Python Data Analysis Basics: Mastering Pandas Core Functions

Development, Data Science
20 Jun, 2024

Excel in the Python data ecosystem, Pandas

Why was Python able to become the overwhelming number one language in the fields of data science and machine learning? This is thanks to an excellent ecosystem of libraries specialized in handling data. The core library at the center of the ecosystem is Pandas.

Pandas helps you easily and powerfully manipulate and analyze tabular data, similar to spreadsheets in Excel or tables in a relational database (SQL), in the Python programming environment. It is no exaggeration to say that more than 80% of the work done by data analysts, including big data preprocessing, purification, filtering, and grouping, is done through Pandas.

In this post, we will take a quick look at Pandas' two core data structures and essential data manipulation methods.

Getting started with Pandas

To use Pandas, you must first install and import the library. By convention, it is imported using the alias pd.

# Install (Terminal)
# pip install pandas

# import
import pandas as pd
import numpy as np # Usually, numpy, a numerical calculation library, is also used.

1. The heart of Pandas, two core data structures

Pandas provides two special containers (data structures) to store data: Series and DataFrame.

Series

Series is a data structure in the form of a one-dimensional array. It is easy to think of it as ‘one column’ of a table in Excel. It is similar to Python's basic list, but the difference is that it has a label called Index that allows access to each data.

# Convert list to Pandas Series
data = ['Apple', 'Banana', 'Cherry']
s = pd.Series(data, index=['a', 'b', 'c'])
print(s)

# Output result:
# a Apple
#b Banana
#c Cherry
# dtype: object

DataFrame

DataFrame is a data structure in the form of a two-dimensional table. It is composed of rows and columns and can be edited, and multiple Series can be viewed as gathering together to form one DataFrame. This is the structure that will be dealt with most in actual data analysis.

# Create DataFrame using dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
print(df)

# Output result:
# Name Age City
# 0 Alice 25 New York
#1 Bob 30 London
#2 Charlie 35 Paris

2. Load external data

In a real environment, rather than entering data directly into the code, work is done by loading data saved in CSV or Excel file format. Pandas provides powerful features for reading and writing files in a variety of formats.

# Read CSV file
df_csv = pd.read_csv('sales_data.csv')

# Read Excel file
df_excel = pd.read_excel('report.xlsx')

# Preview the first 5 rows (essential to understand what the data looks like!)
print(df_csv.head())

# Summary of overall information of the data frame (missing values, data type, etc.)
df_csv.info()

3. Core fundamentals of data selection and filtering

The core of Pandas is the method of indexing only the information you want from a massive data frame.

Select a column

# Select only the 'Name' column (results are returned in Series format)
names = df['Name']

# When selecting multiple columns at the same time, group them into a list.
subset = df[['Name', 'City']]

Filtering rows that meet conditions (Boolean Indexing)

It is similar to the filter function in Excel. Used to extract only rows that satisfy specific conditions.

# Filter only people over 30 years old
over_30 = df[df['Age'] >= 30]

# When combining multiple conditions (AND: &, OR: |)
# Caution: You must use parentheses () for each conditional expression.
condition_df = df[(df['Age'] >= 30) & (df['City'] == 'London')]

Select data from specific location (loc, iloc)

loc[]: Select based on the ‘label (name)’ of the index.
iloc[]: Select based on the ‘position (integer number)’ of the index.

# Get all data in row with index 0 (based on name)
row_0 = df.loc[0]

# Get data from row 0, column 1 (based on numbers)
val = df.iloc[0, 1]

4. Handling missing data

In reality, data is often empty (NaN, Not a Number) or messy and full of invalid values. It is essential to handle missing values in the data cleaning (preprocessing) stage before starting analysis.

# Check if there are any missing values
print(df.isnull().sum())

# Delete rows containing at least one missing value
df_dropped = df.dropna()

# Fill missing values with another value (e.g. 0 or the mean value of a column)
mean_age = df['Age'].mean()
df_filled = df.fillna(mean_age)

To the next step

So far, we've only looked at the very basic form and functionality of Pandas. In addition, Pandas has extensive and powerful built-in functions such as data merging (Merge, Concat), grouping and aggregation (GroupBy), and time series data processing.

Rather than trying to memorize all the functions from scratch, download a CSV file dataset of interest from Kaggle or a public data portal, load it directly, and play around with it. There is no better way to learn than dealing with the data and dealing with error messages.

Tags :

AI-Native Development: The End of Traditional Coding as We Knew It

If you told me a few years ago that my main job as a software engineer would involve more talking to an AI than actually typing out lines of syntax, I would have laughed you out of the room. Yet, her

Development, AI & Data
29 Jun, 2026

Down the Rabbit Hole: My First Custom Mechanical Keyboard Build

So, I finally did it. After years of staring at beautifully lit, perfectly sounding typing tests on YouTube, I took the plunge into the bottomless pit that is the custom mechanical keyboard hobby. If

Hardware, Development
14 Jun, 2026

Self-Hosting a Developer's NAS: Why I Stopped Paying for Cloud Storage

For the longest time, I viewed Network Attached Storage (NAS) devices purely as digital filing cabinets. They were the boring, dusty boxes where photographers dumped terabytes of RAW files, or where

Technology, Development
14 Jun, 2026

I Replaced My Multi-Monitor Desktop with a Dual-Screen Laptop: A 2026 Developer Review

If you walk into any software engineer's home office, you will almost certainly find a sprawling command center consisting of at least two, if not three, large monitors. For years, I was no different

Hardware, Review, Technology, Development
12 Jun, 2026

Enterprise Agentic AI: Why We Stopped Prompting and Started Delegating in 2026

A couple of years ago, we were all amazed when we first asked an AI chatbot to write a quick Python script or fix a pesky CSS bug. It felt like magic. We were basically pair programming with a very f

AI & Data, Technology, Development
09 Jun, 2026

Feeling the Digital World: A Month Working with Haptic VR Gloves

Since spatial computing headsets became mainstream a few years ago, we've gotten very used to seeing digital objects sitting on our living room coffee tables. But there was always a glaring disconn

Technology, Development
25 Jun, 2026

6 Months of Migrating from React to HTMX: Was It Worth It? (A Practical Review)

One of the hottest keywords currently burning through the frontend developer community is HTMX. The alluring promise that "you can build modern web apps without (or with very minimal) JavaScript"

Development
25 May, 2026

The Golden Age of Solo Dev: Building Games in Your Bedroom with AI

We all love games, right? If you browse Steam or mobile app stores lately, you'll probably notice an incredible surge of indie games armed with brilliant, outside-the-box ideas. Even after a long day

Technology, Development, AI & Data
14 Jun, 2026

Can the 2026 iPad Pro Actually Replace a MacBook for Developers? A 30-Day Experiment

Every few years, Apple releases a new iPad Pro with a processor so powerful it rivals their top-tier laptops. And every time, the tech community asks the exact same question: *"Can I finally code on

Hardware, Development
22 May, 2026

I Spent a Week Coding with OpenAI's o1 Model: Here is What Happened

We’ve all been there. You paste a complex chunk of code into ChatGPT, ask it to fix a subtle bug, and it confidently spits back a solution that looks brilliant—until you actually run it, and everythi

AI & Data, Development, Review
15 Oct, 2024

The Complete Guide to Docker: Introduction to and Use of Container Technology for Beginners

What is Docker? One of the technologies that has brought about the most innovative changes in the software development and distribution environment in recent years is Docker. Docker is a software

Development
10 Jun, 2024

Mastering Kubernetes: Container Orchestration Beyond Docker

What is Kubernetes? While Docker revolutionized the creation and management of single containers, Kubernetes (k8s for short) is a 'Container Orchestration' tool that automates the process of depl

Development
12 Jun, 2024

The Complete Guide to Git Branching Strategy: From Git Flow to GitHub Flow

A necessity for collaboration, Git branch strategy In software development projects, when multiple developers write code simultaneously, conflicts and confusion inevitably arise. “Who modified th

Development
14 Jun, 2024

React vs Vue.js: Guide to Choosing a Front-End Framework in 2024

Front-end war, what is your choice? If you are at all interested in web development, you have probably heard the names 'React' and 'Vue.js' at least once. As the jQuery era comes to an end and th

Development, Frontend
16 Jun, 2024

Website Performance Optimization Strategies: How Loading Speed Affects Your Business

Butterfly effect with 1 second loading speed The patience of not only Koreans, a “fast, quick” people, but also internet users around the world, is getting shorter and shorter. An Amazon study fo

Development, Frontend
18 Jun, 2024

Complete CI/CD pipeline automation starting with GitHub Actions

Escape the nightmare of manual deployment “Okay, now the coding is done! Let’s connect to the server, get git pull, reinstall dependencies, build, kill the existing process, and launch a new pr

DevOps, Development
24 Jun, 2024

TypeScript 101: Putting ‘seatbelts’ on JavaScript

Betrayal of JavaScript JavaScript is the most widely used language in the world, and is a very flexible and easy to write language. However, as the project size grows and becomes more complex, 'f

Development, Frontend
26 Jun, 2024

Practical guide to developer-prompted engineering in the era of generative AI

Introduction: Why do developers need prompt engineering? In an era where generative AI writes code and fixes bugs, the role of developers is rapidly evolving from simply ‘typing’ code to ‘designi

Development
31 May, 2024

Front-end ecosystem trends in 2024: What should we learn and prepare for?

Introduction: The ever-changing front-end ecosystem Among the web development fields, the front-end ecosystem is one where the speed of change is dazzlingly fast. New frameworks and tools are con

Development
31 May, 2024

Data-based decision making and big data analysis trends in 2024

Introduction: “I think...” vs “Looking at the data...” What is the most dangerous thing to say in a business meeting? It is an argument that begins with "I think..." and relies solely on a person

Data
31 May, 2024