Pandas is the most essential and powerful Python library for data manipulation and analysis. It provides high-performance, easy-to-use data structures and data analysis tools that have made it the go-to library for data scientists, analysts, and engineers.
The two primary data structures in Pandas are:
- Series: A one-dimensional labeled array, similar to a column in a spreadsheet.
- DataFrame: A two-dimensional labeled data structure with columns of potentially different types, much like a full spreadsheet or a SQL table. This is the most commonly used Pandas object.
Code Examples
To run these examples, you first need to install Pandas: pip install pandas
1. Creating a DataFrame
You can create a DataFrame from various sources, but a common way is from a dictionary or by reading a file.
import pandas as pd
# Create a DataFrame from a dictionary
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
print("--- Created DataFrame ---")
print(df)
2. Inspecting Data
Pandas provides simple methods to get a quick overview of your DataFrame.
import pandas as pd
# (Assuming 'df' is the DataFrame from the previous example)
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
# Get the first few rows
print("--- First 2 Rows (.head(2)) ---")
print(df.head(2))
# Get a concise summary of the DataFrame
print("\n--- DataFrame Info (.info()) ---")
df.info()
# Get descriptive statistics for numerical columns
print("\n--- Descriptive Statistics (.describe()) ---")
print(df.describe())
3. Selecting Data
You can select columns, rows, and specific data points in several ways.
import pandas as pd
# (Assuming 'df' is the DataFrame from the first example)
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
# Select a single column (returns a Series)
print("--- Selecting the 'Product' Column ---")
print(df['Product'])
# Select multiple columns
print("\n--- Selecting 'Product' and 'Sales_2025' Columns ---")
print(df[['Product', 'Sales_2025']])
# Select rows by their integer position using .iloc
print("\n--- Selecting the first row (.iloc[0]) ---")
print(df.iloc[0])
4. Filtering Data
This is one of the most powerful features of Pandas, allowing you to select rows that meet certain criteria.
import pandas as pd
# (Assuming 'df' is the DataFrame from the first example)
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
# Filter for products with 2024 sales greater than 300
high_sales_2024 = df[df['Sales_2024'] > 300]
print("--- Products with 2024 Sales > 300 ---")
print(high_sales_2024)
# Filter for products in the 'Electronics' category
electronics_products = df[df['Category'] == 'Electronics']
print("\n--- Products in 'Electronics' Category ---")
print(electronics_products)
5. GroupBy and Aggregation
The groupby() method allows you to split your data into groups based on some criteria and then apply a function (like sum, mean, count) to each group.
import pandas as pd
# (Assuming 'df' is the DataFrame from the first example)
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
# Group by 'Category' and calculate the sum of sales for each group
category_sales = df.groupby('Category')[['Sales_2024', 'Sales_2025']].sum()
print("--- Total Sales by Category ---")
print(category_sales)
6. Creating New Columns
You can easily create new columns, often based on calculations from existing columns.
import pandas as pd
# (Assuming 'df' is the DataFrame from the first example)
data = {
'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],
'Sales_2024': [150, 220, 310, 450],
'Sales_2025': [180, 250, 290, 400],
'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']
}
df = pd.DataFrame(data)
# Create a new column for the change in sales from 2024 to 2025
df['Sales_Change'] = df['Sales_2025'] - df['Sales_2024']
print("--- DataFrame with New 'Sales_Change' Column ---")
print(df)