Study | StudyLover

Pandas

Unit:1 Foundations of Python and Its Applications in Machine Learning

Pandas is the most essential and powerful Python library for data manipulation and analysis. It provides high-performance, easy-to-use data structures and data analysis tools that have made it the go-to library for data scientists, analysts, and engineers.

The two primary data structures in Pandas are:

Series: A one-dimensional labeled array, similar to a column in a spreadsheet.
DataFrame: A two-dimensional labeled data structure with columns of potentially different types, much like a full spreadsheet or a SQL table. This is the most commonly used Pandas object.

Code Examples

To run these examples, you first need to install Pandas: pip install pandas

1. Creating a DataFrame

You can create a DataFrame from various sources, but a common way is from a dictionary or by reading a file.

import pandas as pd

# Create a DataFrame from a dictionary

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

print("--- Created DataFrame ---")

print(df)

2. Inspecting Data

Pandas provides simple methods to get a quick overview of your DataFrame.

import pandas as pd

# (Assuming 'df' is the DataFrame from the previous example)

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

# Get the first few rows

print("--- First 2 Rows (.head(2)) ---")

print(df.head(2))

# Get a concise summary of the DataFrame

print("\n--- DataFrame Info (.info()) ---")

df.info()

# Get descriptive statistics for numerical columns

print("\n--- Descriptive Statistics (.describe()) ---")

print(df.describe())

3. Selecting Data

You can select columns, rows, and specific data points in several ways.

import pandas as pd

# (Assuming 'df' is the DataFrame from the first example)

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

# Select a single column (returns a Series)

print("--- Selecting the 'Product' Column ---")

print(df['Product'])

# Select multiple columns

print("\n--- Selecting 'Product' and 'Sales_2025' Columns ---")

print(df[['Product', 'Sales_2025']])

# Select rows by their integer position using .iloc

print("\n--- Selecting the first row (.iloc[0]) ---")

print(df.iloc[0])

4. Filtering Data

This is one of the most powerful features of Pandas, allowing you to select rows that meet certain criteria.

import pandas as pd

# (Assuming 'df' is the DataFrame from the first example)

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

# Filter for products with 2024 sales greater than 300

high_sales_2024 = df[df['Sales_2024'] > 300]

print("--- Products with 2024 Sales > 300 ---")

print(high_sales_2024)

# Filter for products in the 'Electronics' category

electronics_products = df[df['Category'] == 'Electronics']

print("\n--- Products in 'Electronics' Category ---")

print(electronics_products)

5. GroupBy and Aggregation

The groupby() method allows you to split your data into groups based on some criteria and then apply a function (like sum, mean, count) to each group.

import pandas as pd

# (Assuming 'df' is the DataFrame from the first example)

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

# Group by 'Category' and calculate the sum of sales for each group

category_sales = df.groupby('Category')[['Sales_2024', 'Sales_2025']].sum()

print("--- Total Sales by Category ---")

print(category_sales)

6. Creating New Columns

You can easily create new columns, often based on calculations from existing columns.

import pandas as pd

# (Assuming 'df' is the DataFrame from the first example)

data = {

'Product': ['Laptops', 'Monitors', 'Keyboards', 'Mice'],

'Sales_2024': [150, 220, 310, 450],

'Sales_2025': [180, 250, 290, 400],

'Category': ['Electronics', 'Electronics', 'Accessories', 'Accessories']

}

df = pd.DataFrame(data)

# Create a new column for the change in sales from 2024 to 2025

df['Sales_Change'] = df['Sales_2025'] - df['Sales_2024']

print("--- DataFrame with New 'Sales_Change' Column ---")

print(df)

Bokeh Mahotas