Matplotlib is the most widely used and foundational library for creating static, animated, and interactive visualizations in Python. Think of it as the ultimate toolbox for plotting data; it gives you precise control over every aspect of a figure, making it incredibly powerful for both simple data exploration and creating publication-quality charts.
It was designed to have a feel similar to MATLAB's plotting functions, making it familiar to many in the scientific and engineering communities.
Key Concepts
- pyplot: This is a module within Matplotlib that provides a simple, state-based interface for creating plots. When you see import matplotlib.pyplot as plt, you are importing this module. It's the most common way to use Matplotlib for quick and easy plotting.
- Figure and Axes: A Matplotlib plot is structured with two main components:
- Figure: The top-level container for everything. It's the overall window or page that everything is drawn on.
- Axes: This is what you typically think of as "the plot." It's the area where data points are plotted with x- and y-axes (or other coordinates). A single Figure can contain multiple Axes (subplots).
Code Examples
To run these examples, you first need to install Matplotlib: pip install matplotlib
1. Simple Line Plot
A line plot is the most basic type of plot, used to visualize the relationship between two continuous variables, often over time.
import matplotlib.pyplot as plt
# Sample data representing monthly sales
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [10000, 12000, 11500, 13000, 15000, 14000]
# Create the plot
# plt.plot() draws the line based on the x (months) and y (sales) data
plt.plot(months, sales)
# Add titles and labels for clarity
plt.title('Monthly Sales Performance')
plt.xlabel('Month')
plt.ylabel('Sales (in INR)')
# Display the plot
plt.show()
2. Bar Chart
Bar charts are excellent for comparing quantities across different categories.
import matplotlib.pyplot as plt
# Sample data for different product categories
products = ['Electronics', 'Clothing', 'Groceries', 'Books']
revenue = [500000, 350000, 600000, 200000]
# Create the bar chart
# plt.bar() creates vertical bars for each category
plt.bar(products, revenue, color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
# Add titles and labels
plt.title('Revenue by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Revenue (in INR)')
plt.xticks(rotation=15) # Rotate x-axis labels slightly for better readability
# Display the plot
plt.show()
3. Histogram
A histogram is used to visualize the distribution of a single numerical variable by grouping numbers into "bins."
import matplotlib.pyplot as plt
import numpy as np
# Generate some random data representing student test scores
# np.random.randn() generates data with a standard normal distribution
scores = 65 + 15 * np.random.randn(1000)
# Create the histogram
# plt.hist() automatically calculates the distribution and draws the bars
# 'bins=30' specifies how many bars to divide the data into
plt.hist(scores, bins=30, color='skyblue', edgecolor='black')
# Add titles and labels
plt.title('Distribution of Student Test Scores')
plt.xlabel('Score')
plt.ylabel('Number of Students')
# Display the plot
plt.show()
4. Scatter Plot
Scatter plots are used to observe the relationship between two numerical variables. Each dot represents an observation.
import matplotlib.pyplot as plt
import numpy as np
# Sample data: study hours vs. exam scores
study_hours = np.array([1, 2, 2.5, 3, 4, 4.5, 5, 6, 7, 8])
exam_scores = np.array([65, 68, 70, 75, 80, 85, 88, 90, 92, 95])
# Create the scatter plot
# plt.scatter() places a marker at each (x, y) coordinate
plt.scatter(study_hours, exam_scores, color='crimson')
# Add titles and labels
plt.title('Study Hours vs. Exam Scores')
plt.xlabel('Hours Spent Studying')
plt.ylabel('Exam Score (%)')
# Display the plot
plt.show()
5. Customized Plot with Subplots
This example shows how to create a figure with multiple plots (Axes) and customize their appearance.
import matplotlib.pyplot as plt
import numpy as np
# Prepare data for two different lines
x = np.linspace(0, 10, 100) # 100 points from 0 to 10
y1 = np.sin(x)
y2 = np.cos(x)
# Create a figure and a set of subplots. Here, we create one subplot (axes).
# 'fig' is the entire window, 'ax' is the plot inside it.
fig, ax = plt.subplots(figsize=(10, 6)) # figsize sets the window size in inches
# Plot both lines on the same axes
ax.plot(x, y1, color='blue', linestyle='--', linewidth=2, label='Sine')
ax.plot(x, y2, color='red', marker='o', markersize=4, linestyle=':', label='Cosine')
# Customize the plot
ax.set_title('Sine and Cosine Waves', fontsize=16)
ax.set_xlabel('X-axis', fontsize=12)
ax.set_ylabel('Y-axis', fontsize=12)
ax.grid(True, linestyle='-.', alpha=0.6) # Add a grid
ax.legend() # Display the labels for each line
# Set limits for the axes
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)
# Display the plot
plt.show()