Study | StudyLover

Supervised Learning vs. Unsupervised Learning

Applications of Machine Learning : Python libraries suitable for Machine Learning

Unit:1 Foundations of Python and Its Applications in Machine Learning

The fundamental difference between Supervised and Unsupervised Learning lies in the type of data used for training: Supervised Learning uses labeled data, while Unsupervised Learning uses unlabeled data.

Think of it like this:

Supervised Learning is like learning with a teacher who gives you a workbook with questions and the correct answers. You learn by comparing your results to the known answers.
Unsupervised Learning is like being given a box of mixed LEGO bricks and being asked to sort them into groups. No one tells you what the groups should be; you have to discover the patterns (color, size, shape) on your own.

Supervised Learning: Learning with a Teacher

In supervised learning, the algorithm learns from a dataset where each data point is tagged with a correct output or "label." The goal is to learn a mapping function that can predict the output label for new, unseen data.

Key Characteristics:

Goal: To predict an outcome or classify data.
Input Data: Labeled data (features + correct answers).
Process: The model is "trained" by comparing its predictions to the correct labels and adjusting its internal parameters to minimize errors.
Main Types:

Classification: The output is a category (e.g., "Spam" or "Not Spam," "Cat" or "Dog").
Regression: The output is a continuous value (e.g., the price of a house, the temperature tomorrow).

Real-World Example: Email Spam Detection An algorithm is fed thousands of emails that have already been labeled by humans as either "spam" or "not spam." The model learns the features (words, sender, etc.) associated with spam and uses this knowledge to classify new, incoming emails.

Code Example: Predicting Diabetes (Classification)

Here, we'll train a simple model to predict whether a person has diabetes based on their age and blood glucose level. The data is labeled because we know the outcome for each patient in our training set.

# Import the K-Nearest Neighbors classifier

from sklearn.neighbors import KNeighborsClassifier

import numpy as np

# --- Step 1: Prepare the Labeled Data ---

# Features: [Age, Blood Glucose Level]

X_train = np.array([

[25, 110], [30, 95], [45, 160], [50, 180],

[22, 85], [60, 190], [35, 140]

])

# Labels: 0 = No Diabetes, 1 = Diabetes

# This is the "teacher" providing the correct answers for the training data.

y_train = np.array([0, 0, 1, 1, 0, 1, 1])

# --- Step 2: Create and Train the Model ---

# Create a classifier object

# It will find the 3 nearest neighbors to make a prediction

model = KNeighborsClassifier(n_neighbors=3)

# Train the model using our labeled data

model.fit(X_train, y_train)

# --- Step 3: Make a Prediction on New, Unseen Data ---

# Let's predict the outcome for a new patient: Age 48, Glucose 175

new_patient = np.array([[48, 175]])

prediction = model.predict(new_patient)

# --- Step 4: Interpret the Result ---

print(f"New Patient Data: {new_patient[0]}")

if prediction[0] == 1:

print("Prediction: The model predicts this patient has Diabetes. 🩺")

else:

print("Prediction: The model predicts this patient does not have Diabetes. ✅")

# Expected Output: The model predicts this patient has Diabetes.

Unsupervised Learning: Finding Hidden Patterns

In unsupervised learning, the algorithm is given a dataset without any labels. The goal is to explore the data and find some inherent structure or patterns within it on its own.

Key Characteristics:

Goal: To discover hidden patterns or group similar data points.
Input Data: Unlabeled data (features only, no answers).
Process: The model tries to learn the relationships between the data points by grouping them or identifying outliers.
Main Types:

Clustering: Grouping similar data points together (e.g., customer segmentation).
Association: Discovering rules that describe large portions of your data (e.g., "customers who buy bread also tend to buy milk").

Real-World Example: Customer Segmentation A retail company has data on the purchasing habits of its customers but doesn't have pre-defined customer "types." An unsupervised learning algorithm can process this data and automatically group customers into segments (e.g., "budget shoppers," "brand loyalists," "weekend shoppers") based on their shared behaviors.

Code Example: Grouping Customers (Clustering)

Here, we have data on customer spending habits, but we don't have labels. We want the model to discover natural groupings (clusters) within the data.

# Import the KMeans clustering algorithm

from sklearn.cluster import KMeans

import numpy as np

import matplotlib.pyplot as plt

# --- Step 1: Prepare the Unlabeled Data ---

# Features: [Annual Income (in thousands), Spending Score (1-100)]

# Notice there are no 'y_train' labels. The model knows nothing about these customers.

X = np.array([

[25, 75], [30, 80], [28, 60],

[60, 30], [55, 25], [65, 20],

[45, 50], [50, 55]

])

# --- Step 2: Create and Train the Model ---

# Create a KMeans object. We'll ask it to find 2 clusters.

kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)

# Train the model. It will find the best centers for the 2 clusters.

kmeans.fit(X)

# The model has now assigned a cluster (0 or 1) to each data point.

cluster_labels = kmeans.labels_

print(f"Cluster assignments for each customer: {cluster_labels}")

# --- Step 3: Make a Prediction for a New Customer ---

# Let's see which cluster a new customer belongs to: Income 58k, Spending Score 28

new_customer = np.array([[58, 28]])

prediction = kmeans.predict(new_customer)

print(f"\nNew Customer Data: {new_customer[0]}")

print(f"Prediction: This new customer belongs to Cluster {prediction[0]}.")

# --- Optional: Visualize the results ---

plt.figure(figsize=(8, 6))

# Plot the data points, coloring them by their assigned cluster

plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis', marker='o', s=100, label='Existing Customers')

# Plot the new customer

plt.scatter(new_customer[:, 0], new_customer[:, 1], c='red', marker='*', s=200, label='New Customer')

# Plot the cluster centers

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=250, c='blue', marker='X', label='Cluster Centers')

plt.title('Customer Segmentation')

plt.xlabel('Annual Income (k$)')

plt.ylabel('Spending Score (1-100)')

plt.legend()

plt.grid(True)

plt.show()

Applications of Machine Learning Python libraries suitable for Machine Learning