Recommender Systems are a type of machine learning algorithm designed to predict a user's preference for an item and provide personalized suggestions. They are the engines behind the personalized experiences you see on platforms like Netflix, Amazon, Spotify, and YouTube. Their goal is to help users discover new and relevant content from a vast catalog of options.
There are two primary types of recommender systems: Content-Based Filtering and Collaborative Filtering.
1. Content-Based Filtering
This approach recommends items based on their attributes and a user's past preferences. The core idea is: "If you liked this item, you will also like other items that are similar to it."
- How it Works:
1. Feature Extraction: The system first analyzes the items in the catalog and extracts key features. For movies, these features could be genre, director, actors, and plot keywords. For news articles, it could be the topic or named entities.
2. User Profile Creation: The system builds a profile for each user based on the items they have liked or interacted with in the past. This profile summarizes the user's preferences (e.g., this user likes sci-fi movies directed by Christopher Nolan).
3. Recommendation: When a recommendation is needed, the system compares the attributes of new or unrated items to the user's profile and suggests the items that are the best match.
- Example: If you watch a lot of action movies starring Tom Cruise on a streaming service, a content-based system will recommend other action movies, especially those also starring Tom Cruise.
- Pros: Doesn't need data from other users; can recommend new and unpopular items.
- Cons: Can lead to a "filter bubble" where you are only recommended items similar to what you already know, limiting discovery.
2. Collaborative Filtering
This is the most common and powerful approach. It makes recommendations based on the preferences of similar users. The core idea is: "Users who agreed in the past will agree in the future."
- How it Works: The algorithm analyzes a large user-item interaction matrix (e.g., a matrix of users and the ratings they've given to movies). It doesn't need to know anything about the items themselves (like the genre). It works in two main ways:
- User-Based Collaborative Filtering: It finds users who have similar rating patterns to you. It then recommends items that these similar users have liked but that you haven't seen yet.
- Item-Based Collaborative Filtering (More Common): Instead of finding similar users, it finds similar items. It analyzes the user-item matrix to find items that are frequently rated similarly by users. For example, it might discover that users who like "The Dark Knight" also tend to like "Inception". If you have liked "The Dark Knight", it will recommend "Inception".
- Example: Amazon's "Customers who bought this item also bought..." feature is a classic example of item-based collaborative filtering.
- Pros: Can lead to serendipitous discoveries of new items; highly effective and accurate.
- Cons: Suffers from the "cold start" problem (it's hard to make recommendations for new users or new items with no rating history).
Hybrid Approaches
Most modern recommender systems use a hybrid approach, combining both content-based and collaborative filtering methods to leverage the strengths of both and mitigate their weaknesses.
# --- 1. Import Necessary Libraries ---
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
# --- 2. Prepare Sample Data ---
# Create a sample dataset of users, movies they've watched, and the ratings they've given.
data = {
'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5],
'movie_title': [
'The Dark Knight', 'Inception', 'Forrest Gump',
'Inception', 'Pulp Fiction',
'The Dark Knight', 'Forrest Gump', 'The Shawshank Redemption',
'Pulp Fiction', 'The Godfather', 'Goodfellas',
'Inception', 'The Dark Knight'
],
'rating': [5, 5, 4, 5, 4, 5, 5, 5, 5, 4, 5, 4, 5]
}
df = pd.DataFrame(data)
# --- 3. Create a User-Item Matrix ---
# This is the core data structure for collaborative filtering.
# We'll use a pivot table to create a matrix where:
# - Rows are users (user_id).
# - Columns are movies (movie_title).
# - Values are the ratings.
# `fillna(0)` replaces missing values (movies a user hasn't rated) with 0.
user_item_matrix = df.pivot_table(index='user_id', columns='movie_title', values='rating').fillna(0)
print("--- User-Item Matrix ---")
print(user_item_matrix)
# --- 4. Calculate Item-Item Similarity ---
# We will use cosine similarity to measure how similar movies are to each other.
# Cosine similarity calculates the cosine of the angle between two vectors.
# In this context, each movie is a vector of user ratings.
# A similarity score of 1 means the movies are very similar, 0 means they are not.
item_similarity = cosine_similarity(user_item_matrix.T) # Transpose the matrix to get items as rows
# Convert the similarity matrix into a DataFrame for easier use.
item_similarity_df = pd.DataFrame(item_similarity, index=user_item_matrix.columns, columns=user_item_matrix.columns)
print("\n--- Item-Item Similarity Matrix ---")
print(item_similarity_df)
# --- 5. Create the Recommendation Function ---
def get_movie_recommendations(movie_title, user_id):
"""
Generates movie recommendations for a user based on a movie they liked.
This is an example of item-based collaborative filtering.
"""
# Get the similarity scores for the given movie against all other movies.
similar_scores = item_similarity_df[movie_title]
# Sort the movies based on their similarity score in descending order.
similar_movies = similar_scores.sort_values(ascending=False)
# Get the list of movies the user has already watched.
watched_movies = user_item_matrix.loc[user_id][user_item_matrix.loc[user_id] > 0].index
# Filter out movies the user has already watched.
recommendations = []
for movie, score in similar_movies.items():
if movie not in watched_movies:
recommendations.append(movie)
return recommendations
# --- 6. Get Recommendations for a User ---
# Let's get recommendations for User 3, who liked "The Dark Knight".
user_to_recommend = 3
movie_liked = 'The Dark Knight'
print(f"\n--- Recommendations for User {user_to_recommend} (based on liking '{movie_liked}') ---")
recommendations = get_movie_recommendations(movie_liked, user_to_recommend)
# Print the top 3 recommendations.
if recommendations:
for i, movie in enumerate(recommendations[:3]):
print(f"{i+1}. {movie}")
else:
print("No new recommendations found.")