Study | StudyLover

Collaborative filtering

Content-Based Recommender System : Hybrid Recommender Systems

Unit 2: Guide to Machine Learning Algorithms

Collaborative filtering is the most popular and powerful technique used in recommender systems. Unlike content-based filtering, which focuses on the attributes of the items themselves, collaborative filtering makes recommendations based on the collective behavior and preferences of similar users. The core idea is: "If person A has the same opinion as person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person."

In simpler terms, it works by finding people with similar tastes to you and then recommending items that they have liked but you haven't yet seen.

How It Works

The algorithm analyzes a large user-item interaction matrix (e.g., a table where rows are users and columns are movies, with the values being the ratings). It doesn't need to know anything about the items themselves (like the genre or director). It works in two main ways:

1. User-Based Collaborative Filtering:

o Find Similar Users: The algorithm first finds users who have a similar rating history to you. For example, it might find another user who has also rated "The Dark Knight," "Inception," and "Pulp Fiction" highly.

o Recommend Items: It then looks at the items that this similar user has liked but that you haven't seen yet (e.g., "The Shawshank Redemption") and recommends them to you.

2. Item-Based Collaborative Filtering (More Common and Scalable):

o Find Similar Items: Instead of finding similar users, this method finds similar items. It analyzes the user-item matrix to find items that are frequently rated similarly by users. For example, it might discover that, in general, users who like "The Dark Knight" also tend to like "Inception".

o Recommend Items: If you have liked "The Dark Knight", the system will then recommend "Inception" to you, based on this learned item-item similarity.

Advantages and Disadvantages

Advantages:

Serendipity: It can lead to the discovery of new and unexpected items because it's not limited by the features of the items you've previously liked.
No Need for Item Content: It works without needing any information about the items themselves, which is useful when item features are hard to obtain.
High Accuracy: It is often more accurate than content-based filtering.

Disadvantages:

Cold Start Problem: This is the biggest challenge. It's very difficult to make recommendations for:

New Users: Because they have no rating history, the system doesn't know who they are similar to.
New Items: Because no one has rated them yet, the system can't find any similar items.

Data Sparsity: In most real-world scenarios, the user-item matrix is very sparse (most users have only rated a few items). This can make it difficult to find users or items with enough overlapping ratings to calculate similarity.

# --- 1. Import Necessary Libraries ---

import pandas as pd

from sklearn.metrics.pairwise import cosine_similarity

# --- 2. Prepare Sample Data ---

# Create a sample dataset of users, movies they've watched, and the ratings they've given.

data = {

'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5],

'movie_title': [

'The Dark Knight', 'Inception', 'Forrest Gump',

'Inception', 'Pulp Fiction',

'The Dark Knight', 'Forrest Gump', 'The Shawshank Redemption',

'Pulp Fiction', 'The Godfather', 'Goodfellas',

'Inception', 'The Dark Knight'

'rating': [5, 5, 4, 5, 4, 5, 5, 5, 5, 4, 5, 4, 5]

}

df = pd.DataFrame(data)

# --- 3. Create a User-Item Matrix ---

# This is the core data structure for collaborative filtering.

# We'll use a pivot table to create a matrix where:

# - Rows are users (user_id).

# - Columns are movies (movie_title).

# - Values are the ratings.

# `fillna(0)` replaces missing values (movies a user hasn't rated) with 0.

user_item_matrix = df.pivot_table(index='user_id', columns='movie_title', values='rating').fillna(0)

print("--- User-Item Matrix ---")

print(user_item_matrix)

# --- 4. Calculate Item-Item Similarity ---

# We will use cosine similarity to measure how similar movies are to each other.

# Cosine similarity calculates the cosine of the angle between two vectors.

# In this context, each movie is a vector of user ratings.

# A similarity score of 1 means the movies are very similar, 0 means they are not.

item_similarity = cosine_similarity(user_item_matrix.T) # Transpose the matrix to get items as rows

# Convert the similarity matrix into a DataFrame for easier use.

item_similarity_df = pd.DataFrame(item_similarity, index=user_item_matrix.columns, columns=user_item_matrix.columns)

print("\n--- Item-Item Similarity Matrix ---")

print(item_similarity_df)

# --- 5. Create the Recommendation Function ---

def get_movie_recommendations(movie_title, user_id):

"""

Generates movie recommendations for a user based on a movie they liked.

This is an example of item-based collaborative filtering.

"""

# Get the similarity scores for the given movie against all other movies.

similar_scores = item_similarity_df[movie_title]

# Sort the movies based on their similarity score in descending order.

similar_movies = similar_scores.sort_values(ascending=False)

# Get the list of movies the user has already watched.

watched_movies = user_item_matrix.loc[user_id][user_item_matrix.loc[user_id] > 0].index

# Filter out movies the user has already watched.

recommendations = []

for movie, score in similar_movies.items():

if movie not in watched_movies:

recommendations.append(movie)

return recommendations

# --- 6. Get Recommendations for a User ---

# Let's get recommendations for User 3, who liked "The Dark Knight".

user_to_recommend = 3

movie_liked = 'The Dark Knight'

print(f"\n--- Recommendations for User {user_to_recommend} (based on liking '{movie_liked}') ---")

recommendations = get_movie_recommendations(movie_liked, user_to_recommend)

# Print the top 3 recommendations.

if recommendations:

for i, movie in enumerate(recommendations[:3]):

print(f"{i+1}. {movie}")

else:

print("No new recommendations found.")

Content-Based Recommender System Hybrid Recommender Systems