Study | StudyLover

Hybrid Recommender Systems

Collaborative filtering : The Role of Model Evaluation in Recommender Systems

Unit 2: Guide to Machine Learning Algorithms

A hybrid recommender system is a system that combines two or more recommendation strategies to leverage their complementary strengths and mitigate their individual weaknesses. The most common hybrid approach is to combine content-based filtering and collaborative filtering.

The goal is to produce better, more accurate, and more robust recommendations than either method could on its own.

Why Use a Hybrid Approach?

Solving the "Cold Start" Problem: Collaborative filtering fails for new users or new items because there's no interaction history. A hybrid system can switch to a content-based approach in these situations, recommending items based on their attributes until enough user data is collected.
Improving Accuracy: By combining the "similar users" logic of collaborative filtering with the "similar items" logic of content-based filtering, the system can make more nuanced and accurate predictions.
Increasing Diversity: Content-based systems can get stuck in a "filter bubble," only recommending items very similar to what a user already knows. Collaborative filtering can introduce more diverse and serendipitous recommendations from similar users, broadening the user's horizons.

Common Hybridization Methods

1. Weighted Hybrid: This is the most straightforward approach. The system calculates a score for each item using both a content-based and a collaborative filtering method. The final recommendation score is a weighted average of these two scores. Final Score = (α * Collaborative_Score) + ((1-α) * Content_Based_Score) The weight (α) can be adjusted to give more importance to one method over the other.

2. Switching Hybrid: The system switches between different recommendation methods based on certain criteria. For example, if a user has very few ratings, the system uses a content-based approach. Once the user has provided enough ratings, it switches to a more powerful collaborative filtering approach.

3. Feature Combination: This method treats the output of one recommender as an input feature for another. For example, the predictions from a collaborative filtering model could be used as an additional feature in a content-based model.

Real-World Example: Netflix

Netflix is a prime example of a company that uses a sophisticated hybrid recommender system. It analyzes:

Collaborative Data: What you watch, what other users with similar tastes watch.
Content-Based Data: The genres, actors, directors, and even plot keywords of the movies and shows you like.

By combining these, it can recommend a movie because "users with similar tastes to you also liked it" (collaborative) and because "it's a sci-fi movie starring an actor you like" (content-based).

# --- 1. Import Necessary Libraries ---

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

# --- 2. Prepare Sample Data ---

# We'll use a combined dataset with both user ratings and item content (genres).

ratings_data = {

'user_id': [1, 1, 2, 2, 3, 3, 4, 4],

'movie_title': ['The Dark Knight', 'Inception', 'Inception', 'Pulp Fiction', 'The Dark Knight', 'Forrest Gump', 'Pulp Fiction', 'The Godfather'],

'rating': [5, 5, 5, 4, 5, 5, 5, 4]

}

movies_data = {

'movie_title': ['The Dark Knight', 'Inception', 'Forrest Gump', 'Pulp Fiction', 'The Godfather', 'The Matrix'],

}

ratings_df = pd.DataFrame(ratings_data)

movies_df = pd.DataFrame(movies_data)

# --- 3. Content-Based Filtering Component ---

# Create a TF-IDF matrix from the movie genres.

tfidf = TfidfVectorizer(stop_words='english')

tfidf_matrix = tfidf.fit_transform(movies_df['genres'])

# Calculate the cosine similarity between movies based on their genres.

content_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

content_sim_df = pd.DataFrame(content_sim, index=movies_df['movie_title'], columns=movies_df['movie_title'])

# --- 4. Collaborative Filtering Component ---

# Create a user-item matrix from the ratings data.

user_item_matrix = ratings_df.pivot_table(index='user_id', columns='movie_title', values='rating').fillna(0)

# Calculate the cosine similarity between movies based on user ratings.

collab_sim = cosine_similarity(user_item_matrix.T)

collab_sim_df = pd.DataFrame(collab_sim, index=user_item_matrix.columns, columns=user_item_matrix.columns)

# --- 5. Create a Weighted Hybrid Recommender ---

def get_hybrid_recommendations(user_id, movie_title, alpha=0.5):

"""

Generates recommendations using a weighted average of content-based and collaborative filtering scores.

alpha: The weight given to collaborative filtering (0.0 to 1.0).

"""

# Get the list of movies the user has already watched.

watched_movies = user_item_matrix.loc[user_id][user_item_matrix.loc[user_id] > 0].index

# Get the similarity scores from both models for the liked movie.

content_scores = content_sim_df[movie_title]

# Ensure the liked movie is in the collaborative data, otherwise use an empty series.

if movie_title in collab_sim_df:

collab_scores = collab_sim_df[movie_title]

else:

collab_scores = pd.Series(0, index=content_sim_df.index)

# Calculate the weighted hybrid score.

hybrid_scores = (alpha * collab_scores) + ((1 - alpha) * content_scores)

# Sort the movies based on the hybrid score.

sorted_scores = hybrid_scores.sort_values(ascending=False)

# Filter out movies the user has already watched.

recommendations = []

for movie, score in sorted_scores.items():

if movie not in watched_movies and movie != movie_title:

recommendations.append((movie, score))

return recommendations

# --- 6. Get Recommendations for a User ---

user_to_recommend = 3

movie_liked = 'The Dark Knight'

print(f"--- Hybrid Recommendations for User {user_to_recommend} (based on liking '{movie_liked}') ---")

recommendations = get_hybrid_recommendations(user_to_recommend, movie_liked, alpha=0.5)