Model Training Module
HybridModel
A hybrid recommendation system combining content-based and collaborative filtering.
This class implements a hybrid approach that leverages both content-based filtering (using movie genres) and collaborative filtering (using user-item interactions). It uses FAISS for efficient similarity search in the collaborative filtering component and combines recommendations from both approaches.
Source code in src/train.py
|
|
__init__(movies, ratings)
Initialize the hybrid recommendation model.
Sets up the hybrid model by creating sparse matrices for collaborative filtering, training the FAISS index, and preparing all necessary mappings.
Source code in src/train.py
find_closest_title(input_title)
Find the closest matching movie title using fuzzy string matching.
This method handles typos and variations in movie titles by finding the most similar title in the movies database using sequence matching.
Source code in src/train.py
hybrid_recommend(user_ratings, content_weight=0.4, top_n=5)
Generate hybrid recommendations combining content-based and collaborative filtering.
This method implements a hybrid recommendation approach that: 1. Generates content-based recommendations using movie genres 2. Creates collaborative filtering recommendations using user similarity 3. Combines both approaches to provide diverse, high-quality recommendations 4. Handles fuzzy matching for movie titles to improve usability
Source code in src/train.py
load_hybrid_model()
Load a pre-trained hybrid recommendation model from disk.
This helper function loads a previously saved hybrid model using joblib, ensuring proper class reference resolution for successful deserialization.
Source code in src/train.py
train_model()
Train and save the hybrid recommendation model.
This function orchestrates the complete model training process: 1. Loads movie and rating data from CSV files 2. Applies data preprocessing and downsampling for performance 3. Trains the hybrid model combining content-based and collaborative filtering 4. Saves the trained model to disk using joblib
The function performs data downsampling to improve training speed and memory usage by selecting the top 20,000 most active users and top 10,000 most rated movies.