Usage
๐ Getting Started
๐ Prerequisites
Note: this model creates sparse matrixes, and can be heavy for ram!
Recommended ram for deployment system is at least 32GB!
Before running the Personal Movie Recommender, ensure you have the following installed:
- ๐ป WSL (Windows Subsystem for Linux) this is necessary for faiss library. (only works on linux & macos)
- ๐ Python 3.8+ (Nix shell provides Python 3.13)
-
๐ฆ Required packages:
-
pip
โ Python package installer pandas
โ for data manipulation and analysismatplotlib
โ for plotting and visualizationsseaborn
โ statistical data visualization based on matplotlibscikit-learn
โ for similarity calculations and machine learninggradio
โ for the interactive web interfacefaiss
โ efficient similarity search library
๐ง Installation
-
๐ฅ Clone the repository:
-
โ๏ธ Install dependencies:
Using pip: pip install -r requirements.txt
Or if you use Nix flakes: nix develop
- ๐ Prepare your data:
Download the required datasets and place them in the data/
directory:
- movies.csv
- movie metadata (title, genres, year, etc.)
- ratings.csv
- user ratings data
You can obtain these datasets from Kaggle: Movie Recommendation System Dataset.
๐ฏ Usage
1. ๐ง Train the Recommendation Model
First, train the hybrid recommendation model: python src/train.py
This will:
- ๐ Load and preprocess the movie and ratings data
- ๐ง Build collaborative filtering and content-based models
- ๐พ Save the trained model as hybrid_model.joblib
2. ๐ ๏ธ Run the Data Debugger
Next let's run debugger to inspect the dataset: python src/data_debug.py
This script will help you inspect and verify your dataset. It will: - ๐ Print the shape (rows ร columns) of the raw movies.csv and ratings.csv - ๐ Show the first 5 entries from each dataset - ๐งพ List all columns in the movies.csv file - ๐ Preprocess and clean the movie data - ๐ Display genre distribution and perform basic genre analysis
Use this to ensure your dataset is correctly formatted and loaded before training.
3. โ Run a Basic Test Before Deployment
Now, let's confirm the model works properly before launching the app: python src/simple_test.py
This script performs a sanity check to confirm the model is functioning. It will: - ๐ง Load the model and required similarity data - โ๏ธ Generate the cosine_sim.npy file (if it doesn't already exist) โ this precomputed similarity matrix helps reduce RAM usage and speed up recommendations - ๐ฌ Run the model on a few predefined test cases - ๐จ๏ธ Print out sample recommendations to the terminal
Run this after training to confirm everything works as expected before launching the interface.
4. ๐ Launch the Web Interface
Start the Gradio web application: python src/main.py
This will: - ๐ Load the trained model - ๐ฅ๏ธ Launch an interactive web interface - ๐ญ Allow you to input preferences and get personalized recommendations
The interface will be available at http://localhost:7860
by default.
5. ๐ Generate Visualizations (Optional)
Create data visualizations and analysis charts:
This generates various plots showing: - ๐ Rating distributions - ๐ฌ Genre popularity - ๐ฅ User behavior patterns - ๐ฏ Model performance metrics
๐ ๏ธ Development & Testing
The src/
directory contains additional utility scripts for development:
test.py
- ๐งช unit tests for core functionalityutils.py
- ๐ง helper functions for data processing- Various debugging scripts for testing specific components
To run tests:
๐ง Troubleshooting
- ๐ Missing data files: Ensure
movies.csv
andratings.csv
are in thedata/
directory - ๐ Memory issues: For large datasets, consider using data sampling or increasing system memory
- ๐ Port conflicts: If port 7860 is occupied, Gradio will automatically use the next available port
๐ฏ Next Steps
- ๐ Explore the API documentation for programmatic access
- ๐ Check out the visualization outputs to understand your data better
- ๐ฌ Modify the model parameters in
train.py
to experiment with different approaches