Skip to content

Usage

๐Ÿš€ Getting Started

๐Ÿ“‹ Prerequisites


Note: this model creates sparse matrixes, and can be heavy for ram!

Recommended ram for deployment system is at least 32GB!


Before running the Personal Movie Recommender, ensure you have the following installed:

  • ๐Ÿ’ป WSL (Windows Subsystem for Linux) this is necessary for faiss library. (only works on linux & macos)
  • ๐Ÿ Python 3.8+ (Nix shell provides Python 3.13)
  • ๐Ÿ“ฆ Required packages:

  • pip โ€” Python package installer

  • pandas โ€” for data manipulation and analysis
  • matplotlib โ€” for plotting and visualizations
  • seaborn โ€” statistical data visualization based on matplotlib
  • scikit-learn โ€” for similarity calculations and machine learning
  • gradio โ€” for the interactive web interface
  • faiss โ€” efficient similarity search library

๐Ÿ”ง Installation

  1. ๐Ÿ“ฅ Clone the repository:

    git clone https://github.com/Gurjaka/Personal-movie-recommendation-model.git
    cd Personal-movie-recommendation-model
    

  2. โš™๏ธ Install dependencies:

Using pip: pip install -r requirements.txt

Or if you use Nix flakes: nix develop

  1. ๐Ÿ“Š Prepare your data:

Download the required datasets and place them in the data/ directory: - movies.csv - movie metadata (title, genres, year, etc.) - ratings.csv - user ratings data

You can obtain these datasets from Kaggle: Movie Recommendation System Dataset.

๐ŸŽฏ Usage

1. ๐Ÿง  Train the Recommendation Model

First, train the hybrid recommendation model: python src/train.py

This will: - ๐Ÿ“– Load and preprocess the movie and ratings data - ๐Ÿ”ง Build collaborative filtering and content-based models - ๐Ÿ’พ Save the trained model as hybrid_model.joblib

2. ๐Ÿ› ๏ธ Run the Data Debugger

Next let's run debugger to inspect the dataset: python src/data_debug.py

This script will help you inspect and verify your dataset. It will: - ๐Ÿ“ Print the shape (rows ร— columns) of the raw movies.csv and ratings.csv - ๐Ÿ“ Show the first 5 entries from each dataset - ๐Ÿงพ List all columns in the movies.csv file - ๐Ÿ”„ Preprocess and clean the movie data - ๐Ÿ“Š Display genre distribution and perform basic genre analysis

Use this to ensure your dataset is correctly formatted and loaded before training.

3. โœ… Run a Basic Test Before Deployment

Now, let's confirm the model works properly before launching the app: python src/simple_test.py

This script performs a sanity check to confirm the model is functioning. It will: - ๐Ÿง  Load the model and required similarity data - โš™๏ธ Generate the cosine_sim.npy file (if it doesn't already exist) โ€” this precomputed similarity matrix helps reduce RAM usage and speed up recommendations - ๐ŸŽฌ Run the model on a few predefined test cases - ๐Ÿ–จ๏ธ Print out sample recommendations to the terminal

Run this after training to confirm everything works as expected before launching the interface.

4. ๐ŸŒ Launch the Web Interface

Start the Gradio web application: python src/main.py

This will: - ๐Ÿš€ Load the trained model - ๐Ÿ–ฅ๏ธ Launch an interactive web interface - ๐ŸŽญ Allow you to input preferences and get personalized recommendations

The interface will be available at http://localhost:7860 by default.

5. ๐Ÿ“ˆ Generate Visualizations (Optional)

Create data visualizations and analysis charts:

python src/visualize.py

This generates various plots showing: - ๐Ÿ“Š Rating distributions - ๐ŸŽฌ Genre popularity - ๐Ÿ‘ฅ User behavior patterns - ๐ŸŽฏ Model performance metrics

๐Ÿ› ๏ธ Development & Testing

The src/ directory contains additional utility scripts for development:

  • test.py - ๐Ÿงช unit tests for core functionality
  • utils.py - ๐Ÿ”ง helper functions for data processing
  • Various debugging scripts for testing specific components

To run tests:

python src/test.py

๐Ÿ”ง Troubleshooting

  • ๐Ÿ“ Missing data files: Ensure movies.csv and ratings.csv are in the data/ directory
  • ๐Ÿ Memory issues: For large datasets, consider using data sampling or increasing system memory
  • ๐Ÿ”Œ Port conflicts: If port 7860 is occupied, Gradio will automatically use the next available port

๐ŸŽฏ Next Steps

  • ๐Ÿ“š Explore the API documentation for programmatic access
  • ๐Ÿ“Š Check out the visualization outputs to understand your data better
  • ๐Ÿ”ฌ Modify the model parameters in train.py to experiment with different approaches