Utils Module
ContentModel
A content-based recommendation model using TF-IDF and cosine similarity.
This class implements a content-based filtering approach for movie recommendations. It uses TF-IDF vectorization on movie genres and computes cosine similarity to find movies similar to those rated by users.
Source code in src/utils.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
__init__(movies)
Initialize the ContentModel with movie data.
This constructor sets up the TF-IDF vectorizer, computes the similarity matrix, and creates necessary mappings for efficient recommendation generation.
Source code in src/utils.py
content_recommendations(user_ratings, top_n=10)
Generate content-based movie recommendations using user ratings.
This method implements a content-based filtering algorithm that: 1. Matches user-provided movie titles to dataset titles using fuzzy matching 2. Calculates weighted similarity scores based on user ratings 3. Recommends movies most similar to highly-rated movies 4. Filters out movies the user has already rated
Source code in src/utils.py
find_closest_title(input_title)
Find the closest matching movie title in the dataset using fuzzy matching.
This method uses difflib to find the most similar movie title to the user's input, helping to handle typos and variations in movie titles.
Source code in src/utils.py
DataHandler
A class for handling movie and rating data loading and preprocessing.
This class provides methods to load movie and rating datasets from CSV files and preprocess the movie data by splitting genres and creating genre flags.
Source code in src/utils.py
__init__(data_path)
load_data(movies_file, ratings_file)
Load movie and rating data from CSV files.
This method reads the specified CSV files from the data directory and returns them as pandas DataFrames.
Source code in src/utils.py
preprocess_movies(movies)
Preprocess movie data by splitting genres and creating genre flags.
This method processes the 'genres' column by: 1. Splitting genre strings on '|' delimiter into lists 2. Creating binary flag columns for each unique genre 3. Setting flag values to 1 if movie belongs to that genre, 0 otherwise