RecommenderSystemsColab

Improving Deezer’s Music Recommendation Engine

This repository presents my work on Improving Deezer’s Music Recommendation Engine for the DSG17 Online Phase. This project is a work carried out at HSLU (Hochschule Luzern) by Gerber Andri. It delivers a complete, modular pipeline—from data exploration to advanced modeling—for predicting whether a user will listen to a track for more than 30 seconds.

For an in-depth discussion of the methodology and results, see the accompanying Gerber_Andri_Report.pdf.

Overview

The project is divided into three core notebooks:

EDA.ipynb
- Purpose: Conduct rigorous exploratory data analysis to assess data quality, detect outliers, and derive key temporal and behavioral features.
- Outcome: A detailed data dictionary and insights that drive subsequent feature engineering.
Preprocess_Feature_store.ipynb
- Purpose: Transform raw data into a unified feature store using chunk-wise loading, timestamp conversion, outlier filtering, and engineered features (e.g., user behavior, time bins).
- Outcome: Consistent, high-quality processed datasets for both training and testing.
Specific_Preprocess_and_Modelling.ipynb
- Purpose: Build and evaluate recommender models ranging from baseline methods (content-based, collaborative filtering, matrix factorization, PMF, RBM) to advanced approaches (Factorization Machines, NCF/NeuMF, GraphSAGE).
- Outcome: Models optimized primarily for ROC AUC, with additional metrics (precision, recall, F1) to ensure robust performance.

Key Insights

Feature Engineering: Advanced preprocessing (time-based, behavioral, and cross features) is critical for model success.
Modeling Strategies: Deep learning and graph-based methods (e.g., GraphSAGE) significantly outperform traditional baselines.
Evaluation: ROC AUC is the primary metric, ensuring threshold-independent performance even with imbalanced data.
Scalability: A modular pipeline allows for continuous updates and real-time integration in a production environment.

How to Run

EDA.ipynb:
- Open the notebook (in Colab or JupyterLab), load your raw data, and perform initial exploration.
Preprocess_Feature_store.ipynb:
- Run the training and test pipelines to generate processed datasets with uniform features.
Specific_Preprocess_and_Modelling.ipynb:
- Load the processed data, select and tune models, and generate final predictions for evaluation or deployment.

Final Thoughts

Our approach blends robust feature engineering with state-of-the-art modeling techniques to create a competitive recommendation engine. For further details please contact Andri Gerber¹

Enjoy exploring and extending this pipeline!

Email: andri.gerber\@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU. ORCiD ID.. ↩