Building a comprehensive recommendation system prototype for online ecommerce. Please kindly provide proper citation if you plan to use any part of the project.
With the accelerated online eCommerce scene driven by the contactless shopping style in recent years, having a great recommendation system is essential to the business' success. However, it has always been challenging to provide any meaningful recommendations with the absence of user interaction history, known as the cold start problem. In this project, we attempted to create a comprehensive recommendation system that recommends both similar and complementary products using the power of deep learning and visual embeddings, which would effectively recommend products without need any knowledge of user preferences, user history, item propensity, or any other data.
This project uses uv for dependency management.
# Install uv (if needed): curl -LsSf https://astral.sh/uv/install.sh | sh
uv syncActivate the virtual environment: source .venv/bin/activate (or use uv run which auto-uses it).
| Dataset | Purpose |
|---|---|
| Complete the Look (CTL) | Outfit compatibility (anchor/pos/neg triplets). Downloaded as fashion_v2. |
| Shop the Look (STL) | Similar product recommendations (alternative to CTL). |
| Street2Shop | Street photo → shop product matching (multi-task). Streamed from Hugging Face (no full local cache). |
| Polyvore | Additional outfit compatibility data, merged with CTL (multi-task). Streamed by default; use --download-images or add manually. |
STL/CTL setup: Clone both repos into src/dataset/data/:
- STL-Dataset for
fashion.json(STL) - complete-the-look-dataset for
raw_train.tsvandraw_test.tsv(CTL)
The preparation script fetches images from these metadata files.
- Download data:
uv run python -m src.data_pipeline.data_preparation stl_ctl(editconfigs/data_prep.yamlfor stl/ctl_train/ctl_test) - Get similar product embedding:
uv run python -m src.features.embeddings(2+ hours without GPU) - Recommend similar products:
uv run python -m src.recommend_cli - Streamlit UI:
uv run streamlit run streamlit_app.py
- Download data:
uv run python -m src.data_pipeline.data_preparation stl_ctl(setctl_train: trueandctl_test: trueinconfigs/data_prep.yaml) - Train compatible model:
uv run python -m src.models.compatibility_trainer - Get compatible product embedding:
uv run python -m src.features.embeddings(see__main__in embeddings.py) - Evaluate:
uv run python -m src.models.evaluation - Recommend compatible products:
uv run python -m src.recommend_cli(selectrecommend_complementary_productsin__main__)
Trains on three data sources: CTL provides the compatibility base; Street2Shop adds street-to-shop robustness; Polyvore augments compatibility triplets.
- Edit
configs/data_prep.yamlfor paths and options - Download CTL:
uv run python -m src.data_pipeline.data_preparation stl_ctl(setctl_train: true,ctl_test: true) - Download Street2Shop:
uv run python -m src.data_pipeline.data_preparation street2shop - Prepare Polyvore:
uv run python -m src.data_pipeline.data_preparation polyvore(setdownload_images: truefor images) - Train:
uv run python -m src.models.compatibility_trainer --config configs/train_multitask.yaml
See docs/street2shop.md and docs/polyvore.md for details.
Samples of similar product recommendation (on the left is the query product, on the right is the top 5 recommended similar products)
Samples of compatible product recommendation (on the left is the query product, on the right is the top 5 recommended compatible products)

