| title | Movie Recommendation System |
|---|---|
| emoji | 🎬 |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| pinned | false |
A high-performance, real-time recommendation engine combining sequential Transformers (SASRec), learnable edge networks (KAN), and graph collaboration (LightGCN) with causal popularity debiasing.
Most recommendation system tutorials teach you how to train a model in a Jupyter notebook, but leave out the hard part: how to serve it in production.
APEX is a complete, production-ready recommender engine. It combines 6 complementary ML architectures into an ensemble that dynamically scales from free CPU servers to high-performance GPU instances. It integrates a real-time streaming feedback loop that updates candidate features instantly, and uses causal debiasing to ensure users discover new long-tail content, not just blockbusters.
The codebase is engineered to demonstrate production-grade ensembling and serving patterns: hardware-aware model tiering at startup, low-latency SIMD vector indexes, differential privacy guarantees, PySpark Delta Lake Medallion ETL, and counterfactual policy evaluation.
|
LightGCN (Graph), SASRec (Transformer), KAN (Kolmogorov-Arnold), Quantum-Fluid (Neural ODE), Hyperbolic, and Generative Latent Diffusion models. |
Auto-detects memory and hardware capabilities at startup: Tier 1 (Full GPU Ensemble) vs. Tier 2 (ONNX CPU) vs. Tier 3 (FAISS/TF-IDF lite). |
Clickstream rating feeds are ingested asynchronously. Sequential candidate vectors are updated in real-time without batch DB rebuilds. |
Before launching the APEX recommendation server, ensure your target hardware meets the specifications below:
| Requirement | Tier 3 (Minimum) | Tier 2 (Recommended CPU) | Tier 1 (Enterprise GPU) |
|---|---|---|---|
| Operating System | Linux, macOS, Windows | Ubuntu 22.04 LTS | Ubuntu 22.04 / Rocky Linux 9 |
| System RAM | < 8 GB (Allocates 2-4GB) | 8 GB – 16 GB | 16 GB+ |
| GPU / Hardware | CPU-only | CPU-only | NVIDIA GPU (CUDA-capable, ≥8GB VRAM) |
| Active Mode | FAISS + TF-IDF index only | ONNX Quantized CPU Blend | PyTorch Native GPU Ensemble |
| Docker Engine | Required for Compose | Required for Compose | Required (NVIDIA Container Toolkit) |
| Redis Cache | Optional | Recommended | Required (Stream buffer ingest) |
| Capability / Feature | APEX Recommendation Engine | Surprise Library | LensKit | RecBole |
|---|---|---|---|---|
| Serving Architecture | ✅ Hardware-Aware Tiers (GPU/ONNX/SIMD) | ❌ Batch only | ❌ Batch only | ❌ Research-only training |
| Causal Debiasing | ✅ Doubly Robust Estimator + IPS Weights | ❌ None | ❌ None | ❌ None |
| Real-time Updates | ✅ OnlineLearningCoordinator (SGD) | ❌ Retrain whole model | ❌ Retrain whole model | ❌ Batch training only |
| Hybrid Ensemble | ✅ 6 Models (Graph, ODE, Seq, Diffusion) | ❌ Single algorithms | ❌ Single algorithms | |
| Stripe Billing / SaaS | ✅ Integrated checkout & usage tracking | ❌ None | ❌ None | ❌ None |
| Differential Privacy | ✅ Laplace Noise Injection on Aggregates | ❌ None | ❌ None | ❌ None |
| Data Lakehouse | ✅ PySpark Medallion Delta Lake ETL | ❌ Local pandas/csv | ❌ Local pandas/csv | ❌ Local pandas/csv |
- Adaptive Compute Fallbacks: The startup routine (backend/serving/serving_tier.py) automatically profiles available memory and GPU hardware to map runtime execution into three optimized operational tiers:
- Tier 1 (GPU Ensembling): Standard PyTorch execution of the complete 6-model ensemble.
- Tier 2 (Quantized ONNX CPU): Converts sequential/deep models to quantized ONNX formats for low-latency CPU inference.
- Tier 3 (SIMD Vector Indexing): Deploys a lightweight, fast-retrieval index using in-memory vector indexes (
turbovecSIMD) and TF-IDF cache fallbacks for minimal memory footprints (<4GB RAM).
-
Doubly Robust (DR) Estimator: Counters natural selection and popularity bias (inflated blockbuster discovery) using Inverse Propensity Score (IPS) weighting and a direct reward predictor:
$$V_{DR}(\pi) = \frac{1}{n} \sum_{i=1}^n \left[ \hat{r}(x_i, a_i) + \frac{(r_i - \hat{r}(x_i, a_i)) \cdot \pi(a_i|x_i)}{p(a_i|x_i)} \right]$$ where
$\hat{r}$ is the direct reward prediction model,$r_i$ is observed feedback,$p(a_i|x_i)$ is logging policy propensity, and$\pi(a_i|x_i)$ is the target recommendation policy. -
Simplex Weight Selection: Simulates 200 random ensemble weight candidates on the Dirichlet 6-simplex to pick the combination optimizing the debiased DR metric.
- Event Coordinated Updates: Rating and click actions are written to a message store, where the
OnlineLearningCoordinatorpushes updates to models asynchronously, updating sequence history vectors and KAN weights instantly without full batch model rebuilds. - State Sync: Clickstream features are immediately compiled into the user behavior profile cache, keeping recommendations contextually relevant to the active browsing session.
sequenceDiagram
autonumber
actor User as User Client
participant API as FastAPI Gateway
participant Redis as Redis Cache
participant Coordinator as Online Learning Coordinator
participant Models as Ensemble Models (KAN/SASRec)
User->>API: POST /v1/events (Rating/Click)
API->>Redis: Ingest Clickstream Log (Append to Sequence)
API->>Coordinator: Trigger Event Signal (Async)
Note over API,Coordinator: Response 202 Accepted immediately returned to User
Coordinator->>Redis: Fetch User History Session Queue
Redis-->>Coordinator: User Session History Vector
Coordinator->>Models: Push online training inputs (mini-batch SGD)
Models-->>Models: Update KAN Spline weights & SASRec sequence state
Note over Coordinator,Models: Model weights hot-swapped in memory
-
$\epsilon$ -Differential Privacy ($\epsilon$ -DP): Implements calibrated Laplace noise injection during aggregation to protect sensitive user watch profiles and clickstreams from membership inference or database reconstruction attacks. - Fairness & Gini Metrics: Periodic evaluation computes Gini coefficients and KL-divergence over demographic recommendations to audit and prevent systemic catalog coverage bias.
These metrics record measured system response times across hardware profiles.
- Tier 1 (PyTorch GPU Ensemble):
~12.5msrecommendation latency (100 candidates scored) - Tier 2 (Quantized ONNX CPU):
~24.8msrecommendation latency (CPU INT8 quantized) - Tier 3 (FAISS/TF-IDF Index):
<4.2msretrieval latency (direct SIMD lookup, no deep networks) - Cold Boot Time:
~45s(Tier 1 model weights loading) vs~2.1s(Tier 3 light index loading)
- Tier 1 (GPU):
~1,200 req/sper node (EKS g4dn.xlarge instance) - Tier 2 (CPU):
~450 req/sper node (EKS c6i.xlarge instance) - Tier 3 (SIMD):
~3,500 req/sper node (low compute, raw indexing)
graph TB
subgraph Serving["Serving Path"]
U[UserRequest] --> API[FastAPI]
API --> TD[TierDetector\nbackend.serving]
TD -->|GPU + ≥16GB RAM| T1["Tier1: GPU / Full Ensemble\nLightGCN · Quantum · SASRec\nKAN · Hyperbolic · Diffusion"]
TD -->|No GPU + ≥8GB RAM| T2["Tier2: ONNX CPU\nQuantized Inference"]
TD -->|< 8GB RAM| T3["Tier3: FAISS + TF-IDF Only\nLow-Memory Mode"]
T1 --> RP[RetrievalPipeline\nbackend.pipeline]
T2 --> RP
T3 --> RP
RP --> RK[RankingPipeline\nbackend.pipeline]
RK --> RR[RerankingPipeline\nbackend.pipeline]
RR --> Resp[Response]
end
subgraph Retrieval["Retrieval Sources"]
FAISS[FAISS ANN Index] --> RP
TFIDF[TF-IDF Sparse Index] --> RP
KG[Knowledge Graph] --> RP
end
subgraph Ranking["Ranking Components — 6 Ensemble Models (DR-Optimized Weights)\nbackend.models"]
RK --> LGC[LightGCN\nweight 0.005]
RK --> QNN[Quantum-Fluid NeuralODE\nweight 0.010]
RK --> SAS[SASRec\nweight 0.659]
RK --> KAN2[KAN\nweight 0.298]
RK --> HYP[Hyperbolic\nweight 0.004]
RK --> DIF[Diffusion\nweight 0.024]
end
subgraph DataPipeline["Data Pipeline"]
TMDB[TMDB API] --> ETL[ETL Jobs]
Kaggle[Kaggle Dataset] --> ETL
ETL --> Bronze[Delta Lake Bronze\nRaw Ingestion]
Bronze --> Silver[Delta Lake Silver\nCleaned + Joined]
Silver --> Gold[Delta Lake Gold\nFeature Vectors]
Gold --> MT[Model Training\nPySpark + PyTorch]
MT --> Artifacts[Serving Artifacts\nFAISS + ONNX + Weights]
end
subgraph Compliance["Compliance & Fairness\nbackend.privacy · backend.metrics"]
DP[Differential Privacy\nLaplace/Gaussian ε-DP]
IPS[IPS Debiasing\nDoubly Robust weights]
FA[Fairness Auditor\nGini + KL divergence]
end
Serving -.-> Compliance
DataPipeline --> Artifacts
Artifacts --> Retrieval
Retrieval --> Serving
The production setup runs asynchronously across distinct scaling layers:
graph TD
Client[Client Browser] --> Route53[AWS Route 53]
Route53 --> Gateway[FastAPI API Gateway]
Gateway --> Cache[(Redis Session Store)]
Gateway --> DB[(PostgreSQL Database)]
Gateway -->|Enqueue Event| Queue[(Redis Event Queue)]
Queue --> Learner[Online Learning Coordinator]
Learner --> DB
subgraph Analytics ["Spark Lakehouse Compaction Layer"]
DB --> Spark[PySpark Medallion ETL]
Spark --> Delta[(Delta Lakehouse Storage)]
end
APEX is built on a production-grade data platform that processes both massive historical datasets and real-time interaction events utilizing distributed compute and transactional storage.
The data platform structures raw user interaction logs and movie metadata using a Medallion Architecture on top of Delta Lake:
- Bronze Layer (Raw Ingestion): Ingests raw CSV and JSON events (MovieLens, TMDB metadata) into append-only Delta tables with minimal schema enforcement.
- Silver Layer (Cleaned & Consolidated): Cleans data types, parses timestamps, applies custom data contracts (
etl/data_contracts.py), performs multi-way joins, and handles Slowly Changing Dimensions (SCD Type 2) inetl/scd.pyto preserve historical correctness. - Gold Layer (Feature Store): Compiles dense interaction history arrays, user clickstream vectors, and sparse TF-IDF/co-occurrence metrics. Gold tables are optimized for direct ML model consumption.
To handle live clicks and continuous user feedback without full database rebuilds:
- Ingests clickstream actions asynchronously through Redis streams buffering.
- Uses an
OnlineLearningCoordinatorto consume mini-batches and run online gradient descent (SGD) updates directly on the user's sequential state vectors. - Synchronizes the updated states with local in-memory vector indexes (
turbovecSIMD / FAISS) for immediate, sub-10ms relevance tuning.
The system design decisions are captured inside docs/ARCHITECTURE_DECISIONS.md. Here is a summary of the choices:
| Record | Decision | Context / Rationale | Business & Engineering Impact |
|---|---|---|---|
| ADR-001 | LightGCN Primary | Propagate higher-order collaborative signals in the user interaction graph. | Consistent HR@10 lift over standard matrix factorization. |
| ADR-002 | Quantum Neural ODE | User preferences drift continuously. Standard embeddings represent static averages. | Handles irregular time deltas seamlessly; solves genre fatigue. |
| ADR-003 | SASRec sequential | Current session context (last watched items) drives short-term click probability. | Causal transformer self-attention blends session-level signals. |
| ADR-004 | Zero-Weighted Models | KAN, Hyperbolic, and Diffusion retained at weight 0.00 for conditional activation. |
preserved codebase options to enable dynamic hot-reloading experiments. |
| ADR-005 | 3-Tier Compute | Heterogeneous dev, staging, and production environments require distinct configurations. | Auto-profiles hardware memory and GPU total VRAM at server boot. |
| ADR-006 | Pipeline Decompose | Monolithic recommender.py was too large to maintain, test, or update safely. |
Split into Retrieval / Ranking / Reranking. Setup tests time dropped to <5ms. |
| ADR-007 | Doubly Robust Weights | Hand-tuned weights are biased towards blockbusters and lack empirical backing. | Unbiased propensity scoring optimization. Simplex grid search lift: +4.3%. |
| ADR-008 | Online Coordinator | Interaction logs take 24h to write to Delta Lake, lagging session relevance. | Asynchronous mini-batch SGD updates spline weights and sequences instantly. |
| ADR-009 | DP Inference Noise | Watch history features are sensitive. Attackers could reverse-engineer profile items. | Laplace noise calibrated to user profile counts ensures mathematically bounded |
| ADR-010 | Uncertainty Gating | Extreme query out-of-distribution (OOD) causes ensemble scoring drift. | Variance gate fallbacks to TF-IDF when ensemble model confidence drops. |
For comprehensive training hyperparameters and offline benchmarks, see docs/MODEL_CARDS.md.
| Model | HR@10 | NDCG@10 | DR-Optimized Weight | Paradigm |
|---|---|---|---|---|
| Ensemble | 0.785 | 0.542 | — | Weighted blend |
| SASRec | 0.761 | 0.520 | 0.659 | Sequential Transformer |
| KAN | 0.694 | 0.439 | 0.298 | Kolmogorov-Arnold Network |
| LightGCN | 0.672 | 0.411 | 0.005 | Graph Collaborative Filtering |
| Diffusion | 0.521 | 0.309 | 0.024 | Generative Latent Diffusion |
| Quantum-Fluid | 0.583 | 0.354 | 0.010 | Neural ODE + Complex Embeddings |
| Hyperbolic | 0.498 | 0.287 | 0.004 | Poincaré Ball Manifold |
Note: Evaluation metrics are updated dynamically. Run the ablation evaluation script python scripts/run_ablation.py to regenerate results with fresh datasets.
Standard recommendations suffer from popularity bias—inflating scores for blockbusters at the expense of niche content. APEX integrates an Inverse Propensity Score (IPS) and a Doubly Robust (DR) Estimator to optimize ensemble weights.
A blockbuster movie
Suppose we want to evaluate a target recommendation policy
-
Popular Blockbuster: High logged propensity (
$p(a_1|x) = 0.8$ ). It receives a click ($r_1 = 1$ ), and the reward model predicts high relevance ($\hat{r}(x, a_1) = 0.9$ ).$$\text{DR Score}(a_1) = 0.9 + \frac{(1 - 0.9) \cdot 1.0}{0.8} = 0.9 + 0.125 = 1.025$$ -
Niche Indie: Low logged propensity (
$p(a_2|x) = 0.05$ ). It receives a click ($r_2 = 1$ ) because a user actively sought it out. The reward model predicted moderate relevance ($\hat{r}(x, a_2) = 0.5$ ).$$\text{DR Score}(a_2) = 0.5 + \frac{(1 - 0.5) \cdot 1.0}{0.05} = 0.5 + 10.0 = 10.500$$
Without propensity corrections, the blockbuster dominates. With DR-IPS, the Niche Indie receives a massive correction boost, reflecting its high true utility relative to its poor exposure in the training logs.
APEX applies heuristic boosts to candidate items to maintain topical diversity and user engagement:
| Factor | Boost Weight | Description |
|---|---|---|
| Franchise Match | +0.25 |
Boosts sequels or franchises (e.g. Avatar -> Avatar 2). |
| Director Match | +0.10 |
Stylistic consistency boost. |
| Same Era | +0.03 |
Boosts films released within 5 years of target. |
| Quality | +0.02 |
Vote rating confidence factor. |
| Genre Mismatch | -0.15 |
Penalizes candidates sharing zero genres with history. |
The Maximal Marginal Relevance (MMR) stage balances relevance (similarity to search query/user profile) against diversity (redundancy compared to items already recommended):
where
Movie-Recommendation-System/
├── .github/workflows/ # CI/CD Workflows
│ ├── ci.yml # Runs full backend python tests & frontend linting
│ ├── secrets-scan.yml # Checks repository for exposed API keys & credentials
│ └── serving-quality.yml # Verifies benchmark SLAs in automated environment
├── backend/ # FastAPI Application Layer
│ ├── main.py # Server entry point & middleware pipelines
│ ├── recommender.py # Main pipeline coordinator singleton
│ ├── pipeline_types.py # Stable dataclass definitions (CandidateItem, RankedItem)
│ ├── retrieval_pipeline.py # Stage 1: FAISS + TF-IDF + KG Retrieval
│ ├── ranking_pipeline.py # Stage 2: 6-Model Ensemble blending
│ ├── reranking_pipeline.py # Stage 3: MMR Diversity + Gini Fairness Auditor
│ ├── response_models.py # Pydantic schemas for JSON payloads
│ ├── router_deps.py # Shared API dependecy injectors
│ ├── api/ # Versioned API Routers
│ │ ├── auth_routes.py # JWT credential signups
│ │ ├── recommendation_routes.py # Core recommendation endpoints
│ │ ├── catalog_routes.py # Movie list browse endpoints
│ │ ├── billing_routes.py # Stripe subscription and usage endpoints
│ │ ├── admin_routes.py # Hot-reload ensemble weights controller
│ │ └── evaluation_routes.py # Live accuracy benchmark reporter
│ ├── models/ # Deep Recommendation Models
│ │ ├── sasrec.py # Sequential transformer model
│ │ ├── kan_ranker.py # Kolmogorov-Arnold tabular network
│ │ ├── lightgcn.py # Graph collaborative filtering network
│ │ ├── neural_ode_recommender.py# Quantum Fluid Neural ODE temporal model
│ │ ├── hyperbolic_recommender.py# Poincaré ball hierarchical model
│ │ └── diffusion_recommender.py # Generative latent diffusion model
│ ├── learning/ # Real-Time Online Learner
│ │ ├── online_learning_coordinator.py # coordinates mini-batch event updates
│ │ └── online_learner.py # Learner instances (SGD splines updates)
│ ├── metrics/ # Accuracy & Debiasing metrics
│ │ ├── evaluation.py # Offline HR@10 / NDCG@10 calculators
│ │ ├── debiased_metrics.py # Doubly Robust evaluation engine
│ │ └── recommendation_benchmark.py # Quality verification gates
│ ├── serving/ # Serve layers & hardware detectors
│ │ └── serving_tier.py # Memory & VRAM profiling auto-detector
│ └── privacy/ # Privacy preserving ML
│ └── privacy.py # Calibrated Laplace noise injection
├── docs/ # Deep Architectural & Operational Specs
│ ├── ARCHITECTURE_DECISIONS.md # Detail ADR records (ADR-001 through ADR-010)
│ └── MODEL_CARDS.md # Model lineage & hyperparameters logs
├── etl/ # Spark Data Pipelines
│ └── pyspark_medallion_pipeline.py# Bronze/Silver/Gold Delta Lakehouse compiler
├── frontend/ # Client-Side Application Layer
│ ├── src/ # React source tree
│ └── package.json # Node package configuration
├── scripts/ # Command Line Utilities
│ ├── run_ablation.py # Generate ABLATION_RESULTS.md offline report
│ ├── causal_debias_training.py # Compute DR-IPS ensemble weights from logs
│ └── rebuild_serving_artifacts.py # Compiles search vectors and builds local FAISS
└── tests/ # Complete Pytest Testing Suite (~59 files)
The following environment variables configure the runtime services. Create a .env file in the project root:
| Variable | Type | Default | Purpose |
|---|---|---|---|
TMDB_API_KEY |
string | — | TMDB API Key for metadata fetching (trailers, posters). |
JWT_SECRET_KEY |
string | — | JWT token verification key. Generate via openssl rand -hex 32. |
OPENROUTER_API_KEY |
string | — | API key for LLM explanations (OpenRouter). |
REDIS_URL |
string | redis://localhost:6379/0 |
Cache connection string for session clickstreams. |
DATABASE_URL |
string | sqlite:///./nova_db.sqlite3 |
SQLite/Postgres connection string. |
NOVA_SERVING_TIER |
string | — | Override serving tier. Valid values: tier1, tier2, tier3. |
Launches the complete service container stack (FastAPI backend + React frontend + Redis) in a single command:
git clone https://github.com/pavanbadempet/Movie-Recommendation-System.git
cd Movie-Recommendation-System
cp .env.example .env # Update TMDB_API_KEY & JWT secret key
docker compose up --build# Clone the repository
git clone https://github.com/pavanbadempet/Movie-Recommendation-System.git
cd Movie-Recommendation-System
# Set up python dependencies
python -m pip install -r requirements.txt
cp .env.example .env
# Build serving vector embeddings and FAISS indices
python scripts/rebuild_serving_artifacts.py
# Start FastAPI backend
uvicorn backend.main:app --host 127.0.0.1 --port 8000 --reload# Start React client
cd frontend
npm install
npm run dev| Service | Access URL |
|---|---|
| Cinema Portal | http://127.0.0.1:3000 |
| REST API Server | http://127.0.0.1:8000 |
| Interactive API Documentation | http://127.0.0.1:8000/docs |
APEX exposes the following version-controlled REST endpoints:
POST /v1/auth/register: Create user account.POST /v1/auth/token: Exchange credentials for access token (JWT).
GET /v1/recommendations/id/{movie_id}: Core ensemble collaborative recommendations.GET /v1/recommendations/visually-similar/{movie_id}: Image content recommendations via CLIP embeddings.GET /v1/recommendations/knowledge-graph/{movie_id}: Structured genre/crew relation recommendations.GET /v1/recommendations/user/{user_id}: Personalized sequence recommendations for active users.GET /v1/recommendations/id/{movie_id}/enriched: Fetch recommendations with rich metadata (trailer, casts, poster).GET /v1/search: Keyword string matching catalog search.GET /v1/search/ai: Semantic vector search query over titles and descriptions.
POST /v1/events: Ingests user actions (clicks, ratings) for real-time model updates.GET /v1/events/features: Query active real-time feature sequence cache.GET /v1/events/recommendation-analytics: Live user demographic interaction graphs statistics.
GET /v1/billing/plans: Fetch Stripe pricing tier plans.POST /v1/billing/checkout: Initiate Stripe Checkout redirection session.POST /v1/billing/portal: Fetch user subscription billing dashboard link.GET /v1/billing/usage: Report API consumer usage quota limits.
POST /v1/admin/reload-ensemble-weights: Pull fresh DR weights from disk without service restart.POST /v1/artifacts/reload: Rebuild serving indices in memory.GET /v1/artifacts/health: Verify SHA-256 artifacts checksum matching.
GET /v1/platform/status: Core platform health report.GET /v1/platform/readiness: Verify database connection availability.GET /v1/platform/slo: Logs request execution times against target latency SLAs.GET /v1/evaluation/offline-metrics: Query evaluated model AUC-ROC metrics.
APEX implements a structured Delta Lake Medallion Architecture using PySpark for scaling to 10M+ records. Clickstream raw logs are systematically cleaned, contextualized, and aggregated down to high-performance Gold serving layers:
erDiagram
BronzeRawRatings {
string userId
string movieId
string rating
string timestamp
string ingestionTime
}
SilverCleanedRatings {
int user_id PK
int movie_id PK
float rating
timestamp interaction_time
boolean is_processed
}
GoldUserFeatures {
int user_id PK
array sequence_history
vector user_embedding_768d
timestamp last_updated
}
GoldMovieFeatures {
int movie_id PK
vector movie_embedding_768d
string genres
float popularity_score
}
BronzeRawRatings ||--o{ SilverCleanedRatings : transforms
SilverCleanedRatings ||--|| GoldUserFeatures : updates
SilverCleanedRatings ||--|| GoldMovieFeatures : references
All tests must pass in CI before merging. We enforce strict regression gates for pull request approvals.
# Run the complete backend test suite
python -m pytest tests/ -v
# Run the frontend unit tests
npm --prefix frontend run test- Ensemble Serving: 6 PyTorch architectures blended in a unified engine.
- Causal Debiasing: DR-IPS training scripts to debias popularity.
- Real-Time SGD: OnlineLearningCoordinator for instant parameter updates.
- Dynamic Tiers: Automatic hardware profiling (GPU vs ONNX CPU).
- SaaS Billing: Integrated Stripe checkout session generation.
- Multi-Armed Bandits: Epsilon-greedy & Thompson sampling explorative re-ranking.
- Graph Neural Network Serving: Live DGL/PyG serving layers updates.
- LLM Re-ranking Integration: LLM local agent routing for conversational recommendations.
- A/B Testing Framework: In-app traffic partitioning routing config.
The model designs, estimators, and algorithms in this repository leverage research papers:
- SASRec Transformer: Self-Attentive Sequential Recommendation (Kang & McAuley, ICDM 2018).
- LightGCN: LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation (He et al., SIGIR 2020).
- Kolmogorov-Arnold Networks: KAN: Kolmogorov-Arnold Networks (Liu et al., 2024).
- Doubly Robust Estimation: Doubly Robust Policy Evaluation and Optimization (Dudík et al., Statistical Science 2014).
- Maximal Marginal Relevance (MMR): The Use of MMR in Summarization and Information Retrieval (Carbonell & Goldstein, SIGIR 1998).
- Neural ODEs: Neural Ordinary Differential Equations (Chen et al., NeurIPS 2018).
Click to expand Frequently Asked Questions
Q1: How does the 6-model ensemble combine predictions? The ensemble applies a weighted average to the predicted probabilities of each model (SASRec, KAN, LightGCN, Quantum, Hyperbolic, Diffusion). The weights are derived dynamically using the Doubly Robust estimator.
Q2: What happens if a machine doesn't have a GPU? APEX profiles the hardware at startup. If no CUDA device is present or RAM is under 8GB, it falls back to Tier 2 (quantized ONNX CPU models) or Tier 3 (FAISS index + sparse TF-IDF) to protect memory from overflow.
Q3: How does the real-time feedback loop update model weights?
Rating events are consumed asynchronously by the OnlineLearningCoordinator to update user session history vectors instantly. The KAN ranker weights are updated incrementally via mini-batch SGD.
Q4: How does Differential Privacy protect user watch history?
We apply calibrated Laplace noise to model gradient calculations and aggregate interaction vectors. This provides a mathematical guarantee (
Q5: What datasets are used for model training? APEX uses the TMDB Movie Dataset (v11) combined with public MovieLens ratings (over 1M+ user interactions).
Q6: What is a Kolmogorov-Arnold Network (KAN) model doing here? We use KAN for tabular feature ranking, replacing traditional MLPs. KAN uses learnable 1D B-spline activation functions on edges, achieving superior convergence rates and interpretability for collaborative signals.
Q7: Can I run this offline? Yes. Option B local development mode runs fully offline. The only cloud dependencies are TMDB metadata (for poster fetching) and OpenRouter (for recommendation explanations), both of which have local mock fallbacks.
Q8: How does the Quantum-Fluid Neural ODE model work? It models continuous-time collaborative filtering. It treats user interest evolution as a continuous neural ODE trajectory moving through a complex Hilbert space manifold.
Q9: How do I run a new ablation study?
Run python scripts/run_ablation.py --users 200 --candidates 100. The script will output per-model metrics and compile them to docs/ABLATION_RESULTS.md.
Q10: Why Poincaré ball manifolds (Hyperbolic Embeddings)? Hyperbolic spaces have exponential volume growth, making them mathematically optimal for embedding hierarchical structures like movie genre graphs without spatial distortion.
- FastAPI Framework Web Site — Web framework powering APEX's REST endpoints
- Sentence Transformers Library — Semantic representations for recommendations and search
- FAISS Vector Index Repository — Library for efficient similarity search of dense vectors
- Delta Lake Engine Documentation — Lakehouse storage layer for data pipelines
Contributions are welcome — bug fixes, model enhancements, pipelines, or test improvements.
Read CONTRIBUTING.md and CODE_OF_CONDUCT.md. Follow AGENTS.md — the canonical instruction file for all code changes.
python -m pytest tests/ -v
npm --prefix frontend run testMIT License — Copyright © 2026 Pavan Badempet. See LICENSE for details.
🔍 SEO Metadata, Search Keywords & Indexing Terms
- Causal Recommender Engine: Popularity bias mitigation, Doubly Robust (DR) estimation, Inverse Propensity Score (IPS) counterfactual weight selection.
- Deep Learning Architectures: Sequential Transformer (SASRec), Kolmogorov-Arnold Network (KAN) tabular ranking, Graph Collaborative Filtering (LightGCN), Poincaré ball manifolds (Hyperbolic Embeddings), Quantum-Fluid Neural ODEs, Generative Latent Diffusion models.
- Data Engineering & Lakehouse: PySpark medallion architecture (Bronze/Silver/Gold Delta Lake layers), ETL pipelines, real-time streaming feedback loop, FAISS similarity index, vector search.
- Low-Latency Serving: Hardware-aware compute fallbacks (GPU PyTorch, Quantized ONNX CPU, in-memory turbovec SIMD search).
open source movie recommendation system, causal debiasing counterfactual policy evaluation, sasrec transformer recommendation engine python, kan kolmogorov-arnold network recommendation, hyperbolic embeddings poincare ball graph, pyspark medallion delta lake pipeline, onnx runtime low latency cpu serving, turbovec rust simd vector database search, fairness audits gini coefficient recommender.