ML-powered fantasy cricket team predictor for IPL. Trained on every ball-by-ball IPL delivery from CricSheet (2008–2026), with a calibrated picker that handles pre-toss / post-toss workflows, impact-sub rules, opposition matchups, and venue/form penalties.
Built for the IPL 2026 season — all picker rules below were added incrementally after a real pick failed, so the model learns from match-day mistakes rather than backtest fitting.
| Layer | What |
|---|---|
| Data | 1,235+ IPL T20 matches, 27,900+ player-match rows, exact-spec Dream11 fantasy points calculated from raw deliveries |
| Model | LightGBM quantile regression (floor / expected / ceiling), MAE ±34 pts on holdout |
| Optimizer | PuLP ILP with role/team/credit/impact-sub constraints, Monte Carlo C/VC selection (5,000 sims) |
| UI | Next.js dashboard with per-pick reasoning, multiplier breakdown, captain probability chart, backtest history |
| Workflow | Pre-toss prediction with probable XI → single API call promotes to post-toss with confirmed XI + impact subs |
cd dream11_engine
./run.shOn first run (~6 min total) the script will:
- Create a Python 3.12 venv and install all dependencies
- Download CricSheet's IPL JSON archive (~4 MB)
- Ingest 1,200+ matches into SQLite
- Apply role classification + manual overrides
- Train three LightGBM quantile models
- Install Next.js dependencies
- Start both the FastAPI backend and Next.js frontend
Then visit:
- UI → http://localhost:3000
- API docs → http://localhost:8000/docs
See dream11_engine/README.md for the full runbook, the picker rules learned during IPL 2026, and the API reference.
For every player in the pool, the model stacks adjustments:
LightGBM quantile models → raw_floor, raw_expected, raw_ceiling
↓
Multipliers applied in order:
1. due-factor uplift (consistent player below avg → bounce-back)
2. recency penalty (recent_5 vs season → penalize fade)
3. venue-affinity penalty (<0.65 career avg here → -15%)
4. opposition-affinity penalty (<0.65 career avg vs them → -15%)
5. batting-position penalty (#6+ non-finisher → conditional on collapse)
6. wicket-rate boost (bowlers ≥1.3 wkts/m → +6 to +12%)
7. cold-start clamp (n_history <15 → 0.50–0.85× cap)
8. RAG signal multiplier (injury / dropped / hot-streak / etc.)
9. recent-form blender (0.6·model + 0.4·clipped recent_5)
10. H2H reversion (last_match_vs_opp ≥2× → −22%, capped for veterans)
11. impact-sub EV discount (named subs × 0.55 — may not come on)
↓
ILP optimizer (PuLP):
maximize 0.55·E + 0.20·floor + 0.25·ceiling − 0.04·ownership − inexperience_pen
subject to:
- 11 players total
- 1-4 WK · 3-6 BAT · 2-4 AR · 3-4 BOWL
- ≤100 credits, max 7 per team, ≤1 impact sub per team
↓
Monte Carlo C/VC (5,000 lognormal sims):
Captain eligibility: n_history ≥ 15 AND recent_5 ≥ 35 AND not OOF
VC eligibility: same + consistency_cv < 0.95 (no feast-or-famine players for 1.5×)
Every pick traces back to specific numbers. Click any player row in the UI and you see five sections:
- PREDICTION — Floor / Expected / Ceiling / Ownership / Credits / P(top scorer)
- CAREER SIGNAL — EWMA / Recent-5 / Form gap / CV / Batting position / Wickets per match
- MATCHUP — Venue affinity / Opposition affinity / Last H2H score
- MULTIPLIERS APPLIED — Only shows non-1.0× adjustments
- REASONING — SHAP-style top contributing factors as readable bullets
- Pre-toss vs post-toss as a first-class state. Fixtures are tagged
pre_toss(probable XI, no impact subs) orpost_toss(confirmed XI + named impact subs). UI shows different banners and the optimizer applies the impact-sub-max-1-per-team rule only post-toss. - Impact subs get a 0.55× EV discount. They might not come on at all. Only get picked if elite at their role.
- VC gate on consistency_cv. A feast-or-famine player (CV > 0.95) at 1.5× multiplier costs you the contest.
- Veteran H2H cap. A 100-match veteran who scored 296 last time vs an opponent will likely revert — but the penalty is capped at 0.90× (vs 0.78× for unknowns). Career proves they're a match-winner.
- Manual role overrides persist.
backend/scripts/role_overrides.pyapplies fixes after everyauto_update(e.g. wicket-keeper-batters who haven't kept stumpings yet in our window).
Trained on 12,529 player-match rows from 2018–2026:
| Quantile | Holdout MAE |
|---|---|
| Floor (P10) | ±45.7 pts |
| Expected (P50) | ±34.5 pts |
| Ceiling (P90) | ±66.5 pts |
Captain hit rate (predicted top-scorer = actual top-scorer): 8.1% vs random 9%. Direct top-scorer prediction is genuinely hard — but that's not the metric that wins Dream11. The Monte Carlo C/VC selector, which consumes the floor/expected/ceiling distribution, is what actually drives captain choice.
| Layer | Tech |
|---|---|
| Data | CricSheet (open data), SQLite |
| Features | NumPy, Pandas |
| ML | LightGBM (quantile regression) |
| Optimization | PuLP / CBC |
| Simulation | NumPy, SciPy (lognormal Monte Carlo) |
| API | FastAPI + Uvicorn |
| Frontend | Next.js 16, React 19, Recharts, TypeScript |
| Storage | SQLite (data), joblib (models) |
.
├── dream11_dashboard.jsx # Original UI design reference (single-file React)
└── dream11_engine/
├── README.md # Detailed run instructions + every picker rule
├── run.sh # One-command launcher
├── backend/
│ ├── scoring.py # Dream11 T20 fantasy-points formula (verbatim spec)
│ ├── db.py # SQLite schema
│ ├── features.py # EWMA, recent-5, due-factor, H2H, venue/opp
│ ├── train.py # LightGBM quantile training
│ ├── predict.py # Predict → optimize → MC captain → differentials
│ ├── rag.py # Signal store + classification
│ ├── api.py # FastAPI endpoints
│ ├── live_fixtures.py # Pre-toss / post-toss fixture registry
│ └── scripts/
│ ├── ingest_cricsheet.py
│ ├── refine_roles.py
│ ├── role_overrides.py
│ └── auto_update.py
└── frontend/ # Next.js 16 + Recharts dashboard
MIT