Skip to content

hanad28/market-making-hackathon

Repository files navigation

QMML Market Making Hackathon — Chanko Trading

Result: 11th of 23 teams · +£46,742.40 net P&L · 46.7% return on a £100k starting bankroll

95 teams registered. 23 competed on the day. A live algorithmic trading competition run by the Queen Mary Machine Learning Society across 9 rounds.


What the problem actually required

Standard regression gets you a fair-value estimate. Market making requires something harder: a calibrated spread. Submit too tight and you become the market maker, forced to accept every trade at prices that may be far from the true value. Submit too wide and you are safe but uncompetitive. The core challenge was balancing prediction accuracy, uncertainty quantification, and risk management simultaneously — under time pressure, round by round, with real money on the line.


My approach

Pre-hackathon modelling (Market_Making_AI_Hackathon.ipynb)

Built a baseline modelling pipeline before the event:

  • Loaded and audited all 9 train/test dataset pairs
  • Benchmarked linear regression, ridge, lasso, and random forest using 5-fold CV on each stock independently
  • Assigned the best-performing model per stock rather than forcing one approach across all datasets
  • Tuned regularisation parameters via GridSearchCV
  • Converted RMSE into quote ranges: aggressive (±0.5σ), balanced (±1.0σ), and defensive (±1.5σ) bands

Hackathon day — combined final model (Hackathon_Combined_Bl.ipynb)

Merged the best components from three team notebooks into a single pipeline:

A. SVD-based NumPy Ridge with exact leave-one-out CV

Rather than approximating LOO error, the pipeline computes it exactly using the hat-matrix shortcut — mathematically optimal alpha selection with no loops over folds. Sweeps 60 alpha values across a log-spaced grid, fits using the SVD decomposition of the training matrix, and averages predictions across near-optimal alphas to reduce variance. For small datasets (fewer than 200 rows), uses 300 bootstrap samples to get a stable prediction and a reliable uncertainty estimate.

B. Size-gated sklearn model zoo

Dataset sizes varied widely across the 9 stocks. Rather than applying the same model everywhere, the pipeline gates model availability by number of training rows:

Model Minimum rows Rationale
Ridge / ElasticNet any always available, strong on small data
GradientBoosting ≥ 200 captures non-linearity
HistGradientBoosting ≥ 500 faster GBM for mid-size data
ExtraTrees ≥ 2,000 low-bias ensemble for large datasets

C. Ensemble and decision logic

If the runner-up model is within 5% of the best model's CV RMSE, both predictions are averaged. The final spread was then derived from the combined uncertainty estimate, with the leaderboard-aware decision engine adjusting aggressiveness based on current cash position and round number.

Live tracking

Used a bespoke Excel control sheet to record fair-value predictions, submitted quotes, true values, and running P&L across all 9 rounds in real time.


Stack

Python · NumPy · Pandas · Scikit-learn · Google Colab

Models: Ridge · Lasso · ElasticNet · GradientBoosting · HistGradientBoosting · ExtraTrees


Repository structure

├── Market_Making_AI_Hackathon.ipynb   # Pre-event baseline and tuning pipeline
├── Hackathon_Combined_Bl.ipynb        # Final combined model used on the day
├── qmml_hackathon_playbook_v2.docx    # Round-by-round strategy notes
├── qmml_hackathon_control_sheet.xlsx  # Live P&L and quote tracking sheet
└── README.md

Key design decisions

Why per-stock model selection? The nine datasets varied significantly in size (29 to ~20,000 rows). A single model class cannot be optimal across that range. Lasso dominated on small, sparse datasets; tree-based ensembles won on larger ones.

Why exact LOO over k-fold? On very small datasets, k-fold produces high-variance estimates. Exact LOO via the hat-matrix shortcut is both computationally equivalent and statistically unbiased, making it strictly preferable when the dataset allows it.

Why bootstrap for small datasets? Three of the nine stocks had fewer than 200 training rows. Standard CV uncertainty estimates are unreliable at that scale. 300 bootstrap samples provide a more honest picture of prediction variance, which feeds directly into spread width.

Why ensemble the top two? When two models are within 5% of each other in cross-validated RMSE, neither has meaningfully won. Averaging their predictions reduces variance at no cost to bias — a free improvement before submitting.

What separated the top teams? After speaking to the top finishers, our model and strategy were nearly identical. The gap came down to position sizing in Round 1. Both teams saw the same high-confidence signal — we played it safe. When a model gives you a large-edge, high-confidence prediction, that is precisely when you size up. A lesson in translating prediction quality into trading decisions, not just quotes.

About

Live algorithmic market-making competition — 11th of 23 teams, +46.7% return on £100k. SVD Ridge with exact LOO, size-gated model zoo, ensemble decision engine.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors