Infosys Springboard 6.0 Internship Project
Presented by: Famesh Katre
Mentor: Pranaya Ma'am
Duration: 8 Weeks
Internship Project β Dynamic IPL Player Auction Value Prediction using AI and Multi-source Data
CricketIQ is an AI-driven system that predicts IPL player auction prices by integrating multi-source data β batting/bowling performance statistics, news sentiment, historical auction prices, and player profile features.
The system uses a stacked ensemble of LSTM time-series models and XGBoost/LightGBM to produce season-by-season auction price forecasts.
Data Sources Pipeline Output
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cricsheet Ball-by-Ball βββ
IPL Auction History βββ€ββ Preprocessing βββΊ LSTM βββ
News Sentiment (VADER) βββ€ Feature Eng. βββ Ensemble βββΊ βΉCr Prediction
Player Profiles βββ XGBoost βββ
CricketIQ/
β
βββ data/
β βββ raw/
β β βββ ipl_batting.csv # Season batting stats (50 players)
β β βββ ipl_bowling.csv # Season bowling stats
β β βββ ipl_auction.csv # Historical auction prices 2019β2024
β βββ processed/
β β βββ cricket_feature_matrix.csv
β β βββ scaler.pkl
β β βββ evaluation_report.csv
β βββ models/
β β βββ cricket_lstm.pt
β β βββ cricket_xgboost.pkl
β β βββ cricket_ensemble.pkl
β βββ sentiment/
β βββ ipl_sentiment.csv
β
βββ src/
β βββ data_collection/
β β βββ cricsheet_loader.py # IPL ball-by-ball + auction data
β β βββ cricket_sentiment.py # VADER NLP + cricket lexicon
β β
β βββ preprocessing/
β β βββ cricket_feature_engineer.py
β β
β βββ models/
β β βββ cricket_models.py # LSTM + XGBoost + Ensemble
β β
β βββ visualization/
β βββ dashboard.py # Streamlit dashboard
β
βββ api/
β βββ main.py # FastAPI REST API
β
βββ run_pipeline.py # Master pipeline
βββ requirements.txt
βββ README.md
# 1. Clone and setup
git clone https://github.com/fameshkatre87/CricketIQ.git
cd CricketIQ
# 2. Virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
# 3. Install packages
pip install -r requirements.txt
# 4. Run full pipeline
python run_pipeline.py
# 5. Launch dashboard
streamlit run src/visualization/dashboard.py
# 6. Start API (optional)
uvicorn api.main:app --reload --port 8000| Source | Data | Records |
|---|---|---|
| Cricsheet (IPL) | Ball-by-ball β batting/bowling stats | 50 players Γ 6 seasons |
| IPL Auction DB | Historical prices 2019β2024 | 284 auction records |
| News VADER NLP | Cricket sentiment scores | 139 weekly records |
| Player Profiles | Role, nationality, experience | 50 real IPL players |
Batters: Virat Kohli, Rohit Sharma, David Warner, KL Rahul, Shubman Gill, Faf du Plessis, Jos Buttler, Suryakumar Yadav, Ruturaj Gaikwad, Yashasvi Jaiswal, MS Dhoni, AB de Villiers, Rishabh Pant, Sanju Samson, Ishan Kishan, Tilak Varma...
All-Rounders: Hardik Pandya, Ravindra Jadeja, Andre Russell, Glenn Maxwell, Ben Stokes, Sam Curran, Axar Patel, Pat Cummins...
Bowlers: Jasprit Bumrah, Yuzvendra Chahal, Rashid Khan, Bhuvneshwar Kumar, Mohammed Shami, Trent Boult, Kagiso Rabada, Kuldeep Yadav...
Batting: runs_per_match, impact_score, milestone_score (50s + 100s), consistency_score (avg Γ SR), boundary_pct, six_pct, dot_ball_risk
Bowling: bowling_impact, death_specialist flag, powerplay_bowler flag, wicket_taker_score, economy_score
All-Rounder: is_allrounder, allrounder_score, dual_threat_bonus
Experience: seasons_played, veteran_flag (β₯8 seasons), young_prospect (β€3 seasons), peak_experience flag
Market: prev_auction_price, price_trend, log_prev_price, overseas_premium
Sentiment: compound_score, positive_pct, negative_pct, value_impact_cr
| Model | RMSE (βΉCr) | MAE (βΉCr) | RΒ² |
|---|---|---|---|
| LSTM (Attention) | 5.74 | 5.39 | 0.81 |
| XGBoost | 4.63 | 3.98 | 0.87 |
| LightGBM | 4.70 | 4.05 | 0.86 |
| Ensemble (Final) | 4.25 | 3.62 | 0.91 |
GET / Health check
GET /players All 50 IPL players
GET /player/{name} Player stats + prediction
POST /predict Predict auction price
GET /predict/top Top value players for 2025
GET /models/comparison Model metrics
GET /features/importance XGBoost feature importance
GET /sentiment/{name} Player sentiment score
Positive additions: century (+3.5), hat-trick (+3.5), six (+2.0), masterclass (+3.2), match-winning (+3.2), orange cap (+2.8)
Negative additions: duck (-2.8), golden duck (-3.2), injured (-2.8), ruled out (-3.0), poor form (-2.5), expensive (-2.0)
| Week | Task | Status |
|---|---|---|
| 1 | IPL batting/bowling/auction data collection | β |
| 2 | Feature engineering (62 features) | β |
| 3β4 | Advanced features + sentiment integration | β |
| 5 | LSTM with attention mechanism | β |
| 6 | XGBoost + LightGBM + Ensemble | β |
| 7 | Evaluation + comparison report | β |
| 8 | Streamlit dashboard + API + docs | β |
Built with β€οΈ as an Internship Project β CricketIQ v1.0 π