Upload any CSV or Excel dataset, explore it, clean it, train 30+ machine learning models, compare results, and download everything — PDF report, cleaned dataset, session metadata, and trained .pkl files. Built with Next.js 14 + Tailwind CSS on the frontend and FastAPI on the backend.
- Dataset Insights — shape, dtypes, describe, correlation heatmap, value counts per column
- Missing Values — per-column stats, smart cleaning options (fill mean/median/mode, drop rows/cols, normalize)
- 30+ ML Models — regression, classification, clustering; auto-recommended based on your dataset
- Training Progress — live modal with per-model elapsed time and estimated total time
- Results — metrics table, feature importance chart, confusion matrix (classification)
- Compare — side-by-side bar chart across all trained models
- Download — PDF report, CSV/XLSX dataset,
meta.jsonsession data, and a ZIP of all.pklmodel files
| Task | Models |
|---|---|
| Classification | Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, HistGradient Boosting, XGBoost, LightGBM, CatBoost, SVM (RBF/Linear), KNN, Naive Bayes, LDA, MLP, Stacking, AdaBoost, Bagging, Extra Trees |
| Regression | Linear, Ridge, Lasso, Elastic Net, Bayesian Ridge, Huber, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, SVR, KNN, MLP, Stacking |
| Clustering | KMeans, DBSCAN, Agglomerative, Spectral, BIRCH, OPTICS |
- Next.js 14 (App Router, TypeScript)
- Tailwind CSS — dark theme
- Recharts — charts and heatmaps
- FastAPI — modular REST API
- scikit-learn, XGBoost, LightGBM, CatBoost — ML engines
- pandas + pyarrow — data I/O and session storage
- ReportLab — PDF generation
- Uvicorn — ASGI server
MLInsights/
├── backend/
│ ├── main.py # FastAPI app entry point + CORS
│ ├── requirements.txt
│ ├── runtime.txt # Python version pin for Render
│ ├── render.yaml # Render deployment config
│ ├── routers/
│ │ ├── upload.py # Dataset upload + session creation
│ │ ├── insights.py # Shape, dtypes, describe, correlations
│ │ ├── cleaning.py # Missing value analysis + cleaning
│ │ ├── models.py # Model recommendations
│ │ ├── training.py # Model training + .pkl export
│ │ └── report.py # PDF, CSV/XLSX, meta.json, models ZIP
│ └── utils/
│ ├── session_store.py # File-based session management
│ ├── data_utils.py # Pandas helpers
│ ├── ml_models.py # Model registry (30+ models)
│ └── report_gen.py # PDF builder
└── frontend/
├── app/
│ ├── api/download/ # Next.js proxy route (fixes cross-origin filename)
│ └── page.tsx # Main single-page app
├── components/
│ ├── FileUpload.tsx
│ ├── DataInsights.tsx
│ ├── DataCleaning.tsx
│ ├── ModelSelection.tsx # Model cards + training modal
│ ├── TrainingResults.tsx
│ └── ReportDownload.tsx # Download cards (PDF, dataset, JSON, ZIP)
└── lib/
├── api.ts # API fetch helpers
└── utils.ts
- Node.js 18+
- Python 3.11+
cd backend
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install -r requirements.txt
# Copy env file and edit if needed
cp .env.example .env
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadAPI runs at http://localhost:8000. Docs at http://localhost:8000/docs.
cd frontend
npm install
# Copy env file
cp .env.example .env.local
# Edit .env.local — set NEXT_PUBLIC_API_URL=http://localhost:8000
npm run devApp runs at http://localhost:3000.
POST /api/upload/ Upload CSV or XLSX → returns session_id
GET /api/insights/{session_id}/summary Shape, dtypes, describe, value counts
GET /api/insights/{session_id}/corr Correlation matrix
GET /api/cleaning/{session_id}/missing Missing value stats per column
POST /api/cleaning/{session_id}/clean Apply cleaning strategy
GET /api/models/{session_id}/recommend Auto-recommended models for the dataset
POST /api/training/{session_id}/train Train selected models → returns results
GET /api/report/{session_id}/pdf Download PDF report
GET /api/report/{session_id}/dataset Download cleaned CSV or XLSX
GET /api/report/{session_id}/meta Download meta.json session data
GET /api/report/{session_id}/models_zip Download ZIP of all .pkl model files
Frontend — create frontend/.env.local:
NEXT_PUBLIC_API_URL=http://localhost:8000Backend — create backend/.env:
ALLOWED_ORIGINS=http://localhost:3000
UPLOAD_DIR=backend/uploadsMIT