AI-Powered Road Accident Risk Prediction & GIS Analytics System
Predict road accident risk with machine learning. Visualize accident hotspots on interactive maps. Analyze traffic patterns in real-time.
RoadSense AI is a production-grade machine learning platform that predicts road accident risk based on real-world factors:
- Weather conditions (clear, fog, rain)
- Traffic density (high, low, medium)
- Road type (highway, urban, rural)
- Time features (hour, day, peak hours, night)
- Accident severity indicators (vehicles, casualties)
Current Model: RandomForest with 95% inference reliability
Dataset: 20,000+ annotated accident records
API: REST endpoints with Swagger documentation
Deployment: Docker-ready, scalable
- Scikit-learn RandomForest model (100+ estimators)
- One-hot encoding for categorical features
- Feature alignment guaranteed (zero mismatch errors)
- 19 engineered features from raw data
- Training/inference parity verified
- POST /predict - Real-time accident risk classification
- GET /health - Uptime monitoring
- Swagger/OpenAPI documentation at
/docs - Input validation on all endpoints
- Structured responses with confidence scores
- Interactive hotspot maps (Folium + Leaflet)
- City-wise accident analysis
- Weather pattern analysis
- Time-based trends (hourly, by day)
- Correlation heatmaps
- Modular preprocessing (training = inference)
- Comprehensive validation (422 errors, clear messages)
- Structured logging (all requests tracked)
- CORS support (frontend integration ready)
- Docker deployment (one-command setup)
- Python 3.11+
- Docker (optional)
- pip or conda
# Clone repository
git clone https://github.com/yourusername/roadsense-ai.git
cd roadsense-ai
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the API server
./run_api.sh
# Open browser to http://localhost:8000/docs# Build and run
docker-compose up -d
# Check status
curl http://localhost:8000/health
# Open http://localhost:8000/docs in browsercurl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"hour": 14,
"is_weekend": 0,
"temperature": 28.5,
"vehicles_involved": 2,
"casualties": 1,
"is_peak_hour": 1,
"is_night": 0,
"road_type": "highway",
"weather": "clear",
"traffic_density": "high",
"visibility": "high"
}'
# Response:
# {
# "prediction": "LOW RISK",
# "risk_level": 0,
# "confidence": 0.61,
# "timestamp": "2026-05-20T..."
# }GET /health
Check API and model status. Used for monitoring and uptime checks.
Response:
{
"status": "healthy",
"model_loaded": true,
"model_features": 19,
"timestamp": "2026-05-20T09:16:05.216435"
}POST /predict
Predict accident risk based on input features.
Request:
{
"hour": 14,
"is_weekend": 0,
"temperature": 28.5,
"vehicles_involved": 2,
"casualties": 1,
"is_peak_hour": 1,
"is_night": 0,
"road_type": "highway",
"weather": "clear",
"traffic_density": "high",
"visibility": "high"
}Response:
{
"prediction": "LOW RISK",
"risk_level": 0,
"confidence": 0.61,
"timestamp": "2026-05-20T..."
}Error Response (422):
{
"detail": "Invalid value 'rain_heavy' for feature 'weather'. Allowed values: ['clear', 'fog', 'rain']"
}RoadSense AI
├── 📁 backend/
│ ├── main.py
│ ├── config.py
│ └── utils/
│ └── preprocessing.py
│
├── 📁 notebooks/
│ ├── 01_data_collection.ipynb
│ ├── 02_data_cleaning.ipynb
│ ├── 03_eda_visualization.ipynb
│ └── 04_ml_model.ipynb
│
├── 📁 datasets/
│ ├── raw/
│ └── processed/ )
│
├── 📁 models/
│ └── accident_risk_model.pkl
│
├── 📁 frontend/
│
├── Dockerfile
├── docker-compose.yml
└── requirements.txt
Raw Data (CSV, 20K rows)
↓
Data Cleaning (drop nulls, handle outliers)
↓
Feature Engineering (is_peak_hour, is_night, categoricals)
↓
Processed Dataset (master_accident_dataset.csv)
↓
One-Hot Encoding (19 features total)
↓
Train/Test Split (80/20)
↓
Model Training (RandomForest)
↓
Model Serialization (accident_risk_model.pkl)
Inference Request (JSON)
↓
Input Validation (Pydantic)
↓
Feature Preprocessing (alignment to training)
↓
One-Hot Encoding (consistent with training)
↓
Model Prediction (RandomForest.predict)
↓
Response Formatting (prediction + confidence)
↓
Prediction Response (JSON)
Model: RandomForestClassifier (100 estimators)
Dataset: 20,000 accident records
Train/Test Split: 80/20
Accuracy: 55.6% on test set
1. Temperature .............. 31.2%
2. Hour ..................... 22.7%
3. Casualties ............... 11.5%
4. Vehicles Involved ........ 10.5%
5. Is Weekend ............... 4.0%
Note: Model is optimized for production safety (high recall on HIGH RISK). Prioritizes catching accidents over false positives.
- FastAPI - Modern Python web framework
- Pydantic - Data validation & type hints
- Scikit-learn - Machine learning
- Pandas - Data manipulation
- Joblib - Model serialization
- Pandas - Data analysis
- NumPy - Numerical computing
- Matplotlib - Static plots
- Seaborn - Statistical visualization
- Folium - Interactive maps
- Docker - Containerization
- Docker Compose - Multi-service orchestration
- Uvicorn - ASGI server
- Pytest - Test framework
- FastAPI TestClient - API testing
- React/Next.js - UI framework
- Tailwind CSS - Styling
- Recharts - Data visualization
- Leaflet - Interactive maps
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset: Indian Road Accident Records (2022-2025)
- Models: Scikit-learn RandomForest
- Visualizations: Folium, Matplotlib, Seaborn
- Framework: FastAPI, Pydantic
- 📧 Email: prashantsharma4840@outlook.com
- Frontend Dashboard - React-based prediction interface
- Live Deployment - Public API & dashboard
- Monitoring - Performance tracking & alerts
- Scale - Handle more data & predictions
Built by Prashant Sharma