AI-Powered Code Review and Bug Prediction System

Final Year Project — Institute of Management Sciences (IMSciences), Peshawar
Authors: Muhammad Nasir Khan · Ahmed Khan Khisro · Umair Dost
Supervisor: Mr. Omar Bin Samin
Session: 2021–2025

Overview

This project presents a hybrid AI system that automates software code review and predicts bug-prone modules. It combines a Random Forest classifier (trained on 87 software metrics) with an LSTM neural network (operating on tokenised source code) in a soft-voting ensemble, achieving 87.1% accuracy and AUC-ROC of 0.94 on a held-out test set of 2,812 labelled modules.

The system generates structured, actionable review suggestions with SHAP-based metric explanations and exposes its functionality through a FastAPI REST backend and an Angular 16 web dashboard.

Key Results

Model	Accuracy	Precision	Recall	F1-Score	AUC-ROC
Random Forest	83.4%	0.81	0.86	0.83	0.91
LSTM	81.7%	0.79	0.88	0.83	0.90
Ensemble (Ours)	87.1%	0.84	0.89	0.87	0.94

Compared to baselines:

System	Accuracy	F1-Score	AUC-ROC
SonarQube (rule-based)	71.2%	0.68	N/A
PMD (rule-based)	68.5%	0.65	N/A
Logistic Regression	76.3%	0.74	0.82
Proposed Ensemble	87.1%	0.87	0.94

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Angular Frontend                      │
│         Code Submission · Results Dashboard · Reports    │
└────────────────────────┬────────────────────────────────┘
                         │ HTTP REST
┌────────────────────────▼────────────────────────────────┐
│                  FastAPI Backend                         │
│   /predict/metrics   ·   /predict/code   ·   /health    │
└──────────┬──────────────────────────┬───────────────────┘
           │                          │
  ┌────────▼────────┐       ┌─────────▼────────┐
  │  Random Forest  │       │   LSTM Network   │
  │  (61 metrics)   │       │ (token sequences) │
  └────────┬────────┘       └─────────┬────────┘
           │                          │
           └──────────┬───────────────┘
                      │
             ┌────────▼────────┐
             │ Soft Voting      │
             │ Ensemble (0.55/  │
             │ 0.45 weights)    │
             └────────┬────────┘
                      │
             ┌────────▼────────┐
             │ SHAP Explainer  │
             │ + Review Report │
             └─────────────────┘

Repository Structure

ai-code-review/
├── backend/                    # FastAPI REST API
│   ├── app/
│   │   ├── api/                # Route handlers
│   │   ├── models/             # Pydantic schemas
│   │   ├── services/           # ML inference logic
│   │   └── utils/              # Helpers (metrics extractor, tokenizer)
│   ├── tests/                  # Pytest unit & integration tests
│   └── requirements.txt
├── frontend/                   # Angular 16 dashboard
│   └── src/
├── ml/                         # Training & evaluation scripts
│   ├── notebooks/              # Jupyter EDA & training notebooks
│   └── scripts/                # CLI training scripts
├── data/
│   ├── raw/                    # GitHub mining output (gitignored)
│   └── processed/              # Feature matrices (gitignored)
├── configs/                    # Hyperparameter YAML configs
├── docs/                       # Thesis PDF and architecture diagrams
├── .github/workflows/          # CI/CD pipeline (GitHub Actions)
├── docker-compose.yml
└── README.md

Quick Start

Prerequisites

Python 3.10+
Node.js 18+
Docker & Docker Compose (optional)

1. Clone the Repository

git clone https://github.com/<your-username>/ai-code-review.git
cd ai-code-review

2. Backend Setup

cd backend
python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

API docs available at: http://localhost:8000/docs

3. Frontend Setup

cd frontend
npm install
ng serve

Dashboard available at: http://localhost:4200

4. Docker (Full Stack)

docker-compose up --build

API Reference

`POST /predict/metrics`

Accepts a JSON payload of extracted software metrics and returns a bug probability score.

Request:

{
  "loc": 245,
  "cyclomatic_complexity": 18,
  "cbo": 12,
  "wmc": 24,
  "lcom": 0.73,
  "rfc": 45,
  "fan_in": 6,
  "fan_out": 9,
  "num_authors": 4,
  "code_churn": 312
}

Response:

{
  "bug_probability": 0.87,
  "prediction": "BUGGY",
  "severity": "HIGH",
  "top_features": [
    {"feature": "cbo", "shap_value": 0.34},
    {"feature": "wmc", "shap_value": 0.28},
    {"feature": "loc", "shap_value": 0.19}
  ]
}

`POST /predict/code`

Accepts raw source code and returns LSTM-derived code smell classifications and ensemble prediction.

Request:

{
  "source_code": "def process_data(...):\n    ...",
  "language": "python"
}

Response:

{
  "bug_probability": 0.91,
  "prediction": "BUGGY",
  "severity": "HIGH",
  "code_smells": ["God Class", "Long Method", "Duplicate Code"],
  "review_suggestion": "This module exhibits high coupling (CBO = 18) and excessive method complexity (average WMC = 24). Refactoring is recommended: decompose into smaller single-responsibility classes and reduce inter-module dependencies."
}

ML Pipeline

Data Collection

512 open-source GitHub repositories (Python & Java)
Selection criteria: ≥500 commits, issue tracker with bug labels, ≥1,000 LOC
Bug labelling via SZZ algorithm on commit messages

Feature Engineering

87 raw metrics extracted per module → 61 features after correlation pruning
Tools: Radon (Python), CKJMExtended (Java)
Class imbalance handled with SMOTE (training split only)

Models

Component	Details
Random Forest	200 estimators, balanced class weight, 5-fold CV grid search
LSTM	Embedding(10k vocab, 128-dim) → LSTM(128) → LSTM(64) → Dense(64) → Sigmoid
Ensemble	Soft voting: RF×0.55 + LSTM×0.45, threshold=0.5

To Train from Scratch

cd ml/scripts
python mine_repositories.py --config ../../configs/mining_config.yaml
python extract_features.py --input ../../data/raw --output ../../data/processed
python train_random_forest.py --config ../../configs/rf_config.yaml
python train_lstm.py --config ../../configs/lstm_config.yaml
python evaluate_ensemble.py

Testing

cd backend
pytest tests/ -v --cov=app --cov-report=html

Citation

If you use this work, please cite:

@misc{khan2025aireview,
  title     = {AI-Powered Code Review and Bug Prediction System},
  author    = {Muhammad Nasir Khan and Ahmed Khan Khisro and Umair Dost},
  year      = {2025},
  school    = {Institute of Management Sciences, Peshawar},
  note      = {BCS (Hons.) Final Year Project}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Code Review and Bug Prediction System

Overview

Key Results

Architecture

Repository Structure

Quick Start

Prerequisites

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

4. Docker (Full Stack)

API Reference

`POST /predict/metrics`

`POST /predict/code`

ML Pipeline

Data Collection

Feature Engineering

Models

To Train from Scratch

Testing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
backend		backend
configs		configs
docs		docs
ml/scripts		ml/scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Code Review and Bug Prediction System

Overview

Key Results

Architecture

Repository Structure

Quick Start

Prerequisites

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

4. Docker (Full Stack)

API Reference

POST /predict/metrics

POST /predict/code

ML Pipeline

Data Collection

Feature Engineering

Models

To Train from Scratch

Testing

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict/metrics`

`POST /predict/code`

Packages