BUGLEX: Semantic–Lexical Fusion for Performance Bug Classification

This repository provides the code and framework used to evaluate machine learning models for performance bug report classification, including feature engineering (TF-IDF and embeddings) and hybrid model training.

Setup

This project uses uv for fast dependency management.

macOS / Linux (Recommended)

# Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync the local virtual environment:
uv sync

# Activate the virtual environment
source .venv/bin/activate  # bash/zsh

Windows

Please see instructions at the Installing uv page.

Without uv (pip)

python3.13 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Then replace uv run python with python in any command below.

Running Experiments

Commands can be run directly using uv:

# Run the baseline model on a single dataset
uv run python -m src.run_baseline --dataset caffe

# Run the full experiment suite across all datasets
uv run python -m src.run_experiments

# Run all experiments and generate comparison plots automatically
uv run python -m src.run_experiments --with-plots

# Run experiments with a specific preprocessing mode (eg: lemmatize)
uv run python -m src.run_experiments --preprocessing-mode lemmatize 

# Run all preprocessing ablations
uv run python -m src.run_experiments --all-preprocessing

Results

Mean macro-F1 ± std across 30 stratified runs (70/30 split). Source: results/main_table_macro_f1.csv.

Dataset	Baseline NB + TF-IDF	TF-IDF + LogReg	Embedding LogReg	Hybrid LogReg
Caffe	0.623 ± 0.073	0.758 ± 0.061	0.744 ± 0.046	0.788 ± 0.063
Incubator-MXNet	0.503 ± 0.037	0.779 ± 0.031	0.832 ± 0.026	0.844 ± 0.030
Keras	0.530 ± 0.039	0.805 ± 0.034	0.845 ± 0.025	0.856 ± 0.023
PyTorch	0.529 ± 0.039	0.751 ± 0.037	0.796 ± 0.028	0.814 ± 0.027
TensorFlow	0.628 ± 0.027	0.820 ± 0.026	0.875 ± 0.016	0.882 ± 0.017

The hybrid model outperforms the baseline on all five datasets, with statistically significant improvements over the baseline.

Generating Documentation & Reports

To compile the results into the final LaTeX PDF report:

uv run python -m src.tools.build_docs

This will automatically generate the figures and tables before compiling the PDF.

Repository Layout

.
├── datasets/              # Raw data used for models
├── docs/                  # Documentation and report source files
├── main.py                # Local runner
├── pyproject.toml         # Dependencies
├── results/               # Generated results
├── src/                   # Main source code
├── README.md              # This file

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
datasets		datasets
docs		docs
results		results
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
manual.pdf		manual.pdf
pyproject.toml		pyproject.toml
replication.pdf		replication.pdf
report.pdf		report.pdf
requirements.pdf		requirements.pdf
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BUGLEX: Semantic–Lexical Fusion for Performance Bug Classification

Setup

macOS / Linux (Recommended)

Windows

Without uv (pip)

Running Experiments

Results

Generating Documentation & Reports

Repository Layout

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BUGLEX: Semantic–Lexical Fusion for Performance Bug Classification

Setup

macOS / Linux (Recommended)

Windows

Without uv (pip)

Running Experiments

Results

Generating Documentation & Reports

Repository Layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages