C++ SIMT GPU Emulator

A C++17 SIMT GPU architecture emulator modeling streaming multiprocessors, 32-lane-style warps, active masks, per-lane registers, scoreboarding, warp scheduling, memory latency hiding, global memory, L1 cache behavior, coalescing, shared memory bank conflicts, barriers, branch divergence/reconvergence, occupancy, benchmark analysis, and differential verification against a scalar reference interpreter.

Architecture

Grid / CTAs
    |
    v
Streaming Multiprocessors
    |
    +-- Warp scheduler -> scoreboard -> INT/SFU/LSU pipelines
    |
    +-- Warps -> lanes -> per-lane registers/predicates/active masks
    |
    +-- L1 cache / coalescer / memory queue
    |
    +-- Shared memory banks / barriers / divergence stack

Key Features

Multi-SM launch model with configurable CTAs and block size
SIMT warps with per-lane registers, predicates, and active masks
Scoreboarded warp issue and selectable scheduling policies
Branch divergence/reconvergence stack
Global memory latency, L1 cache, and memory coalescing model
Shared memory with bank conflict timing
Barrier synchronization
Occupancy report with limiting-resource guidance
Warp timeline CSV generation and local static viewer
Scalar reference interpreter and --diff mode
Invariant checker and debug dumps
Randomized differential testing and benchmark automation

Build

cmake -S . -B build-ninja -G Ninja -DENABLE_TESTS=ON -DENABLE_WARNINGS=ON
cmake --build build-ninja

Useful CMake options:

ENABLE_WARNINGS=ON
ENABLE_ASAN=ON on non-MSVC compilers
ENABLE_TRACE=ON
ENABLE_TESTS=ON

Run

./build-ninja/simt_gpu examples/vector_add.gpuasm
./build-ninja/simt_gpu --trace examples/branch_divergence.gpuasm
./build-ninja/simt_gpu --timeline outputs/gpu_warp_timeline.csv examples/vector_add.gpuasm
./build-ninja/simt_gpu --occupancy --grid 8 --block 128 examples/vector_add.gpuasm
./build-ninja/simt_gpu --version

Verification

./build-ninja/simt_gpu --diff examples/vector_add.gpuasm
./build-ninja/simt_gpu --check-invariants examples/branch_divergence.gpuasm
./build-ninja/simt_gpu --dump-on-fail --diff examples/vector_add.gpuasm
python tools/run_random_gpu_diff_tests.py --count 20 --instructions 80

Diff mode compares final global memory and selected per-thread registers against a scalar reference interpreter.

Tests

ctest --test-dir build-ninja --output-on-failure

The suite covers ALU execution, warp execution, scoreboarding, global/shared memory, memory coalescing, divergence, scheduling, cache behavior, occupancy, barriers, timeline output, the reference interpreter, diff mode, invariant checking, and random-program smoke coverage.

Benchmarks And Analysis

python tools/run_gpu_benchmarks.py
python tools/analyze_coalescing.py
python tools/compare_schedulers.py
python tools/plot_gpu_results.py

Outputs are written to outputs/, including outputs/gpu_benchmark_results.csv, coalescing reports, scheduler comparison reports, plots, and timeline CSVs.

Visualization

Open viewer/gpu_warp_timeline_viewer.html in a browser and select a generated CSV such as outputs/gpu_warp_timeline.csv or a benchmark timeline under outputs/timelines/. The viewer works locally with no external dependencies.

Project Structure

include/, src/: emulator core
tests/: CTest programs
examples/: runnable kernels
benchmarks/: analysis kernels
tools/: random testing, benchmarks, coalescing, scheduler, plotting
viewer/: static warp timeline viewer
docs/: architecture, verification, benchmark, diagrams, limitations
results/: generated example outputs

Limitations

This is an educational architecture emulator, not a CUDA runtime or binary-compatible GPU. The ISA is a compact GPU assembly dialect, the reference interpreter focuses on architectural final state, and the timing model is intentionally readable rather than vendor-accurate.

Future Work

Add richer predication and memory consistency tests
Improve scalar reference handling for complex shared-memory cross-thread idioms
Add JSON trace export
Add performance regression thresholds
Model additional cache policies and scheduler heuristics

What I Learned

This project reinforced SIMT execution, warp scheduling, memory coalescing, branch divergence and reconvergence, shared memory bank conflicts, occupancy limits, and how GPUs hide memory latency with many eligible warps.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
benchmarks		benchmarks
configs		configs
docs		docs
examples		examples
include		include
outputs		outputs
results		results
src		src
tests		tests
tools		tools
viewer		viewer
.clang-format		.clang-format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ SIMT GPU Emulator

Architecture

Key Features

Build

Run

Verification

Tests

Benchmarks And Analysis

Visualization

Project Structure

Limitations

Future Work

What I Learned

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

C++ SIMT GPU Emulator

Architecture

Key Features

Build

Run

Verification

Tests

Benchmarks And Analysis

Visualization

Project Structure

Limitations

Future Work

What I Learned

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages