A C++17 SIMT GPU architecture emulator modeling streaming multiprocessors, 32-lane-style warps, active masks, per-lane registers, scoreboarding, warp scheduling, memory latency hiding, global memory, L1 cache behavior, coalescing, shared memory bank conflicts, barriers, branch divergence/reconvergence, occupancy, benchmark analysis, and differential verification against a scalar reference interpreter.
Grid / CTAs
|
v
Streaming Multiprocessors
|
+-- Warp scheduler -> scoreboard -> INT/SFU/LSU pipelines
|
+-- Warps -> lanes -> per-lane registers/predicates/active masks
|
+-- L1 cache / coalescer / memory queue
|
+-- Shared memory banks / barriers / divergence stack
- Multi-SM launch model with configurable CTAs and block size
- SIMT warps with per-lane registers, predicates, and active masks
- Scoreboarded warp issue and selectable scheduling policies
- Branch divergence/reconvergence stack
- Global memory latency, L1 cache, and memory coalescing model
- Shared memory with bank conflict timing
- Barrier synchronization
- Occupancy report with limiting-resource guidance
- Warp timeline CSV generation and local static viewer
- Scalar reference interpreter and
--diffmode - Invariant checker and debug dumps
- Randomized differential testing and benchmark automation
cmake -S . -B build-ninja -G Ninja -DENABLE_TESTS=ON -DENABLE_WARNINGS=ON
cmake --build build-ninjaUseful CMake options:
ENABLE_WARNINGS=ONENABLE_ASAN=ONon non-MSVC compilersENABLE_TRACE=ONENABLE_TESTS=ON
./build-ninja/simt_gpu examples/vector_add.gpuasm
./build-ninja/simt_gpu --trace examples/branch_divergence.gpuasm
./build-ninja/simt_gpu --timeline outputs/gpu_warp_timeline.csv examples/vector_add.gpuasm
./build-ninja/simt_gpu --occupancy --grid 8 --block 128 examples/vector_add.gpuasm
./build-ninja/simt_gpu --version./build-ninja/simt_gpu --diff examples/vector_add.gpuasm
./build-ninja/simt_gpu --check-invariants examples/branch_divergence.gpuasm
./build-ninja/simt_gpu --dump-on-fail --diff examples/vector_add.gpuasm
python tools/run_random_gpu_diff_tests.py --count 20 --instructions 80Diff mode compares final global memory and selected per-thread registers against a scalar reference interpreter.
ctest --test-dir build-ninja --output-on-failureThe suite covers ALU execution, warp execution, scoreboarding, global/shared memory, memory coalescing, divergence, scheduling, cache behavior, occupancy, barriers, timeline output, the reference interpreter, diff mode, invariant checking, and random-program smoke coverage.
python tools/run_gpu_benchmarks.py
python tools/analyze_coalescing.py
python tools/compare_schedulers.py
python tools/plot_gpu_results.pyOutputs are written to outputs/, including outputs/gpu_benchmark_results.csv, coalescing reports, scheduler comparison reports, plots, and timeline CSVs.
Open viewer/gpu_warp_timeline_viewer.html in a browser and select a generated CSV such as outputs/gpu_warp_timeline.csv or a benchmark timeline under outputs/timelines/. The viewer works locally with no external dependencies.
include/,src/: emulator coretests/: CTest programsexamples/: runnable kernelsbenchmarks/: analysis kernelstools/: random testing, benchmarks, coalescing, scheduler, plottingviewer/: static warp timeline viewerdocs/: architecture, verification, benchmark, diagrams, limitationsresults/: generated example outputs
This is an educational architecture emulator, not a CUDA runtime or binary-compatible GPU. The ISA is a compact GPU assembly dialect, the reference interpreter focuses on architectural final state, and the timing model is intentionally readable rather than vendor-accurate.
- Add richer predication and memory consistency tests
- Improve scalar reference handling for complex shared-memory cross-thread idioms
- Add JSON trace export
- Add performance regression thresholds
- Model additional cache policies and scheduler heuristics
This project reinforced SIMT execution, warp scheduling, memory coalescing, branch divergence and reconvergence, shared memory bank conflicts, occupancy limits, and how GPUs hide memory latency with many eligible warps.