This project is a shader benchmark suite built on a custom SDL3 GPU engine. It utilizes the Zig build system and injects a custom GPU profiling fork of SDL3 into the engine's dependency tree.
The benchmark provides per-pass GPU nanosecond-precision timing data, allowing for analysis of various real-time blur filter implementations across Vulkan compute and graphics pipelines.
📁 thesis-gpu-benchmark/ <-- You are here
├── 📁 zig-out/
│ ├── 📁 bin/ <-- Output binaries are generated here
│ └── 📁 results/ <-- Output CSVs are generated here
├── 📁 shaders/
│ └── 📄 bloom.glsl <-- Blur filter implementations (multiple shaders)
├── 📁 src/
│ └── 📄 config.zon
│ └── 📄 timeline.zon
│ └── 📄 render.zon
├── 📄 build.zig
└── 📄 build.zig.zon
To successfully compile this benchmark, ensure the following dependencies are installed and available in your system's PATH:
- Zig Compiler v0.16.0: Building with an older zig version will result in build errors. Conversely, a more recent (e.g. nightly) version may work, but isn't recommended.
- Arch Linux / CachyOS:
sudo pacman -S zig - Other Linux OS: Download from ziglang.org:
cd ~ curl -O "https://ziglang.org/download/0.16.0/zig-$(uname -m)-linux-0.16.0.tar.xz" tar -xJf "zig-$(uname -m)-linux-0.16.0.tar.xz" # Run the compiler from anywhere ~/zig-$(uname -m)-linux-0.16.0/zig zen
- Arch Linux / CachyOS:
- Google Shaderc (
glslc): The build process requires shaderc to compile the GLSL source files into SPIR-V binaries. - Libraries:
libpngandfreetype2:- Arch Linux / CachyOS:
sudo pacman -S shaderc libpng freetype2 - Ubuntu / Debian:
sudo apt install glslc libpng-dev libfreetype-dev
- Arch Linux / CachyOS:
- Check the engine README if missing dependencies are encountered.
The custom build.zig handles compiling shaders and directing benchmark runs.
- Compile:
zig build- Outputs binaries tozig-out. - Run the demo:
zig build run- Outputs binaries tozig-outand spins up the interactive testing. See the engine README file for usage. - Run automated benchmarks:
zig build benchmark- Directs a fully automated, serialized profiling routine. It generates precise GPU timing data per each tag on thetimeline. By default, it profiles each tag configuration for 10 seconds, with a 5 second warm-up period. You can override the testing duration directly from the command line interface:# Run every configured variant sequentially for 30 seconds, with a 2 second warm-up zig build -Doptimize=ReleaseFast benchmark -- 30 2
To query the full list of build options, run zig build --help.
For accurate results, never benchmark with a debug binary.
Use -Doptimize:
zig build -Doptimize=ReleaseFast benchmarkBy default, the target architecture is the host CPU and its feature set.
To build for another architecture, such as generic x86_64, use -Dtarget:
# Generic x86_64
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux-gnu
# Generic aarch64
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux-gnuNote: Each zig build command overwrites the binaries generated previously.
Remember to output the correct binaries if you intend to export the build artifacts to another host,
i.e. don't forget to run zig build with the intended -Doptimize and -Dtarget options.
This subsection documents the binaries emitted by zig build.
If you intend to run the benchmark on the same host as the build process, you can ignore this subsection and just use the zig build commands documented above.
The benchmark binary requires two positional arguments and additionally has two optional arguments.
- Demo binary path (relative or absolute, use "./"-prefix if in the working directory).
- CSV results output path (relative or absolute).
- Run duration (in seconds, default = 10).
- Warm-up duration override (in seconds, default = 5).
cd zig-out/bin
# Run the benchmark for 30 seconds with a 2 second warm-up, output CSVs to /mnt/results
./benchmark ./demo /mnt/results 30 2The demo binary has two two optional flags.
--tags-override [tag1,...]: Fix the set of runtime tags to a comma-separated list, ignore the timeline.--duration-override [seconds]: Quit after running forseconds, ignore the timeline.
cd zig-out/bin
# Run the demo for 30 seconds, output CSV to stdout
./demo --duration-override 30When executing zig build benchmark, the stdout streams are captured into separate files
matching their tag identities inside zig-out/results/[tag].csv.
The CSV data uses the following schema:
| Column Index | Field Name | Data Type | Description |
|---|---|---|---|
| 0 | TargetP |
uint32 |
The pass output resolution scale (numerator). |
| 1 | TargetQ |
uint32 |
The pass output resolution scale (denominator). |
| 2 | PassDescription |
string |
A description of the primary shader that was run in the pass. |
| 3 | PassIndex |
integer |
Sequential index to the tag-culled src/render.zon pass list. See below. |
| 4 | StartTicks |
uint64 |
Raw counter ticks at the start of pass execution on the GPU. |
| 5 | EndTicks |
uint64 |
Raw counter ticks at the end of pass execution on the GPU. |
| 6 | DurationNanos |
float |
Pass start to end duration in nanoseconds. |
The initial rows contain per-run constants in comments, formatted # [Field]: [value]:
# WarmupDuration: The total warm-up duration in nanoseconds.# WarmupRows: The number of initial data rows that are warming up the GPU. Ignore this number of initial rows.# TimestampPeriod: The GPU counter tick period in nanoseconds.
The execution flow and parameters are managed entirely via static configuration declarations using the Zig Object Notation (.zon) format.
This file contains global engine and shader options:
width/height: Internal render resolution (1920x1080).blur_radius: Default blur kernel pixel radius.blur_sigma: Default standard deviation applied to Gaussian kernels.
This file is the source of truth for tags. It specifies the sequence of unique variant workloads targeted by the benchmarking suite.
.{
.tags = .{
.{ .name = "comp_naive", .duration = 10, .t = .seq },
.{ .name = "comp_2pass_separable", .duration = 10, .t = .seq },
.{ .name = "comp_2pass_separable_cache", .duration = 10, .t = .seq },
// ...
},
}This file declares the per-frame pass list (render graph). Pass execution can be conditionally culled via the require_all_tags array field:
- If specified, a pass will only execute if the active tag matches one of the required values.
- If omitted, the pass executes unconditionally on every frame (e.g., UI or final post processing).