A high-performance, multi-threaded Mandelbrot and Julia set explorer written in C99. This project utilizes an Engine-Centric Architecture targeting Native Desktop (CPU/AVX2), Web (WebAssembly/SIMD128), and hardware-accelerated GPU rendering (WebGL/Sokol GFX).
Live Web Demo: tiw302.github.io/mandelbrot-c/
| Overview & UX | Engineering & Math | Dev & Ops | Project Lifecycle |
|---|---|---|---|
| Quick Start | The Mathematics | Prerequisites | Roadmap |
| Introduction | Technical Architecture | Build & Installation | Contributing |
| Technical Preview | Platform Implementations | Configuration | Development Methodology & AI Assistance |
| Core Features | Performance Benchmarks | Running Tests | Author's Note |
| Interactive Controls | Project Structure | License |
# 1. Clone the repository
git clone https://github.com/tiw302/mandelbrot-c.git && cd mandelbrot-c
# 2. Build (interactive menu — pick CPU, GPU, or Web)
./build.sh
# 3. Run
./build_cpu/mandelbrot_cpu # CPU engine
./build_gpu/mandelbrot_gpu # GPU engine (requires OpenGL 3.3+)For the Web build, see Build & Installation.
Mandelbrot-C is an exploratory project focused on the intersection of low-level C programming and high-performance graphics. This journey began as a deep dive into C99 to understand pointers, memory management, and hardware acceleration. What started as a simple SDL2 experiment has evolved into a production-grade fractal engine.
Throughout the development process, I have explored advanced topics including SIMD intrinsics, multi-threaded load balancing, WebAssembly porting, and shader-based 64-bit precision emulation.
- Hybrid Rendering Pipeline: Choice between optimized multi-threaded CPU rendering or hardware-accelerated GPU rendering.
- WASM Performance: Desktop-class performance in the browser via WebAssembly, SIMD128, and multi-threaded Web Workers.
- Persistent State Sharing: Share mathematical discoveries via URL parameters that encode the full view state. Clicking "Copy Link" generates and copies the URL on demand without constant address bar updates.
The URL format encodes the following parameters:
| Parameter | Format | Description |
|---|---|---|
re / im |
14 decimal places | View center on the complex plane |
z |
Exponential (6 sig figs) | Zoom level |
it |
Integer | Iteration count |
p |
Integer | Palette index (0–5) |
j |
1 if active |
Julia mode flag |
jre / jim |
14 decimal places | Julia set c-parameter (only present in Julia mode) |
Example: ?re=-0.74364388797764&im=0.13182590414575&z=1.234568e+4&it=500&p=0
- Hi-Lo Precision GPU Math: 64-bit precision emulation in GLSL shaders for deep-zoom exploration without pixelation artifacts.
- Interactive Tour Mode: Automated exploration with two independent tour systems. The Mandelbrot tour cycles through 10 hand-picked deep-zoom coordinates using a three-phase sequence — Pan (1.8s), Zoom In (4.0s), Zoom Out (3.2s) — with smoothstep easing between phases and a zoom depth of 6000x. Both tours pick the next target randomly without repeating the previous one. On desktop, the Julia tour interpolates between 12 preset c-parameter keyframes (3.0s move, 1.2s dwell). On web, the Julia tour uses a continuous circular orbit (
c = 0.7885 × e^(it)) for smooth real-time animation. - Professional Screenshot System: Deferred capture logic that ensures high-fidelity PNG exports by synchronizing with the GPU rendering cycle. Both desktop and web save screenshots as
mandelbrot_YYYYMMDD_HHMMSS.png. On desktop, stb_image_write handles PNG encoding with automatic ARGB-to-RGBA conversion. On web, the browser generates and downloads the file directly from the canvas. - Dynamic HUD: A redesigned, responsive Heads-Up Display showing 14-decimal precision coordinates.
| Action | Desktop Key | Web Key | Web UI / Touch |
|---|---|---|---|
| Zoom In | Left-Drag (Box) | Left-Drag (Box) / Scroll | Pinch-In |
| Pan | Right-Drag | Right-Drag | Two-Finger Drag |
| Undo | Ctrl + Z |
Ctrl + Z |
"Undo" Button |
| Screenshot | S |
S |
"Screenshot" Button |
| Mega Screenshot (8K) | X |
- | - |
| Record Video | V |
- | - |
| Tour Mode | T |
T |
"Tour" Button |
| GPU/CPU Toggle | G |
G |
"GPU" Button |
| Precision Toggle | E (CPU: 64/128-bit, GPU: 32/64-bit) |
E |
"32-bit / 64-bit" Button |
| Julia Toggle | J |
J |
"Julia" Button |
| Burning Ship Toggle | B |
B |
- |
| Palette Cycle | P |
P |
"Palette" Button |
| Iterations | Up/Down (Shift ×100) |
Up/Down |
Iter+/Iter- |
| Save Bookmark | M |
- | - |
| Load Bookmark | L |
- | - |
| Reset View | R |
R |
"Reset" Button |
| Copy Link | - | - | "Copy Link" Button |
| Quit | Esc / Q |
- | - |
The Mandelbrot set is defined as the set of complex numbers
To maintain high frame rates in dense regions, the engine implements several mathematical optimizations:
- Main Cardioid Rejection: Points inside the main cardioid are detected using a vectorized check to skip expensive iterations.
- Period-2 Bulb Check: Similar to the cardioid, points within the largest circular bulb are filtered out early.
- Normalized Iteration Count: Prevents color banding by using a fractional iteration formula, resulting in smooth gradients.
The codebase strictly adheres to a modular architecture to ensure Separation of Concerns (SoC):
- Core [SSOT]: Pure mathematical definitions (
mandelbrot.c,julia.c) are the Single Source of Truth, agnostic to rendering APIs. - Engine Layer: Manages high-level rendering logic, thread-pools, and platform-agnostic graphics abstractions (via Sokol GFX).
- Application Layer: Platform-specific entry points (SDL2 for Desktop, Emscripten for Web) handle input and OS-level interactions.
The WASM implementation utilizes SharedArrayBuffer to enable real multi-threading in the browser. The built-in scripts/server.py is configured to handle the required COOP/COEP security headers for local development.
| Platform | Renderer | SIMD | Status |
|---|---|---|---|
| Linux | CPU / GPU (OpenGL) | AVX-512 / AVX2 | Supported |
| macOS | CPU / GPU (OpenGL) | AVX-512 / AVX2 | Supported |
| Windows | CPU / GPU (OpenGL) | AVX-512 / AVX2 | Supported |
| Web (Browser) | CPU / GPU (WebGL 2.0) | SIMD128 | Supported |
The native CPU engine is designed for maximum throughput on multi-core systems:
- Dynamic Load Balancing: Instead of static partitioning, the engine uses an Atomic Row Counter. Threads dynamically "claim" the next available row of pixels, ensuring that no CPU core sits idle while others are stuck rendering dense "black" regions of the fractal.
- AVX2 Vectorization: Utilizing 256-bit YMM registers, the engine processes 4 double-precision complex numbers in a single instruction cycle (SIMD). This provides a theoretical 4x performance boost over scalar C code.
- Persistent Thread Pool: To avoid OS overhead, threads are spawned once at startup and managed via condition variables, ready to render new frames instantly as the user navigates. The thread count is capped at 64 regardless of core count. On WebAssembly, the engine always runs single-threaded due to platform constraints — multi-core Web Worker support is handled separately at the WASM subsystem level.
Bringing desktop-class performance to the browser required solving several engineering challenges:
- Multithreading via Web Workers: By leveraging Emscripten's pthreads support, the C engine runs across multiple Web Workers. These workers communicate via a SharedArrayBuffer, allowing them to share the same pixel memory space as the main thread.
- WASM-SIMD128: We utilize the modern WebAssembly SIMD proposal (128-bit) to process 2 double-precision points simultaneously, bridging the gap between browser and native performance.
- Security & Headers: To enable
SharedArrayBuffer, the environment must be "Cross-Origin Isolated." We implemented a specialized Service Worker (coi-serviceworker.js) to automatically inject COOP and COEP headers, ensuring the engine runs on standard static hosting without server-side configuration.
The GPU path offloads all calculations to the graphics card for real-time smoothness. The shader is written in GLSL and compiled via Sokol's sokol-shdc annotation format (@vs, @fs, @program).
- Hi-Lo Double Precision Emulation: Each coordinate is passed to the shader as two
vec2uniforms —center_hiandcenter_lo. The shader uses Dekker double-single arithmetic (ds_add+ds_mul) to perform full compensated addition and multiplication. This recovers ~48 mantissa bits from two 24-bit floats, achieving near-64-bit coordinate precision without hardware double support. Toggle between 32-bit and 64-bit mode at runtime withEon web. - Uniform Interface: The fragment shader receives
center_hi,center_lo,zoom,iterations,aspect_ratio,palette_idx,julia_mode,julia_c_hi,julia_c_lo, andhigh_precision— giving the CPU full control over every rendering parameter per frame. - All 6 Palettes in Shader: The GLSL palette function exactly replicates the fractional iteration interpolation from
color.c, ensuring GPU and CPU renders are visually identical when switching modes. - Cardioid and Period-2 Bulb Rejection: The shader performs the same early-exit checks as the CPU scalar path, skipping the iteration loop entirely for points confirmed inside the main set.
- Julia Set Support: A
julia_modeuniform switches the shader between Mandelbrot (z₀ = 0, c = pixel) and Julia (z₀ = pixel, c = fixed parameter passed asjulia_c_hi + julia_c_lo). - Correct Escape Radius: The shader uses
ESCAPE_RADIUS = 10.0matchingconfig.h, consistent with the CPU path. - Sokol GFX Integration: The same shader and pipeline logic runs on Native OpenGL (Desktop) and WebGL 2.0 (Browser) via Sokol GFX.
- Deferred Readback: Screenshots in GPU mode utilize a "Deferred Capture" system, ensuring the pixel data is read back from the GPU memory only after the frame is fully validated.
The following numbers were measured on a Linux system with an Intel CPU (AVX2-capable) and an integrated GPU. Results will vary by hardware.
| Mode | Resolution | Avg FPS | Throughput |
|---|---|---|---|
| 64-bit scalar (no SIMD) | 1920×1080 | ~30 fps | ~62 Mpx/s |
| 64-bit AVX2 (4× SIMD) | 1920×1080 | ~115 fps | ~239 Mpx/s |
128-bit simd-f128 (AVX2 double-double) |
1920×1080 | ~16 fps | ~33 Mpx/s |
| 64-bit AVX2 | 3840×2160 (4K) | ~30 fps | ~249 Mpx/s |
Note
128-bit mode uses software-emulated double-double arithmetic via AVX2. The ~7× slowdown versus 64-bit is expected and still significantly faster than a naive __float128 implementation (~20–30× slower).
| Mode | Resolution | Avg FPS | Throughput |
|---|---|---|---|
| 32-bit shader (native float) | 1920×1080 | ~79 fps | ~163 Mpx/s |
| 64-bit emulation (Hi-Lo Dekker) | 1920×1080 | ~60 fps | ~124 Mpx/s |
Tip
To reproduce these numbers, build with -DBUILD_CPU=ON or -DBUILD_GPU=ON and run the benchmarks in benchmarks/cpu/ or benchmarks/gpu/ respectively.
Before building, ensure the following tools and libraries are installed on your system.
| Dependency | Version | Notes |
|---|---|---|
| GCC / Clang | GCC 9+ / Clang 10+ | C99 support required |
| CMake | 3.10+ | Build system |
| SDL2 | 2.0.14+ | Windowing and input |
| SDL2_ttf | 2.0+ | Font rendering for HUD |
| libGL / OpenGL | 3.3+ | Required for Sokol GFX |
Linux (Debian/Ubuntu):
sudo apt install cmake libsdl2-dev libsdl2-ttf-dev libgl1-mesa-devmacOS (Homebrew):
brew install cmake sdl2 sdl2_ttf| Dependency | Version | Notes |
|---|---|---|
| GCC / Clang | GCC 9+ / Clang 10+ | C99 support required |
| CMake | 3.10+ | Build system |
| libGL / OpenGL | 3.3+ | Required for Sokol GFX |
The GPU engine does not depend on SDL2 or SDL2_ttf.
Linux (Debian/Ubuntu):
sudo apt install cmake libgl1-mesa-devmacOS (Homebrew):
brew install cmake| Dependency | Version | Notes |
|---|---|---|
| Emscripten | 3.1.0+ | WASM compiler toolchain |
| Python | 3.x | Required for server.py |
Follow the Emscripten installation guide and ensure emcmake is available in your PATH.
Run ./build.sh without arguments for a numbered menu:
./build.shPass a target directly to skip the menu:
| Command | Action |
|---|---|
./build.sh cpu |
Build CPU engine only |
./build.sh gpu |
Build GPU engine only |
./build.sh web |
Build web (WASM) engine only |
./build.sh all |
Build all three targets |
./build.sh clean |
Remove all build directories |
# Desktop — CPU engine
cmake -S . -B build_cpu -DBUILD_CPU=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build_cpu
./build_cpu/mandelbrot_cpu
# Desktop — GPU engine
cmake -S . -B build_gpu -DBUILD_GPU=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build_gpu
./build_gpu/mandelbrot_gpu
# Web (WASM)
emcmake cmake -S . -B build_web -DBUILD_WEB=ON
cmake --build build_web
# Output is automatically copied to the deploy/ folderThe web build requires specific HTTP security headers (COOP/COEP) to enable SharedArrayBuffer. Use the included server script:
python3 scripts/server.pyThen open http://localhost:8081 in your browser.
Optional arguments:
| Argument | Default | Description |
|---|---|---|
--port |
8081 |
Port to listen on |
--dir |
web |
Directory to serve |
# Example: serve the deploy/ folder on port 9000
python3 scripts/server.py --dir deploy --port 9000Rendering parameters can be tuned in include/config.h:
| Parameter | Default | Description |
|---|---|---|
WINDOW_WIDTH / WINDOW_HEIGHT |
1024 / 768 |
Initial window resolution |
DEFAULT_ITERATIONS |
500 |
Initial iteration depth |
MAX_ITERATIONS_LIMIT |
10000 |
Upper bound for runtime adjustments |
DEFAULT_THREAD_COUNT |
4 |
Number of parallel threads (0 = auto-detect from CPU cores, max 64) |
ESCAPE_RADIUS |
10 |
Mathematical threshold for divergence |
DEFAULT_PALETTE |
0 |
Starting color palette index (see table below) |
INITIAL_CENTER_RE / INITIAL_CENTER_IM |
-0.5 / 0.0 |
Initial view center (complex plane) |
INITIAL_ZOOM |
3.0 |
Initial zoom level |
MAX_HISTORY_SIZE |
100 |
Maximum undo history depth |
The engine ships with 6 built-in palettes, selectable at runtime with P or Iter+/Iter-, or set as default via DEFAULT_PALETTE in config.h:
| Index | Name | Character |
|---|---|---|
0 |
Sine Wave | Smooth cycling colors using sine-wave interpolation |
1 |
Grayscale | Pure luminance, iteration count mapped to brightness |
2 |
Fire | Blue-to-white ramp, cool-to-hot gradient |
3 |
Electric | Red-dominant, high-contrast neon feel |
4 |
Ocean | Warm amber tones with subtle blue undertones |
5 |
Inferno | Deep blue-to-white, high-zoom detail emphasis |
All palettes use fractional iteration interpolation to eliminate color banding at region boundaries.
The test suite covers core mathematical correctness, AVX2 vs scalar consistency, threading correctness, and I/O validation. Tests are integrated into the CMake build system and run via ctest.
cmake -S . -B build_cpu -DBUILD_CPU=ON
cmake --build build_cpu
ctest --test-dir build_cpu --output-on-failure| Test | Description |
|---|---|
test_math |
Verifies Mandelbrot/Julia/Burning Ship escape math, cardioid/period-2 bulb rejection, and AVX2 vs scalar consistency within 1e-7 |
test_renderer |
Validates the persistent thread pool dispatch — ensures pixel output is correctly produced across all worker threads |
test_color |
Confirms all 6 palette functions produce valid ARGB values and gradient continuity |
test_bookmark |
Tests bookmark serialization and round-trip load/save correctness |
test_tour |
Validates tour phase state machine transitions and coordinate interpolation |
test_config |
Verifies settings.txt parsing and default fallback values |
AVX2 tests are compiled and run automatically if the host CPU supports it. On machines without AVX2, the scalar path is used and consistency tests are skipped.
.
├── include/ # Global configuration and platform headers
├── src/
│ ├── core/ # Pure Mathematical Engine (Single Source of Truth)
│ ├── engine/ # Platform-Agnostic Renderers, Tours, and Logic
│ └── app/ # Platform-Specific Entries (Desktop, Web)
├── shaders/ # GLSL shader source files
├── web/ # Web Frontend (HTML, CSS, JS)
├── assets/ # Shared Typography and Media
├── tests/ # Automated Unit Testing Suite
├── benchmarks/
│ ├── cpu/ # CPU benchmarks (math kernels, renderer throughput, I/O)
│ └── gpu/ # GPU benchmarks (Sokol shader throughput)
├── third_party/ # Vendored external libraries
│ ├── sokol/ # Sokol headers (GFX, App, GL, Time, Fontstash)
│ ├── stb/ # stb_image_write for PNG/TGA export
│ ├── fons/ # Fontstash for HUD text rendering
│ └── simd-f128/ # AVX2-accelerated 128-bit double-double precision
├── scripts/ # Utility scripts (local dev server, etc.)
├── deploy/ # Generated by web build — ready-to-serve package
├── CMakeLists.txt # Unified Cross-platform Build System
└── build.sh # Interactive TUI Build Wrapper
- Implement dynamic load balancing using atomic row-counters to maximize CPU utilization.
- Integrate a pre-calculated Look-Up Table (LUT) for color mapping.
- Implement smooth coloring algorithms using fractional iteration counts.
- Deploy hardware-specific vectorization (AVX2 for Desktop, SIMD128 for WebAssembly).
- Research and implement pure-shader fractal calculation for GPU rendering.
- Optimize Julia set calculation using hardware-specific vectorization.
- Add interactive runtime controls for iteration depth and palette switching.
- Implement automated "camera path" and "tour" modes.
- Connect HTML5 Frontend APIs to the web-engine for a responsive experience.
- Implement URL-based state recovery and deep-linking for sharing discoveries.
- Add mobile touch support (pinch-to-zoom and gesture-based panning).
- Establish a strict Engine-Centric Monorepo architecture.
- Implement a high-performance CMake build system.
- Expand unit testing coverage to ensure mathematical consistency (math, renderer, color, bookmark, tour, config).
- Implement automatic CPU core detection for dynamic thread pool allocation.
- Implement Hi-Lo 64-bit precision emulation for GPU shaders.
- Implement 128-bit software double-double precision via
simd-f128(AVX2-accelerated) for deep CPU zoom. - Build a comprehensive benchmark suite covering CPU math kernels, multi-threaded renderer throughput (64-bit and 128-bit), image I/O, and GPU shader throughput.
- Integrate automated performance benchmarks into all CI pipelines (Linux, macOS, Windows) with GitHub Step Summary reports.
- Add Enterprise CI workflows: Code Formatter Enforcement (
clang-format), Memory Safety (Valgrind), and Static Security Analysis (CodeQL). - Research and implement arbitrary-precision arithmetic for infinite zoom.
Contributions, bug reports, and suggestions are welcome. Areas of particular interest include memory safety, SIMD optimization, and GPGPU improvements.
To contribute:
- Open an issue to discuss bugs or proposed changes.
- Fork the repository and open a pull request with your changes.
- Descriptive commit messages and clear explanations are appreciated.
Building a high-performance fractal engine in C involves navigating complex engineering tradeoffs — from SIMD vectorization strategies and IEEE 754 floating-point precision limits, to lock-free thread pool design and cross-platform shader compatibility.
To achieve this level of stability and performance, this project was architected and rigorously verified in collaboration with Advanced Agentic AI. AI was specifically utilized to:
- Validate AVX2 intrinsic correctness and ensure scalar/SIMD result consistency within
1e-7tolerance. - Assist in designing the persistent thread pool architecture (condition variable signalling, atomic row counter load balancing).
- Verify Hi-Lo double-single arithmetic in GLSL shaders for 64-bit precision emulation without hardware double support.
- Automate the generation of robust cross-platform CI/CD pipelines (Linux, macOS, Windows, WASM) including memory safety checks and static analysis.
However, human agency remains at the core of this project. Every line of code generated or suggested was manually inspected, audited, and verified. The core architecture, algorithms, and mathematical implementation were human-planned. This hybrid approach — combining human architectural vision with AI-driven debugging and verification — allowed this project to reach a level of engineering quality well beyond what a solo developer could achieve alone.
I'm just a kid building projects as a hobby. Thank you for showing interest in my little library! It really means a lot to me. :)
This project is licensed under the MIT License - see the LICENSE file for details.





