Skip to content

feat: flamegraph CPU-scaling benchmark tooling#985

Merged
Manuthor merged 42 commits into
developfrom
feat/flamegraph
Jun 26, 2026
Merged

feat: flamegraph CPU-scaling benchmark tooling#985
Manuthor merged 42 commits into
developfrom
feat/flamegraph

Conversation

@serene-kitfisto-8899

@serene-kitfisto-8899 serene-kitfisto-8899 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds the profiling infrastructure to generate CPU-scaling flamegraphs for the Cosmian KMS, providing empirical proof that the server scales linearly with CPU count and that the dominant hotspot is cryptographic work (not locking, not I/O).

The --server-workers flag and the http_throughput Criterion bench were landed in previous commits on develop; this PR adds the flamegraph/benchmarking tooling on top.

Changes

New: bench_run_flamegraph.sh

Self-contained script modeled on bench_run_load.sh that:

  1. Runs the http_throughput Criterion bench across all worker counts (1/2/4/8 by default)
  2. Generates per-worker flamegraph SVGs (ecdsa_sign_w<N>.svg) via cargo-flamegraph / Linux perf
  3. Writes an updated cpu_scaling.md report under target/flamegraphs/

Key env controls: WORKER_COUNTS, PROFILE_TIME, PERF_FREQ, SKIP_FLAMEGRAPH, SKIP_THROUGHPUT.

[profile.bench] in Cargo.toml

The release profile uses opt-level = "z" (size) and strip = "symbols", causing a 15-40% throughput penalty in benchmarks and completely unreadable flamegraphs. New [profile.bench] overrides to opt-level = 3, strip = "none", debug = 1, lto = "thin".

.cargo/config.toml - perf flags for x86_64

  • force-frame-pointers=yes: forces perf --call-graph fp, reducing perf.data from 4.2 GB to ~20 MB

SQLite pragma tuning

Added cache_size (64 MiB), mmap_size (256 MiB), temp_store=MEMORY to reduce syscall overhead under concurrent read load.

.gitignore

Ignored perf.data, perf.data.old, target/flamegraphs/.

How to run

sudo sysctl kernel.perf_event_paranoid=-1
cargo install flamegraph --locked

# Full run (generates per-worker SVGs)
bash .github/scripts/benchmarks/bench_run_flamegraph.sh --variant non-fips

# Throughput only (no perf required)
SKIP_FLAMEGRAPH=1 bash .github/scripts/benchmarks/bench_run_flamegraph.sh

Outputs are written to target/flamegraphs/:

  • ecdsa_sign_w<N>.svg — per-worker interactive flamegraph
  • cpu_scaling.md — markdown report linking to all of the above

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CPU-scaling benchmarking + flamegraph tooling around the KMS HTTP server to empirically validate multi-core scaling, and introduces supporting configuration/profile changes (server worker count knob, bench build profile, CI workflow, and benchmark scripts/docs).

Changes:

  • Introduces a Criterion benchmark (http_throughput) and a GitHub Actions workflow to generate CPU-scaling throughput reports and perf-based flamegraphs.
  • Adds a --server-workers / KMS_SERVER_WORKERS configuration option and wires it into Actix HttpServer::workers.
  • Tunes benchmarking build/profile settings (bench profile, cargo config flags) and adjusts SQLite PRAGMAs for concurrent read performance.

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
resources/kms.toml Documents server_workers in sample config.
pkg/kms.toml Documents server_workers in packaged config.
documentation/mkdocs.yml Adds navigation entry for CPU scaling/flamegraph page.
documentation/docs/certifications_and_compliance/cryptographic_algorithms/benchmarks/cpu_scaling.md Adds methodology and reproduction documentation for scaling + flamegraphs.
crate/test_kms_server/Cargo.toml Registers the new http_throughput bench and adds futures dev-dependency.
crate/test_kms_server/benches/http_throughput.rs Adds the async HTTP throughput benchmark sweeping worker counts and operations.
crate/server/src/start_kms_server.rs Applies configurable Actix worker thread count on server startup.
crate/server/src/main.rs Updates test config struct initialization for new HTTP field.
crate/server/src/config/wizard/tests.rs Updates wizard round-trip tests for new HTTP field.
crate/server/src/config/wizard/http_wizard.rs Adds default server_workers: None in wizard HTTP config.
crate/server/src/config/params/server_params.rs Plumbs server_workers from config into ServerParams.
crate/server/src/config/command_line/http_config.rs Adds CLI/env support for server_workers.
crate/server/kms_template.toml Documents server_workers in server template config.
crate/server_database/src/stores/sql/sqlite.rs Adds SQLite PRAGMA tuning (cache/mmap/temp_store) alongside WAL/synchronous/busy_timeout.
CHANGELOG/develop.md Adds develop-branch changelog entries describing the new benchmark/tooling.
Cargo.toml Adds [profile.bench] tuned for flamegraph visibility and throughput realism.
Cargo.lock Updates lockfile (notably adds futures and bumps some transitive deps).
.gitignore Ignores perf and flamegraph output artifacts.
.github/workflows/flamegraph.yml Adds CI workflow to run the benchmark and generate/upload flamegraphs.
.github/scripts/benchmarks/README.md Documents the new flamegraph benchmark runner script.
.github/scripts/benchmarks/bench_run_flamegraph.sh Adds a local script to run throughput + flamegraph collection and emit a markdown report.
.cargo/config.toml Adds a bench alias and x86_64 Linux rustflags intended for perf/flamegraph runs.

Comment thread crate/test_kms_server/benches/http_throughput.rs
Comment thread crate/test_kms_server/benches/http_throughput.rs Outdated
Comment thread crate/test_kms_server/benches/http_throughput.rs Outdated
Comment thread .github/scripts/benchmarks/bench_run_flamegraph.sh Outdated
Comment thread .github/scripts/benchmarks/bench_run_flamegraph.sh Outdated
Comment thread .github/workflows/flamegraph.yml Outdated
Comment thread crate/server_database/src/stores/sql/sqlite.rs
Comment thread .github/scripts/benchmarks/README.md Outdated
Comment thread .github/scripts/benchmarks/bench_run_flamegraph.sh Outdated
Comment thread .github/scripts/benchmarks/README.md Outdated
Comment thread .cargo/config.toml Outdated
Comment thread crate/test_kms_server/benches/http_throughput.rs
Comment thread .github/workflows/flamegraph.yml Outdated
Comment thread .mise/scripts/bench/bench_run_flamegraph.sh
Comment thread .mise/scripts/bench/bench_run_flamegraph.sh
Comment thread crate/server/src/start_kms_server.rs
Comment thread Cargo.toml
@serene-kitfisto-8899

Copy link
Copy Markdown
Contributor Author

@copilot disable workflows flamegraph.yml this job will be run manually

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@copilot disable workflows flamegraph.yml this job will be run manually

Done — removed the schedule and pull_request triggers from flamegraph.yml. The workflow now only runs on workflow_dispatch. (commit ef3e27f → new commit on branch)

@serene-kitfisto-8899

Copy link
Copy Markdown
Contributor Author

@copilot fix : .github/scripts/benchmarks/bench_run_flamegraph.sh: line 260: unexpected EOF while looking for matching `"'

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@copilot fix : .github/scripts/benchmarks/bench_run_flamegraph.sh: line 260: unexpected EOF while looking for matching `"'

Fixed — there was a stray " on line 212 left over from an earlier edit that opened a string never closed. Removed in the latest commit.

@serene-kitfisto-8899

Copy link
Copy Markdown
Contributor Author

@copilot group svg flamefraphs form target/flamegraphs and make one graph in order to show worker scaling

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@copilot group svg flamefraphs form target/flamegraphs and make one graph in order to show worker scaling

Done. Added .github/scripts/benchmarks/combine_flamegraphs.py and wired it into bench_run_flamegraph.sh.

After the per-worker SVGs are generated, the script now calls the combiner which produces target/flamegraphs/cpu_scaling_combined.html — a single self-contained HTML file containing:

  • A throughput bar chart (req/s vs worker count) at the top, with pale "ideal linear scaling" reference bars for comparison
  • All per-worker interactive flamegraphs stacked vertically below with labelled headers and jump-to navigation

The combined HTML is also linked from the cpu_scaling.md report under a new "Combined scaling overview" section. The individual per-worker SVGs are still kept alongside it.

@serene-kitfisto-8899

Copy link
Copy Markdown
Contributor Author

@copilot remove the combine_flamegraphs.py script. this feature is useless.

Copilot AI commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@copilot remove the combine_flamegraphs.py script. this feature is useless.

Removed combine_flamegraphs.py and all related wiring from bench_run_flamegraph.sh in the latest commit.

@serene-kitfisto-8899

Copy link
Copy Markdown
Contributor Author

@copilot update nix hashes

Copilot AI commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

@copilot update nix hashes

The Nix hashes are already current. The last Packaging CI run (all 16 jobs passed, commit f58d1a60) confirmed that nix/expected-hashes/ is consistent with the branch's Cargo.lock and ui/pnpm-lock.yaml. The two subsequent commits (ad67b47, 8d811cf) only touched start_kms_server.rs and CHANGELOG/develop.md — no lock file changes — so the hashes remain valid and no update is needed.

If you want to bring the branch up to develop's 5.24.0 package set, that would require rebasing/merging develop first, after which I can pull the new hashes from CI.

serene-kitfisto-8899 and others added 25 commits June 26, 2026 10:51
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Three for-loops over join_all(tasks).await had incorrect indentation
(flush-left instead of matching the surrounding async block level),
causing cargo fmt --all -- --check to fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Removed comments and configuration for target CPU tuning.
The script still referenced the old .github/scripts/benchmarks/ path.
Updated to source ../common.sh relative to the new .mise/scripts/bench/ location.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FIPS mode requires a locally-built OpenSSL 3.6.2 with the FIPS provider,
which may not be available in all environments (e.g. when the download
is blocked by a proxy). Default to non-fips for convenience; callers can
still override with --variant fips or VARIANT=fips.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RSA-2048 OAEP decrypt takes ~450ms/iter, which exceeds the default 5s
measurement window for 20 samples. Setting 10s eliminates the warning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After each SVG is generated, convert it to PNG using rsvg-convert
(preferred) or ImageMagick convert as fallback. The markdown report
references PNGs when available, falling back to SVGs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three changes to reduce unresolved frames:

1. Cargo.toml [profile.bench]: bump debug=1 → debug=2 for full DWARF
   info (line-level attribution + reliable unwinding by perf).

2. bench_run_flamegraph.sh: switch perf from --call-graph fp to
   --call-graph dwarf,32768. DWARF unwinding resolves system library
   frames (glibc, OpenSSL .so) that lack frame pointers. 32 KB stack
   dump handles deep async/tokio stacks.

3. Export RUSTFLAGS='-C force-frame-pointers=yes' before cargo bench
   so all Rust code preserves the RBP chain — belt-and-suspenders with
   DWARF unwinding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The script now builds OpenSSL 3.6.2 from source with -g -O2 (optimized
+ full DWARF symbols) into target/openssl-debug-3.6.2-linux-<arch>/.
This eliminates [unknown] frames for OpenSSL/libcrypto in flamegraphs.

- Downloads tarball with --noproxy to bypass corporate proxies
- SHA-256 verified before extraction
- Cached: only builds once, reused on subsequent runs
- Sets OPENSSL_DIR so build.rs uses it instead of system OpenSSL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Regenerated with debug-instrumented OpenSSL (-g -O2), DWARF unwinding,
and force-frame-pointers. Resolves previously [unknown] frames.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Checked out log-reference.md from origin/develop and ran
update_log_index.py to add 2 new entries from current source:
  - INFO  crate/server  start_kms_server.rs  "KMS HTTP server configured with {n} worker thread(s)"
  - LOG   ui/src        App.tsx              "JWT fallback failed:"

Fixes: Log index CI check on PR #985.
scripts/generate_rekey_vectors.sh was moved to .mise/scripts/test/,
leaving scripts/ with only a stale .gitignore. Remove it.
- Add .mise/scripts/docs/update_log_index.py (copied from develop)
- Add .mise/tasks/docs/log-index — updates log-reference.md from source;
  handles exit-1 (stale/new entries noted) vs exit-2 (hard error)
- Sync log-reference.md: removed 25 stale entries that referenced
  code paths from docs/key-autorotation-spec (rekey/ subdir,
  auto_rotate.rs, recertify.rs) which do not exist in this branch;
  file now documents 1063 entries matching feat/flamegraph source exactly
@Manuthor Manuthor enabled auto-merge (squash) June 26, 2026 09:02
@Manuthor Manuthor merged commit 0954eaa into develop Jun 26, 2026
58 checks passed
@Manuthor Manuthor deleted the feat/flamegraph branch June 26, 2026 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants