feat: flamegraph CPU-scaling benchmark tooling by serene-kitfisto-8899 · Pull Request #985 · Cosmian/kms

serene-kitfisto-8899 · 2026-06-04T12:21:48Z

Summary

This PR adds the profiling infrastructure to generate CPU-scaling flamegraphs for the Cosmian KMS, providing empirical proof that the server scales linearly with CPU count and that the dominant hotspot is cryptographic work (not locking, not I/O).

The --server-workers flag and the http_throughput Criterion bench were landed in previous commits on develop; this PR adds the flamegraph/benchmarking tooling on top.

Changes

New: `bench_run_flamegraph.sh`

Self-contained script modeled on bench_run_load.sh that:

Runs the http_throughput Criterion bench across all worker counts (1/2/4/8 by default)
Generates per-worker flamegraph SVGs (ecdsa_sign_w<N>.svg) via cargo-flamegraph / Linux perf
Writes an updated cpu_scaling.md report under target/flamegraphs/

Key env controls: WORKER_COUNTS, PROFILE_TIME, PERF_FREQ, SKIP_FLAMEGRAPH, SKIP_THROUGHPUT.

`[profile.bench]` in `Cargo.toml`

The release profile uses opt-level = "z" (size) and strip = "symbols", causing a 15-40% throughput penalty in benchmarks and completely unreadable flamegraphs. New [profile.bench] overrides to opt-level = 3, strip = "none", debug = 1, lto = "thin".

`.cargo/config.toml` - perf flags for x86_64

force-frame-pointers=yes: forces perf --call-graph fp, reducing perf.data from 4.2 GB to ~20 MB

SQLite pragma tuning

Added cache_size (64 MiB), mmap_size (256 MiB), temp_store=MEMORY to reduce syscall overhead under concurrent read load.

`.gitignore`

Ignored perf.data, perf.data.old, target/flamegraphs/.

How to run

sudo sysctl kernel.perf_event_paranoid=-1
cargo install flamegraph --locked

# Full run (generates per-worker SVGs)
bash .github/scripts/benchmarks/bench_run_flamegraph.sh --variant non-fips

# Throughput only (no perf required)
SKIP_FLAMEGRAPH=1 bash .github/scripts/benchmarks/bench_run_flamegraph.sh

Outputs are written to target/flamegraphs/:

ecdsa_sign_w<N>.svg — per-worker interactive flamegraph
cpu_scaling.md — markdown report linking to all of the above

Copilot

Pull request overview

This PR adds CPU-scaling benchmarking + flamegraph tooling around the KMS HTTP server to empirically validate multi-core scaling, and introduces supporting configuration/profile changes (server worker count knob, bench build profile, CI workflow, and benchmark scripts/docs).

Changes:

Introduces a Criterion benchmark (http_throughput) and a GitHub Actions workflow to generate CPU-scaling throughput reports and perf-based flamegraphs.
Adds a --server-workers / KMS_SERVER_WORKERS configuration option and wires it into Actix HttpServer::workers.
Tunes benchmarking build/profile settings (bench profile, cargo config flags) and adjusts SQLite PRAGMAs for concurrent read performance.

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
resources/kms.toml	Documents `server_workers` in sample config.
pkg/kms.toml	Documents `server_workers` in packaged config.
documentation/mkdocs.yml	Adds navigation entry for CPU scaling/flamegraph page.
documentation/docs/certifications_and_compliance/cryptographic_algorithms/benchmarks/cpu_scaling.md	Adds methodology and reproduction documentation for scaling + flamegraphs.
crate/test_kms_server/Cargo.toml	Registers the new `http_throughput` bench and adds `futures` dev-dependency.
crate/test_kms_server/benches/http_throughput.rs	Adds the async HTTP throughput benchmark sweeping worker counts and operations.
crate/server/src/start_kms_server.rs	Applies configurable Actix worker thread count on server startup.
crate/server/src/main.rs	Updates test config struct initialization for new HTTP field.
crate/server/src/config/wizard/tests.rs	Updates wizard round-trip tests for new HTTP field.
crate/server/src/config/wizard/http_wizard.rs	Adds default `server_workers: None` in wizard HTTP config.
crate/server/src/config/params/server_params.rs	Plumbs `server_workers` from config into `ServerParams`.
crate/server/src/config/command_line/http_config.rs	Adds CLI/env support for `server_workers`.
crate/server/kms_template.toml	Documents `server_workers` in server template config.
crate/server_database/src/stores/sql/sqlite.rs	Adds SQLite PRAGMA tuning (cache/mmap/temp_store) alongside WAL/synchronous/busy_timeout.
CHANGELOG/develop.md	Adds develop-branch changelog entries describing the new benchmark/tooling.
Cargo.toml	Adds `[profile.bench]` tuned for flamegraph visibility and throughput realism.
Cargo.lock	Updates lockfile (notably adds `futures` and bumps some transitive deps).
.gitignore	Ignores perf and flamegraph output artifacts.
.github/workflows/flamegraph.yml	Adds CI workflow to run the benchmark and generate/upload flamegraphs.
.github/scripts/benchmarks/README.md	Documents the new flamegraph benchmark runner script.
.github/scripts/benchmarks/bench_run_flamegraph.sh	Adds a local script to run throughput + flamegraph collection and emit a markdown report.
.cargo/config.toml	Adds a bench alias and x86_64 Linux rustflags intended for perf/flamegraph runs.

serene-kitfisto-8899 · 2026-06-05T15:04:45Z

@copilot disable workflows flamegraph.yml this job will be run manually

Copilot · 2026-06-05T15:06:40Z

@copilot disable workflows flamegraph.yml this job will be run manually

Done — removed the schedule and pull_request triggers from flamegraph.yml. The workflow now only runs on workflow_dispatch. (commit ef3e27f → new commit on branch)

serene-kitfisto-8899 · 2026-06-05T15:36:59Z

@copilot fix : .github/scripts/benchmarks/bench_run_flamegraph.sh: line 260: unexpected EOF while looking for matching `"'

Copilot · 2026-06-05T15:38:29Z

@copilot fix : .github/scripts/benchmarks/bench_run_flamegraph.sh: line 260: unexpected EOF while looking for matching `"'

Fixed — there was a stray " on line 212 left over from an earlier edit that opened a string never closed. Removed in the latest commit.

serene-kitfisto-8899 · 2026-06-05T16:13:53Z

@copilot group svg flamefraphs form target/flamegraphs and make one graph in order to show worker scaling

Copilot · 2026-06-05T16:18:19Z

@copilot group svg flamefraphs form target/flamegraphs and make one graph in order to show worker scaling

Done. Added .github/scripts/benchmarks/combine_flamegraphs.py and wired it into bench_run_flamegraph.sh.

After the per-worker SVGs are generated, the script now calls the combiner which produces target/flamegraphs/cpu_scaling_combined.html — a single self-contained HTML file containing:

A throughput bar chart (req/s vs worker count) at the top, with pale "ideal linear scaling" reference bars for comparison
All per-worker interactive flamegraphs stacked vertically below with labelled headers and jump-to navigation

The combined HTML is also linked from the cpu_scaling.md report under a new "Combined scaling overview" section. The individual per-worker SVGs are still kept alongside it.

serene-kitfisto-8899 · 2026-06-08T09:36:20Z

@copilot remove the combine_flamegraphs.py script. this feature is useless.

Copilot · 2026-06-08T09:37:59Z

@copilot remove the combine_flamegraphs.py script. this feature is useless.

Removed combine_flamegraphs.py and all related wiring from bench_run_flamegraph.sh in the latest commit.

serene-kitfisto-8899 · 2026-06-19T07:43:26Z

@copilot update nix hashes

Copilot · 2026-06-19T07:50:22Z

@copilot update nix hashes

The Nix hashes are already current. The last Packaging CI run (all 16 jobs passed, commit f58d1a60) confirmed that nix/expected-hashes/ is consistent with the branch's Cargo.lock and ui/pnpm-lock.yaml. The two subsequent commits (ad67b47, 8d811cf) only touched start_kms_server.rs and CHANGELOG/develop.md — no lock file changes — so the hashes remain valid and no update is needed.

If you want to bring the branch up to develop's 5.24.0 package set, that would require rebasing/merging develop first, after which I can pull the new hashes from CI.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…eric for Ubuntu 24.04

Three for-loops over join_all(tasks).await had incorrect indentation (flush-left instead of matching the surrounding async block level), causing cargo fmt --all -- --check to fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nual only)

…flamegraph.sh

…erview HTML

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Removed comments and configuration for target CPU tuning.

The script still referenced the old .github/scripts/benchmarks/ path. Updated to source ../common.sh relative to the new .mise/scripts/bench/ location. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FIPS mode requires a locally-built OpenSSL 3.6.2 with the FIPS provider, which may not be available in all environments (e.g. when the download is blocked by a proxy). Default to non-fips for convenience; callers can still override with --variant fips or VARIANT=fips. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RSA-2048 OAEP decrypt takes ~450ms/iter, which exceeds the default 5s measurement window for 20 samples. Setting 10s eliminates the warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After each SVG is generated, convert it to PNG using rsvg-convert (preferred) or ImageMagick convert as fallback. The markdown report references PNGs when available, falling back to SVGs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three changes to reduce unresolved frames: 1. Cargo.toml [profile.bench]: bump debug=1 → debug=2 for full DWARF info (line-level attribution + reliable unwinding by perf). 2. bench_run_flamegraph.sh: switch perf from --call-graph fp to --call-graph dwarf,32768. DWARF unwinding resolves system library frames (glibc, OpenSSL .so) that lack frame pointers. 32 KB stack dump handles deep async/tokio stacks. 3. Export RUSTFLAGS='-C force-frame-pointers=yes' before cargo bench so all Rust code preserves the RBP chain — belt-and-suspenders with DWARF unwinding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The script now builds OpenSSL 3.6.2 from source with -g -O2 (optimized + full DWARF symbols) into target/openssl-debug-3.6.2-linux-<arch>/. This eliminates [unknown] frames for OpenSSL/libcrypto in flamegraphs. - Downloads tarball with --noproxy to bypass corporate proxies - SHA-256 verified before extraction - Cached: only builds once, reused on subsequent runs - Sets OPENSSL_DIR so build.rs uses it instead of system OpenSSL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Regenerated with debug-instrumented OpenSSL (-g -O2), DWARF unwinding, and force-frame-pointers. Resolves previously [unknown] frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Checked out log-reference.md from origin/develop and ran update_log_index.py to add 2 new entries from current source: - INFO crate/server start_kms_server.rs "KMS HTTP server configured with {n} worker thread(s)" - LOG ui/src App.tsx "JWT fallback failed:" Fixes: Log index CI check on PR #985.

scripts/generate_rekey_vectors.sh was moved to .mise/scripts/test/, leaving scripts/ with only a stale .gitignore. Remove it.

- Add .mise/scripts/docs/update_log_index.py (copied from develop) - Add .mise/tasks/docs/log-index — updates log-reference.md from source; handles exit-1 (stale/new entries noted) vs exit-2 (hard error) - Sync log-reference.md: removed 25 stale entries that referenced code paths from docs/key-autorotation-spec (rekey/ subdir, auto_rotate.rs, recertify.rs) which do not exist in this branch; file now documents 1063 entries matching feat/flamegraph source exactly

serene-kitfisto-8899 requested review from Manuthor and Copilot June 4, 2026 13:59

Copilot started reviewing on behalf of serene-kitfisto-8899 June 4, 2026 14:00 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Copilot started work on behalf of serene-kitfisto-8899 June 4, 2026 14:42 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 4, 2026 14:44

Copilot started work on behalf of serene-kitfisto-8899 June 4, 2026 14:57 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 4, 2026 15:02

Manuthor reviewed Jun 5, 2026

View reviewed changes

Copilot started work on behalf of serene-kitfisto-8899 June 5, 2026 15:04 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 5, 2026 15:06

Copilot started work on behalf of serene-kitfisto-8899 June 5, 2026 15:37 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 5, 2026 15:38

Copilot started work on behalf of serene-kitfisto-8899 June 5, 2026 16:14 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 5, 2026 16:18

Copilot started work on behalf of serene-kitfisto-8899 June 8, 2026 09:36 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 8, 2026 09:38

serene-kitfisto-8899 requested a review from Manuthor June 8, 2026 09:49

Copilot started work on behalf of serene-kitfisto-8899 June 19, 2026 07:43 View session

Copilot finished work on behalf of serene-kitfisto-8899 June 19, 2026 07:50

serene-kitfisto-8899 and others added 25 commits June 26, 2026 10:51

Potential fix for pull request finding

da2dce4

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

fix(ci): replace linux-perf with linux-tools-common + linux-tools-gen…

83fcf41

…eric for Ubuntu 24.04

ci: remove schedule and pull_request triggers from flamegraph.yml (ma…

f29fb2b

…nual only)

fix: remove stray double-quote causing EOF syntax error in bench_run_…

5f2db8c

…flamegraph.sh

fix auto fix

eb8b389

feat: combine per-worker flamegraph SVGs into a single CPU-scaling ov…

4478d5e

…erview HTML

feat: remove combine_flamegraphs.py and related wiring

f03c706

Potential fix for pull request finding

eb5e9eb

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

53a3fc3

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Delete .github/workflows/flamegraph.yml

595884b

Clean up .cargo/config.toml by removing unused settings

ae29510

Removed comments and configuration for target CPU tuning.

fix(bench): correct common.sh source path after directory rename

78ab5c6

The script still referenced the old .github/scripts/benchmarks/ path. Updated to source ../common.sh relative to the new .mise/scripts/bench/ location. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(bench): increase measurement time to 10s to avoid Criterion warning

6c17aff

RSA-2048 OAEP decrypt takes ~450ms/iter, which exceeds the default 5s measurement window for 20 samples. Setting 10s eliminates the warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs(bench): add ECDSA P-256 sign flamegraph (8 workers)

0a76d3d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs(bench): update flamegraph with debug OpenSSL symbols

befcdbb

Regenerated with debug-instrumented OpenSSL (-g -O2), DWARF unwinding, and force-frame-pointers. Resolves previously [unknown] frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

remove lto should reuse the release lto

5aae303

chore: remove empty scripts/ directory

24a5d90

scripts/generate_rekey_vectors.sh was moved to .mise/scripts/test/, leaving scripts/ with only a stale .gitignore. Remove it.

fix: rebase

2d7ae51

Manuthor force-pushed the feat/flamegraph branch from 5750cca to 2d7ae51 Compare June 26, 2026 08:52

Manuthor enabled auto-merge (squash) June 26, 2026 09:02

Manuthor approved these changes Jun 26, 2026

View reviewed changes

Manuthor merged commit 0954eaa into develop Jun 26, 2026
58 checks passed

Manuthor deleted the feat/flamegraph branch June 26, 2026 10:02

Uh oh!

Conversation

serene-kitfisto-8899 commented Jun 4, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New: bench_run_flamegraph.sh

[profile.bench] in Cargo.toml

.cargo/config.toml - perf flags for x86_64

SQLite pragma tuning

.gitignore

How to run

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serene-kitfisto-8899 commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

serene-kitfisto-8899 commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

serene-kitfisto-8899 commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

serene-kitfisto-8899 commented Jun 8, 2026

Uh oh!

Copilot AI commented Jun 8, 2026

Uh oh!

serene-kitfisto-8899 commented Jun 19, 2026

Uh oh!

Copilot AI commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

serene-kitfisto-8899 commented Jun 4, 2026 •

edited by Copilot AI

Loading

New: `bench_run_flamegraph.sh`

`[profile.bench]` in `Cargo.toml`

`.cargo/config.toml` - perf flags for x86_64

`.gitignore`