Optimizations#62
Open
homm wants to merge 4 commits into
Open
Conversation
homm
commented
May 2, 2026
| pub mod metrics; | ||
| pub mod sources; | ||
|
|
||
| #[cfg(feature = "bench")] |
Author
There was a problem hiding this comment.
This bridge is required since benchmarks are separate target and can't access private functions directly.
$ time cargo bb Timer precision: 41 ns bench fastest β slowest β median β mean β samples β iters ββ ioreport β β β β β β ββ get_samples_0_1 3.454 ms β 7.501 ms β 3.578 ms β 3.646 ms β 100 β 100 β ββ get_samples_0_4 13.95 ms β 17.8 ms β 14.28 ms β 14.37 ms β 100 β 100 β β°β subscription 220.2 ms β 225.6 ms β 223.4 ms β 222.9 ms β 10 β 10 ββ sampler β β β β β β β°β get_metrics_0 46.93 ms β 66.28 ms β 49.24 ms β 50.7 ms β 10 β 10 β°β smc β β β β β ββ full_init 2.47 s β 2.539 s β 2.514 s β 2.508 s β 3 β 3 β°β read_all_keys 1.385 s β 2.097 s β 1.405 s β 1.629 s β 3 β 3 cargo bb 1.59s user 5.11s system 37% cpu 17.905 total Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com> # Conflicts: # src/metrics.rs
Co-authored-by: Codex <codex@openai.com>
$ time cargo bb Timer precision: 41 ns bench fastest β slowest β median β mean β samples β iters ββ ioreport β β β β β β ββ get_samples_0_1 2.862 ms β 6.741 ms β 3.001 ms β 3.057 ms β 100 β 100 β ββ get_samples_0_4 11.56 ms β 15.67 ms β 11.92 ms β 12.02 ms β 100 β 100 β β°β subscription 83.46 ms β 87.88 ms β 84.38 ms β 84.89 ms β 10 β 10 ββ sampler β β β β β β β°β get_metrics_0 44.12 ms β 47.54 ms β 45.25 ms β 45.51 ms β 10 β 10 β°β smc β β β β β ββ full_init 670.2 ms β 778.6 ms β 730.1 ms β 726.3 ms β 3 β 3 β°β read_all_keys 649.9 ms β 691.5 ms β 663.5 ms β 668.3 ms β 3 β 3 cargo bb 8.62s user 3.02s system 134% cpu 8.637 total Co-authored-by: Codex <codex@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a port of optimizations from #59.
Summary
divanbenchmarksSocInfolookup withOnceLockSampler::get_metrics()Changes
Add
divanbenchmarksBenchmarks for the expensive SMC and IOReport paths can be run through the new
cargo bbalias:Use a single
get_soc_info()entry pointThis branch converges on one clear API for SoC info:
get_soc_info().Instead of older entry points and patterns such as:
SocInfo::new()sampler.get_soc_info()get_soc_info()is cached withOnceLock, so the result is reused no matter which startup path asks for it.Reduce SMC discovery reads during startup
The main startup cost here was the number of SMC calls needed to discover temperature sensors.
This branch reduces that cost by:
read_all_keys()only to enumerate keysTp*,Te*,Ts*,Tg*) before attempting float readsread_float_val()path for validation and readsPSTRThat removes unnecessary SMC roundtrips during startup and is the main reason the SMC path got faster.
Filter IOReport subscriptions down to the channels we actually use
IOReport subscription now uses a predicate-based channel filter instead of broad group-level subscriptions.
On my system, that reduced the subscription set from roughly
~300channels down to20.The subscription filter matches what
Sampler::get_metrics()actually reads:Energy Model: CPU / GPU / ANE / DRAM / GPU SRAM energy channelsCPU Stats:CPU Core Performance StatesGPU Stats:GPU Performance StatesAdd more detailed IOReport output to
macmon debugmacmon debugnow makes IOReport inspection more useful while validating subscriptions:ioreport_channels_filterevents,B,KiB,MiB,ns,us,ms,s, empty/count)Benchmarks
Baseline numbers come from commit
a46b8f3("Add divan benchmarks for SMC and IOReport").Optimized numbers come from commit
4885a52("Filter IOReport subscription channels"), after the full optimization stack in this branch was applied.ioreport / subscription2.65xfasterioreport / get_samples_0_11.19xfasterioreport / get_samples_0_41.20xfastersmc / full_init3.44xfastersmc / read_all_keys2.12xfaster