Skip to content

Optimizations#62

Open
homm wants to merge 4 commits into
vladkens:mainfrom
homm:optimizations
Open

Optimizations#62
homm wants to merge 4 commits into
vladkens:mainfrom
homm:optimizations

Conversation

@homm
Copy link
Copy Markdown

@homm homm commented May 2, 2026

This PR is a port of optimizations from #59.

Summary

  • add divan benchmarks
  • cache SocInfo lookup with OnceLock
  • reduce SMC discovery reads during startup
  • filter IOReport subscriptions down to the channels actually consumed by Sampler::get_metrics()

Changes

Add divan benchmarks

Benchmarks for the expensive SMC and IOReport paths can be run through the new cargo bb alias:

# all benchmarks
cargo bb
# list available benchmarks
cargo bb --list
# all IOReport benchmarks
cargo bb ioreport
# single benchmark
cargo bb subscription

Use a single get_soc_info() entry point

This branch converges on one clear API for SoC info: get_soc_info().

Instead of older entry points and patterns such as:

  • SocInfo::new()
  • sampler.get_soc_info()
  • direct repeated startup lookups in app/CLI paths

get_soc_info() is cached with OnceLock, so the result is reused no matter which startup path asks for it.

Reduce SMC discovery reads during startup

The main startup cost here was the number of SMC calls needed to discover temperature sensors.

This branch reduces that cost by:

  • using read_all_keys() only to enumerate keys
  • filtering candidate keys by prefix first (Tp*, Te*, Ts*, Tg*) before attempting float reads
  • using a dedicated read_float_val() path for validation and reads
  • reusing the same float-read path for temperatures and PSTR

That removes unnecessary SMC roundtrips during startup and is the main reason the SMC path got faster.

Filter IOReport subscriptions down to the channels we actually use

IOReport subscription now uses a predicate-based channel filter instead of broad group-level subscriptions.

On my system, that reduced the subscription set from roughly ~300 channels down to 20.

The subscription filter matches what Sampler::get_metrics() actually reads:

  • Energy Model: CPU / GPU / ANE / DRAM / GPU SRAM energy channels
  • CPU Stats: CPU Core Performance States
  • GPU Stats: GPU Performance States

Add more detailed IOReport output to macmon debug

macmon debug now makes IOReport inspection more useful while validating subscriptions:

  • marks whether a channel is included by ioreport_channels_filter
  • prints simple scalar units directly (events, B, KiB, MiB, ns, us, ms, s, empty/count)
  • uses a wider debug subscription than the production filter, so it is easier to compare what is subscribed vs what is available nearby

Benchmarks

Baseline numbers come from commit a46b8f3 ("Add divan benchmarks for SMC and IOReport").
Optimized numbers come from commit 4885a52 ("Filter IOReport subscription channels"), after the full optimization stack in this branch was applied.

Benchmark Baseline Optimized Improvement
ioreport / subscription 223 ms 84 ms 2.65x faster
ioreport / get_samples_0_1 3.6 ms 3 ms 1.19x faster
ioreport / get_samples_0_4 14.3 ms 11.9 ms 1.20x faster
smc / full_init 2.5 s 730 ms 3.44x faster
smc / read_all_keys 1.4 s 663 ms 2.12x faster

Comment thread src/lib.rs
pub mod metrics;
pub mod sources;

#[cfg(feature = "bench")]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bridge is required since benchmarks are separate target and can't access private functions directly.

homm and others added 4 commits May 3, 2026 14:52
$ time cargo bb
Timer precision: 41 ns
bench                  fastest       β”‚ slowest       β”‚ median        β”‚ mean          β”‚ samples β”‚ iters
β”œβ”€ ioreport                          β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ get_samples_0_1  3.454 ms      β”‚ 7.501 ms      β”‚ 3.578 ms      β”‚ 3.646 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ get_samples_0_4  13.95 ms      β”‚ 17.8 ms       β”‚ 14.28 ms      β”‚ 14.37 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ subscription     220.2 ms      β”‚ 225.6 ms      β”‚ 223.4 ms      β”‚ 222.9 ms      β”‚ 10      β”‚ 10
β”œβ”€ sampler                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  ╰─ get_metrics_0    46.93 ms      β”‚ 66.28 ms      β”‚ 49.24 ms      β”‚ 50.7 ms       β”‚ 10      β”‚ 10
╰─ smc                               β”‚               β”‚               β”‚               β”‚         β”‚
   β”œβ”€ full_init        2.47 s        β”‚ 2.539 s       β”‚ 2.514 s       β”‚ 2.508 s       β”‚ 3       β”‚ 3
   ╰─ read_all_keys    1.385 s       β”‚ 2.097 s       β”‚ 1.405 s       β”‚ 1.629 s       β”‚ 3       β”‚ 3

cargo bb  1.59s user 5.11s system 37% cpu 17.905 total

Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>

# Conflicts:
#	src/metrics.rs
Co-authored-by: Codex <codex@openai.com>
$ time cargo bb
Timer precision: 41 ns
bench                  fastest       β”‚ slowest       β”‚ median        β”‚ mean          β”‚ samples β”‚ iters
β”œβ”€ ioreport                          β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ get_samples_0_1  2.862 ms      β”‚ 6.741 ms      β”‚ 3.001 ms      β”‚ 3.057 ms      β”‚ 100     β”‚ 100
β”‚  β”œβ”€ get_samples_0_4  11.56 ms      β”‚ 15.67 ms      β”‚ 11.92 ms      β”‚ 12.02 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ subscription     83.46 ms      β”‚ 87.88 ms      β”‚ 84.38 ms      β”‚ 84.89 ms      β”‚ 10      β”‚ 10
β”œβ”€ sampler                           β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  ╰─ get_metrics_0    44.12 ms      β”‚ 47.54 ms      β”‚ 45.25 ms      β”‚ 45.51 ms      β”‚ 10      β”‚ 10
╰─ smc                               β”‚               β”‚               β”‚               β”‚         β”‚
   β”œβ”€ full_init        670.2 ms      β”‚ 778.6 ms      β”‚ 730.1 ms      β”‚ 726.3 ms      β”‚ 3       β”‚ 3
   ╰─ read_all_keys    649.9 ms      β”‚ 691.5 ms      β”‚ 663.5 ms      β”‚ 668.3 ms      β”‚ 3       β”‚ 3

cargo bb  8.62s user 3.02s system 134% cpu 8.637 total

Co-authored-by: Codex <codex@openai.com>
@homm homm force-pushed the optimizations branch from 4885a52 to 23f358d Compare May 3, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant