Optimizations by homm · Pull Request #62 · vladkens/macmon

homm · 2026-05-02T20:20:13Z

This PR is a port of optimizations from #59.

Summary

add divan benchmarks
cache SocInfo lookup with OnceLock
reduce SMC discovery reads during startup
filter IOReport subscriptions down to the channels actually consumed by Sampler::get_metrics()

Changes

Add `divan` benchmarks

Benchmarks for the expensive SMC and IOReport paths can be run through the new cargo bb alias:

# all benchmarks
cargo bb
# list available benchmarks
cargo bb --list
# all IOReport benchmarks
cargo bb ioreport
# single benchmark
cargo bb subscription

Use a single `get_soc_info()` entry point

This branch converges on one clear API for SoC info: get_soc_info().

Instead of older entry points and patterns such as:

SocInfo::new()
sampler.get_soc_info()
direct repeated startup lookups in app/CLI paths

get_soc_info() is cached with OnceLock, so the result is reused no matter which startup path asks for it.

Reduce SMC discovery reads during startup

The main startup cost here was the number of SMC calls needed to discover temperature sensors.

This branch reduces that cost by:

using read_all_keys() only to enumerate keys
filtering candidate keys by prefix first (Tp*, Te*, Ts*, Tg*) before attempting float reads
using a dedicated read_float_val() path for validation and reads
reusing the same float-read path for temperatures and PSTR

That removes unnecessary SMC roundtrips during startup and is the main reason the SMC path got faster.

Filter IOReport subscriptions down to the channels we actually use

IOReport subscription now uses a predicate-based channel filter instead of broad group-level subscriptions.

On my system, that reduced the subscription set from roughly ~300 channels down to 20.

The subscription filter matches what Sampler::get_metrics() actually reads:

Energy Model: CPU / GPU / ANE / DRAM / GPU SRAM energy channels
CPU Stats: CPU Core Performance States
GPU Stats: GPU Performance States

Add more detailed IOReport output to `macmon debug`

macmon debug now makes IOReport inspection more useful while validating subscriptions:

marks whether a channel is included by ioreport_channels_filter
prints simple scalar units directly (events, B, KiB, MiB, ns, us, ms, s, empty/count)
uses a wider debug subscription than the production filter, so it is easier to compare what is subscribed vs what is available nearby

Benchmarks

Baseline numbers come from commit a46b8f3 ("Add divan benchmarks for SMC and IOReport").
Optimized numbers come from commit 4885a52 ("Filter IOReport subscription channels"), after the full optimization stack in this branch was applied.

Benchmark	Baseline	Optimized	Improvement
`ioreport / subscription`	223 ms	84 ms	`2.65x` faster
`ioreport / get_samples_0_1`	3.6 ms	3 ms	`1.19x` faster
`ioreport / get_samples_0_4`	14.3 ms	11.9 ms	`1.20x` faster
`smc / full_init`	2.5 s	730 ms	`3.44x` faster
`smc / read_all_keys`	1.4 s	663 ms	`2.12x` faster

homm · 2026-05-02T20:21:53Z

 pub mod metrics;
 pub mod sources;

+#[cfg(feature = "bench")]


This bridge is required since benchmarks are separate target and can't access private functions directly.

$ time cargo bb Timer precision: 41 ns bench fastest │ slowest │ median │ mean │ samples │ iters ├─ ioreport │ │ │ │ │ │ ├─ get_samples_0_1 3.454 ms │ 7.501 ms │ 3.578 ms │ 3.646 ms │ 100 │ 100 │ ├─ get_samples_0_4 13.95 ms │ 17.8 ms │ 14.28 ms │ 14.37 ms │ 100 │ 100 │ ╰─ subscription 220.2 ms │ 225.6 ms │ 223.4 ms │ 222.9 ms │ 10 │ 10 ├─ sampler │ │ │ │ │ │ ╰─ get_metrics_0 46.93 ms │ 66.28 ms │ 49.24 ms │ 50.7 ms │ 10 │ 10 ╰─ smc │ │ │ │ │ ├─ full_init 2.47 s │ 2.539 s │ 2.514 s │ 2.508 s │ 3 │ 3 ╰─ read_all_keys 1.385 s │ 2.097 s │ 1.405 s │ 1.629 s │ 3 │ 3 cargo bb 1.59s user 5.11s system 37% cpu 17.905 total Co-authored-by: Codex <codex@openai.com>

Co-authored-by: Codex <codex@openai.com> # Conflicts: # src/metrics.rs

Co-authored-by: Codex <codex@openai.com>

$ time cargo bb Timer precision: 41 ns bench fastest │ slowest │ median │ mean │ samples │ iters ├─ ioreport │ │ │ │ │ │ ├─ get_samples_0_1 2.862 ms │ 6.741 ms │ 3.001 ms │ 3.057 ms │ 100 │ 100 │ ├─ get_samples_0_4 11.56 ms │ 15.67 ms │ 11.92 ms │ 12.02 ms │ 100 │ 100 │ ╰─ subscription 83.46 ms │ 87.88 ms │ 84.38 ms │ 84.89 ms │ 10 │ 10 ├─ sampler │ │ │ │ │ │ ╰─ get_metrics_0 44.12 ms │ 47.54 ms │ 45.25 ms │ 45.51 ms │ 10 │ 10 ╰─ smc │ │ │ │ │ ├─ full_init 670.2 ms │ 778.6 ms │ 730.1 ms │ 726.3 ms │ 3 │ 3 ╰─ read_all_keys 649.9 ms │ 691.5 ms │ 663.5 ms │ 668.3 ms │ 3 │ 3 cargo bb 8.62s user 3.02s system 134% cpu 8.637 total Co-authored-by: Codex <codex@openai.com>

homm commented May 2, 2026

View reviewed changes

homm and others added 4 commits May 3, 2026 14:52

Cache SoC info lookup

d57f269

Co-authored-by: Codex <codex@openai.com> # Conflicts: # src/metrics.rs

Reduce SMC discovery reads

f151be6

Co-authored-by: Codex <codex@openai.com>

homm force-pushed the optimizations branch from 4885a52 to 23f358d Compare May 3, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations#62

Optimizations#62
homm wants to merge 4 commits into
vladkens:mainfrom
homm:optimizations

homm commented May 2, 2026 •

edited

Loading

Uh oh!

homm May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

homm commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Add divan benchmarks

Use a single get_soc_info() entry point

Reduce SMC discovery reads during startup

Filter IOReport subscriptions down to the channels we actually use

Add more detailed IOReport output to macmon debug

Benchmarks

Uh oh!

homm May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

homm commented May 2, 2026 •

edited

Loading

Add `divan` benchmarks

Use a single `get_soc_info()` entry point

Add more detailed IOReport output to `macmon debug`