Add mixed-batch inference benchmark to perf_check by misko · Pull Request #2050 · facebookresearch/fairchem

misko · 2026-06-24T21:01:49Z

Summary

Adds a MixedPerfCheckRunner that extends perf_check to benchmark inference across varying batch sizes (4, 8, 16, 32, 64, 128, 256) over a diverse pool of systems spanning all 5 UMA tasks (oc20, omat, omol, odac, omc) at multiple size buckets.

Ground truth, computed once, cached on disk. Per-system fp64 predictions are generated using the existing BASELINE_SETTINGS and run_inference, then stored in mixed_baseline_cache.json keyed by checkpoint, pool signature, device, seed, and batch sizes. Subsequent runs reuse the cache.
Pre-materialized schedule, no adjacent duplicates. Batches are constructed up front via a deterministic round-robin through batch_sizes. The schedule guarantees no two consecutive batches share the same (size, sorted_indices) multiset, so warmup and benchmark walk the same prepared sequence.
Per-system accuracy comparison. Each batched forward pass is split per-system (using natoms offsets) and compared to the cached fp64 baseline — accuracy is decoupled from batch composition.
Diverse pool via fake_dataset.generate_structures. Reuses the same generator the training benchmark uses, so size distributions match production specs for each UMA task.
Two-table report. Throughput per batch size (steps, total time, samples/s, atoms/s) plus per-system accuracy (energy abs error, force MAE, force max).

Usage

# Default run
fairchem -c configs/uma/benchmark/perf_check/mixed_benchmark.yaml

# Compare against an optimized execution backend
fairchem -c configs/uma/benchmark/perf_check/mixed_benchmark.yaml \
  runner.inference_settings.execution_mode=umas_fast_gpu \
  runner.inference_settings.tf32=True

# Smaller pool for smoke runs
fairchem -c configs/uma/benchmark/perf_check/mixed_benchmark.yaml \
  runner.device=cpu 'runner.pool_size_buckets=[20,80]' runner.pool_n_per_bucket=1

Files

src/fairchem/core/components/benchmark/perf_check.py — adds MixedPerfCheckRunner, build_batch_schedule, run_mixed_inference, MixedInferenceResult, BatchTiming, format_mixed_report_table, _mixed_baseline_cache_key. Reuses existing _baseline_cache_key/_load_baseline_cache/_save_baseline_cache/run_inference/compare_results from the same file.
src/fairchem/core/components/benchmark/systems.py — adds SystemPool dataclass and get_diverse_benchmark_pool.
configs/uma/benchmark/perf_check/mixed_benchmark.yaml — Hydra config.
tests/core/components/benchmark/test_mixed_perf_check.py — 13 tests, all CPU, no real model weights required.

Test plan

pytest tests/core/components/benchmark/test_mixed_perf_check.py — 13/13 pass in ~15s
- SystemPool signature stability and uniqueness invariant
- get_diverse_benchmark_pool covers all requested UMA tasks and size buckets
- Schedule determinism, round-robin order, index range, no-adjacent-duplicate invariant, empty-input handling
- Cache key invalidation on pool change and on batch_sizes change
- End-to-end runner with mocked predict unit: report written, return value matches disk, all batch sizes populated
- Baseline cache reuse on second run (zero additional baseline calls)
- Baseline cache invalidation when cache key is tampered
pytest tests/core/components/benchmark/ — full benchmark suite (22 passed, 2 skipped GPU-only smoke tests)
pre-commit run --files ... on every modified file
End-to-end run with a real UMA checkpoint on GPU (intended by reviewer/user; mocked tests confirm the harness wiring)

Extends perf_check with MixedPerfCheckRunner that benchmarks varying batch sizes (4, 8, 16, 32, 64, 128, 256) across a diverse pool of systems spanning all 5 UMA tasks (oc20, omat, omol, odac, omc). Ground truth is computed once per pool entry at fp64 using the existing BASELINE_SETTINGS and cached on disk, mirroring the singleton runner. Batches are pre-materialized via a deterministic schedule that round-robins through batch sizes and forbids adjacent duplicate batches, so warmup and benchmark phases draw from the same prepared sequence. Per-system predictions are split out of each batched forward pass and compared against the cached baseline. The pool reuses benchmark.fake_dataset.generate_structures so size distributions match the training-benchmark datasets. Usage: fairchem -c configs/uma/benchmark/perf_check/mixed_benchmark.yaml

lbluque

lgtm

meta-cla Bot added the cla signed label Jun 24, 2026

misko added minor Minor version release enhancement New feature or request labels Jun 24, 2026

misko requested a review from mshuaibii June 24, 2026 21:07

Merge remote-tracking branch 'origin/main' into mixed-perf-check

3e30446

lbluque self-requested a review June 25, 2026 00:29

lbluque approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mixed-batch inference benchmark to perf_check#2050

Add mixed-batch inference benchmark to perf_check#2050
misko wants to merge 2 commits into
mainfrom
mixed-perf-check

misko commented Jun 24, 2026

Uh oh!

lbluque left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

misko commented Jun 24, 2026

Summary

Usage

Files

Test plan

Uh oh!

lbluque left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants