Skip to content

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197

Open
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-core
Open

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-core

Conversation

@AntoineRichard

@AntoineRichard AntoineRichard commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Part 1 of 5 of a series that splits the (oversized) benchmark refactor into reviewable, independently-mergeable PRs:

Parts 1–4 are purely additive: they add the new suite alongside the existing scripts, which keep working unchanged. The legacy scripts are removed only in Part 5/5, so downstream consumers (OmniPerf ingestion, job runners) can migrate at their own pace.

This PR adds the reusable core the entry scripts build on:

  • New submodules under isaaclab.test.benchmark:
    • capture — versions / hardware / resources / run-id capture from the benchmark recorders.
    • metrics — TensorBoard log parsing, convergence detection, EMA, mean/std/peak, success-rate tracking.
    • builders — assemble the schema-v1 RuntimeBundle / TrainingBundle / StartupBundle.
    • stepping — backend-agnostic random-action stepping loop.
    • profilingcProfile stats parsing (own/cumulative time, call counts).
    • backend_descriptor — per-RL-library TensorBoard tag descriptors.
  • A new schema output backend that serializes a benchmark bundle through the existing BaseIsaacLabBenchmark metrics-backend system, plus multi-backend support (e.g. --benchmark_backend schema,omniperf) via a new attach_bundle hook. Single-backend behavior and filenames are unchanged, so existing benchmarks are unaffected.

Note: the shared GPU/memory recorders gain additive peak rows (GPU [i] Memory Used peak, System Memory RSS/VMS/USS peak); every pre-existing KPI row is unchanged. This is the only change visible in the legacy scripts' OmniPerf/JSON output.

Builds on the v1.0 benchmark schema merged in #5840. No entry scripts or docs change here — those land with Parts 2–3.

Fixes # (n/a)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation (user-facing benchmark docs land with the entry scripts in Parts 2–3)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 16, 2026
@AntoineRichard AntoineRichard force-pushed the antoiner/benchmark-core branch from b752ce0 to 995ccd1 Compare June 16, 2026 12:21
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 16, 2026
Add backend-agnostic runtime.py (random-action stepping, emits a
RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a
StartupBundle), wired to develop's launch API (launch_simulation and
add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra
without folding). Remove the legacy benchmark_non_rl.py and
benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and
run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve
at _common.get_backend_type.

Part 2 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Part 1 (isaac-sim#6197).
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 16, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
@AntoineRichard AntoineRichard marked this pull request as ready for review June 16, 2026 12:33
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a backend-agnostic benchmark core under isaaclab.test.benchmark, introducing six new submodules (capture, metrics, builders, stepping, profiling, backend_descriptor) and a new schema output backend that serializes typed benchmark bundles. It also extends BaseIsaacLabBenchmark to accept multiple comma-separated backends in one run, suffixing each output file with the backend key to avoid collisions.

  • Multi-backend supportbackend_type now accepts a string, list, or comma-separated value; single-backend behavior and filenames are preserved, so existing benchmarks are unaffected.
  • New SchemaBundleFile backend — serializes a typed RuntimeBundle/TrainingBundle/StartupBundle (attached via attach_bundle) while intentionally ignoring flat measurement phases consumed by other backends.
  • Pure-stdlib helper modulescapture, builders, metrics, stepping, and profiling import nothing from Isaac Sim or any RL library, enabling the full test suite to run without a simulator.

Confidence Score: 4/5

The new modules are well-isolated and the multi-backend extension is backwards-compatible. One logic issue in SuccessRateTracker warrants attention before this is wired into Part 2/3 training loops.

The SuccessRateTracker.at_iteration_boundary property returns True before any record_step call because 0 % num_steps_per_env == 0. The class is new in this PR and is explicitly intended for use in upcoming RL training loops (Parts 2–3); if a caller checks the boundary before recording the first step, it will invoke end_iteration() on an empty tracker, silently shortening the history and skewing convergence detection. The remaining modules are clean utility code with good test coverage and no breaking changes to existing benchmarks.

metrics.py — the SuccessRateTracker.at_iteration_boundary property needs a _step_count > 0 guard before it is used in training-loop wrappers.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/test/benchmark/metrics.py New module with TensorBoard parsing, EMA, convergence detection, and SuccessRateTracker. at_iteration_boundary returns True at _step_count=0 before any step is recorded, violating its own documented contract.
source/isaaclab/isaaclab/test/benchmark/benchmark_core.py Extended to support multi-backend via comma-separated string or list; single-backend filenames unchanged. All existing finalize implementations accept **kwargs, so the new bundle=self._bundle kwarg is safely ignored by non-schema backends.
source/isaaclab/isaaclab/test/benchmark/backends.py New SchemaBundleFile backend added; add_metrics is intentionally a no-op, finalize serializes the attached typed bundle. All finalize signatures consistently accept **kwargs.
source/isaaclab/isaaclab/test/benchmark/builders.py Pure assembly functions for RuntimeBundle/TrainingBundle/StartupBundle; minor inconsistency where zero-time iterations are silently excluded from iterations_per_s but counted in iterations_completed.
source/isaaclab/isaaclab/test/benchmark/capture.py Clean capture helpers for versions/hardware/resources from recorder data. Gracefully defaults when recorders are absent. GPU device keys sorted numerically, tested explicitly.
source/isaaclab/isaaclab/test/benchmark/profiling.py cProfile parsing using internal stats.stats dict; risk is acknowledged in a comment with an explicit fallback plan. Whitelist mode correctly emits zero-entry placeholders for unmatched patterns.
source/isaaclab/isaaclab/test/benchmark/stepping.py Lightweight random-action stepping helper; torch/numpy are lazily imported. Single-agent and multi-agent paths are both tested.
source/isaaclab/isaaclab/test/benchmark/backend_descriptor.py Pure-data frozen dataclass with per-RL-backend TensorBoard tag/path descriptors. No imports from RL libraries.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
    A -->|"_finalize_impl()"| C{multi-backend?}
    C -->|"single"| D["metrics.finalize(path, prefix)"]
    C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]

    D --> F1["JSONFileMetrics"]
    D --> F2["OmniPerfKPIFile"]
    D --> F3["SummaryMetrics"]
    D --> F4["SchemaBundleFile"]
    E --> F1
    E --> F2
    E --> F3
    E --> F4

    F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]

    subgraph "New schema pipeline"
        H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
        J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
        K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
        I --> B
        L --> B
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
    A -->|"_finalize_impl()"| C{multi-backend?}
    C -->|"single"| D["metrics.finalize(path, prefix)"]
    C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]

    D --> F1["JSONFileMetrics"]
    D --> F2["OmniPerfKPIFile"]
    D --> F3["SummaryMetrics"]
    D --> F4["SchemaBundleFile"]
    E --> F1
    E --> F2
    E --> F3
    E --> F4

    F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]

    subgraph "New schema pipeline"
        H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
        J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
        K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
        I --> B
        L --> B
    end
Loading

Comments Outside Diff (1)

  1. source/isaaclab/isaaclab/test/benchmark/metrics.py, line 1056-1067 (link)

    P1 at_iteration_boundary is True before any steps are recorded

    _step_count is initialised to 0 and 0 % num_steps_per_env == 0 is True for every positive integer. Any caller that reads at_iteration_boundary before the first record_step call will see True and may invoke end_iteration() on an empty tracker (returning None and silently writing nothing to history). If such a caller later checks converged or tail_mean, it will operate over a history that is one iteration short, skewing convergence detection.

    The safe fix is to guard with _step_count > 0.

Reviews (1): Last reviewed commit: "Add backend-agnostic benchmark core to i..." | Re-trigger Greptile

Comment on lines +143 to +144
iter_times = list(iteration_times_s)
iter_per_s = [1.0 / t for t in iter_times if t > 0]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Zero-time iterations silently excluded from iterations_per_s but counted in iterations_completed

iter_per_s is computed only for iterations where t > 0, but iterations_completed always equals len(iter_times). If any iteration reports a wall time of exactly 0 (e.g., a mocked timer in tests or a very fast GPU step rounded down), iterations_completed and iterations_per_s.mean become inconsistent without any warning. This discrepancy is not documented in the docstring.

Suggested change
iter_times = list(iteration_times_s)
iter_per_s = [1.0 / t for t in iter_times if t > 0]
iter_times = list(iteration_times_s)
_zero_count = sum(1 for t in iter_times if t <= 0)
if _zero_count:
import logging as _logging # noqa: PLC0415
_logging.getLogger(__name__).warning(
"%d iteration(s) had non-positive wall time and are excluded from iterations_per_s.",
_zero_count,
)
iter_per_s = [1.0 / t for t in iter_times if t > 0]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, and not a real discrepancy. The t > 0 guard is only to avoid a divide-by-zero — real perf_counter_ns iteration times are always positive, so in practice no iteration is ever excluded and iterations_completed == len(iter_per_s).

iterations_completed (the number of iterations that ran) and iterations_per_s (a rate aggregated over timed iterations) intentionally measure different quantities; dropping a hypothetical zero-wall-time sample from a rate is the correct behavior, not a silent inconsistency. A zero wall-time would only arise under a mocked timer, where the proposed warning would add noise to every such test. Leaving as-is.

Introduce the capture, metrics, builders, stepping, profiling, and
backend_descriptor submodules for assembling the schema-v1 benchmark
bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit
several backends in one run via a new attach_bundle hook. Unit tests
cover each submodule plus the schema backend and multi-backend finalize.

Part 1 of a series splitting the oversized benchmark refactor
(core -> runtime/startup -> training -> play).
@AntoineRichard AntoineRichard force-pushed the antoiner/benchmark-core branch from 995ccd1 to 05f7c96 Compare June 17, 2026 07:43
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add backend-agnostic runtime.py (random-action stepping, emits a
RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a
StartupBundle), wired to develop's launch API (launch_simulation and
add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra
without folding). Remove the legacy benchmark_non_rl.py and
benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and
run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve
at _common.get_backend_type.

Part 2 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Part 1 (isaac-sim#6197).
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add backend-agnostic runtime.py (random-action stepping, emits a
RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a
StartupBundle), wired to develop's launch API (launch_simulation and
add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra
without folding). Remove the legacy benchmark_non_rl.py and
benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and
run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve
at _common.get_backend_type.

Part 2 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Part 1 (isaac-sim#6197).
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
@AntoineRichard AntoineRichard changed the title Add backend-agnostic benchmark core (benchmark refactor, Part 1/4) Add backend-agnostic benchmark core (benchmark refactor, Part 1/5) Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant