Add backend-agnostic benchmark core (benchmark refactor, Part 1/5) by AntoineRichard · Pull Request #6197 · isaac-sim/IsaacLab

AntoineRichard · 2026-06-16T07:11:37Z

Description

Part 1 of 5 of a series that splits the (oversized) benchmark refactor into reviewable, independently-mergeable PRs:

Part 1/5 — this PR: backend-agnostic benchmark core under isaaclab.test.benchmark.
Part 2/5 (Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5) #6198): unified runtime.py + startup.py entry scripts (depends on this PR).
Part 3/5 (Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5) #6199): unified training.py dispatcher + rsl_rl/rl_games/skrl/sb3 adapters (depends on Parts 1–2).
Part 4/5 (Add play (inference) benchmark with PlayBundle (benchmark refactor, Part 4/5) #6201): play/inference benchmark.
Part 5/5: removal of the legacy benchmark_* / run_*.sh / utils.py scripts.

Parts 1–4 are purely additive: they add the new suite alongside the existing scripts, which keep working unchanged. The legacy scripts are removed only in Part 5/5, so downstream consumers (OmniPerf ingestion, job runners) can migrate at their own pace.

This PR adds the reusable core the entry scripts build on:

New submodules under isaaclab.test.benchmark:
- capture — versions / hardware / resources / run-id capture from the benchmark recorders.
- metrics — TensorBoard log parsing, convergence detection, EMA, mean/std/peak, success-rate tracking.
- builders — assemble the schema-v1 RuntimeBundle / TrainingBundle / StartupBundle.
- stepping — backend-agnostic random-action stepping loop.
- profiling — cProfile stats parsing (own/cumulative time, call counts).
- backend_descriptor — per-RL-library TensorBoard tag descriptors.
A new schema output backend that serializes a benchmark bundle through the existing BaseIsaacLabBenchmark metrics-backend system, plus multi-backend support (e.g. --benchmark_backend schema,omniperf) via a new attach_bundle hook. Single-backend behavior and filenames are unchanged, so existing benchmarks are unaffected.

Note: the shared GPU/memory recorders gain additive peak rows (GPU [i] Memory Used peak, System Memory RSS/VMS/USS peak); every pre-existing KPI row is unchanged. This is the only change visible in the legacy scripts' OmniPerf/JSON output.

Builds on the v1.0 benchmark schema merged in #5840. No entry scripts or docs change here — those land with Parts 2–3.

Fixes # (n/a)

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation (user-facing benchmark docs land with the entry scripts in Parts 2–3)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
I have added my name to the CONTRIBUTORS.md or my name already exists there

Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

greptile-apps · 2026-06-16T12:38:40Z

Greptile Summary

This PR adds a backend-agnostic benchmark core under isaaclab.test.benchmark, introducing six new submodules (capture, metrics, builders, stepping, profiling, backend_descriptor) and a new schema output backend that serializes typed benchmark bundles. It also extends BaseIsaacLabBenchmark to accept multiple comma-separated backends in one run, suffixing each output file with the backend key to avoid collisions.

Multi-backend support — backend_type now accepts a string, list, or comma-separated value; single-backend behavior and filenames are preserved, so existing benchmarks are unaffected.
New SchemaBundleFile backend — serializes a typed RuntimeBundle/TrainingBundle/StartupBundle (attached via attach_bundle) while intentionally ignoring flat measurement phases consumed by other backends.
Pure-stdlib helper modules — capture, builders, metrics, stepping, and profiling import nothing from Isaac Sim or any RL library, enabling the full test suite to run without a simulator.

Confidence Score: 4/5

The new modules are well-isolated and the multi-backend extension is backwards-compatible. One logic issue in SuccessRateTracker warrants attention before this is wired into Part 2/3 training loops.

The SuccessRateTracker.at_iteration_boundary property returns True before any record_step call because 0 % num_steps_per_env == 0. The class is new in this PR and is explicitly intended for use in upcoming RL training loops (Parts 2–3); if a caller checks the boundary before recording the first step, it will invoke end_iteration() on an empty tracker, silently shortening the history and skewing convergence detection. The remaining modules are clean utility code with good test coverage and no breaking changes to existing benchmarks.

metrics.py — the SuccessRateTracker.at_iteration_boundary property needs a _step_count > 0 guard before it is used in training-loop wrappers.

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/test/benchmark/metrics.py	New module with TensorBoard parsing, EMA, convergence detection, and SuccessRateTracker. `at_iteration_boundary` returns `True` at `_step_count=0` before any step is recorded, violating its own documented contract.
source/isaaclab/isaaclab/test/benchmark/benchmark_core.py	Extended to support multi-backend via comma-separated string or list; single-backend filenames unchanged. All existing `finalize` implementations accept `**kwargs`, so the new `bundle=self._bundle` kwarg is safely ignored by non-schema backends.
source/isaaclab/isaaclab/test/benchmark/backends.py	New `SchemaBundleFile` backend added; `add_metrics` is intentionally a no-op, `finalize` serializes the attached typed bundle. All finalize signatures consistently accept `**kwargs`.
source/isaaclab/isaaclab/test/benchmark/builders.py	Pure assembly functions for RuntimeBundle/TrainingBundle/StartupBundle; minor inconsistency where zero-time iterations are silently excluded from `iterations_per_s` but counted in `iterations_completed`.
source/isaaclab/isaaclab/test/benchmark/capture.py	Clean capture helpers for versions/hardware/resources from recorder data. Gracefully defaults when recorders are absent. GPU device keys sorted numerically, tested explicitly.
source/isaaclab/isaaclab/test/benchmark/profiling.py	cProfile parsing using internal `stats.stats` dict; risk is acknowledged in a comment with an explicit fallback plan. Whitelist mode correctly emits zero-entry placeholders for unmatched patterns.
source/isaaclab/isaaclab/test/benchmark/stepping.py	Lightweight random-action stepping helper; torch/numpy are lazily imported. Single-agent and multi-agent paths are both tested.
source/isaaclab/isaaclab/test/benchmark/backend_descriptor.py	Pure-data frozen dataclass with per-RL-backend TensorBoard tag/path descriptors. No imports from RL libraries.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
    A -->|"_finalize_impl()"| C{multi-backend?}
    C -->|"single"| D["metrics.finalize(path, prefix)"]
    C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]

    D --> F1["JSONFileMetrics"]
    D --> F2["OmniPerfKPIFile"]
    D --> F3["SummaryMetrics"]
    D --> F4["SchemaBundleFile"]
    E --> F1
    E --> F2
    E --> F3
    E --> F4

    F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]

    subgraph "New schema pipeline"
        H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
        J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
        K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
        I --> B
        L --> B
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
    A -->|"_finalize_impl()"| C{multi-backend?}
    C -->|"single"| D["metrics.finalize(path, prefix)"]
    C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]

    D --> F1["JSONFileMetrics"]
    D --> F2["OmniPerfKPIFile"]
    D --> F3["SummaryMetrics"]
    D --> F4["SchemaBundleFile"]
    E --> F1
    E --> F2
    E --> F3
    E --> F4

    F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]

    subgraph "New schema pipeline"
        H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
        J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
        K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
        I --> B
        L --> B
    end

Comments Outside Diff (1)

source/isaaclab/isaaclab/test/benchmark/metrics.py, line 1056-1067 (link)

at_iteration_boundary is True before any steps are recorded

_step_count is initialised to 0 and 0 % num_steps_per_env == 0 is True for every positive integer. Any caller that reads at_iteration_boundary before the first record_step call will see True and may invoke end_iteration() on an empty tracker (returning None and silently writing nothing to history). If such a caller later checks converged or tail_mean, it will operate over a history that is one iteration short, skewing convergence detection.

The safe fix is to guard with _step_count > 0.

_{Reviews (1): Last reviewed commit: "Add backend-agnostic benchmark core to i..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T12:38:46Z

+    iter_times = list(iteration_times_s)
+    iter_per_s = [1.0 / t for t in iter_times if t > 0]


Zero-time iterations silently excluded from iterations_per_s but counted in iterations_completed

iter_per_s is computed only for iterations where t > 0, but iterations_completed always equals len(iter_times). If any iteration reports a wall time of exactly 0 (e.g., a mocked timer in tests or a very fast GPU step rounded down), iterations_completed and iterations_per_s.mean become inconsistent without any warning. This discrepancy is not documented in the docstring.

Suggested change

iter_times = list(iteration_times_s)

iter_per_s = [1.0 / t for t in iter_times if t > 0]

iter_times = list(iteration_times_s)

_zero_count = sum(1 for t in iter_times if t <= 0)

if _zero_count:

import logging as _logging # noqa: PLC0415

_logging.getLogger(__name__).warning(

"%d iteration(s) had non-positive wall time and are excluded from iterations_per_s.",

_zero_count,

)

iter_per_s = [1.0 / t for t in iter_times if t > 0]

Intentional, and not a real discrepancy. The t > 0 guard is only to avoid a divide-by-zero — real perf_counter_ns iteration times are always positive, so in practice no iteration is ever excluded and iterations_completed == len(iter_per_s).

iterations_completed (the number of iterations that ran) and iterations_per_s (a rate aggregated over timed iterations) intentionally measure different quantities; dropping a hypothetical zero-wall-time sample from a rate is the correct behavior, not a silent inconsistency. A zero wall-time would only arise under a mocked timer, where the proposed warning would add noise to every such test. Leaving as-is.

Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).

Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 16, 2026

AntoineRichard force-pushed the antoiner/benchmark-core branch from b752ce0 to 995ccd1 Compare June 16, 2026 12:21

AntoineRichard marked this pull request as ready for review June 16, 2026 12:33

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

AntoineRichard force-pushed the antoiner/benchmark-core branch from 995ccd1 to 05f7c96 Compare June 17, 2026 07:43

AntoineRichard mentioned this pull request Jun 17, 2026

Remove legacy benchmark scripts (benchmark refactor, Part 5/5) #6206

Open

7 tasks

AntoineRichard changed the title ~~Add backend-agnostic benchmark core (benchmark refactor, Part 1/4)~~ Add backend-agnostic benchmark core (benchmark refactor, Part 1/5) Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-core

AntoineRichard commented Jun 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

AntoineRichard Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		iter_times = list(iteration_times_s)
		iter_per_s = [1.0 / t for t in iter_times if t > 0]

-    iter_times = list(iteration_times_s)
-    iter_per_s = [1.0 / t for t in iter_times if t > 0]
+    iter_times = list(iteration_times_s)
+    _zero_count = sum(1 for t in iter_times if t <= 0)
+    if _zero_count:
+        import logging as _logging  # noqa: PLC0415
+        _logging.getLogger(__name__).warning(
+            "%d iteration(s) had non-positive wall time and are excluded from iterations_per_s.",
+            _zero_count,
+        )
+    iter_per_s = [1.0 / t for t in iter_times if t > 0]

Conversation

AntoineRichard commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

AntoineRichard Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AntoineRichard commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading