Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5) by AntoineRichard · Pull Request #6199 · isaac-sim/IsaacLab

AntoineRichard · 2026-06-16T07:45:06Z

Description

Part 3 of 5 of the benchmark refactor series — the unified training dispatcher + RL adapters.

Stacked on Parts 1–2 (#6197, #6198). The diff against develop below also includes Parts 1–2 until they merge. For the incremental Part 3 changes only, view:
AntoineRichard/IsaacLab@antoiner/benchmark-runtime-startup...antoiner/benchmark-training

Series: Part 1/5 core (#6197) → Part 2/5 runtime + startup (#6198) → Part 3/5 training (this PR) → Part 4/5 play (#6201) → Part 5/5 cleanup.

This PR is purely additive — it adds training.py and the per-backend adapters alongside the existing benchmark_rsl_rl.py / benchmark_rlgames.py / run_training_benchmarks.sh, which keep working unchanged. Removal of the legacy scripts and utils.py is deferred to Part 5/5.

Adds:

scripts/benchmarks/training.py — dispatcher selecting the RL library with --rl_library {rsl_rl, rl_games, skrl, sb3} (mirrors scripts/reinforcement_learning/train.py).
Per-backend adapters (rsl_rl/bench_rsl_rl.py, rl_games/bench_rl_games.py, skrl/bench_skrl.py, sb3/bench_sb3.py) that run real training under BenchmarkMonitor and emit a TrainingBundle via the shared core.
Smoke tests for all four backends.

Also repoints the shared early_stop.py at the core SuccessRateTracker. This is a behavior-preserving change: the public wrappers/observers/CLI helpers are unchanged, and the legacy benchmark_rsl_rl.py / benchmark_rlgames.py (which import only those preserved symbols) keep working against it.

Docs: updates the RL-training sections of the benchmarking / performance / warp-environments / visualization pages. (The 3.0 migration-guide "Benchmark Scripts" section, which documents the legacy-script removal, lands with Part 5/5.)

Validated on develop (Newton/MJWarp): all four training smokes pass (rsl_rl / skrl / sb3 at 16 envs, rl_games at 512).

Fixes # (n/a)

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2026-06-16T12:41:00Z

Greptile Summary

Part 3 of the benchmark refactor series adds a unified training.py dispatcher and four per-library RL adapters (rsl_rl, rl_games, skrl, sb3), each running real training under BenchmarkMonitor and emitting a typed TrainingBundle JSON via the new schema backend. It also refactors early_stop.py to reuse SuccessRateTracker from the shared core and removes the superseded monolithic benchmark scripts.

New dispatcher + adapters: training.py selects the RL library via --rl_library and delegates to <backend>/bench_<backend>.py; each adapter captures per-iteration timing/reward/episode-length independently of TensorBoard where possible.
Core library changes: BaseIsaacLabBenchmark now accepts a list of backend types, gains attach_bundle(), and dispatches to each backend's finalize() with suffix-disambiguated filenames.
Test coverage: Four subprocess smoke tests validate end-to-end bundle emission; _find_bundle is copy-pasted identically into all four smoke test files and could be consolidated in conftest.py.

Confidence Score: 4/5

The dispatcher and all four RL adapters work correctly end-to-end; the only issues are localized quality concerns in the skrl adapter and duplicated test helpers.

The skrl adapter has two concerns that would silently drop metrics if skrl internals change, but neither causes a crash or data corruption. All other adapters are clean.

scripts/benchmarks/skrl/bench_skrl.py

Important Files Changed

Filename	Overview
scripts/benchmarks/training.py	New unified dispatcher selecting per-library adapter via --rl_library.
scripts/benchmarks/rsl_rl/bench_rsl_rl.py	New RSL-RL adapter; redundant if/else branches and private _finalize_impl() call.
scripts/benchmarks/rl_games/bench_rl_games.py	New RL-Games adapter; flushes TensorBoard writer before parsing, handles missing log data gracefully.
scripts/benchmarks/skrl/bench_skrl.py	New SKRL adapter; private runner._trainer access and heuristic episode-length key may silently produce zero series.
scripts/benchmarks/sb3/bench_sb3.py	New SB3 adapter; BenchmarkCallback captures per-rollout timing/reward; NaN filtering for incomplete rollouts is correct.
scripts/benchmarks/_common.py	New shared helpers: backend normalization, multi-backend splitting, preset extraction, safe importlib loading.
scripts/benchmarks/early_stop.py	Re-exports SuccessRateTracker from core; adds defensive hasattr guard for after_clear_stats.
source/isaaclab/isaaclab/test/benchmark/benchmark_core.py	Adds multi-backend support, attach_bundle(), and per-backend suffixed output filenames.
source/isaaclab/isaaclab/test/benchmark/backends.py	Adds SchemaBundleFile backend serializing typed bundles; no-ops when no bundle attached.
source/isaaclab/isaaclab/test/benchmark/builders.py	New pure assembly module converting raw series into schema dataclasses.
scripts/benchmarks/test/test_training_rsl_rl_smoke.py	Smoke test for rsl_rl; _find_bundle duplicated across all four smoke test files.
scripts/benchmarks/test/conftest.py	New require_isaacsim fixture that skips smoke tests when Isaac Sim is unavailable.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["training.py --rl_library lib"] --> DISP["dispatch_library_entrypoint()"]
    DISP --> RSL["bench_rsl_rl.py run()"]
    DISP --> RLG["bench_rl_games.py run()"]
    DISP --> SKRL["bench_skrl.py run()"]
    DISP --> SB3["bench_sb3.py run()"]
    RSL & RLG & SKRL & SB3 --> LAUNCH["launch_simulation()"]
    LAUNCH --> TRAIN["Framework training loop"]
    TRAIN --> BUILD["builders.build_training_bundle()"]
    BUILD --> ATTACH["benchmark.attach_bundle()"]
    ATTACH --> FI["benchmark._finalize_impl()"]
    FI --> SCHEMA["SchemaBundleFile TrainingBundle .json"]
    FI --> OTHER["omniperf / json / osmo KPI .json"]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    CLI["training.py --rl_library lib"] --> DISP["dispatch_library_entrypoint()"]
    DISP --> RSL["bench_rsl_rl.py run()"]
    DISP --> RLG["bench_rl_games.py run()"]
    DISP --> SKRL["bench_skrl.py run()"]
    DISP --> SB3["bench_sb3.py run()"]
    RSL & RLG & SKRL & SB3 --> LAUNCH["launch_simulation()"]
    LAUNCH --> TRAIN["Framework training loop"]
    TRAIN --> BUILD["builders.build_training_bundle()"]
    BUILD --> ATTACH["benchmark.attach_bundle()"]
    ATTACH --> FI["benchmark._finalize_impl()"]
    FI --> SCHEMA["SchemaBundleFile TrainingBundle .json"]
    FI --> OTHER["omniperf / json / osmo KPI .json"]

Comments Outside Diff (3)

scripts/benchmarks/rsl_rl/bench_rsl_rl.py, line 781-791 (link)

Redundant if/else with identical bodies

Both branches produce exactly the same OnPolicyRunner(...) constructor call; the only observable difference is the warning print. The condition could be restructured so that the runner creation happens once unconditionally, with only the warning gated behind the else.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
scripts/benchmarks/skrl/bench_skrl.py, line 799-800 (link)

Private attribute access on skrl Runner

runner._trainer is a private attribute of skrl's Runner class. If skrl renames this field, all per-iteration timing, reward, and episode-length series will be empty in the emitted bundle with no error surfaced. Consider exposing the trainer via a property on _BenchmarkRunner instead.
scripts/benchmarks/skrl/bench_skrl.py, line 550-553 (link)

Episode-length key heuristic may silently produce all-zero series

The search requires both "episode" and "timestep" in the key name. If skrl uses "Episode / Length" instead, ep_len_key is always None and iter_ep_lengths will be all zeros without any warning.

_{Reviews (1): Last reviewed commit: "Add unified training benchmark dispatche..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T12:41:07Z

+ROOT = Path(__file__).resolve().parents[3]
+
+_TASK = "Isaac-Cartpole-Direct"
+
+# Top-level keys that identify a schema TrainingBundle (runtime bundle plus ``learning``).
+_TRAINING_BUNDLE_KEYS = {"run", "versions", "hardware", "runtime", "resources", "learning"}
+
+
+def _find_bundle(out_dir: Path, expected_keys: set[str]) -> dict:
+    """Return the parsed JSON whose top-level keys cover ``expected_keys``.
+
+    The schema backend names its file from a timestamped prefix, so the smoke
+    tests glob the output directory rather than hardcode the filename.


_find_bundle duplicated verbatim across all four smoke-test files

The helper is copy-pasted identically into all four smoke test files. Moving it to conftest.py would eliminate the duplication.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-16T12:41:08Z

+
+        benchmark.attach_bundle(bundle)
+
+        benchmark._finalize_impl()


All adapters call _finalize_impl() as external callers

All four adapters reach into BaseIsaacLabBenchmark's private API. Consider promoting _finalize_impl to a public method or documenting the external-call contract explicitly.

Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).

Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Jun 16, 2026

AntoineRichard mentioned this pull request Jun 16, 2026

Add play (inference) benchmark with PlayBundle (benchmark refactor, Part 4/5) #6201

Open

7 tasks

AntoineRichard force-pushed the antoiner/benchmark-training branch from 0e66178 to cc2b7a0 Compare June 16, 2026 12:22

AntoineRichard marked this pull request as ready for review June 16, 2026 12:33

AntoineRichard requested review from Mayankm96, jtigue-bdai, kellyguo11 and ooctipus as code owners June 16, 2026 12:33

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

AntoineRichard force-pushed the antoiner/benchmark-training branch from cc2b7a0 to 45e2299 Compare June 17, 2026 07:43

AntoineRichard added 2 commits June 17, 2026 10:46

AntoineRichard force-pushed the antoiner/benchmark-training branch from 45e2299 to a2bc539 Compare June 17, 2026 08:54

This was referenced Jun 17, 2026

Remove legacy benchmark scripts (benchmark refactor, Part 5/5) #6206

Open

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5) #6197

Open

AntoineRichard changed the title ~~Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/4)~~ Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5) Jun 17, 2026

AntoineRichard mentioned this pull request Jun 17, 2026

Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5) #6198

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5)#6199

Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5)#6199
AntoineRichard wants to merge 3 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-training

AntoineRichard commented Jun 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AntoineRichard commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AntoineRichard commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading