Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197
Add backend-agnostic benchmark core (benchmark refactor, Part 1/5)#6197AntoineRichard wants to merge 1 commit into
Conversation
b752ce0 to
995ccd1
Compare
Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.
Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Greptile SummaryThis PR adds a backend-agnostic benchmark core under
Confidence Score: 4/5The new modules are well-isolated and the multi-backend extension is backwards-compatible. One logic issue in The
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
A -->|"_finalize_impl()"| C{multi-backend?}
C -->|"single"| D["metrics.finalize(path, prefix)"]
C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]
D --> F1["JSONFileMetrics"]
D --> F2["OmniPerfKPIFile"]
D --> F3["SummaryMetrics"]
D --> F4["SchemaBundleFile"]
E --> F1
E --> F2
E --> F3
E --> F4
F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]
subgraph "New schema pipeline"
H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
I --> B
L --> B
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["BaseIsaacLabBenchmark\n(benchmark_core.py)"] -->|"attach_bundle(bundle)"| B["_bundle"]
A -->|"_finalize_impl()"| C{multi-backend?}
C -->|"single"| D["metrics.finalize(path, prefix)"]
C -->|"multiple"| E["metrics.finalize(path, prefix_key)\nfor each backend"]
D --> F1["JSONFileMetrics"]
D --> F2["OmniPerfKPIFile"]
D --> F3["SummaryMetrics"]
D --> F4["SchemaBundleFile"]
E --> F1
E --> F2
E --> F3
E --> F4
F4 -->|"bundle kwarg"| G["serialize.write_bundle_file()"]
subgraph "New schema pipeline"
H["capture.py\n(versions/hardware/resources)"] --> I["builders.py\n(RuntimeBundle / TrainingBundle)"]
J["metrics.py\n(parse_tf_logs, ema, convergence)"] --> I
K["profiling.py\n(cProfile parsing)"] --> L["builders.build_startup_bundle()"]
I --> B
L --> B
end
|
| iter_times = list(iteration_times_s) | ||
| iter_per_s = [1.0 / t for t in iter_times if t > 0] |
There was a problem hiding this comment.
Zero-time iterations silently excluded from
iterations_per_s but counted in iterations_completed
iter_per_s is computed only for iterations where t > 0, but iterations_completed always equals len(iter_times). If any iteration reports a wall time of exactly 0 (e.g., a mocked timer in tests or a very fast GPU step rounded down), iterations_completed and iterations_per_s.mean become inconsistent without any warning. This discrepancy is not documented in the docstring.
| iter_times = list(iteration_times_s) | |
| iter_per_s = [1.0 / t for t in iter_times if t > 0] | |
| iter_times = list(iteration_times_s) | |
| _zero_count = sum(1 for t in iter_times if t <= 0) | |
| if _zero_count: | |
| import logging as _logging # noqa: PLC0415 | |
| _logging.getLogger(__name__).warning( | |
| "%d iteration(s) had non-positive wall time and are excluded from iterations_per_s.", | |
| _zero_count, | |
| ) | |
| iter_per_s = [1.0 / t for t in iter_times if t > 0] |
There was a problem hiding this comment.
Intentional, and not a real discrepancy. The t > 0 guard is only to avoid a divide-by-zero — real perf_counter_ns iteration times are always positive, so in practice no iteration is ever excluded and iterations_completed == len(iter_per_s).
iterations_completed (the number of iterations that ran) and iterations_per_s (a rate aggregated over timed iterations) intentionally measure different quantities; dropping a hypothetical zero-wall-time sample from a rate is the correct behavior, not a silent inconsistency. A zero wall-time would only arise under a mocked timer, where the proposed warning would add noise to every such test. Leaving as-is.
Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).
995ccd1 to
05f7c96
Compare
Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.
Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.
Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Description
Part 1 of 5 of a series that splits the (oversized) benchmark refactor into reviewable, independently-mergeable PRs:
isaaclab.test.benchmark.runtime.py+startup.pyentry scripts (depends on this PR).training.pydispatcher +rsl_rl/rl_games/skrl/sb3adapters (depends on Parts 1–2).benchmark_*/run_*.sh/utils.pyscripts.Parts 1–4 are purely additive: they add the new suite alongside the existing scripts, which keep working unchanged. The legacy scripts are removed only in Part 5/5, so downstream consumers (OmniPerf ingestion, job runners) can migrate at their own pace.
This PR adds the reusable core the entry scripts build on:
isaaclab.test.benchmark:capture— versions / hardware / resources / run-id capture from the benchmark recorders.metrics— TensorBoard log parsing, convergence detection, EMA, mean/std/peak, success-rate tracking.builders— assemble the schema-v1RuntimeBundle/TrainingBundle/StartupBundle.stepping— backend-agnostic random-action stepping loop.profiling—cProfilestats parsing (own/cumulative time, call counts).backend_descriptor— per-RL-library TensorBoard tag descriptors.schemaoutput backend that serializes a benchmark bundle through the existingBaseIsaacLabBenchmarkmetrics-backend system, plus multi-backend support (e.g.--benchmark_backend schema,omniperf) via a newattach_bundlehook. Single-backend behavior and filenames are unchanged, so existing benchmarks are unaffected.Note: the shared GPU/memory recorders gain additive peak rows (
GPU [i] Memory Used peak,System Memory RSS/VMS/USS peak); every pre-existing KPI row is unchanged. This is the only change visible in the legacy scripts' OmniPerf/JSON output.Builds on the v1.0 benchmark schema merged in #5840. No entry scripts or docs change here — those land with Parts 2–3.
Fixes # (n/a)
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched packageCONTRIBUTORS.mdor my name already exists there