Remove legacy benchmark scripts (benchmark refactor, Part 5/5)#6206
Remove legacy benchmark scripts (benchmark refactor, Part 5/5)#6206AntoineRichard wants to merge 17 commits into
Conversation
Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).
Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.
Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Introduce scripts/benchmarks/play.py, a --rl_library dispatcher mirroring training.py, plus the rsl_rl inference adapter scripts/benchmarks/rsl_rl/bench_play_rsl_rl.py. The adapter resolves a checkpoint via resolve_play_checkpoint, loads the policy the way the rsl_rl play script does, rolls it out under a BenchmarkMonitor using run_play_loop, and emits a PlayBundle.
Roll out a checkpointed skrl policy under a BenchmarkMonitor and emit a PlayBundle. The skrl env wrapper returns reward and done tensors shaped (num_envs, 1); reshape them to (num_envs,) in run_play_loop so the per-environment return accumulator broadcasts correctly across backends.
Roll out a checkpointed Stable-Baselines3 policy under a BenchmarkMonitor and emit a PlayBundle. The sb3 vec env returns NumPy reward/done arrays and a per-environment list of info dicts; coerce reward and dones onto the env device in run_play_loop so CPU NumPy returns do not clash with the on-device accumulators, and skip success extraction when the info value is not a dict.
- rl_games adapter: read obs_groups/concate_obs_groups from the agent cfg and pass them to RlGamesVecEnvWrapper, so tasks with asymmetric/non-default observation layouts feed the policy the same observation it was trained on. - _common: key the published-checkpoint lookup on the bare training-task name (drop any namespace prefix and the -Play suffix), matching the reinforcement_learning play scripts; add a unit-testable _published_task_name helper. - rl_games adapter: drop the inaccurate RNN-state-reset claim from the policy docstring.
Add PlayBundle to the attach_bundle and SchemaBundleFile.finalize bundle-type unions, and trim the over-verbose reshape comment in run_play_loop.
Remove the per-backend benchmark entry points (benchmark_non_rl.py, benchmark_startup.py, benchmark_rsl_rl.py, benchmark_rlgames.py), the run_*.sh runner shells, and the obsolete utils.py helper, all of which are replaced by the unified runtime.py / startup.py / training.py scripts added earlier in this series. Repoint benchmark_hydra_resolve.py at _common for get_backend_type and document the old->new command mapping in the 3.0 migration guide. This removal is split into its own PR so the unified scripts (Parts 1-4) can merge while the legacy scripts remain in place, letting downstream consumers (OmniPerf ingestion, job runners) migrate at their own pace before the old entry points disappear.
Greptile SummaryThis PR is Part 5/5 of the benchmark refactor series, removing legacy per-backend benchmark entry scripts (
Confidence Score: 4/5The PR is safe to merge. All changes are benchmark-tooling only with no impact on the simulation runtime or training logic. The refactor is thorough, well-tested, and the migration guide is complete. The _published_task_name issue with .replace is real but low-risk given Isaac Lab task naming conventions. The at_iteration_boundary false-positive at step_count=0 is not triggered by any existing caller. The output_file_path property silently returns a nonexistent path in multi-backend mode, which could surprise a future caller. scripts/benchmarks/_common.py (removesuffix fix), source/isaaclab/isaaclab/test/benchmark/metrics.py (at_iteration_boundary guard), source/isaaclab/isaaclab/test/benchmark/benchmark_core.py (output_file_path multi-backend correction) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[training.py / play.py / runtime.py / startup.py] -->|--rl_library dispatch| B[bench_rsl_rl / bench_rl_games / bench_sb3 / bench_skrl]
A -->|no --rl_library| C[runtime.py direct loop]
B --> D[BaseIsaacLabBenchmark]
C --> D
D -->|backend_type list| E{Multi-backend?}
E -->|single| F[One backend: filename = prefix.json]
E -->|multi| G[N backends: filename = prefix_key.json each]
F --> H[SchemaBundleFile / OmniPerfKPIFile / JSONFileMetrics / OsmoKPIFile / SummaryMetrics]
G --> H
H -->|schema backend| I[write_bundle_file → typed JSON]
H -->|other backends| J[flat KPI JSON / stdout summary]
D --> K[BenchmarkMonitor background thread]
K -->|interval| L[update_manual_recorders]
L --> M[GPUInfoRecorder / CPUInfoRecorder / MemoryInfoRecorder / VersionInfoRecorder]
M --> D
D -->|attach_bundle| N[RuntimeBundle / TrainingBundle / PlayBundle / StartupBundle]
N -->|schema finalize| I
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[training.py / play.py / runtime.py / startup.py] -->|--rl_library dispatch| B[bench_rsl_rl / bench_rl_games / bench_sb3 / bench_skrl]
A -->|no --rl_library| C[runtime.py direct loop]
B --> D[BaseIsaacLabBenchmark]
C --> D
D -->|backend_type list| E{Multi-backend?}
E -->|single| F[One backend: filename = prefix.json]
E -->|multi| G[N backends: filename = prefix_key.json each]
F --> H[SchemaBundleFile / OmniPerfKPIFile / JSONFileMetrics / OsmoKPIFile / SummaryMetrics]
G --> H
H -->|schema backend| I[write_bundle_file → typed JSON]
H -->|other backends| J[flat KPI JSON / stdout summary]
D --> K[BenchmarkMonitor background thread]
K -->|interval| L[update_manual_recorders]
L --> M[GPUInfoRecorder / CPUInfoRecorder / MemoryInfoRecorder / VersionInfoRecorder]
M --> D
D -->|attach_bundle| N[RuntimeBundle / TrainingBundle / PlayBundle / StartupBundle]
N -->|schema finalize| I
|
| Returns: | ||
| The bare training-task name to query for a published checkpoint. | ||
| """ | ||
| return task.split(":")[-1].replace("-Play", "") |
There was a problem hiding this comment.
str.replace("-Play", "") removes every non-overlapping occurrence of the substring, not just the trailing suffix. A task whose name contains "-Play" as an infix — e.g. "Isaac-Play-Cube-Direct" — would be incorrectly transformed to "Isaac-Cube-Direct". Use str.removesuffix (Python 3.9+, already in scope throughout this codebase) to strip only the trailing occurrence.
| return task.split(":")[-1].replace("-Play", "") | |
| return task.split(":")[-1].removesuffix("-Play") |
| Integrations that call :meth:`record_step` more or fewer times per env step will | ||
| break iteration accounting. | ||
| """ | ||
| return self.num_steps_per_env > 0 and self._step_count % self.num_steps_per_env == 0 |
There was a problem hiding this comment.
_step_count starts at 0, so 0 % num_steps_per_env == 0 is always True for any positive num_steps_per_env. A freshly constructed SuccessRateTracker incorrectly reports at_iteration_boundary = True before a single step has been recorded. The wrapper guards against this in practice (it only checks after record_step), but the property is misleading and would cause an off-by-one if ever checked eagerly. Excluding _step_count == 0 fixes the edge case.
| return self.num_steps_per_env > 0 and self._step_count % self.num_steps_per_env == 0 | |
| return self.num_steps_per_env > 0 and self._step_count > 0 and self._step_count % self.num_steps_per_env == 0 |
| out: list[str] = [] | ||
| for tok in cli_backend.split(","): | ||
| tok = tok.strip() | ||
| if not tok: | ||
| continue | ||
| canon = get_backend_type(tok) | ||
| if canon not in out: | ||
| out.append(canon) | ||
| return out or ["omniperf"] |
There was a problem hiding this comment.
Silent fallback for unknown backend tokens
get_backend_types maps any unrecognised token through get_backend_type, which silently falls back to "omniperf". If a user passes --benchmark_backend schma (typo) they receive the omniperf backend with no diagnostic. Adding a logger.warning for unrecognised tokens would make misconfigurations detectable without changing existing behaviour for callers that rely on the fallback.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
Part 5 of 5 of the benchmark refactor series — removes the legacy benchmark scripts now superseded by the unified suite added in Parts 1–4.
Series: Part 1/5 core (#6197) → Part 2/5 runtime + startup (#6198) → Part 3/5 training (#6199) → Part 4/5 play (#6201) → Part 5/5 cleanup (this PR).
Parts 1–4 add the unified
runtime.py/startup.py/training.py/play.pysuite alongside the legacy scripts, so nothing breaks while they merge. This final PR removes the legacy scripts once downstream consumers (OmniPerf ingestion, job runners) have migrated — kept as a draft; merge it last, on the consumers' signal.Removes:
benchmark_non_rl.py→runtime.pybenchmark_startup.py→startup.pybenchmark_rsl_rl.py→training.py --rl_library rsl_rlbenchmark_rlgames.py→training.py --rl_library rl_gamesrun_non_rl_benchmarks.sh,run_physx_benchmarks.sh,run_training_benchmarks.sh— express their behavior with script args +presets=; run the PhysX micro-benchmarks undersource/isaaclab_physx/benchmark/directly.scripts/benchmarks/utils.py(obsolete helper) and the orphanedtest/test_training_metrics.py.Also repoints
benchmark_hydra_resolve.pyat_commonforget_backend_type, and adds the "Benchmark Scripts" section to the 3.0 migration guide documenting the full old → new command mapping (includingplay.py).Backward-compatibility note (outputs): the unified scripts emit the same OmniPerf / JSON / Osmo / Summary KPI files as the legacy scripts — every pre-existing KPI row is byte-identical in name, value, and unit. The only difference is four additive peak rows contributed by the shared recorders in Part 1 (
GPU [i] Memory Used peak,System Memory RSS/VMS/USS peak). Nothing is renamed, removed, or recomputed. Direct callers of the removed scripts must switch to the unified entry points per the migration guide.Fixes # (n/a)
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched packageCONTRIBUTORS.mdor my name already exists there