Add evaluation visualization gallery#802
Conversation
Add an HTML gallery that lays out the per-(env, camera, episode) mp4s from a policy-runner output dir into a grid, with a standalone CLI (and optional --serve) for viewing. - New isaaclab_arena/visualization package with gallery.py. - --gallery flag on policy_runner renders index.html in the run's video dir after the rollout (rank 0 only). Signed-off-by: alex <amillane@nvidia.com>
| if args_cli.gallery and get_local_rank() == 0: | ||
| output = build_gallery(video_cfg.video_base_dir) |
There was a problem hiding this comment.
assert crash before graceful error message
build_gallery internally calls assert folder.is_dir(), so if a user runs with --gallery but without any recording flag (neither --record_camera_video nor --record_viewport_video), the timestamped video_base_dir is never created by any recorder. build_gallery then raises AssertionError: Not a directory: … before it can return None, meaning the intended user-friendly message on lines 250–253 ("did you pass --record_camera_video?") is never shown. A quick guard in policy_runner.py — checking Path(video_cfg.video_base_dir).is_dir() before calling build_gallery — or replacing the assert in build_gallery with an early return None would fix this.
| folder = Path(folder).resolve() | ||
| assert folder.is_dir(), f"Not a directory: {folder}" | ||
| output = Path(output).resolve() if output else folder / "index.html" |
There was a problem hiding this comment.
assert statements can be silently disabled when Python is run with the -O (optimize) flag. For input validation that should always fire — especially in a library function — a plain if + raise ValueError is more robust.
| folder = Path(folder).resolve() | |
| assert folder.is_dir(), f"Not a directory: {folder}" | |
| output = Path(output).resolve() if output else folder / "index.html" | |
| folder = Path(folder).resolve() | |
| if not folder.is_dir(): | |
| raise ValueError(f"Not a directory: {folder}") | |
| output = Path(output).resolve() if output else folder / "index.html" |
| handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory)) | ||
| url = f"http://localhost:{port}/{filename}" | ||
| with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd: |
There was a problem hiding this comment.
TCPServer does not set allow_reuse_address, so if the server is stopped and restarted quickly (e.g., after Ctrl+C during a second evaluation run), the port may still be in TIME_WAIT and the bind will fail with "Address already in use". Setting allow_reuse_address = True avoids this.
| handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory)) | |
| url = f"http://localhost:{port}/{filename}" | |
| with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd: | |
| handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory)) | |
| url = f"http://localhost:{port}/{filename}" | |
| socketserver.TCPServer.allow_reuse_address = True | |
| with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd: |
alexmillane
left a comment
There was a problem hiding this comment.
Self-review. Good effort claude. Let's clean up.
| if args_cli.gallery and get_local_rank() == 0: | ||
| output = build_gallery(video_cfg.video_base_dir) | ||
| if output is None: | ||
| print( | ||
| f"[Rank {local_rank}/{world_size}] --gallery: no per-camera videos found in" | ||
| f" {video_cfg.video_base_dir}; nothing to render (did you pass --record_camera_video?)." | ||
| ) | ||
| else: | ||
| print( | ||
| f"[Rank {local_rank}/{world_size}] Gallery written to {output}. View it with:" | ||
| f" python isaaclab_arena/visualization/gallery.py {video_cfg.video_base_dir} --serve" | ||
| ) |
There was a problem hiding this comment.
Can we remove the extra step and have the policy runner drop back into serving the result itself/immediately.
| # Optionally render an HTML gallery of the per-camera per-episode videos. | ||
| # Only the local rank 0 writes it, to avoid races on a shared video dir. |
There was a problem hiding this comment.
Shorten to: "Optionally render an HTML report"
| "After the rollout, generate an HTML gallery (index.html) of the per-camera per-episode" | ||
| " videos in --video_base_dir. Requires --record_camera_video to have produced videos." |
There was a problem hiding this comment.
Update to: "after the generate and serve an evaluation report."
| ), | ||
| ) | ||
| parser.add_argument( | ||
| "--gallery", |
There was a problem hiding this comment.
evaluation_report
| def main() -> None: | ||
| parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) | ||
| parser.add_argument("folder", type=str, help="Folder of policy-runner output videos to scan.") | ||
| parser.add_argument( | ||
| "-o", | ||
| "--output", | ||
| type=str, | ||
| default=None, | ||
| help="Output HTML path. Defaults to <folder>/index.html.", | ||
| ) | ||
| parser.add_argument( | ||
| "--title", | ||
| type=str, | ||
| default="Evaluation Gallery", | ||
| help="Title and heading for the generated page.", | ||
| ) | ||
| parser.add_argument( | ||
| "--serve", | ||
| action="store_true", | ||
| help=( | ||
| "Host the gallery over HTTP instead of opening a file. Recommended inside the dev container " | ||
| "(reachable from the host browser at http://localhost:<port> thanks to --net=host)." | ||
| ), | ||
| ) | ||
| parser.add_argument( | ||
| "--port", | ||
| type=int, | ||
| default=8000, | ||
| help="Port for --serve. Defaults to 8000.", | ||
| ) | ||
| parser.add_argument( | ||
| "--no-open", | ||
| action="store_true", | ||
| help="Do not open the generated page in a web browser.", | ||
| ) | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Separate parsing of args to another function.
| parser.add_argument( | ||
| "-o", | ||
| "--output", | ||
| type=str, | ||
| default=None, | ||
| help="Output HTML path. Defaults to <folder>/index.html.", | ||
| ) |
There was a problem hiding this comment.
remove option. just write to the default.
| """Generate a browsable HTML gallery from a policy-runner output folder. | ||
|
|
||
| Scans a folder for the per-(env, camera, episode) mp4s written by | ||
| ``CameraObsVideoRecorder`` and writes an ``index.html`` laying them out in a | ||
| grid: one row per (env_idx, episode_idx), one column per camera. Videos are | ||
| referenced by relative path, so the html must stay next to the mp4s. | ||
|
|
||
| Files that do not match the recorder's naming pattern (e.g. the kit-viewport | ||
| ``rl-video-step-*.mp4``) are ignored. | ||
|
|
||
| Usage: | ||
| python isaaclab_arena/visualization/gallery.py outputs/ | ||
| python isaaclab_arena/visualization/gallery.py outputs/ -o gallery.html --title "Run 42" | ||
|
|
||
| When running inside the dev container, the host browser cannot be launched | ||
| directly. Use ``--serve`` to host the gallery over HTTP; because the container | ||
| runs with ``--net=host``, the printed ``http://localhost:<port>`` URL opens | ||
| straight from the host browser. | ||
| """ |
There was a problem hiding this comment.
Move this description into the argparse.
shorten drastically. summarize.
| parser.add_argument( | ||
| "--title", | ||
| type=str, | ||
| default="Evaluation Gallery", | ||
| help="Title and heading for the generated page.", | ||
| ) |
| parser.add_argument( | ||
| "--no-open", | ||
| action="store_true", | ||
| help="Do not open the generated page in a web browser.", | ||
| ) |
There was a problem hiding this comment.
Could we also add the visualization to the eval_runner?
Address review feedback on the visualization module: - Rename --gallery to --evaluation_report; policy_runner now builds and serves the report immediately instead of printing a follow-up command. - Rename gallery.py -> report.py; move the HTML/CSS into report_template.html. - Introduce EpisodeVideos/VideoGrid dataclasses for the parsed videos; split render into small helpers; scan recursively so eval per-job dirs become groups. - Slim the standalone CLI (always serve, default output, no browser detection); rename serve_forever -> serve_until_ctrl_c with allow_reuse_address. - build_report returns None (no assert) on a missing/empty dir. - Also wire the report into eval_runner. Signed-off-by: alex <amillane@nvidia.com>
|
Worked through the review in 5a483ea. Summary of changes: Reframed as an 'evaluation report'
Structure / readability
CLI slimmed
greptile
eval_runner
Verified the module end-to-end on synthetic videos (flat + nested layouts, escaping, graceful no-video case). One call worth confirming: serving now blocks until Ctrl+C at the end of the run — that's the 'drop into serving immediately' behaviour you asked for, but shout if you'd prefer it backgrounded. |
| with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd: | ||
| print(f"Serving evaluation report at {url} (Ctrl+C to stop).") | ||
| try: | ||
| httpd.serve_forever() | ||
| except KeyboardInterrupt: | ||
| print("\nStopping server.") |
There was a problem hiding this comment.
Unhandled
OSError when port is actively in use
allow_reuse_address = True resolves the TIME_WAIT case (rapid restart of this tool), but if port 8000 is already held by another running process — Jupyter, a dev server, a previous evaluation run that was never stopped — TCPServer(("0.0.0.0", port), handler) raises OSError: [Errno 98] Address already in use before the with block is entered. The exception is not caught anywhere in the call chain (serve_until_ctrl_c → build_and_serve_report → main), so the process crashes with a raw Python traceback after what may have been a long evaluation run. A try/except OSError around the TCPServer construction, printing a message like "Port {port} is already in use; you can view the report by opening {output} directly or re-run with --port N.", would let the user recover gracefully.
- Model the run as EvaluationReport -> JobReport -> EpisodeVideos; render one per-job section (heading + env x episode grid) instead of a flat table. - Derive a contiguous per-(job, env) episode index, parsing the rebuild index only to order/disambiguate episodes; rebuilds are no longer surfaced (fixes episode-index collisions across rebuilds). - Build the report once at the end of the run whenever --record_camera_video is set; --evaluation_report now additionally serves it (no longer blocks by default). - eval_runner builds a single run-wide report; policy_runner builds its single unnamed-job report. Signed-off-by: alex <amillane@nvidia.com>
Summary
Add a simple viewer for evaluation results.
Details
--galleryflag onpolicy_runnerrenders the gallery in the run's dated video dir after the rollout.