Add evaluation visualization gallery by alexmillane · Pull Request #802 · isaac-sim/IsaacLab-Arena

alexmillane · 2026-06-16T13:58:37Z

Summary

Add a simple viewer for evaluation results.

Details

See attached image. For now, this is an html table with episode and associated video.
Note: It's likely that in the future we'll want something more sophisticated here. For the time being, this is a useful tool to get started.
--gallery flag on policy_runner renders the gallery in the run's dated video dir after the rollout.
TODO: eval runner.

Add an HTML gallery that lays out the per-(env, camera, episode) mp4s from a policy-runner output dir into a grid, with a standalone CLI (and optional --serve) for viewing. - New isaaclab_arena/visualization package with gallery.py. - --gallery flag on policy_runner renders index.html in the run's video dir after the rollout (rank 0 only). Signed-off-by: alex <amillane@nvidia.com>

greptile-apps · 2026-06-16T14:01:19Z

Greptile Summary

This PR adds a new isaaclab_arena/visualization package that scans a rollout video directory, groups mp4s into a per-job × per-episode grid, and writes a self-contained index.html; a --evaluation_report flag on both runners optionally serves it over HTTP after the run.

report.py implements build_report (scans + renders) and serve_until_ctrl_c (blocking HTTP server), wired into both policy_runner.py (rank-0 only, after env.close()) and eval_runner.py (after all jobs complete), each guarded by --record_camera_video.
The HTML template uses string.Template substitution for title/summary/sections; job names, camera names, and video paths are HTML-escaped throughout.
The --evaluation_report flag is additive: the report is always written to disk when camera recording is on, and the flag only controls whether the HTTP server is started afterward.

Confidence Score: 5/5

The change is additive — new visualization module plus two small call sites in existing runners, both guarded by the pre-existing --record_camera_video flag. No existing behaviour is altered.

All new code follows the established patterns in the repo, the report writing and serving paths are well-guarded, and no changes touch core simulation logic. The findings are all minor quality/robustness nits with no functional impact on the current codebase.

isaaclab_arena/visualization/report.py — the four style/robustness suggestions are all in this file.

Important Files Changed

Filename	Overview
isaaclab_arena/visualization/report.py	New module implementing HTML report generation and HTTP serving for evaluation videos; core logic is clean with a few minor robustness gaps.
isaaclab_arena/visualization/report_template.html	New HTML template with sticky headers and inline video player; correct use of $title/$summary/$sections substitution variables, no CSS dollar-signs that would conflict.
isaaclab_arena/evaluation/policy_runner.py	Calls write_report on the timestamped video_base_dir after env.close(), guarded to rank 0 only; straightforward integration.
isaaclab_arena/evaluation/eval_runner.py	Calls write_report on run_video_dir after all jobs complete, guarded by record_camera_video; straightforward integration.
isaaclab_arena/evaluation/policy_runner_cli.py	Adds --evaluation_report flag with accurate help text; no issues.
isaaclab_arena/evaluation/eval_runner_cli.py	Adds --evaluation_report flag parallel to the policy runner's; no issues.
isaaclab_arena/visualization/init.py	New package init with license header only; no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Runner as policy_runner / eval_runner
    participant WR as write_report()
    participant BR as build_report()
    participant SJ as _scan_jobs()
    participant RR as render_report()
    participant Tmpl as report_template.html
    participant FS as Filesystem
    participant SV as serve_until_ctrl_c()

    Runner->>WR: write_report(video_dir, serve)
    WR->>BR: build_report(video_dir)
    BR->>FS: video_dir.is_dir()?
    alt directory missing or empty
        BR-->>WR: None
        WR-->>Runner: print hint, return None
    else videos found
        BR->>SJ: _scan_jobs(video_dir)
        SJ->>FS: "rglob(*.mp4), regex-filter"
        SJ-->>BR: list[JobReport]
        BR->>RR: render_report(EvaluationReport)
        RR->>Tmpl: read_text()
        RR->>RR: Template.safe_substitute(title, summary, sections)
        RR-->>BR: HTML string
        BR->>FS: write index.html
        BR-->>WR: Path(index.html)
        alt "serve=True"
            WR->>SV: serve_until_ctrl_c(dir, port, index.html)
            SV->>SV: TCPServer.serve_forever() blocks until Ctrl+C
        else "serve=False"
            WR-->>Runner: print To view hint
        end
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Runner as policy_runner / eval_runner
    participant WR as write_report()
    participant BR as build_report()
    participant SJ as _scan_jobs()
    participant RR as render_report()
    participant Tmpl as report_template.html
    participant FS as Filesystem
    participant SV as serve_until_ctrl_c()

    Runner->>WR: write_report(video_dir, serve)
    WR->>BR: build_report(video_dir)
    BR->>FS: video_dir.is_dir()?
    alt directory missing or empty
        BR-->>WR: None
        WR-->>Runner: print hint, return None
    else videos found
        BR->>SJ: _scan_jobs(video_dir)
        SJ->>FS: "rglob(*.mp4), regex-filter"
        SJ-->>BR: list[JobReport]
        BR->>RR: render_report(EvaluationReport)
        RR->>Tmpl: read_text()
        RR->>RR: Template.safe_substitute(title, summary, sections)
        RR-->>BR: HTML string
        BR->>FS: write index.html
        BR-->>WR: Path(index.html)
        alt "serve=True"
            WR->>SV: serve_until_ctrl_c(dir, port, index.html)
            SV->>SV: TCPServer.serve_forever() blocks until Ctrl+C
        else "serve=False"
            WR-->>Runner: print To view hint
        end
    end

_{Reviews (3): Last reviewed commit: "Expand evaluation report to cover the wh..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T14:01:23Z

+        if args_cli.gallery and get_local_rank() == 0:
+            output = build_gallery(video_cfg.video_base_dir)


assert crash before graceful error message

build_gallery internally calls assert folder.is_dir(), so if a user runs with --gallery but without any recording flag (neither --record_camera_video nor --record_viewport_video), the timestamped video_base_dir is never created by any recorder. build_gallery then raises AssertionError: Not a directory: … before it can return None, meaning the intended user-friendly message on lines 250–253 ("did you pass --record_camera_video?") is never shown. A quick guard in policy_runner.py — checking Path(video_cfg.video_base_dir).is_dir() before calling build_gallery — or replacing the assert in build_gallery with an early return None would fix this.

greptile-apps · 2026-06-16T14:01:24Z

+    folder = Path(folder).resolve()
+    assert folder.is_dir(), f"Not a directory: {folder}"
+    output = Path(output).resolve() if output else folder / "index.html"


assert statements can be silently disabled when Python is run with the -O (optimize) flag. For input validation that should always fire — especially in a library function — a plain if + raise ValueError is more robust.

Suggested change

folder = Path(folder).resolve()

assert folder.is_dir(), f"Not a directory: {folder}"

output = Path(output).resolve() if output else folder / "index.html"

folder = Path(folder).resolve()

if not folder.is_dir():

raise ValueError(f"Not a directory: {folder}")

output = Path(output).resolve() if output else folder / "index.html"

greptile-apps · 2026-06-16T14:01:25Z

+    handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))
+    url = f"http://localhost:{port}/{filename}"
+    with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:


TCPServer does not set allow_reuse_address, so if the server is stopped and restarted quickly (e.g., after Ctrl+C during a second evaluation run), the port may still be in TIME_WAIT and the bind will fail with "Address already in use". Setting allow_reuse_address = True avoids this.

Suggested change

handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))

url = f"http://localhost:{port}/{filename}"

with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:

handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))

url = f"http://localhost:{port}/{filename}"

socketserver.TCPServer.allow_reuse_address = True

with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:

alexmillane

Self-review. Good effort claude. Let's clean up.

alexmillane · 2026-06-16T15:56:34Z

+        if args_cli.gallery and get_local_rank() == 0:
+            output = build_gallery(video_cfg.video_base_dir)
+            if output is None:
+                print(
+                    f"[Rank {local_rank}/{world_size}] --gallery: no per-camera videos found in"
+                    f" {video_cfg.video_base_dir}; nothing to render (did you pass --record_camera_video?)."
+                )
+            else:
+                print(
+                    f"[Rank {local_rank}/{world_size}] Gallery written to {output}. View it with:"
+                    f" python isaaclab_arena/visualization/gallery.py {video_cfg.video_base_dir} --serve"
+                )


Can we remove the extra step and have the policy runner drop back into serving the result itself/immediately.

alexmillane · 2026-06-16T15:56:59Z

+        # Optionally render an HTML gallery of the per-camera per-episode videos.
+        # Only the local rank 0 writes it, to avoid races on a shared video dir.


Shorten to: "Optionally render an HTML report"

alexmillane · 2026-06-16T15:57:33Z

+            "After the rollout, generate an HTML gallery (index.html) of the per-camera per-episode"
+            " videos in --video_base_dir. Requires --record_camera_video to have produced videos."


Update to: "after the generate and serve an evaluation report."

alexmillane · 2026-06-16T15:57:56Z

        ),
    )
+    parser.add_argument(
+        "--gallery",


evaluation_report

alexmillane · 2026-06-16T15:58:31Z

+def main() -> None:
+    parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    parser.add_argument("folder", type=str, help="Folder of policy-runner output videos to scan.")
+    parser.add_argument(
+        "-o",
+        "--output",
+        type=str,
+        default=None,
+        help="Output HTML path. Defaults to <folder>/index.html.",
+    )
+    parser.add_argument(
+        "--title",
+        type=str,
+        default="Evaluation Gallery",
+        help="Title and heading for the generated page.",
+    )
+    parser.add_argument(
+        "--serve",
+        action="store_true",
+        help=(
+            "Host the gallery over HTTP instead of opening a file. Recommended inside the dev container "
+            "(reachable from the host browser at http://localhost:<port> thanks to --net=host)."
+        ),
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=8000,
+        help="Port for --serve. Defaults to 8000.",
+    )
+    parser.add_argument(
+        "--no-open",
+        action="store_true",
+        help="Do not open the generated page in a web browser.",
+    )
+    args = parser.parse_args()


Separate parsing of args to another function.

alexmillane · 2026-06-16T16:04:18Z

+    parser.add_argument(
+        "-o",
+        "--output",
+        type=str,
+        default=None,
+        help="Output HTML path. Defaults to <folder>/index.html.",
+    )


remove option. just write to the default.

alexmillane · 2026-06-16T16:04:51Z

+"""Generate a browsable HTML gallery from a policy-runner output folder.
+
+Scans a folder for the per-(env, camera, episode) mp4s written by
+``CameraObsVideoRecorder`` and writes an ``index.html`` laying them out in a
+grid: one row per (env_idx, episode_idx), one column per camera. Videos are
+referenced by relative path, so the html must stay next to the mp4s.
+
+Files that do not match the recorder's naming pattern (e.g. the kit-viewport
+``rl-video-step-*.mp4``) are ignored.
+
+Usage:
+    python isaaclab_arena/visualization/gallery.py outputs/
+    python isaaclab_arena/visualization/gallery.py outputs/ -o gallery.html --title "Run 42"
+
+When running inside the dev container, the host browser cannot be launched
+directly. Use ``--serve`` to host the gallery over HTTP; because the container
+runs with ``--net=host``, the printed ``http://localhost:<port>`` URL opens
+straight from the host browser.
+"""


Move this description into the argparse.

shorten drastically. summarize.

alexmillane · 2026-06-16T16:05:03Z

+    parser.add_argument(
+        "--title",
+        type=str,
+        default="Evaluation Gallery",
+        help="Title and heading for the generated page.",
+    )


alexmillane · 2026-06-16T16:05:13Z

+    parser.add_argument(
+        "--no-open",
+        action="store_true",
+        help="Do not open the generated page in a web browser.",
+    )


alexmillane · 2026-06-16T16:05:44Z

Could we also add the visualization to the eval_runner?

Address review feedback on the visualization module: - Rename --gallery to --evaluation_report; policy_runner now builds and serves the report immediately instead of printing a follow-up command. - Rename gallery.py -> report.py; move the HTML/CSS into report_template.html. - Introduce EpisodeVideos/VideoGrid dataclasses for the parsed videos; split render into small helpers; scan recursively so eval per-job dirs become groups. - Slim the standalone CLI (always serve, default output, no browser detection); rename serve_forever -> serve_until_ctrl_c with allow_reuse_address. - build_report returns None (no assert) on a missing/empty dir. - Also wire the report into eval_runner. Signed-off-by: alex <amillane@nvidia.com>

alexmillane · 2026-06-16T16:14:30Z

Worked through the review in 5a483ea. Summary of changes:

Reframed as an 'evaluation report'

--gallery → --evaluation_report; help reworded to generate and serve an evaluation report.
policy_runner now builds and serves immediately (no more 'run this command to view it' step); comment shortened to render an HTML report.
gallery.py → report.py, build_gallery → build_report/build_and_serve_report.

Structure / readability

HTML+CSS moved out into report_template.html (loaded via string.Template).
Parsed output is now EpisodeVideos/VideoGrid dataclasses instead of the nested dict[tuple,...]; the setdefault now has a clarifying comment.
render_report split into _render_row / _render_row_label / _render_video_cell.
import pathlib (module-qualified pathlib.Path).
Module docstring shortened to one line; the long description moved into the argparse description.

CLI slimmed

Always serve; removed --serve, --output (writes default index.html), --title, --no-open, and the in_container detection.
serve_forever → serve_until_ctrl_c; arg parsing extracted to _parse_args.

greptile

P1 (assert crash before the friendly message): build_report now returns None on a missing/empty dir — no assert.
P2 (assert under -O): assert removed entirely.
P2 (TIME_WAIT bind): allow_reuse_address = True.

eval_runner

Added --evaluation_report there too; the scan now recurses, so each job's sub-directory shows up as a row group.

Verified the module end-to-end on synthetic videos (flat + nested layouts, escaping, graceful no-video case). One call worth confirming: serving now blocks until Ctrl+C at the end of the run — that's the 'drop into serving immediately' behaviour you asked for, but shout if you'd prefer it backgrounded.

greptile-apps · 2026-06-16T16:19:15Z

+    with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:
+        print(f"Serving evaluation report at {url} (Ctrl+C to stop).")
+        try:
+            httpd.serve_forever()
+        except KeyboardInterrupt:
+            print("\nStopping server.")


Unhandled OSError when port is actively in use

allow_reuse_address = True resolves the TIME_WAIT case (rapid restart of this tool), but if port 8000 is already held by another running process — Jupyter, a dev server, a previous evaluation run that was never stopped — TCPServer(("0.0.0.0", port), handler) raises OSError: [Errno 98] Address already in use before the with block is entered. The exception is not caught anywhere in the call chain (serve_until_ctrl_c → build_and_serve_report → main), so the process crashes with a raw Python traceback after what may have been a long evaluation run. A try/except OSError around the TCPServer construction, printing a message like "Port {port} is already in use; you can view the report by opening {output} directly or re-run with --port N.", would let the user recover gracefully.

- Model the run as EvaluationReport -> JobReport -> EpisodeVideos; render one per-job section (heading + env x episode grid) instead of a flat table. - Derive a contiguous per-(job, env) episode index, parsing the rebuild index only to order/disambiguate episodes; rebuilds are no longer surfaced (fixes episode-index collisions across rebuilds). - Build the report once at the end of the run whenever --record_camera_video is set; --evaluation_report now additionally serves it (no longer blocks by default). - eval_runner builds a single run-wide report; policy_runner builds its single unnamed-job report. Signed-off-by: alex <amillane@nvidia.com>

alexmillane requested review from cvolkcvolk, peterd-NV, qianl-nv, viiik-inside, xyao-nv and zhx06 as code owners June 16, 2026 13:58

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

alexmillane commented Jun 16, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

		if args_cli.gallery and get_local_rank() == 0:
		output = build_gallery(video_cfg.video_base_dir)

		# Optionally render an HTML gallery of the per-camera per-episode videos.
		# Only the local rank 0 writes it, to avoid races on a shared video dir.

		"After the rollout, generate an HTML gallery (index.html) of the per-camera per-episode"
		" videos in --video_base_dir. Requires --record_camera_video to have produced videos."

Conversation

alexmillane commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexmillane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexmillane commented Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alexmillane commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading