Skip to content

Add evaluation visualization gallery#802

Open
alexmillane wants to merge 3 commits into
alex/refactor/clean_up_env_camera_wrapperfrom
alex/feature/evaluation_visualization
Open

Add evaluation visualization gallery#802
alexmillane wants to merge 3 commits into
alex/refactor/clean_up_env_camera_wrapperfrom
alex/feature/evaluation_visualization

Conversation

@alexmillane

@alexmillane alexmillane commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add a simple viewer for evaluation results.

Details

  • See attached image. For now, this is an html table with episode and associated video.
  • Note: It's likely that in the future we'll want something more sophisticated here. For the time being, this is a useful tool to get started.
  • --gallery flag on policy_runner renders the gallery in the run's dated video dir after the rollout.
  • TODO: eval runner.

Add an HTML gallery that lays out the per-(env, camera, episode) mp4s from a
policy-runner output dir into a grid, with a standalone CLI (and optional
--serve) for viewing.

- New isaaclab_arena/visualization package with gallery.py.
- --gallery flag on policy_runner renders index.html in the run's video dir
  after the rollout (rank 0 only).

Signed-off-by: alex <amillane@nvidia.com>
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new isaaclab_arena/visualization package that scans a rollout video directory, groups mp4s into a per-job × per-episode grid, and writes a self-contained index.html; a --evaluation_report flag on both runners optionally serves it over HTTP after the run.

  • report.py implements build_report (scans + renders) and serve_until_ctrl_c (blocking HTTP server), wired into both policy_runner.py (rank-0 only, after env.close()) and eval_runner.py (after all jobs complete), each guarded by --record_camera_video.
  • The HTML template uses string.Template substitution for title/summary/sections; job names, camera names, and video paths are HTML-escaped throughout.
  • The --evaluation_report flag is additive: the report is always written to disk when camera recording is on, and the flag only controls whether the HTTP server is started afterward.

Confidence Score: 5/5

The change is additive — new visualization module plus two small call sites in existing runners, both guarded by the pre-existing --record_camera_video flag. No existing behaviour is altered.

All new code follows the established patterns in the repo, the report writing and serving paths are well-guarded, and no changes touch core simulation logic. The findings are all minor quality/robustness nits with no functional impact on the current codebase.

isaaclab_arena/visualization/report.py — the four style/robustness suggestions are all in this file.

Important Files Changed

Filename Overview
isaaclab_arena/visualization/report.py New module implementing HTML report generation and HTTP serving for evaluation videos; core logic is clean with a few minor robustness gaps.
isaaclab_arena/visualization/report_template.html New HTML template with sticky headers and inline video player; correct use of $title/$summary/$sections substitution variables, no CSS dollar-signs that would conflict.
isaaclab_arena/evaluation/policy_runner.py Calls write_report on the timestamped video_base_dir after env.close(), guarded to rank 0 only; straightforward integration.
isaaclab_arena/evaluation/eval_runner.py Calls write_report on run_video_dir after all jobs complete, guarded by record_camera_video; straightforward integration.
isaaclab_arena/evaluation/policy_runner_cli.py Adds --evaluation_report flag with accurate help text; no issues.
isaaclab_arena/evaluation/eval_runner_cli.py Adds --evaluation_report flag parallel to the policy runner's; no issues.
isaaclab_arena/visualization/init.py New package init with license header only; no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Runner as policy_runner / eval_runner
    participant WR as write_report()
    participant BR as build_report()
    participant SJ as _scan_jobs()
    participant RR as render_report()
    participant Tmpl as report_template.html
    participant FS as Filesystem
    participant SV as serve_until_ctrl_c()

    Runner->>WR: write_report(video_dir, serve)
    WR->>BR: build_report(video_dir)
    BR->>FS: video_dir.is_dir()?
    alt directory missing or empty
        BR-->>WR: None
        WR-->>Runner: print hint, return None
    else videos found
        BR->>SJ: _scan_jobs(video_dir)
        SJ->>FS: "rglob(*.mp4), regex-filter"
        SJ-->>BR: list[JobReport]
        BR->>RR: render_report(EvaluationReport)
        RR->>Tmpl: read_text()
        RR->>RR: Template.safe_substitute(title, summary, sections)
        RR-->>BR: HTML string
        BR->>FS: write index.html
        BR-->>WR: Path(index.html)
        alt "serve=True"
            WR->>SV: serve_until_ctrl_c(dir, port, index.html)
            SV->>SV: TCPServer.serve_forever() blocks until Ctrl+C
        else "serve=False"
            WR-->>Runner: print To view hint
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Runner as policy_runner / eval_runner
    participant WR as write_report()
    participant BR as build_report()
    participant SJ as _scan_jobs()
    participant RR as render_report()
    participant Tmpl as report_template.html
    participant FS as Filesystem
    participant SV as serve_until_ctrl_c()

    Runner->>WR: write_report(video_dir, serve)
    WR->>BR: build_report(video_dir)
    BR->>FS: video_dir.is_dir()?
    alt directory missing or empty
        BR-->>WR: None
        WR-->>Runner: print hint, return None
    else videos found
        BR->>SJ: _scan_jobs(video_dir)
        SJ->>FS: "rglob(*.mp4), regex-filter"
        SJ-->>BR: list[JobReport]
        BR->>RR: render_report(EvaluationReport)
        RR->>Tmpl: read_text()
        RR->>RR: Template.safe_substitute(title, summary, sections)
        RR-->>BR: HTML string
        BR->>FS: write index.html
        BR-->>WR: Path(index.html)
        alt "serve=True"
            WR->>SV: serve_until_ctrl_c(dir, port, index.html)
            SV->>SV: TCPServer.serve_forever() blocks until Ctrl+C
        else "serve=False"
            WR-->>Runner: print To view hint
        end
    end
Loading

Reviews (3): Last reviewed commit: "Expand evaluation report to cover the wh..." | Re-trigger Greptile

Comment on lines +247 to +248
if args_cli.gallery and get_local_rank() == 0:
output = build_gallery(video_cfg.video_base_dir)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 assert crash before graceful error message

build_gallery internally calls assert folder.is_dir(), so if a user runs with --gallery but without any recording flag (neither --record_camera_video nor --record_viewport_video), the timestamped video_base_dir is never created by any recorder. build_gallery then raises AssertionError: Not a directory: … before it can return None, meaning the intended user-friendly message on lines 250–253 ("did you pass --record_camera_video?") is never shown. A quick guard in policy_runner.py — checking Path(video_cfg.video_base_dir).is_dir() before calling build_gallery — or replacing the assert in build_gallery with an early return None would fix this.

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +171 to +173
folder = Path(folder).resolve()
assert folder.is_dir(), f"Not a directory: {folder}"
output = Path(output).resolve() if output else folder / "index.html"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 assert statements can be silently disabled when Python is run with the -O (optimize) flag. For input validation that should always fire — especially in a library function — a plain if + raise ValueError is more robust.

Suggested change
folder = Path(folder).resolve()
assert folder.is_dir(), f"Not a directory: {folder}"
output = Path(output).resolve() if output else folder / "index.html"
folder = Path(folder).resolve()
if not folder.is_dir():
raise ValueError(f"Not a directory: {folder}")
output = Path(output).resolve() if output else folder / "index.html"

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +208 to +210
handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))
url = f"http://localhost:{port}/{filename}"
with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TCPServer does not set allow_reuse_address, so if the server is stopped and restarted quickly (e.g., after Ctrl+C during a second evaluation run), the port may still be in TIME_WAIT and the bind will fail with "Address already in use". Setting allow_reuse_address = True avoids this.

Suggested change
handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))
url = f"http://localhost:{port}/{filename}"
with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:
handler = functools.partial(http.server.SimpleHTTPRequestHandler, directory=str(directory))
url = f"http://localhost:{port}/{filename}"
socketserver.TCPServer.allow_reuse_address = True
with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:

@alexmillane alexmillane left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review. Good effort claude. Let's clean up.

Comment on lines +247 to +258
if args_cli.gallery and get_local_rank() == 0:
output = build_gallery(video_cfg.video_base_dir)
if output is None:
print(
f"[Rank {local_rank}/{world_size}] --gallery: no per-camera videos found in"
f" {video_cfg.video_base_dir}; nothing to render (did you pass --record_camera_video?)."
)
else:
print(
f"[Rank {local_rank}/{world_size}] Gallery written to {output}. View it with:"
f" python isaaclab_arena/visualization/gallery.py {video_cfg.video_base_dir} --serve"
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the extra step and have the policy runner drop back into serving the result itself/immediately.

Comment on lines +245 to +246
# Optionally render an HTML gallery of the per-camera per-episode videos.
# Only the local rank 0 writes it, to avoid races on a shared video dir.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten to: "Optionally render an HTML report"

Comment on lines +64 to +65
"After the rollout, generate an HTML gallery (index.html) of the per-camera per-episode"
" videos in --video_base_dir. Requires --record_camera_video to have produced videos."

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to: "after the generate and serve an evaluation report."

),
)
parser.add_argument(
"--gallery",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluation_report

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +221 to +256
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("folder", type=str, help="Folder of policy-runner output videos to scan.")
parser.add_argument(
"-o",
"--output",
type=str,
default=None,
help="Output HTML path. Defaults to <folder>/index.html.",
)
parser.add_argument(
"--title",
type=str,
default="Evaluation Gallery",
help="Title and heading for the generated page.",
)
parser.add_argument(
"--serve",
action="store_true",
help=(
"Host the gallery over HTTP instead of opening a file. Recommended inside the dev container "
"(reachable from the host browser at http://localhost:<port> thanks to --net=host)."
),
)
parser.add_argument(
"--port",
type=int,
default=8000,
help="Port for --serve. Defaults to 8000.",
)
parser.add_argument(
"--no-open",
action="store_true",
help="Do not open the generated page in a web browser.",
)
args = parser.parse_args()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate parsing of args to another function.

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +224 to +230
parser.add_argument(
"-o",
"--output",
type=str,
default=None,
help="Output HTML path. Defaults to <folder>/index.html.",
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove option. just write to the default.

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +6 to +24
"""Generate a browsable HTML gallery from a policy-runner output folder.

Scans a folder for the per-(env, camera, episode) mp4s written by
``CameraObsVideoRecorder`` and writes an ``index.html`` laying them out in a
grid: one row per (env_idx, episode_idx), one column per camera. Videos are
referenced by relative path, so the html must stay next to the mp4s.

Files that do not match the recorder's naming pattern (e.g. the kit-viewport
``rl-video-step-*.mp4``) are ignored.

Usage:
python isaaclab_arena/visualization/gallery.py outputs/
python isaaclab_arena/visualization/gallery.py outputs/ -o gallery.html --title "Run 42"

When running inside the dev container, the host browser cannot be launched
directly. Use ``--serve`` to host the gallery over HTTP; because the container
runs with ``--net=host``, the printed ``http://localhost:<port>`` URL opens
straight from the host browser.
"""

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this description into the argparse.

shorten drastically. summarize.

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +231 to +236
parser.add_argument(
"--title",
type=str,
default="Evaluation Gallery",
help="Title and heading for the generated page.",
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread isaaclab_arena/visualization/gallery.py Outdated
Comment on lines +251 to +255
parser.add_argument(
"--no-open",
action="store_true",
help="Do not open the generated page in a web browser.",
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retire

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add the visualization to the eval_runner?

Address review feedback on the visualization module:
- Rename --gallery to --evaluation_report; policy_runner now builds and serves
  the report immediately instead of printing a follow-up command.
- Rename gallery.py -> report.py; move the HTML/CSS into report_template.html.
- Introduce EpisodeVideos/VideoGrid dataclasses for the parsed videos; split
  render into small helpers; scan recursively so eval per-job dirs become groups.
- Slim the standalone CLI (always serve, default output, no browser detection);
  rename serve_forever -> serve_until_ctrl_c with allow_reuse_address.
- build_report returns None (no assert) on a missing/empty dir.
- Also wire the report into eval_runner.

Signed-off-by: alex <amillane@nvidia.com>
@alexmillane

Copy link
Copy Markdown
Collaborator Author

Worked through the review in 5a483ea. Summary of changes:

Reframed as an 'evaluation report'

  • --gallery--evaluation_report; help reworded to generate and serve an evaluation report.
  • policy_runner now builds and serves immediately (no more 'run this command to view it' step); comment shortened to render an HTML report.
  • gallery.pyreport.py, build_gallerybuild_report/build_and_serve_report.

Structure / readability

  • HTML+CSS moved out into report_template.html (loaded via string.Template).
  • Parsed output is now EpisodeVideos/VideoGrid dataclasses instead of the nested dict[tuple,...]; the setdefault now has a clarifying comment.
  • render_report split into _render_row / _render_row_label / _render_video_cell.
  • import pathlib (module-qualified pathlib.Path).
  • Module docstring shortened to one line; the long description moved into the argparse description.

CLI slimmed

  • Always serve; removed --serve, --output (writes default index.html), --title, --no-open, and the in_container detection.
  • serve_foreverserve_until_ctrl_c; arg parsing extracted to _parse_args.

greptile

  • P1 (assert crash before the friendly message): build_report now returns None on a missing/empty dir — no assert.
  • P2 (assert under -O): assert removed entirely.
  • P2 (TIME_WAIT bind): allow_reuse_address = True.

eval_runner

  • Added --evaluation_report there too; the scan now recurses, so each job's sub-directory shows up as a row group.

Verified the module end-to-end on synthetic videos (flat + nested layouts, escaping, graceful no-video case). One call worth confirming: serving now blocks until Ctrl+C at the end of the run — that's the 'drop into serving immediately' behaviour you asked for, but shout if you'd prefer it backgrounded.

Comment on lines +181 to +186
with socketserver.TCPServer(("0.0.0.0", port), handler) as httpd:
print(f"Serving evaluation report at {url} (Ctrl+C to stop).")
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("\nStopping server.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unhandled OSError when port is actively in use

allow_reuse_address = True resolves the TIME_WAIT case (rapid restart of this tool), but if port 8000 is already held by another running process — Jupyter, a dev server, a previous evaluation run that was never stopped — TCPServer(("0.0.0.0", port), handler) raises OSError: [Errno 98] Address already in use before the with block is entered. The exception is not caught anywhere in the call chain (serve_until_ctrl_cbuild_and_serve_reportmain), so the process crashes with a raw Python traceback after what may have been a long evaluation run. A try/except OSError around the TCPServer construction, printing a message like "Port {port} is already in use; you can view the report by opening {output} directly or re-run with --port N.", would let the user recover gracefully.

- Model the run as EvaluationReport -> JobReport -> EpisodeVideos; render one
  per-job section (heading + env x episode grid) instead of a flat table.
- Derive a contiguous per-(job, env) episode index, parsing the rebuild index
  only to order/disambiguate episodes; rebuilds are no longer surfaced (fixes
  episode-index collisions across rebuilds).
- Build the report once at the end of the run whenever --record_camera_video is
  set; --evaluation_report now additionally serves it (no longer blocks by default).
- eval_runner builds a single run-wide report; policy_runner builds its single
  unnamed-job report.

Signed-off-by: alex <amillane@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant