Update base image to CUDA 13.3 / Ubuntu 24.04 and raise CI timeouts by hannahli-nv · Pull Request #158 · NVIDIA/TileGym

hannahli-nv · 2026-06-24T10:16:22Z

Description

Move the test/benchmark image to a newer base and make the GPU CI scale as the
test and benchmark suites grow.

Base image

Update the transformers Dockerfile base from cuda:13.2.0-devel-ubuntu22.04 to
cuda:13.3.0-devel-ubuntu24.04 (newer default toolchain: GCC 13 + Python 3.12).
Update the nsight-systems devtools apt repo to the ubuntu2404 path and add
--break-system-packages to the uv bootstrap pip install (PEP 668 on 24.04).

test-ops (sharded + balanced)

Split test-ops across parallel GPU runners with pytest-split (--splits/--group),
balanced by a committed .github/.test_durations map; scale out by adding entries to
the matrix.
Run at -n 8 and set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.8 to bound
peak GPU memory and avoid intermittent CUDA OOM. (expandable_segments was tried and
dropped — it conflicts with cuTile kernel launches.)
Skip the known-failing chunk_gated_delta_rule cuTile case (tracked separately).

test-benchmark (sharded + aggregate)

Split test-benchmark into a run-shard matrix (files assigned round-robin) plus a
benchmark-aggregate job that merges all shard results and runs the existing regression
check / summary / baseline update over the union.
Drop the cuTile backend from the layernorm/rmsnorm benchmarks: those kernels are
JIT-compiled per shape and sweeping the full shape grid made the benchmark run for tens
of minutes (the backend remains covered by the ops tests).
Fix the benchmark output parser to ignore non-data diagnostic stdout so malformed rows no
longer leak into the results table / summary.

Nightly maintenance

Add a nightly-only update-test-durations job that, when the ops test set changes (tests
added/removed — not on timing drift), opens a reviewable PR refreshing
.github/.test_durations.

CI Configuration

config:
  build: true
  # valid options are "ops", "benchmark", and "sanity"
  test: ["ops", "benchmark"]

Checklist

Code formatted and imports sorted via repo specifications (./format.sh)
Documentation updated (if needed)
CI configuration reviewed

copy-pr-bot · 2026-06-24T10:16:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hannahli-nv force-pushed the update-docker-cuda13.3 branch from 537eb30 to 9a56281 Compare June 28, 2026 15:26

xjmxyt approved these changes Jun 29, 2026

View reviewed changes

hannahli-nv added 19 commits June 30, 2026 00:15

Update base image to CUDA 13.3 / Ubuntu 24.04 and raise CI timeouts

ffd1641

Raise per-benchmark timeout to 40 minutes

a84a6d2

Skip chunk_gated_delta_rule tilecpp l2norm case

4be73ed

Drop tilecpp from layernorm/rmsnorm benchmarks

cc067e2

Shard test-ops across parallel runners with pytest-split

145cddf

Drop expandable_segments from test-ops

0be627c

Use garbage_collection_threshold to avoid cuTile OOM in test-ops

faf1de9

Increase test-ops shards 3 -> 5 to bound per-worker memory

0f8e36e

Try 3 shards with -n 8 and record test durations

12abea3

Balance test-ops shards by measured durations

8669250

Shard test-benchmark across parallel runners

281ba06

Fix ruff formatting in run_all_json.py

2aa5abd

Move .test_durations under .github/

46b8783

Skip non-data lines when parsing benchmark output

a797925

Require exact column width when parsing benchmark rows

3ab0f18

Require all-numeric columns when parsing benchmark rows

cabdd45

Auto-refresh .test_durations on nightly runs

c7098b8

Only refresh .test_durations via PR when the test set changes

a7d62c0

Use the repo PR template for the auto durations PR

969f17e

hannahli-nv force-pushed the update-docker-cuda13.3 branch from 8b8374d to 969f17e Compare June 29, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update base image to CUDA 13.3 / Ubuntu 24.04 and raise CI timeouts#158

Update base image to CUDA 13.3 / Ubuntu 24.04 and raise CI timeouts#158
hannahli-nv wants to merge 19 commits into
mainfrom
update-docker-cuda13.3

hannahli-nv commented Jun 24, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hannahli-nv commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

CI Configuration

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hannahli-nv commented Jun 24, 2026 •

edited

Loading