Skip to content

Add Triton TileIR kernels (rms_norm, rope, layer_norm_legacy, dropout) + TileIR CI pass#157

Open
hannahli-nv wants to merge 2 commits into
mainfrom
triton-kernels-nvt-ci
Open

Add Triton TileIR kernels (rms_norm, rope, layer_norm_legacy, dropout) + TileIR CI pass#157
hannahli-nv wants to merge 2 commits into
mainfrom
triton-kernels-nvt-ci

Conversation

@hannahli-nv

Copy link
Copy Markdown
Collaborator

Summary

Add Triton implementations of rms_norm, rope, layer_norm_legacy, and dropout (wired into the dispatch and covered by test_op), and validate them under the Triton TileIR backend in CI.

Changes

  • Kernels: src/tilegym/ops/triton/{rms_norm,rope,layer_norm_legacy,dropout}.py plus dispatch wiring (ops/ops.py, ops/__init__.py, ops/triton/__init__.py) and backend detection (backend/selector.py, backend/__init__.py).
  • Dockerfile (modeling/transformers/Dockerfile): build the TileIR Triton backend (github.com/triton-lang/Triton-to-tile-IR) from source, installed side-by-side at /opt/nvtriton.
  • CI (.github/workflows/tilegym-ci.yml): add an ops test pass that runs the Triton-backend tests with PYTHONPATH=/opt/nvtriton and ENABLE_TILE=1.

Notes

  • The TileIR backend is installed side-by-side so the default environment keeps the upstream Triton that torch depends on; it is selected only at test time.
  • The existing (default) ops test pass is unchanged.

…) + TileIR CI pass

Add Triton implementations of rms_norm, rope, layer_norm_legacy and dropout
(wired into the dispatch + covered by test_op), and exercise them under the
Triton TileIR backend in CI:
- modeling/transformers/Dockerfile: build the TileIR Triton backend
  (github.com/triton-lang/Triton-to-tile-IR) from source side-by-side at
  /opt/nvtriton.
- .github/workflows/tilegym-ci.yml: add an ops test pass that runs the Triton
  backend tests with PYTHONPATH=/opt/nvtriton and ENABLE_TILE=1.
@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test ad2b382

@hannahli-nv hannahli-nv force-pushed the triton-kernels-nvt-ci branch 3 times, most recently from c2f76da to 4524c74 Compare June 24, 2026 05:18
…uilt wheel on CUDA 13.3 / Ubuntu 24.04

- ruff --select I, ruff format, and end-of-file-fixer on the new triton
  kernels and the selector helper
- Base image -> nvcr cuda 13.3.0-devel-ubuntu24.04 (GCC 13, Python 3.12)
- Install the prebuilt nvtriton (TileIR) cp312 wheel at /opt/nvtriton instead
  of building from source, which was timing out the CI build job
- Pin uv sync to Python 3.12 so the venv matches the wheel
- test-ops: raise job timeout to 90m, default ops step to 45m, and run the
  TileIR ops step with -n 12 (the added triton params grew the matrix and the
  job cap was too low for two test steps)
@hannahli-nv hannahli-nv force-pushed the triton-kernels-nvt-ci branch from 4524c74 to b053eab Compare June 24, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant