Skip to content

alibaba/atrex-kernel-agent

Repository files navigation

Atrex Kernel Agent (AKA)

AKA is an end-to-end Agent project for GPU kernel implementation, analysis, profiling, and iterative optimization. It helps an Agent turn PyTorch logic or an existing kernel into a high-performance GPU kernel through a structured, profile-driven workflow.

Atrex architecture

Atrex optimization loop

What It Does

  • Creates an isolated optimization workspace under /tmp/kernel_opt_<name>/.
  • Looks up target hardware specs from the local gpu-wiki knowledge base.
  • Runs Roofline analysis and sets auditable performance targets.
  • Implements a correct baseline kernel before entering optimization.
  • Runs the profile-driven optimization loop: profile with ncu or rocprofv3, extract bottleneck evidence, query gpu-wiki / reference projects / web sources for relevant optimization knowledge, write an evidence-based plan, apply one optimization category, validate correctness and performance, record memory, commit, then repeat until Stop Conditions are met.
  • Records plans, profile artifacts, structured memory, reports, and Git commits for every accepted iteration.

For the full architecture and workflow design, see docs/design.md.

Requirements

Installation requires:

  • bash
  • git
  • jq
  • Codex or Claude Code installed

Running optimization tasks also requires platform-specific profiling tools:

  • NVIDIA: ncu
  • AMD: rocprofv3, wrapped by tools/profile_kernel.sh

Installation

./install.sh

Common options:

./install.sh --hooks-only          # Install or update hooks only
./install.sh --without-github      # Skip GitHub reference repositories listed by gpu-wiki
./install.sh --max-iterations N    # Configure hook stop behavior after memory/vN.json exceeds N
./install.sh --uninstall           # Remove hooks installed by this script

The installer detects:

  • Codex: $CODEX_HOME or ~/.codex
  • Claude Code: $CLAUDE_HOME or ~/.claude

It also prepares the default local knowledge base at /tmp/gpu-wiki/ and optional reference projects at /tmp/reference-projects/.

After installation, restart Codex / Claude Code or open a new session so the hooks and Skills are loaded.

Quick Start

Ask the Agent to optimize a kernel with at least:

  • platform: target hardware platform, such as H20 or MI308X.
  • framework: target implementation framework, such as CuteDSL or FlyDSL.
  • kernel_demo: path to the initial PyTorch logic or kernel implementation file.

Example:

/gpu-kernel-optimizer Optimize /path/to/kernel_demo.py on MI308X with FlyDSL, dtype bf16, rel_err < 0.01.

The Agent will initialize a workspace, source hardware specs from gpu-wiki, write the workspace configuration, build a baseline, profile the kernel, and iterate until the configured Stop Conditions are met.

Main Files

.
├── SKILL.md                         # Top-level gpu-kernel-optimizer Skill router
├── install.sh                       # Installer / uninstaller
├── docs/                            # Detailed project design docs
├── reference/                       # Workspace, plan, memory, and profiling templates
├── skills/                          # Baseline, optimizer, restart, and output-contract Skills
├── tools/                           # Profiling, utilization, memory, and measurement tools
└── gpu-wiki/                        # Local GPU knowledge base

License

Licensed under the Apache License 2.0.

About

An end-to-end Agent project for GPU kernel implementation, analysis, profiling, and iterative optimization. It helps an Agent turn PyTorch logic or an existing kernel into a high-performance GPU kernel through a structured, profile-driven workflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors