Skip to content

sabreshao/RocmKernelWiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RocmKernelWiki — AMD CDNA3 / CDNA4 Kernel Optimization Knowledge Base

Knowledge cutoff: 2026-05-25. See data/refresh-cutoff.yaml.

A structured knowledge base of AMD GPU kernel optimization for MI300X (CDNA3, gfx942) and MI350 / MI355 (CDNA4, gfx950), packaged as a Claude Code skill. The repository root is the skill directory.

Architecture and conventions mirror KernelWiki (Blackwell/Hopper). Initial content distilled from Apex/toolsskills/, jsons/{hip_sheet,triton_sheet,rocm}.json, and mcps/.

Install as a Claude Code Skill

git clone <repo-url> ~/.claude/skills/RocmKernelWiki
pip install -r ~/.claude/skills/RocmKernelWiki/requirements.txt

Smoke test:

cd ~/.claude/skills/RocmKernelWiki
python3 scripts/query.py --tag mfma --type hardware --compact
python3 scripts/get_page.py hw-mfma-cdna3 --frontmatter-only

Optional override:

export ROCM_WIKI_ROOT=/path/to/RocmKernelWiki

What's Here

  • Hardware pages — MFMA (CDNA3 + CDNA4), SMFMAC + 4:2 sparsity, AccVGPR / dual register files, LDS + bank conflicts, GWS, buffer resource ops, async-copy direct-to-LDS, wave64, FP8/BF8/MX-FP4
  • Technique pages — coalesced/vectorized access, grid-stride loops, software pipelining, LDS double-buffering, swizzling, register budgeting, occupancy tuning, persistent + stream-K kernels, kernel/epilogue fusion, autotune knobs (num_stages, num_warps, matrix_instr_nonkdim, kpack, waves_per_eu)
  • Pattern pages — memory-bound / compute-bound / register-pressure / low-CU-utilization / lds-bank-conflict / lane-conflict diagnostics with candidate techniques
  • Kernel case studies — GEMM (HIP MFMA, Triton-AMD, CK), FlashAttention on MI300/MI350, fused MoE, paged attention, RMSNorm
  • Language guides — HIP C++, Triton ROCm backend, Composable Kernel, Gluon-AMD, FlyDSL
  • Migration guides — CUDA → HIP, Triton-NVIDIA → Triton-AMD, WGMMA → MFMA, TMA → buffer-load + async-copy, MI300 → MI350 (gfx942 → gfx950)
  • Source ledgers under candidates/triton-amd, vllm-rocm, sglang-rocm, aiter, magpie, flydsl
  • Auto-generated query indices under queries/

Query Tools

Tool Purpose
scripts/query.py Unified search across all pages (keywords + filters + alias-aware)
scripts/get_page.py Fetch any page by id or path; --follow-sources expands cited sources
scripts/grep_wiki.py Regex text search across wiki bodies and source pages
scripts/validate.py Schema/tag/source-id validator
scripts/generate-indices.py Rebuilds queries/*.md from frontmatter

Companion Docs

Architecture

Three layers (the Karpathy LLM Wiki pattern, same as KernelWiki):

  1. sources/ — raw data: PRs, vendor docs, blogs, contests.
  2. wiki/ — synthesized pages cross-referenced by id. All have YAML frontmatter.
  3. queries/ — auto-generated indices. Do not edit by hand.

Supporting files:

  • data/schemas.yaml — required/optional fields per page type
  • data/tags.yaml — controlled vocabulary
  • data/aliases.yaml — canonical → synonym mappings (e.g. gfx950MI350, mfmamatrix-fma)
  • data/refresh-cutoff.yaml / data/tool-versions.yaml — versioning

Scope Rules

  • CDNA3/CDNA4-first. gfx942 + gfx950 are primary. gfx90a (CDNA2) only with explicit cdna_relevance. RDNA out of scope.
  • Kernel-only. No RCCL / distributed-training / scheduler topics.
  • First-class DSLs. HIP, Triton-AMD, Composable Kernel, Gluon-AMD, FlyDSL.
  • English canonical.

License

Summaries and wiki syntheses are derivative works citing upstream PRs, docs, and the Apex/KernelWiki projects. Tooling (scripts/, data/, references/) is MIT-style.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages