Skip to content

feat(format): gpu-weight-residency-v1 5-gate PARTIAL discharge#1377

Closed
noahgift wants to merge 2 commits into
mainfrom
feat/gwr-001-005-partial-discharge
Closed

feat(format): gpu-weight-residency-v1 5-gate PARTIAL discharge#1377
noahgift wants to merge 2 commits into
mainfrom
feat/gwr-001-005-partial-discharge

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 2, 2026

Summary

  • Binds FALSIFY-GWR-001..005 from gpu-weight-residency-v1 at PARTIAL_ALGORITHM_LEVEL via 5 verdict functions.
  • 29 unit tests including a 7-bucket sweep over GWR-001's tolerance band.
  • Algorithm-level coverage advances by 5 gates; runtime ship % unchanged.

Gates bound

Gate ID Rule
GWR-001 nvidia-smi residency within [model_bytes, model_bytes × 1.10]
GWR-002 observed ≥ 180 tok/s (RTX 4090, Qwen2.5-1.5B Q4K)
GWR-003 zero cudaMemcpyHtoD per inference in steady state
GWR-004 GPU output == CPU output (token-id parity)
GWR-005 Grace Blackwell unified-memory uses CU_MEM_ATTACH_GLOBAL

Pinned constants

  • AC_GWR_MIN_TPS_RTX4090 = 180.0
  • AC_GWR_RESIDENCY_TOLERANCE_PCT = 10.0
  • AC_GWR_MAX_HTOD_PER_INFERENCE = 0
  • AC_GWR_GRACE_ALLOC_FLAG = "CU_MEM_ATTACH_GLOBAL" (case-sensitive)

Five Whys

See commit message — captures why ±10% for residency, hard 0 for HtoD, and case-sensitive Grace flag.

Test plan

  • cargo test -p aprender-core --lib gwr_001_005 — 29 passed
  • PMAT pre-commit gates green
  • CI green

🤖 Generated with Claude Code

Binds FALSIFY-GWR-001..005 from gpu-weight-residency-v1 at
PARTIAL_ALGORITHM_LEVEL via 5 verdict functions.

- GWR-001: nvidia-smi shows model_bytes resident at startup (±10%)
- GWR-002: ≥ 180 tok/s on Qwen2.5-1.5B Q4K (RTX 4090)
- GWR-003: zero `cudaMemcpyHtoD` per inference in steady state
- GWR-004: GPU-vs-CPU output token-id parity (slice equality)
- GWR-005: Grace Blackwell uses CU_MEM_ATTACH_GLOBAL flag

## Five Whys

1. Why does gpu-weight-residency-v1 list 5 falsification IDs without
   algorithm-level discharge? PMAT lints flagged FALSIFY-GWR-001..005
   as unbound at PARTIAL_ALGORITHM_LEVEL.
2. Why does that block ship? Coverage % cannot move while peripheral
   GPU-residency contracts have no algorithm-level verdict module.
3. Why ±10% tolerance for GWR-001 vs strict equality? CUDA contexts
   add ~50-200MB overhead; nvidia-smi reports total VRAM use, not
   just the model footprint. Strict equality would Fail every healthy
   server. AC_GWR_RESIDENCY_TOLERANCE_PCT=10 absorbs context+kernel
   overhead but still flags lazy-load (observed = 0) and double-load
   (observed = 2× model_bytes).
4. Why a hard 0 floor for GWR-003 not "≤ small N"? The contract is
   binary by design — even one HtoD per inference indicates the
   weights aren't truly resident, and the cost compounds at 100s of
   layers × 1000s of tokens. Sub-threshold tolerance would mask the
   regression class.
5. Why case-sensitive flag-name match for GWR-005? CUDA `#define`
   strings are case-significant; downstream tooling that grep-checks
   "ATTACH_GLOBAL" must see exactly that. Case-insensitive matching
   would let `cu_mem_attach_global` (typo) pass even though the
   underlying allocation flag would be a different macro on Grace.

Adds 29 unit tests including a 5-pct sweep on GWR-001's tolerance
band and a 5-bucket TPS sweep on GWR-002. Realistic-healthy walks
RTX 4090 / 440 tok/s; pre-fix walks 5 simultaneous regressions
(zero residency, 50 tok/s PCIe-bound baseline, 64 HtoD, sampler
drift, lazy Grace flag).

No runtime % shift; algorithm-level coverage advances by 5 gates.
@noahgift noahgift force-pushed the feat/gwr-001-005-partial-discharge branch from dbbc450 to 6b90819 Compare May 11, 2026 15:41
@noahgift noahgift enabled auto-merge (squash) May 11, 2026 15:41
@noahgift
Copy link
Copy Markdown
Contributor Author

Superseded by #1637 (135-PR squash). The commit content is included verbatim in that PR's diff. Closing now to release runner slots; this PR would have auto-closed when #1637 merges.

@noahgift noahgift closed this May 12, 2026
auto-merge was automatically disabled May 12, 2026 09:21

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant