fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend by noahgift · Pull Request #1003 · paiml/aprender

noahgift · 2026-04-22T11:41:47Z

Summary

Toyota Way file-size refactor (PMAT-689 / task #148) bundled with FALSIFY-CORPUS-004 pre-flight gate, QLoRA distillation contract (#137), and GPU training backend Phase 2 (#132).

Toyota Way (PMAT-689): split 5 files over 500-line cap via include!() pattern
- distill.rs 1984→468 (4-way split)
- extended_commands.rs →497 (4 sibling sub-enum files)
- dispatch_analysis.rs →453 (helpers + profiling)
- lib_dispatch_coverage.rs 773→158 (3 sibling test files)
- pull.rs →374 (+ pull_sharded.rs)
FALSIFY-CORPUS-004 pre-flight (chore(deps): Upgrade trueno to v0.9.0 #142/feat: Synthetic noise generation .apr model (WASM-first) #144/Integrate trueno-zram for GPU-accelerated model weight loading #145/feat: Add LZ4 compression support to .apr model format #146/chore(deps): Bump colored from 2.2.0 to 3.0.0 #147): pretraining-corpus-v1 v2.0.0 (INV-TRAIN-010/011), ShardBatchIter::count_tokens, BoxedShardIter + optional cycling, pre-flight module split, --allow-shard-cycle CLI flag, unit tests
QLoRA distillation (chore(deps): Bump bincode from 1.3.3 to 3.0.0 #137): contracts/entrenar/qlora-distillation-v1.yaml v1.1.0 PROPOSED
GPU training backend Phase 2 (feat(voice): Voice processing module - embeddings, style transfer, cloning, isolation #132): pretrain_real_cuda.rs CUDA dispatch wiring
MODEL-2 spec: ship-two-models-spec.md v2.24.0 + roadmap.yaml

Test plan

cargo fmt --all -- --check
cargo test -p apr-cli --features training --lib → 5307 passed, 12 ignored
cargo clippy -p apr-cli --features training --lib -- -D warnings
cargo clippy -p aprender-train --lib -- -D warnings
cargo test -p aprender-train --lib cpu_stepfn_exhaustion → 2 passed (PMAT-688 CPU peer)
CI green on push

🤖 Generated with Claude Code

Addresses 2026-04-22 outage where all 16 intel-clean-room runners went offline because / on intel hit 100% (3.5T/3.6T). Runner diag logs couldn't be written, so GitHub marked runners offline. Two layers of defence: - pre-job hook: aggressive target/ prune when disk >= 85% - nightly timer: prune target/ older than 7 days Scripts are runner-host-agnostic — install path and deployment recipe in scripts/runner-infra/README.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…or + FALSIFY-CORPUS-004 + QLoRA contract + GPU training backend Toyota Way (PMAT-689): split 5 files over 500-line cap via include!() pattern - distill.rs 1984→468 (4-way split: types/config_and_execute/train_and_write/text_generate) - extended_commands.rs →497 (4 sibling sub-enum files: forensics/lints/runs/training) - dispatch_analysis.rs →453 (+ dispatch_helpers.rs + dispatch_profiling.rs) - lib_dispatch_coverage.rs 773→158 (3 sibling test files: analysis/profiling/train) - pull.rs →374 (+ pull_sharded.rs) FALSIFY-CORPUS-004 pre-flight gate (#142/#144/#145/#146/#147): - contracts/pretraining-corpus-v1.yaml v2.0.0 (INV-TRAIN-010/011) - ShardBatchIter::count_tokens static counter - cycling_iter.rs: BoxedShardIter + optional cycling - pretrain_preflight.rs + pretrain_report.rs module split - --allow-shard-cycle CLI flag wired - pretrain_tests.rs unit tests covering epoch/budget/cycle paths QLoRA distillation contract (#137): - contracts/entrenar/qlora-distillation-v1.yaml v1.1.0 PROPOSED - distill/{preflight,driver,apr_writer}.rs wiring 5 INV-DISTILL invariants - 14 harness tests at PARTIAL_ALGORITHM_LEVEL GPU training backend Phase 2 (#132): - pretrain_real_cuda.rs CUDA dispatch wiring - evidence/gpu-training-backend/ Phase 2 scaffold MODEL-2 spec updates: - ship-two-models-spec.md v2.24.0 (INV-TRAIN-011 + corpus v2.0) - roadmap.yaml phase tracking for tasks #142/#144/#145/#146/#147 Verification: - cargo test -p apr-cli --features training --lib → 5307 passed - cargo fmt --all -- --check → clean - cargo clippy -p apr-cli --features training --lib -- -D warnings → clean - cargo clippy -p aprender-train --lib -- -D warnings → clean - All changed files ≤500 lines (pmat work complete invariant GREEN) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codeinputmachine · 2026-04-22T17:48:32Z

Hey @noahgift, Code Input detected this PR has a merge conflict. The conflicts of this PR can be resolved with a semantic merge driver. Code Input can do that automatically: https://codeinput.com/r/3rcKnuOwggR Let me know if you need more help with this conflict or how Code Input works.

noahgift · 2026-05-11T18:48:30Z

Triaged by autonomous sweep (2026-05-11): this PR is a 95-file / 128-commit bundle that pre-dates the current §17.5 cascade and the §50.4 polymorphic-preflight cascade now on main. Auto-arming was skipped because:

The blast radius (95 files across multiple crates) makes a clean cherry-pick onto current main impractical.
Several of the changes (FALSIFY-CORPUS-004, QLoRA, GPU training backend) are likely either superseded by or in conflict with PRs that landed during the §50.4 cascade (feat(apr-cli): wire apr pretrain --init <model.apr> — §49 step 4 #1471/contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 PROPOSED — §50.4 step 5a #1473/fix(aprender-train): qwen2_0_5b tie_word_embeddings true — §50.4 step 5b + DEFECT FIX #1474/feat(aprender-train): build_transformer_config polymorphic dispatch — §50.4 step 5c #1475/feat(apr-cli): polymorphic preflight_tokenizer_vocab_matches_target — §50.4 step 5d #1476/test(aprender-train): GQA-7:1 forward-pass smoke test — §50.4 step 5e #1478/feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1 #1479/feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2 #1481/contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 → v1.1.0 PARTIAL_ALGORITHM_LEVEL #1482/feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3 #1483/spec(ship-two-models): v2.96.0 → v2.97.0 — §52 cascade ALGORITHM-COMPLETE + 5f.4 wireup gap #1486/feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4 #1494) and the SHIP-007 §22 fix cascade (M91-M103).

Recommended next step: split this into the still-relevant subset (probably the GPU training backend changes that aren't already on main) as one or two focused PRs, and close this one as superseded. Leaving as-is for human review.

noahgift · 2026-05-13T07:13:15Z

Triaged after rebase attempt (2026-05-13). Rebase against current main produces 3 structural compile errors:

crate::models::llama_370m::Llama370MConfig — module removed during MODEL-2 architecture-coupling cleanup (§50 multi-PR cascade)
config::Normalization — removed from tokenizer::config exports
BPETokenizer::preprocess — method removed

The PR's 125-commit Toyota Way refactor pre-dates 22+ commits to main during this session including:

chore(format): squash 135 PARTIAL-discharge feat commits into one PR (one CI run instead of 135) #1637 (135-PR squash), chore(format): squash 20 more PARTIAL-discharge feat commits (batch 2) #1639 (20-PR), chore(format): squash 26 realizar PARTIAL-discharge feat commits (batch 4) #1643 (26-PR), chore(format): squash 21 contract(...) PARTIAL-discharge feat commits (batch 5) #1644 (21-PR) — adding crates/aprender-core/src/format/*.rs
§50.4 polymorphic preflight cascade (feat(apr-cli): wire apr pretrain --init <model.apr> — §49 step 4 #1471/contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 PROPOSED — §50.4 step 5a #1473/fix(aprender-train): qwen2_0_5b tie_word_embeddings true — §50.4 step 5b + DEFECT FIX #1474/feat(aprender-train): build_transformer_config polymorphic dispatch — §50.4 step 5c #1475/feat(apr-cli): polymorphic preflight_tokenizer_vocab_matches_target — §50.4 step 5d #1476/test(aprender-train): GQA-7:1 forward-pass smoke test — §50.4 step 5e #1478/feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1 #1479/feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2 #1481/contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 → v1.1.0 PARTIAL_ALGORITHM_LEVEL #1482/feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3 #1483/spec(ship-two-models): v2.96.0 → v2.97.0 — §52 cascade ALGORITHM-COMPLETE + 5f.4 wireup gap #1486/feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4 #1494)
§60 SHIP-007 §22 closure (fix(M-FFN-GGUF-5): SHIP-007 §22 H1 CONFIRMED — APR layer-3 matches GGUF apples-to-apples — bug was test methodology #1550)
§71 SHIP-005 LIVE-DISCHARGE (docs(spec): SHIP-TWO-001 §71 — SHIP-005 LIVE-DISCHARGED at 86.59% pass@1 #1642), §72 SHIP-001/003/004/009/010 (docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%) #1646), §73 SHIP-007 cascade (docs(spec): SHIP-TWO-001 §73 — SHIP-007 cascade reduced from 3 layers to 1 on re-measurement #1647)

The refactor scope and code touched (95 files including tokenizer/bpe.rs, models/, distill/) overlap heavily with this work. Re-authoring against current main is more tractable than rebase. Closing as superseded; please re-open with a fresh PR focused on the still-applicable subset.

noahgift and others added 2 commits April 22, 2026 07:49

noahgift closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend#1003

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend#1003
noahgift wants to merge 2 commits into
mainfrom
fix/task-148-toyota-way-500-line-bundle

noahgift commented Apr 22, 2026

Uh oh!

codeinputmachine commented Apr 22, 2026

Uh oh!

noahgift commented May 11, 2026

Uh oh!

noahgift commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

noahgift commented Apr 22, 2026

Summary

Test plan

Uh oh!

codeinputmachine commented Apr 22, 2026

Uh oh!

noahgift commented May 11, 2026

Uh oh!

noahgift commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants