Skip to content

[Ascend][RFC] Mainline Basic Capability Adaptation Checklist #279

Description

@CalvinXKY

Summary

This RFC defines the basic capability adaptation checklist for the Ascend mainline (ascend branch) in Vime. It tracks near-term, verifiable milestones—reproducible images, accuracy-gated training modes, and platform capabilities—that must be validated before scaling to MoE, multi-node, multimodal, and agentic workloads.

This checklist does not replace the broader Ascend roadmaps. It focuses on what must be hardened on top of the current protected mainline.

Related RFCs

Background

The ascend branch is now the protected Ascend mainline. Two foundational merges have landed:

PR Description Status
#266 Dockerfile.npu + docker/npu_patch/ — reproducible NPU image build and colocate patches Merged
#267 Rebuild ascend on NPU implementation — training/rollout backend, weight transfer, NPU attention, scripts, unit tests Merged

Implication: further changes land on a protected branch under review policy (see below). This issue tracks what must be validated and hardened on top of #266 / #267 before basic capabilities can be considered complete.

Relationship to upstream RFCs

Upstream RFC Scope This checklist
#243 vime-ascend ecosystem roadmap (A2/A3/A5, TQ, R3, P2P, multimodal/MoE, agentic) Subset: infrastructure + basic acceptance gates
#51 NPU support matrix (images, weight transfer, parallelism, VLM, R3, agentic) Subset: items required for mainline basic capabilities

Global Acceptance Gate

Every item below must run successfully and meet accuracy / alignment requirements:

Gate Criteria
Logprob alignment logprob diff / KL within agreed thresholds; no spikes
Reproducibility fixed seed + fixed data subset → reward / loss reproducible
Checkpointing save → load → resume → metrics continuous
Weight sync after IPC (colocate) or HCCL (non-colocate) sync, rollout logprob matches training forward

TODO List

Core Capabilities (Mainline Prerequisites)

Status Item Acceptance Criteria Notes Owner
[ ] Reproducible image build; no runtime manual patching Fresh container imports full stack; colocate smoke (1 step) passes; diff not worse than baseline Built from #266; A2 / A3 @Meihan-chen, @miracle0517
[ ] Qwen3-4B non-colocate (train / rollout split) 1 full rollout + 1 training step; HCCL weight sync aligned; logprob diff acceptable #51 @Meihan-chen, @miracle0517
[ ] Qwen3-4B colocate ≥10 steps; weight sync stable (~seconds); logprob diff acceptable #243 A2 colocate @CalvinXKY, @Fulin-Gao

Ascend Platform Capabilities

Status Item Acceptance Criteria Notes Owner
[ ] Save / resume save → load → continue training; next-step loss continuous @yuxinshan
[ ] R3 (Rollout Routing Replay) MoE or designated model smoke; no numerical anomalies; reward curve + reference case #51; #243 wukaiyuan, @yuxinshan
[ ] Multi-node 2×8 ≥1 step; loss consistent across ranks; TP / DP / EP / PP supported #243 @miracle0517
[ ] Multimodal model support VLM train/infer path working; reward or accuracy within tolerance vs GPU #51 @Meihan-chen, @floatlibai
[ ] A5 hardware validation Equivalent smoke / regression on A5 #243 @yuxinshan
[ ] CI tests NPU smoke / regression in CI; failures block merge TBD yangzeyu
[ ] CI check adjustments Aligned with NPU stack, patches, image tags

Merge policy (ascend protected branch)

  • Large PRs: merge via review meeting
  • Small PRs: ≥2 reviewers required

Model: Qwen3-30B-A3B (MoE)

Status Item Acceptance Criteria Notes Owner
[ ] Qwen3-30B-A3B end-to-end 1 rollout + 1 training step; MoE healthy; no NaN A3 first; A2 follow-up @floatlibai (A3), @yuxinshan (A2)

TQ (Training Quality) — GPU / NPU Alignment

Status Item Acceptance Criteria Notes Owner
[ ] GPU baseline Same config + data subset; metrics archived Reference for NPU @miracle0517
[ ] NPU alignment Same seed vs GPU baseline; diff / reward within tolerance @miracle0517

Scale-up (after multi-node + multimodal are ready):

  • Cluster size: >2 nodes
  • Dataset: production-scale volume
  • Deliverable: full training guide / reference recipe

Agentic Workloads (@Fulin-Gao, @momo609)

Status Item Acceptance Criteria Notes Owner
[ ] ReTool — tool-call RL (Qwen3-4B) Executable tool sandbox rollout; reward curve acceptable #51; examples/retool
[ ] τ-bench — tool-call agentic eval (Qwen3-4B) Agentic eval smoke passes; scoring matches GPU reference #51, #243; examples/tau-bench, #142
[ ] Search-R1 — retrieval-augmented rollout (Qwen2.5-3B) Search server integrated; rollout completes; reward reasonable #51; examples/search-r1
[ ] Search Agent Search-integrated agentic rollout on NPU #243
[ ] Multi-Agent RL (Qwen3-30B-A3B) Reward weighting matches GPU reference #51; examples/multi_agent
[ ] Coding-Agent SWE Subset grading correct; trainable token segments match GPU #243 Code Agent; examples/coding_agent_rl
[ ] SWE-style on Modal Agentic SWE rollouts with Modal cloud sandbox #51
[ ] RLinf Enterprise agentic scenario smoke on NPU #243
[ ] Qwen3-VL Geo3K multi-turn Reward curve acceptable; tool parsing correct; multi-turn loss_mask aligned #51; examples/geo3k_vlm_multi_turn
[ ] Fully-async rollout Same seed as sync; reward / loss within tolerance

Suggested Dependencies

#266 / #267 merged (protected ascend mainline)
  → Reproducible image validation (no runtime patching)
  → Qwen3-4B non-colocate / colocate
  → Accuracy gates (save-resume, weight sync)
  → Qwen3-30B-A3B MoE → Multi-node / R3 → Multimodal → Agentic → TQ at scale

One-Page Checklist (Appendix)

  • Reproducible image build; no runtime manual patching
  • Qwen3-4B non-colocate: 1 cycle, HCCL weight sync, logprob diff acceptable
  • Qwen3-4B colocate: ≥10 steps, logprob diff acceptable
  • Qwen3-30B-A3B: 1 rollout + 1 training step, MoE healthy
  • Save / resume: loss continuous
  • R3: MoE or designated model smoke, no numerical anomalies
  • Multi-node 2×8: 1 step, consistent loss across ranks, TP/DP/EP/PP supported
  • Multimodal model support
  • A5 hardware validation
  • CI tests + CI check adjustments
  • TQ: GPU baseline + NPU alignment
  • ReTool (Qwen3-4B): tool-call RL smoke
  • τ-bench (Qwen3-4B): agentic eval smoke
  • Search-R1 (Qwen2.5-3B): retrieval-augmented rollout
  • Search Agent
  • Multi-Agent (Qwen3-30B-A3B): reward weighting matches GPU
  • Coding-Agent SWE: subset grading correct
  • SWE-style on Modal
  • RLinf
  • Qwen3-VL Geo3K multi-turn: reward curve acceptable
  • Fully-async: same seed as sync, within tolerance

Metadata

Metadata

Labels

RFCProposal requiring discussion & approval before implementation.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions