Summary
This RFC defines the basic capability adaptation checklist for the Ascend mainline (ascend branch) in Vime. It tracks near-term, verifiable milestones—reproducible images, accuracy-gated training modes, and platform capabilities—that must be validated before scaling to MoE, multi-node, multimodal, and agentic workloads.
This checklist does not replace the broader Ascend roadmaps. It focuses on what must be hardened on top of the current protected mainline.
Related RFCs
Background
The ascend branch is now the protected Ascend mainline. Two foundational merges have landed:
| PR |
Description |
Status |
| #266 |
Dockerfile.npu + docker/npu_patch/ — reproducible NPU image build and colocate patches |
Merged |
| #267 |
Rebuild ascend on NPU implementation — training/rollout backend, weight transfer, NPU attention, scripts, unit tests |
Merged |
Implication: further changes land on a protected branch under review policy (see below). This issue tracks what must be validated and hardened on top of #266 / #267 before basic capabilities can be considered complete.
Relationship to upstream RFCs
| Upstream RFC |
Scope |
This checklist |
| #243 |
vime-ascend ecosystem roadmap (A2/A3/A5, TQ, R3, P2P, multimodal/MoE, agentic) |
Subset: infrastructure + basic acceptance gates |
| #51 |
NPU support matrix (images, weight transfer, parallelism, VLM, R3, agentic) |
Subset: items required for mainline basic capabilities |
Global Acceptance Gate
Every item below must run successfully and meet accuracy / alignment requirements:
| Gate |
Criteria |
| Logprob alignment |
logprob diff / KL within agreed thresholds; no spikes |
| Reproducibility |
fixed seed + fixed data subset → reward / loss reproducible |
| Checkpointing |
save → load → resume → metrics continuous |
| Weight sync |
after IPC (colocate) or HCCL (non-colocate) sync, rollout logprob matches training forward |
TODO List
Core Capabilities (Mainline Prerequisites)
| Status |
Item |
Acceptance Criteria |
Notes |
Owner |
| [ ] |
Reproducible image build; no runtime manual patching |
Fresh container imports full stack; colocate smoke (1 step) passes; diff not worse than baseline |
Built from #266; A2 / A3 |
@Meihan-chen, @miracle0517 |
| [ ] |
Qwen3-4B non-colocate (train / rollout split) |
1 full rollout + 1 training step; HCCL weight sync aligned; logprob diff acceptable |
#51 |
@Meihan-chen, @miracle0517 |
| [ ] |
Qwen3-4B colocate |
≥10 steps; weight sync stable (~seconds); logprob diff acceptable |
#243 A2 colocate |
@CalvinXKY, @Fulin-Gao |
Ascend Platform Capabilities
| Status |
Item |
Acceptance Criteria |
Notes |
Owner |
| [ ] |
Save / resume |
save → load → continue training; next-step loss continuous |
|
@yuxinshan |
| [ ] |
R3 (Rollout Routing Replay) |
MoE or designated model smoke; no numerical anomalies; reward curve + reference case |
#51; #243 |
wukaiyuan, @yuxinshan |
| [ ] |
Multi-node 2×8 |
≥1 step; loss consistent across ranks; TP / DP / EP / PP supported |
#243 |
@miracle0517 |
| [ ] |
Multimodal model support |
VLM train/infer path working; reward or accuracy within tolerance vs GPU |
#51 |
@Meihan-chen, @floatlibai |
| [ ] |
A5 hardware validation |
Equivalent smoke / regression on A5 |
#243 |
@yuxinshan |
| [ ] |
CI tests |
NPU smoke / regression in CI; failures block merge |
TBD |
yangzeyu |
| [ ] |
CI check adjustments |
Aligned with NPU stack, patches, image tags |
|
|
Merge policy (ascend protected branch)
- Large PRs: merge via review meeting
- Small PRs: ≥2 reviewers required
Model: Qwen3-30B-A3B (MoE)
| Status |
Item |
Acceptance Criteria |
Notes |
Owner |
| [ ] |
Qwen3-30B-A3B end-to-end |
1 rollout + 1 training step; MoE healthy; no NaN |
A3 first; A2 follow-up |
@floatlibai (A3), @yuxinshan (A2) |
TQ (Training Quality) — GPU / NPU Alignment
| Status |
Item |
Acceptance Criteria |
Notes |
Owner |
| [ ] |
GPU baseline |
Same config + data subset; metrics archived |
Reference for NPU |
@miracle0517 |
| [ ] |
NPU alignment |
Same seed vs GPU baseline; diff / reward within tolerance |
|
@miracle0517 |
Scale-up (after multi-node + multimodal are ready):
- Cluster size: >2 nodes
- Dataset: production-scale volume
- Deliverable: full training guide / reference recipe
| Status |
Item |
Acceptance Criteria |
Notes |
Owner |
| [ ] |
ReTool — tool-call RL (Qwen3-4B) |
Executable tool sandbox rollout; reward curve acceptable |
#51; examples/retool |
|
| [ ] |
τ-bench — tool-call agentic eval (Qwen3-4B) |
Agentic eval smoke passes; scoring matches GPU reference |
#51, #243; examples/tau-bench, #142 |
|
| [ ] |
Search-R1 — retrieval-augmented rollout (Qwen2.5-3B) |
Search server integrated; rollout completes; reward reasonable |
#51; examples/search-r1 |
|
| [ ] |
Search Agent |
Search-integrated agentic rollout on NPU |
#243 |
|
| [ ] |
Multi-Agent RL (Qwen3-30B-A3B) |
Reward weighting matches GPU reference |
#51; examples/multi_agent |
|
| [ ] |
Coding-Agent SWE |
Subset grading correct; trainable token segments match GPU |
#243 Code Agent; examples/coding_agent_rl |
|
| [ ] |
SWE-style on Modal |
Agentic SWE rollouts with Modal cloud sandbox |
#51 |
|
| [ ] |
RLinf |
Enterprise agentic scenario smoke on NPU |
#243 |
|
| [ ] |
Qwen3-VL Geo3K multi-turn |
Reward curve acceptable; tool parsing correct; multi-turn loss_mask aligned |
#51; examples/geo3k_vlm_multi_turn |
|
| [ ] |
Fully-async rollout |
Same seed as sync; reward / loss within tolerance |
|
|
Suggested Dependencies
#266 / #267 merged (protected ascend mainline)
→ Reproducible image validation (no runtime patching)
→ Qwen3-4B non-colocate / colocate
→ Accuracy gates (save-resume, weight sync)
→ Qwen3-30B-A3B MoE → Multi-node / R3 → Multimodal → Agentic → TQ at scale
One-Page Checklist (Appendix)
Summary
This RFC defines the basic capability adaptation checklist for the Ascend mainline (
ascendbranch) in Vime. It tracks near-term, verifiable milestones—reproducible images, accuracy-gated training modes, and platform capabilities—that must be validated before scaling to MoE, multi-node, multimodal, and agentic workloads.This checklist does not replace the broader Ascend roadmaps. It focuses on what must be hardened on top of the current protected mainline.
Related RFCs
Background
The
ascendbranch is now the protected Ascend mainline. Two foundational merges have landed:Dockerfile.npu+docker/npu_patch/— reproducible NPU image build and colocate patchesascendon NPU implementation — training/rollout backend, weight transfer, NPU attention, scripts, unit testsImplication: further changes land on a protected branch under review policy (see below). This issue tracks what must be validated and hardened on top of #266 / #267 before basic capabilities can be considered complete.
Relationship to upstream RFCs
Global Acceptance Gate
Every item below must run successfully and meet accuracy / alignment requirements:
TODO List
Core Capabilities (Mainline Prerequisites)
Ascend Platform Capabilities
Merge policy (
ascendprotected branch)Model: Qwen3-30B-A3B (MoE)
TQ (Training Quality) — GPU / NPU Alignment
Scale-up (after multi-node + multimodal are ready):
Agentic Workloads (@Fulin-Gao, @momo609)
examples/retoolexamples/tau-bench, #142examples/search-r1examples/multi_agentexamples/coding_agent_rlloss_maskalignedexamples/geo3k_vlm_multi_turnSuggested Dependencies
One-Page Checklist (Appendix)