[Ascend][RFC] Mainline Basic Capability Adaptation Checklist

## Summary

This RFC defines the **basic capability adaptation checklist** for the **Ascend mainline** (`ascend` branch) in Vime. It tracks near-term, verifiable milestones—reproducible images, accuracy-gated training modes, and platform capabilities—that must be validated before scaling to MoE, multi-node, multimodal, and agentic workloads.

This checklist does **not** replace the broader Ascend roadmaps. It focuses on what must be hardened on top of the current protected mainline.

## Related RFCs

- [\[Ascend\]\[RFC\] vime-ascend Build and Roadmap #243](https://github.com/vllm-project/vime/issues/243)
- [\[RFC\] NPU support roadmap #51](https://github.com/vllm-project/vime/issues/51)

## Background

The `ascend` branch is now the **protected Ascend mainline**. Two foundational merges have landed:

| PR | Description | Status |
|----|-------------|--------|
| [#266](https://github.com/vllm-project/vime/pull/266) | `Dockerfile.npu` + `docker/npu_patch/` — reproducible NPU image build and colocate patches | **Merged** |
| [#267](https://github.com/vllm-project/vime/pull/267) | Rebuild `ascend` on NPU implementation — training/rollout backend, weight transfer, NPU attention, scripts, unit tests | **Merged** |

**Implication:** further changes land on a **protected branch** under review policy (see below). This issue tracks what must be **validated and hardened** on top of #266 / #267 before basic capabilities can be considered complete.

### Relationship to upstream RFCs

| Upstream RFC | Scope | This checklist |
|--------------|-------|----------------|
| #243 | vime-ascend ecosystem roadmap (A2/A3/A5, TQ, R3, P2P, multimodal/MoE, agentic) | Subset: infrastructure + basic acceptance gates |
| #51 | NPU support matrix (images, weight transfer, parallelism, VLM, R3, agentic) | Subset: items required for mainline basic capabilities |

## Global Acceptance Gate

Every item below must **run successfully** and meet **accuracy / alignment** requirements:

| Gate | Criteria |
|------|----------|
| Logprob alignment | logprob diff / KL within agreed thresholds; no spikes |
| Reproducibility | fixed seed + fixed data subset → reward / loss reproducible |
| Checkpointing | save → load → resume → metrics continuous |
| Weight sync | after IPC (colocate) or HCCL (non-colocate) sync, rollout logprob matches training forward |

## TODO List

### Core Capabilities (Mainline Prerequisites)

| Status | Item | Acceptance Criteria | Notes | Owner |
|:------:|------|---------------------|-------|-------|
| [ ] | Reproducible image build; no runtime manual patching | Fresh container imports full stack; colocate smoke (1 step) passes; diff not worse than baseline | Built from #266; A2 / A3 | @Meihan-chen, @miracle0517 |
| [ ] | Qwen3-4B non-colocate (train / rollout split) | 1 full rollout + 1 training step; HCCL weight sync aligned; logprob diff acceptable | #51 | @Meihan-chen, @miracle0517 |
| [ ] | Qwen3-4B colocate | ≥10 steps; weight sync stable (~seconds); logprob diff acceptable | #243 A2 colocate | @CalvinXKY, @Fulin-Gao |

### Ascend Platform Capabilities

| Status | Item | Acceptance Criteria | Notes | Owner |
|:------:|------|---------------------|-------|-------|
| [ ] | Save / resume | save → load → continue training; next-step loss continuous | | @yuxinshan |
| [ ] | R3 (Rollout Routing Replay) | MoE or designated model smoke; no numerical anomalies; reward curve + reference case | #51; #243 | wukaiyuan, @yuxinshan |
| [ ] | Multi-node 2×8 | ≥1 step; loss consistent across ranks; TP / DP / EP / PP supported | #243 | @miracle0517 |
| [ ] | Multimodal model support | VLM train/infer path working; reward or accuracy within tolerance vs GPU | #51 | @Meihan-chen, @floatlibai |
| [ ] | A5 hardware validation | Equivalent smoke / regression on A5 | #243 | @yuxinshan |
| [ ] | CI tests | NPU smoke / regression in CI; failures block merge | TBD | yangzeyu |
| [ ] | CI check adjustments | Aligned with NPU stack, patches, image tags | | |

**Merge policy (`ascend` protected branch)**

- Large PRs: merge via review meeting
- Small PRs: ≥2 reviewers required

### Model: Qwen3-30B-A3B (MoE)

| Status | Item | Acceptance Criteria | Notes | Owner |
|:------:|------|---------------------|-------|-------|
| [ ] | Qwen3-30B-A3B end-to-end | 1 rollout + 1 training step; MoE healthy; no NaN | A3 first; A2 follow-up | @floatlibai (A3), @yuxinshan (A2) |

### TQ (Training Quality) — GPU / NPU Alignment

| Status | Item | Acceptance Criteria | Notes | Owner |
|:------:|------|---------------------|-------|-------|
| [ ] | GPU baseline | Same config + data subset; metrics archived | Reference for NPU | @miracle0517 |
| [ ] | NPU alignment | Same seed vs GPU baseline; diff / reward within tolerance | | @miracle0517 |

**Scale-up** (after multi-node + multimodal are ready):

- Cluster size: >2 nodes
- Dataset: production-scale volume
- Deliverable: full training guide / reference recipe

### Agentic Workloads (@Fulin-Gao, @momo609)

| Status | Item | Acceptance Criteria | Notes | Owner |
|:------:|------|---------------------|-------|-------|
| [ ] | ReTool — tool-call RL (Qwen3-4B) | Executable tool sandbox rollout; reward curve acceptable | #51; `examples/retool` | |
| [ ] | τ-bench — tool-call agentic eval (Qwen3-4B) | Agentic eval smoke passes; scoring matches GPU reference | #51, #243; `examples/tau-bench`, #142 | |
| [ ] | Search-R1 — retrieval-augmented rollout (Qwen2.5-3B) | Search server integrated; rollout completes; reward reasonable | #51; `examples/search-r1` | |
| [ ] | Search Agent | Search-integrated agentic rollout on NPU | #243 | |
| [ ] | Multi-Agent RL (Qwen3-30B-A3B) | Reward weighting matches GPU reference | #51; `examples/multi_agent` | |
| [ ] | Coding-Agent SWE | Subset grading correct; trainable token segments match GPU | #243 Code Agent; `examples/coding_agent_rl` | |
| [ ] | SWE-style on Modal | Agentic SWE rollouts with Modal cloud sandbox | #51 | |
| [ ] | RLinf | Enterprise agentic scenario smoke on NPU | #243 | |
| [ ] | Qwen3-VL Geo3K multi-turn | Reward curve acceptable; tool parsing correct; multi-turn `loss_mask` aligned | #51; `examples/geo3k_vlm_multi_turn` | |
| [ ] | Fully-async rollout | Same seed as sync; reward / loss within tolerance | | |

## Suggested Dependencies

```text
#266 / #267 merged (protected ascend mainline)
  → Reproducible image validation (no runtime patching)
  → Qwen3-4B non-colocate / colocate
  → Accuracy gates (save-resume, weight sync)
  → Qwen3-30B-A3B MoE → Multi-node / R3 → Multimodal → Agentic → TQ at scale
```

## One-Page Checklist (Appendix)

- [ ] Reproducible image build; no runtime manual patching
- [ ] Qwen3-4B non-colocate: 1 cycle, HCCL weight sync, logprob diff acceptable
- [ ] Qwen3-4B colocate: ≥10 steps, logprob diff acceptable
- [ ] Qwen3-30B-A3B: 1 rollout + 1 training step, MoE healthy
- [ ] Save / resume: loss continuous
- [ ] R3: MoE or designated model smoke, no numerical anomalies
- [ ] Multi-node 2×8: 1 step, consistent loss across ranks, TP/DP/EP/PP supported
- [ ] Multimodal model support
- [ ] A5 hardware validation
- [ ] CI tests + CI check adjustments
- [ ] TQ: GPU baseline + NPU alignment
- [ ] ReTool (Qwen3-4B): tool-call RL smoke
- [ ] τ-bench (Qwen3-4B): agentic eval smoke
- [ ] Search-R1 (Qwen2.5-3B): retrieval-augmented rollout
- [ ] Search Agent
- [ ] Multi-Agent (Qwen3-30B-A3B): reward weighting matches GPU
- [ ] Coding-Agent SWE: subset grading correct
- [ ] SWE-style on Modal
- [ ] RLinf
- [ ] Qwen3-VL Geo3K multi-turn: reward curve acceptable
- [ ] Fully-async: same seed as sync, within tolerance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ascend][RFC] Mainline Basic Capability Adaptation Checklist #279

Summary

Related RFCs

Background

Relationship to upstream RFCs

Global Acceptance Gate

TODO List

Core Capabilities (Mainline Prerequisites)

Ascend Platform Capabilities

Model: Qwen3-30B-A3B (MoE)

TQ (Training Quality) — GPU / NPU Alignment

Agentic Workloads (@Fulin-Gao, @momo609)

Suggested Dependencies

One-Page Checklist (Appendix)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PR	Description	Status
#266	`Dockerfile.npu` + `docker/npu_patch/` — reproducible NPU image build and colocate patches	Merged
#267	Rebuild `ascend` on NPU implementation — training/rollout backend, weight transfer, NPU attention, scripts, unit tests	Merged

Upstream RFC	Scope	This checklist
#243	vime-ascend ecosystem roadmap (A2/A3/A5, TQ, R3, P2P, multimodal/MoE, agentic)	Subset: infrastructure + basic acceptance gates
#51	NPU support matrix (images, weight transfer, parallelism, VLM, R3, agentic)	Subset: items required for mainline basic capabilities

Gate	Criteria
Logprob alignment	logprob diff / KL within agreed thresholds; no spikes
Reproducibility	fixed seed + fixed data subset → reward / loss reproducible
Checkpointing	save → load → resume → metrics continuous
Weight sync	after IPC (colocate) or HCCL (non-colocate) sync, rollout logprob matches training forward

Status	Item	Acceptance Criteria	Notes	Owner
[ ]	Reproducible image build; no runtime manual patching	Fresh container imports full stack; colocate smoke (1 step) passes; diff not worse than baseline	Built from #266; A2 / A3	@Meihan-chen, @miracle0517
[ ]	Qwen3-4B non-colocate (train / rollout split)	1 full rollout + 1 training step; HCCL weight sync aligned; logprob diff acceptable	#51	@Meihan-chen, @miracle0517
[ ]	Qwen3-4B colocate	≥10 steps; weight sync stable (~seconds); logprob diff acceptable	#243 A2 colocate	@CalvinXKY, @Fulin-Gao

Status	Item	Acceptance Criteria	Notes	Owner
[ ]	Save / resume	save → load → continue training; next-step loss continuous		@yuxinshan
[ ]	R3 (Rollout Routing Replay)	MoE or designated model smoke; no numerical anomalies; reward curve + reference case	#51; #243	wukaiyuan, @yuxinshan
[ ]	Multi-node 2×8	≥1 step; loss consistent across ranks; TP / DP / EP / PP supported	#243	@miracle0517
[ ]	Multimodal model support	VLM train/infer path working; reward or accuracy within tolerance vs GPU	#51	@Meihan-chen, @floatlibai
[ ]	A5 hardware validation	Equivalent smoke / regression on A5	#243	@yuxinshan
[ ]	CI tests	NPU smoke / regression in CI; failures block merge	TBD	yangzeyu
[ ]	CI check adjustments	Aligned with NPU stack, patches, image tags

Status	Item	Acceptance Criteria	Notes	Owner
[ ]	GPU baseline	Same config + data subset; metrics archived	Reference for NPU	@miracle0517
[ ]	NPU alignment	Same seed vs GPU baseline; diff / reward within tolerance		@miracle0517

Status	Item	Acceptance Criteria	Notes
[ ]	ReTool — tool-call RL (Qwen3-4B)	Executable tool sandbox rollout; reward curve acceptable	#51; `examples/retool`
[ ]	τ-bench — tool-call agentic eval (Qwen3-4B)	Agentic eval smoke passes; scoring matches GPU reference	#51, #243; `examples/tau-bench`, #142
[ ]	Search-R1 — retrieval-augmented rollout (Qwen2.5-3B)	Search server integrated; rollout completes; reward reasonable	#51; `examples/search-r1`
[ ]	Search Agent	Search-integrated agentic rollout on NPU	#243
[ ]	Multi-Agent RL (Qwen3-30B-A3B)	Reward weighting matches GPU reference	#51; `examples/multi_agent`
[ ]	Coding-Agent SWE	Subset grading correct; trainable token segments match GPU	#243 Code Agent; `examples/coding_agent_rl`
[ ]	SWE-style on Modal	Agentic SWE rollouts with Modal cloud sandbox	#51
[ ]	RLinf	Enterprise agentic scenario smoke on NPU	#243
[ ]	Qwen3-VL Geo3K multi-turn	Reward curve acceptable; tool parsing correct; multi-turn `loss_mask` aligned	#51; `examples/geo3k_vlm_multi_turn`
[ ]	Fully-async rollout	Same seed as sync; reward / loss within tolerance

Uh oh!

[Ascend][RFC] Mainline Basic Capability Adaptation Checklist #279

Description

Summary

Related RFCs

Background

Relationship to upstream RFCs

Global Acceptance Gate

TODO List

Core Capabilities (Mainline Prerequisites)

Ascend Platform Capabilities

Model: Qwen3-30B-A3B (MoE)

TQ (Training Quality) — GPU / NPU Alignment

Agentic Workloads (@Fulin-Gao, @momo609)

Suggested Dependencies

One-Page Checklist (Appendix)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions