Skip to content

[Feature] Add VP drafter training mode for DFlash#592

Open
catnanami wants to merge 7 commits into
sgl-project:mainfrom
catnanami:feature/vp-drafter-training
Open

[Feature] Add VP drafter training mode for DFlash#592
catnanami wants to merge 7 commits into
sgl-project:mainfrom
catnanami:feature/vp-drafter-training

Conversation

@catnanami

@catnanami catnanami commented Jun 24, 2026

Copy link
Copy Markdown

Motivation

This PR adds training support for the VP-Drafter used in D2SD (Dual Diffusion Draft Speculative Decoding). D2SD extends DFlash by using a first DFlash draft to estimate likely rejection boundaries, then training a second variable-prefix drafter to re-anchor at selected prefixes and generate alternative continuations.

The key training requirement is different from standard DFlash: the drafter must learn from variable-length visible prefixes instead of always seeing only the anchor token followed by masks. This PR implements that behavior as a DFlash training-mode branch, so the resulting model still uses the same DFlashDraftModel architecture and config format.

pipeline_contrast

References:

Modifications

  • Added a vp_drafter training mode to OnlineDFlashModel.
  • The vp_drafter mode samples a variable visible prefix length per block.
  • Prefix tokens are fed as real token embeddings; suffix positions remain masked and contribute to loss.
  • Added exponential loss decay from the first masked suffix position, matching the VP-Drafter training recipe.
  • Wired scripts/train_dflash.py to read dflash_config.training_mode / dflash_config.loss_type when --loss-type is not explicitly provided.
  • Added prefix_weight_base support for variable-prefix sampling.
  • Added a Qwen3-8B VP-Drafter config and example script:
    • configs/qwen3-8b-dta.json
    • examples/run_qwen3_8b_dta_online.sh
  • Added formula-level unit coverage for VP-Drafter loss masking and decay behavior.

Checklist

@catnanami catnanami changed the title Add VP drafter training mode for DFlash [Feature] Add VP drafter training mode for DFlash Jun 24, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the vp_drafter training objective (variable-prefix drafter) to the DFlash framework, allowing for training with variable visible prefixes. It adds a configuration file for Qwen3-8B, an online training script, and updates the core DFlash model and training script to support prefix length sampling, VP noise embedding generation, and corresponding loss calculations. Unit tests are also added to validate the new loss implementation. The review feedback highlights three key improvements: replacing torch.distributions.Categorical with torch.multinomial to prevent graph breaks under torch.compile, handling potential None values for prefix_weight_base to avoid a TypeError, and using .reshape() instead of .view() on noise_ids to safely handle non-contiguous tensors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread specforge/core/dflash.py
Comment thread specforge/core/dflash.py
Comment thread specforge/core/dflash.py Outdated
catnanami and others added 3 commits June 24, 2026 21:49
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@catnanami

Copy link
Copy Markdown
Author

Hi @claude @jiapingW @wenqf11, could you please review this PR?

@jiapingW

Copy link
Copy Markdown
Collaborator

Great work! We'll check it soon.

@jiapingW jiapingW self-requested a review June 24, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants