Skip to content

[Bug] There is a gap between the acceptance rates of training and inference #533

Description

@xinanjiao

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I trained a Dflash application using the latest training code for Qwen3.5-4B. The draft configuration for 4B is from the official documentation, while Qwen3.5-4B is a version I fine-tuned using internal data. The task was OCR. This training used 70,000 data points, achieving a 98% acceptance rate during training.

Image

I used a version of VLLM that supports Dflash for inference, but the average acceptance rate was only 10%. I've already aligned the chat-template, and I haven't used <think> during either training or inference.

Image

Are there any gaps I might be overlooking?

Reproduction

BUILD_DATASET_NUM_PROC=64
ATTENTION_BACKEND=${2:-flex_attention}
NUM_GPUS=4
# Use patched specforge (fixes circular import in original)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun \
    --standalone \
    --nproc_per_node $NUM_GPUS \
    $ROOT_DIR/scripts/train_dflash.py \
    --target-model-path qwen3.5_4B\
    --draft-config-path Qwen3.5-4B-Dflash/config.json \
    --train-data-path data_filtered.jsonl\
    --output-dir deflash_outputs/qwen3.5-4b-dflash-opc \
    --num-epochs 10 \
    --batch-size 2 \
    --learning-rate 6e-4 \
    --warmup-ratio 0.04 \
    --max-grad-norm 1.0 \
    --max-length 4096 \
    --chat-template qwen3.5-nothink \
    --attention-backend $ATTENTION_BACKEND \
    --num-anchors 512 \
    --loss-decay-gamma 7.0 \
    --log-interval 50 \
    --save-interval 10000 \
    --target-model-backend hf \
    --block-size 16 \

Environment

specforge [latest]
vllm [0.19.1.rc.0] nightly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions