[Bug] There is a gap between the acceptance rates of training and inference

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

I trained a Dflash application using the latest training code for Qwen3.5-4B. The draft configuration for 4B is from the official documentation, while Qwen3.5-4B is a version I fine-tuned using internal data. The task was OCR. This training used 70,000 data points, achieving a 98% acceptance rate during training.

<img width="2972" height="822" alt="Image" src="https://github.com/user-attachments/assets/243ba344-1075-46ba-ba28-e673d1db3163" />

I used a version of VLLM that supports Dflash for inference, but the average acceptance rate was only 10%. I've already aligned the chat-template, and I haven't used `<think>` during either training or inference.

<img width="1504" height="132" alt="Image" src="https://github.com/user-attachments/assets/9ea744d3-2dc1-4411-880b-9fff60e426b2" />

Are there any gaps I might be overlooking?

### Reproduction
```
BUILD_DATASET_NUM_PROC=64
ATTENTION_BACKEND=${2:-flex_attention}
NUM_GPUS=4
# Use patched specforge (fixes circular import in original)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun \
    --standalone \
    --nproc_per_node $NUM_GPUS \
    $ROOT_DIR/scripts/train_dflash.py \
    --target-model-path qwen3.5_4B\
    --draft-config-path Qwen3.5-4B-Dflash/config.json \
    --train-data-path data_filtered.jsonl\
    --output-dir deflash_outputs/qwen3.5-4b-dflash-opc \
    --num-epochs 10 \
    --batch-size 2 \
    --learning-rate 6e-4 \
    --warmup-ratio 0.04 \
    --max-grad-norm 1.0 \
    --max-length 4096 \
    --chat-template qwen3.5-nothink \
    --attention-backend $ATTENTION_BACKEND \
    --num-anchors 512 \
    --loss-decay-gamma 7.0 \
    --log-interval 50 \
    --save-interval 10000 \
    --target-model-backend hf \
    --block-size 16 \
```
### Environment

specforge [latest]
vllm [0.19.1.rc.0] nightly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] There is a gap between the acceptance rates of training and inference #533

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] There is a gap between the acceptance rates of training and inference #533

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions