Skip to content

EAGLE3.1 Support#568

Open
bluecoffee8 wants to merge 5 commits into
sgl-project:mainfrom
bluecoffee8:eagle3.1
Open

EAGLE3.1 Support#568
bluecoffee8 wants to merge 5 commits into
sgl-project:mainfrom
bluecoffee8:eagle3.1

Conversation

@bluecoffee8

@bluecoffee8 bluecoffee8 commented May 27, 2026

Copy link
Copy Markdown

Motivation

EAGLE3.1 support, based on https://github.com/lightseekorg/TorchSpec/pull/97 which was added to torchspec.

Validation: trained Qwen3-30B-A3B-Instruct-2507 draft model (EAGLE3 vs EAGLE3.1) on a single epoch of sharegpt dataset, based on https://github.com/sgl-project/SpecForge/blob/main/examples/run_qwen3_30b_a3b_eagle3_online.sh.

Modifications

Add eagle3.1 features (draft model output norm, fc norm on target hidden states).
Added example eagle3.1 config for qwen3-30b-a3b model, and script.

Related Issues

Accuracy Test

Benchmark & Profiling

Ran benchmarks for each.

Server launch (1 x H100):

SGLANG_ENABLE_SPEC_V2=1 && SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 && python -m sglang.launch_server --host 0.0.0.0 --port 8001 --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 --speculative-algorithm EAGLE3 --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --speculative-draft-model-path /path/to/draft_model

Client launch:

python3 -m sglang.bench_serving --backend sglang --host 0.0.0.0 --port 8001 --dataset-name random-ids --warmup-requests 80 --num-prompts 128 --random-input-len 1024 --random-output-len 256 --random-range-ratio 1.0 --request-rate 1.0 

Results:
EAGLE3

img_v3_02125_ff5f39c2-f740-4b5d-8083-a4ab77b98a6h

EAGLE3.1

img_v3_02125_873e5d0d-054c-46d2-a52b-d89d67dc73ch

Comparison:

Acc length 1.68 => 2.30, +36%
p50 e2e, 1550.76 => 823.57, -47%
p50 tpot, 5.75 => 2.94, -49%

Checklist

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@bluecoffee8 bluecoffee8 mentioned this pull request May 27, 2026
2 tasks
@bluecoffee8 bluecoffee8 marked this pull request as ready for review May 28, 2026 20:00
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Comment thread specforge/core/eagle3.py

# Step 5.4: get logits
logits = self.draft_model.compute_logits(hidden_states)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_logits already applies norm to the hidden_states. We should gate it if the hidden states is already normed.


def project_hidden_states(self, hidden_states: torch.Tensor) -> torch.Tensor:
# eagle 3 requires hidden states from 3 layers
assert hidden_states.size(-1) == self.config.hidden_size * 3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep assertion and assert to self.num_aux_hidden_states.

Comment thread specforge/core/eagle3.py Outdated
Comment on lines +252 to +254
# Apply output norm for EAGLE 3.1 post-norm architecture
if self.draft_model.norm_output:
hidden_states = self.draft_model.norm(hidden_states)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly,

I think it makes sense to calculate using hidden_states, hidden_states_for_logits = get_hidden_states(...) as a pair, where the right one is normed and left is not by default. The method can return both normed based on norm_output.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dogacel i just pushed a commit that should address your comments. could you plz take a look thanks!

@Dogacel Dogacel left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@xiangqianzsh

Copy link
Copy Markdown

@bluecoffee8 Hi, I have trained an eagle3.1 model using this patch. Which branch or commit id should I use when testing performance in sglang?

@bluecoffee8

Copy link
Copy Markdown
Author

@xiangqianzsh I think you should be able to use any, the draft model trained here should just run with EAGLE-3 spec decoding option in sglang.

@Dogacel could you confirm there should be no changes to the inference engine side of things required?

@Dogacel

Dogacel commented Jun 8, 2026

Copy link
Copy Markdown

@xiangqianzsh I think you should be able to use any, the draft model trained here should just run with EAGLE-3 spec decoding option in sglang.

@Dogacel could you confirm there should be no changes to the inference engine side of things required?

Yeah it should work out of the box.

@jiapingW

Copy link
Copy Markdown
Collaborator

Hi, thank you for implementing eagle3.1. I'd like to know why your bench is using random_ids. Comparing accept lengths with random data as input doesn't make much sense, does it? @bluecoffee8

@bluecoffee8

Copy link
Copy Markdown
Author

@jiapingW you can try with other benchmark method like sharegpt dataset.

@jiapingW

Copy link
Copy Markdown
Collaborator

@jiapingW you can try with other benchmark method like sharegpt dataset.

Can you opensource your eagle3.1 draft model? We can test it.

@bluecoffee8

Copy link
Copy Markdown
Author

@jiapingW I don't have option to open source the model, but let me try to benchmark it when I have the bandwidth. If anybody else has the GPU resource they could try to train and benchmark in the meantime as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants