EAGLE3.1 Support by bluecoffee8 · Pull Request #568 · sgl-project/SpecForge

bluecoffee8 · 2026-05-27T22:59:16Z

Motivation

EAGLE3.1 support, based on https://github.com/lightseekorg/TorchSpec/pull/97 which was added to torchspec.

Validation: trained Qwen3-30B-A3B-Instruct-2507 draft model (EAGLE3 vs EAGLE3.1) on a single epoch of sharegpt dataset, based on https://github.com/sgl-project/SpecForge/blob/main/examples/run_qwen3_30b_a3b_eagle3_online.sh.

Modifications

Add eagle3.1 features (draft model output norm, fc norm on target hidden states).
Added example eagle3.1 config for qwen3-30b-a3b model, and script.

Related Issues

Accuracy Test

Benchmark & Profiling

Ran benchmarks for each.

Server launch (1 x H100):

SGLANG_ENABLE_SPEC_V2=1 && SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 && python -m sglang.launch_server --host 0.0.0.0 --port 8001 --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 --speculative-algorithm EAGLE3 --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --speculative-draft-model-path /path/to/draft_model

Client launch:

python3 -m sglang.bench_serving --backend sglang --host 0.0.0.0 --port 8001 --dataset-name random-ids --warmup-requests 80 --num-prompts 128 --random-input-len 1024 --random-output-len 256 --random-range-ratio 1.0 --request-rate 1.0

Results:
EAGLE3

img_v3_02125_ff5f39c2-f740-4b5d-8083-a4ab77b98a6h

EAGLE3.1

img_v3_02125_873e5d0d-054c-46d2-a52b-d89d67dc73ch

Comparison:

Acc length 1.68 => 2.30, +36%
p50 e2e, 1550.76 => 823.57, -47%
p50 tpot, 5.75 => 2.94, -49%

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist · 2026-05-27T22:59:20Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-05-28T20:00:54Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Dogacel · 2026-05-29T03:11:25Z

+
            # Step 5.4: get logits
            logits = self.draft_model.compute_logits(hidden_states)



compute_logits already applies norm to the hidden_states. We should gate it if the hidden states is already normed.

Dogacel · 2026-05-29T03:12:34Z


    def project_hidden_states(self, hidden_states: torch.Tensor) -> torch.Tensor:
-        # eagle 3 requires hidden states from 3 layers
-        assert hidden_states.size(-1) == self.config.hidden_size * 3


Let's keep assertion and assert to self.num_aux_hidden_states.

Dogacel · 2026-05-29T03:13:49Z

+            # Apply output norm for EAGLE 3.1 post-norm architecture
+            if self.draft_model.norm_output:
+                hidden_states = self.draft_model.norm(hidden_states)


Similarly,

I think it makes sense to calculate using hidden_states, hidden_states_for_logits = get_hidden_states(...) as a pair, where the right one is normed and left is not by default. The method can return both normed based on norm_output.

@Dogacel i just pushed a commit that should address your comments. could you plz take a look thanks!

Dogacel

LGTM 👍

xiangqianzsh · 2026-06-08T09:09:33Z

@bluecoffee8 Hi, I have trained an eagle3.1 model using this patch. Which branch or commit id should I use when testing performance in sglang?

bluecoffee8 · 2026-06-08T17:22:57Z

@xiangqianzsh I think you should be able to use any, the draft model trained here should just run with EAGLE-3 spec decoding option in sglang.

@Dogacel could you confirm there should be no changes to the inference engine side of things required?

Dogacel · 2026-06-08T17:23:48Z

@xiangqianzsh I think you should be able to use any, the draft model trained here should just run with EAGLE-3 spec decoding option in sglang.

@Dogacel could you confirm there should be no changes to the inference engine side of things required?

Yeah it should work out of the box.

jiapingW · 2026-06-11T04:46:45Z

Hi, thank you for implementing eagle3.1. I'd like to know why your bench is using random_ids. Comparing accept lengths with random data as input doesn't make much sense, does it? @bluecoffee8

bluecoffee8 · 2026-06-11T04:48:19Z

@jiapingW you can try with other benchmark method like sharegpt dataset.

jiapingW · 2026-06-11T05:06:01Z

@jiapingW you can try with other benchmark method like sharegpt dataset.

Can you opensource your eagle3.1 draft model? We can test it.

bluecoffee8 · 2026-06-11T05:08:03Z

@jiapingW I don't have option to open source the model, but let me try to benchmark it when I have the bandwidth. If anybody else has the GPU resource they could try to train and benchmark in the meantime as well.

eagle3.1 initial code

c74d815

bluecoffee8 mentioned this pull request May 27, 2026

[Feature] EAGLE3.1 support #567

Open

2 tasks

add eagle3.1 example

581664a

bluecoffee8 marked this pull request as ready for review May 28, 2026 20:00

bluecoffee8 requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners May 28, 2026 20:00

fix linter

fd31088

Dogacel reviewed May 29, 2026

View reviewed changes

bluecoffee8 added 2 commits May 28, 2026 20:23

address comments

2508b38

fix

3e62972

Dogacel approved these changes May 29, 2026

View reviewed changes


		# Step 5.4: get logits
		logits = self.draft_model.compute_logits(hidden_states)

Uh oh!

Conversation

bluecoffee8 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented May 27, 2026

Uh oh!

gemini-code-assist Bot commented May 28, 2026

Uh oh!

Dogacel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Dogacel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Dogacel May 29, 2026

Choose a reason for hiding this comment

Uh oh!

bluecoffee8 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Dogacel left a comment

Choose a reason for hiding this comment

Uh oh!

xiangqianzsh commented Jun 8, 2026

Uh oh!

bluecoffee8 commented Jun 8, 2026

Uh oh!

Dogacel commented Jun 8, 2026

Uh oh!

jiapingW commented Jun 11, 2026

Uh oh!

bluecoffee8 commented Jun 11, 2026

Uh oh!

jiapingW commented Jun 11, 2026

Uh oh!

bluecoffee8 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bluecoffee8 commented May 27, 2026 •

edited

Loading