fix: include vocab sizes in EAGLE3 vocab mapping cache key by wiketool · Pull Request #602 · sgl-project/SpecForge

wiketool · 2026-06-25T11:45:05Z

Motivation

draft_model_config.draft_vocab_size and draft_model_config.vocab_size affect vocab mapping generation. But the current vocab mapping cache key only follows the dataset cache key. This can incorrectly reuse a stale mapping when dataset/tokenizer inputs are unchanged but vocab dimensions differ.

Modifications

Keep the existing processed dataset cache key unchanged.
Add a vocab-mapping-specific cache key by appending draft_vocab_size and vocab_size to the existing dataset cache params string.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist

Code Review

This pull request refactors the cache key generation in scripts/train_eagle3.py by separating the dataset cache key from the vocabulary mapping cache key. It introduces a new vocab_cache_key that incorporates the draft and target model vocabulary sizes, ensuring that vocabulary mapping caches are correctly invalidated when vocabulary sizes change. There are no review comments to evaluate, and the changes look correct.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Include vocab sizes in EAGLE3 vocab mapping cache key

2efb363

wiketool requested review from FlamingoPg, shuaills and sleepcoo as code owners June 25, 2026 11:45

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

wiketool changed the title ~~Include vocab sizes in EAGLE3 vocab mapping cache key~~ fix: include vocab sizes in EAGLE3 vocab mapping cache key Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: include vocab sizes in EAGLE3 vocab mapping cache key#602

fix: include vocab sizes in EAGLE3 vocab mapping cache key#602
wiketool wants to merge 1 commit into
sgl-project:mainfrom
wiketool:main

wiketool commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wiketool commented Jun 25, 2026

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant