DeepSeek-V4-Flash-DSpark

DeepSeek released [DeepSeek-V4-Flash-DSpark variant](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark) of the V4 Flash model with built-in speculative decoding and described the methodology in a [companion paper](https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf). Given the focus on performance of running the model locally, I think it makes sense to explore how DSpark can be integrated into ds4.

It seems that DSpark provides considerable improvement in tok/s performance with fixed concurrency over previous generation of speculative decoding (MTP):

<img width="801" height="317" alt="Image" src="https://github.com/user-attachments/assets/8fc963c0-a8ef-4ca4-812f-382378dc686d" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek-V4-Flash-DSpark #468

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DeepSeek-V4-Flash-DSpark #468

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions