DeepSeek released DeepSeek-V4-Flash-DSpark variant of the V4 Flash model with built-in speculative decoding and described the methodology in a companion paper. Given the focus on performance of running the model locally, I think it makes sense to explore how DSpark can be integrated into ds4.
It seems that DSpark provides considerable improvement in tok/s performance with fixed concurrency over previous generation of speculative decoding (MTP):

DeepSeek released DeepSeek-V4-Flash-DSpark variant of the V4 Flash model with built-in speculative decoding and described the methodology in a companion paper. Given the focus on performance of running the model locally, I think it makes sense to explore how DSpark can be integrated into ds4.
It seems that DSpark provides considerable improvement in tok/s performance with fixed concurrency over previous generation of speculative decoding (MTP):