### Description This issue tracks MPK's explorations for supporting speculative decoding ### Multi-head speculation 1. Single draft sequence 1. Support EAGLE3 for Qwen model series #692 2. Refactor Multi-head pipeline for generality and performance 1. Separate draft kvcache pool with unified page memory allocator 2. Overlap draft generation and verification 3. Reorganize KNGraph with additional dependency, e.g. h_norm / e_norm 2. Multiple draft candidates 1. Tree attention kernel 2. Attention planning 3. Dynamic verification budgets
Description
This issue tracks MPK's explorations for supporting speculative decoding
Multi-head speculation