[MPK] Support Speculative Decoding

### Description

This issue tracks MPK's explorations for supporting speculative decoding 
### Multi-head speculation
1. Single draft sequence
	1. Support EAGLE3 for Qwen model series #692 
	2. Refactor Multi-head pipeline for generality and performance
		1. Separate draft kvcache pool with unified page memory allocator
		2. Overlap draft generation and verification
		3. Reorganize KNGraph with additional dependency, e.g. h_norm / e_norm
2. Multiple draft candidates
	1. Tree attention kernel
	2. Attention planning
	3. Dynamic verification budgets





  



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPK] Support Speculative Decoding #684

Description

Multi-head speculation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[MPK] Support Speculative Decoding #684

Description

Description

Multi-head speculation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions