Skip to content

Fix slow decodes "poisoning" sleep times when using power throttling#464

Open
omnomburp wants to merge 1 commit into
antirez:mainfrom
omnomburp:fix-power-throttle-decode-warmup
Open

Fix slow decodes "poisoning" sleep times when using power throttling#464
omnomburp wants to merge 1 commit into
antirez:mainfrom
omnomburp:fix-power-throttle-decode-warmup

Conversation

@omnomburp

Copy link
Copy Markdown

When using --power <number>, decode could slow down dramatically after prefill
(~0.11 tok/s observed) even though unthrottled decode was normal.

The throttle used the first decode timing samples immediately to seed its EMA and
sleep duration. If those first samples were cold/outliers, the resulting long
sleeps could keep the GPU cold and reinforce the slow decode loop.

This change skips sleeping for the first few decode evals, uses the fastest
warmup sample to seed the decode average, and caps later outlier samples relative
to the current average.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant