fix(scheduler): relax AdaptivePolicy cohort watermark instead of ratcheting monotonically by evanc7007 · Pull Request #437 · pie-project/pie

evanc7007 · 2026-06-19T22:11:44Z

Summary

AdaptivePolicy::fired_high_water only ever ratchets up and never decays. Once a transient
burst lifts it above the steady-state cohort size, every later fire whose batch lands below
that peak fails the rule-3 firing test (B >= fired_high_water) and is parked on the rule-4
last_latency watchdog before it goes out — adding a full ~one-fire-time stall to a large
fraction of fires for the rest of the engine's lifetime.

This PR makes fired_high_water track the recent cohort instead of an all-time peak: it
still ratchets up immediately on a meet-or-exceed fire, but relaxes by one (floored at the
observed size) on a below-watermark fire, so it follows the live cohort back down. Fixes #436

Why the current behavior is a bug

The watermark exists to avoid firing a half-empty batch when more peers are about to re-submit.
But because it never decays, any event that lifts it above the current steady state poisons
every subsequent sub-peak fire:

Concurrency decline (no speculation needed): as in-flight requests finish, the active
cohort falls from its peak (say 6) through 5, 4, 3… Each sub-peak cohort now fails rule 3 and
waits out the watchdog, even though no more peers are coming.
Pass-level speculation: speculative pre-fires transiently inflate a batch (e.g. 4 → 6–7),
lifting the watermark; the normal cohort of ~4–5 is then permanently "below peak" and stalls.

So this is a general scheduler-policy defect, not a speculation-specific one — speculation is
just one trigger. (It surfaced first under speculation because that re-triggers the condition
continuously; see the speculation throughput discussion in #434's companion thread.)

Reproduction & evidence

Workload: tau2-bench airline, --num-tasks 12 --max-concurrency 6 --seed 10,
max_tokens=16384, served via an OpenAI-compatible inferlet; 25-min budget per arm; engine
built PIE_PORTABLE_CUDA=1 PIE_PORTABLE_CUDA_ARCH=89, NVIDIA L40S, Qwen3-30B-A3B-GGUF (Q4_K_M).
Per-fire scheduler trace parsed into inter-fire gap by batch size; "stall %" = fires whose
preceding gap exceeds 2× the dominant (at-watermark) cohort's median.

arm	watermark	stall %	inter-fire p95
spec-off (no speculation)	monotonic (today)	37.5%	104 ms
spec-on	monotonic (today)	23.9–25.2%	117–127 ms
spec-on	relaxing (this PR)	0.2–0.6%	55–69 ms

Note the spec-off control already shows 37.5 % stalled fires purely from concurrency
decline — the bug reproduces with speculation entirely disabled. With the fix, the sub-peak
cohorts that previously sat at 100–125 ms (batch sizes 4–5 while the watermark was pinned at 6)
drop to ~40–65 ms, and the relaxing-watermark spec-on arms end up cleaner than the monotonic
spec-off control. Result reproduced in both verbose and quiet logging configurations.

The change

runtime/src/inference/adaptive_policy.rs only:

fn relax_high_water(current: usize, fired_size: usize) -> usize {
    if fired_size >= current { fired_size } else { current.saturating_sub(1) }
}
// on_fired: self.fired_high_water = relax_high_water(self.fired_high_water, fired_size);

Decay-by-one (rather than e.g. snapping straight to the live size, or a sliding-window max) was
chosen against the recorded fire-size sequences: a window-max doesn't relax when large batches
recur, while snapping to the size of a single fire is too eager; stepping down by one removed
~99.7 % of the watchdog stalls in replay and reacts within 1–2 fires, while ignoring a lone
anomalously small fire. (On the observed traces the cohort never fell by more than one between
consecutive fires, so a snap-to-size variant would have been indistinguishable here.)

Behavior preserved: cold-start (fired_high_water == 0), the structural cap, and the
last_latency watchdog are unchanged. A uniform cohort keeps the watermark exactly at the
cohort size (verified by test), so steady-state batching and the existing coalescing benefit
are untouched — only the never-decaying tail behavior changes.

Tests

cargo test -p pie -- adaptive relax_high_water: ratchet-up-on-exceed, relax-after-burst
(7 → settles at the live cohort of 5), steady-cohort stability (uniform load stays at the
cohort size, never inflates or under-fires), and the relax_high_water floor cases.

Summary by CodeRabbit

Improvements
- Adaptive policy watermark behavior enhanced to dynamically respond to workload changes with both upward and downward adjustments, providing improved flexibility in system adaptation.
Documentation
- Updated documentation reflecting the new adaptive policy watermark adjustment patterns.
Tests
- Expanded unit tests to validate watermark behavior under various workload scenarios and adjustment patterns.

…heting monotonically `fired_high_water` only ratcheted up and never decayed, so once a transient burst lifted it above the steady-state cohort — either concurrency peaking before in-flight requests finish, or pass-level speculation inflating a batch — every later fire whose batch landed below that peak failed the rule-3 firing test and was parked on the `last_latency` watchdog before going out. This added a ~one-fire-time stall to a large fraction of fires for the rest of the run. Relax the watermark by one toward the live cohort on a below-watermark fire so it tracks the current steady-state cohort instead of an all-time peak. Ratchet-up, cold-start, structural-cap, and watchdog behavior are unchanged, and a uniform cohort holds the watermark exactly at the cohort size (so steady-state batching and coalescing are untouched). Repro (tau2-bench airline, conc 6, 16384 max, 25-min budget): the monotonic watermark stalls 24-37% of fires (37.5% even with speculation OFF, from concurrency decline); relaxing it drops that to 0.2-0.6% and halves inter-fire p95 (104-127ms -> 55-69ms). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011YJWTtt36uwcfLcx7Ckubq

coderabbitai · 2026-06-19T22:11:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 464de69e-d8be-4547-b331-f3ac7c24fb4c

📥 Commits

Reviewing files that changed from the base of the PR and between 905498f and 12322c3.

📒 Files selected for processing (1)

runtime/src/inference/adaptive_policy.rs

Walkthrough

AdaptivePolicy's fired_high_water watermark is changed from strictly monotonic to a relaxable recent-cohort high-water. A new private relax_high_water helper ratchets up when fired_size >= current and steps the watermark down by one (saturating at 0) otherwise. on_fired now calls this helper, and the firing-rule comment, field documentation, and unit tests are updated accordingly.

Changes

AdaptivePolicy relaxable high-water watermark

Layer / File(s)	Summary
`relax_high_water` helper, docs, and `on_fired` call site `runtime/src/inference/adaptive_policy.rs`	Adds the private `relax_high_water` function implementing ratchet-up / step-down-by-one logic; expands `fired_high_water` field documentation to describe the non-monotonic recent high-water semantics; updates the firing-rule 3 inline comment; changes `on_fired` to call `relax_high_water` instead of the old upward-only conditional assignment.
Updated unit tests `runtime/src/inference/adaptive_policy.rs`	Removes the old monotonic-only watermark assertion; adds test cases for immediate ratcheting up, post-burst relaxation under sustained smaller cohorts, stability at a constant cohort size, and direct `relax_high_water` step-down behavior.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(scheduler): relax AdaptivePolicy cohort watermark instead of ratcheting monotonically' directly and specifically describes the main code change: replacing monotonic (upward-only) watermark behavior with a relaxation mechanism that can step down.
Linked Issues check	✅ Passed	The PR addresses the monotonic watermark defect identified in issue `#436` as a root cause of scheduler stalls. The changes implement the core fix to the AdaptivePolicy logic, though the full issue resolution (gating speculation on spare capacity) requires additional changes outside this PR's scope.
Out of Scope Changes check	✅ Passed	All changes are focused on the AdaptivePolicy watermark mechanism in adaptive_policy.rs: the relax_high_water helper, on_fired logic update, and corresponding test updates. No unrelated or extraneous changes are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ingim · 2026-06-19T22:19:07Z

Great work, @evanc7007 !

ingim merged commit c458a6b into pie-project:main Jun 19, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(scheduler): relax AdaptivePolicy cohort watermark instead of ratcheting monotonically#437

fix(scheduler): relax AdaptivePolicy cohort watermark instead of ratcheting monotonically#437
ingim merged 1 commit into
pie-project:mainfrom
evanc7007:fix/adaptive-highwater-decay

evanc7007 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

ingim commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

evanc7007 commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the current behavior is a bug

Reproduction & evidence

The change

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

ingim commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evanc7007 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading