You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make SLOs a first-class, enforced concern, not just a dashboard.
Declare SLO targets per InferenceService (e.g. ttft_p99_ms, error budget, min_throughput_tok_s). Use Pyrra (feat(observability): per-InferenceService SLO declaration via Pyrra integration #415) to generate the Prometheus recording + multi-window burn-rate alerting rules from the declared SLO. The operator surfaces an evaluated SLOBreached condition on InferenceService status.
ModelRouter circuit breaker consumes SLOBreached: a breached backend is treated as degraded and traffic redistributes. Today the breaker is HTTP-health only, so a slow-but-alive backend keeps getting traffic.
Foreman scheduler defers dispatching new AgenticTasks to a fleet/InferenceService that is SLO-breached (load-shed rather than pile on a degraded backend).
Why
This is the production-readiness piece for the B200 + edge deployment: SLO breach causes automatic traffic/work redistribution without a human reading Grafana. It turns SLOs from observation into enforcement.
What
Make SLOs a first-class, enforced concern, not just a dashboard.
ttft_p99_ms, error budget,min_throughput_tok_s). Use Pyrra (feat(observability): per-InferenceService SLO declaration via Pyrra integration #415) to generate the Prometheus recording + multi-window burn-rate alerting rules from the declared SLO. The operator surfaces an evaluatedSLOBreachedcondition on InferenceService status.SLOBreached: a breached backend is treated as degraded and traffic redistributes. Today the breaker is HTTP-health only, so a slow-but-alive backend keeps getting traffic.Why
This is the production-readiness piece for the B200 + edge deployment: SLO breach causes automatic traffic/work redistribution without a human reading Grafana. It turns SLOs from observation into enforcement.
Approach / dependencies
Definition of done
SLO targets declarable per InferenceService;
SLOBreachedcondition surfaced from Pyrra-evaluated state; ModelRouter redistributes off breached backends; Foreman scheduler defers dispatch on breach.