Skip to content

[FEATURE] SLO contracts on InferenceService + SLO-aware routing and scheduling #629

@Defilan

Description

@Defilan

What

Make SLOs a first-class, enforced concern, not just a dashboard.

  • Declare SLO targets per InferenceService (e.g. ttft_p99_ms, error budget, min_throughput_tok_s). Use Pyrra (feat(observability): per-InferenceService SLO declaration via Pyrra integration #415) to generate the Prometheus recording + multi-window burn-rate alerting rules from the declared SLO. The operator surfaces an evaluated SLOBreached condition on InferenceService status.
  • ModelRouter circuit breaker consumes SLOBreached: a breached backend is treated as degraded and traffic redistributes. Today the breaker is HTTP-health only, so a slow-but-alive backend keeps getting traffic.
  • Foreman scheduler defers dispatching new AgenticTasks to a fleet/InferenceService that is SLO-breached (load-shed rather than pile on a degraded backend).

Why

This is the production-readiness piece for the B200 + edge deployment: SLO breach causes automatic traffic/work redistribution without a human reading Grafana. It turns SLOs from observation into enforcement.

Approach / dependencies

Definition of done

SLO targets declarable per InferenceService; SLOBreached condition surfaced from Pyrra-evaluated state; ModelRouter redistributes off breached backends; Foreman scheduler defers dispatch on breach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/observabilityMonitoring, metrics, logging, tracingarea/routingMulti-backend routing, model router CRD, policy-aware dispatchcomponent/controllerRelated to the operator controllerenhancementNew feature or requestkind/featureNew feature or requestpriority/highHigh priority

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions