[Router] Back-compat conformance corpus (golden policy → Decision tests)

## Goal
Mechanically guarantee the milestone's hard back-compat rule: **a future server must never break a policy authored against an earlier schema major.** Today that rule is held by discipline (the "never redefine a shipped field / never delete a major's schema" principles documented in `src/cpp/resources/schemas/README.md`). This issue makes it a CI gate via a **frozen conformance corpus** of versioned policies + their expected `Decision`s, replayed against every server build.

## Why
Schema validation proves an old policy still *parses*; it does **not** prove it still *routes the same way*. The deterministic tiers (keyword/regex/char rules, the `routing.router` desugaring, band/first-match logic) are pure lemonade engine logic and MUST stay behavior-stable across versions — exactly the thing a golden corpus locks down. (Model-backed scores carry inherent backend/model numerical wobble; see Scope for how the corpus handles that.)

## Scope
- **Corpus layout:** `test/conformance/routing/<schema_major>/<case>/` each holding `policy.json` (a versioned `collection.router` policy) + `cases.jsonl` (input → expected `Decision`). Seed v1 from the existing `test/cpp/fixtures/routing/` L0a–L3 examples.
- **Deterministic cases (exact):** L1 keyword/regex/char rules, `any/all/not`, first-match-wins, `default_model` fallback, and `routing.router` desugaring — assert the full `Decision` (`route_to`, `matched_rule`, `default_used`, `outputs`) byte-for-byte.
- **Model-backed cases (stubbed):** L2/L3/L0a run against a **pinned fake `ClassifierServices`** (fixed embeddings / scores / chat reply) so the assertion tests the *engine's* threshold + selection logic, not the backend's floats. The fake fixture is committed alongside the case. (Live-backend tolerance bands are explicitly out of scope here — that's a separate, non-gating perf/accuracy concern.)
- **Runner:** a CTest target (C++, reusing the foundation `RoutingPolicyEngine` + fake services) and/or a Python harness under `test/`, run in CI on every PR.
- **Append-only discipline:** when a new schema major ships, its corpus dir is added; **existing major dirs are immutable** — editing a frozen case is the CI signal that a change broke back-compat.

## Out of scope
- Live-backend numerical reproducibility / tolerance bands for model-backed scores.
- Component-resolution compatibility (an old policy naming a delisted model) — that's a model-registry concern, tracked separately.
- The migration shims themselves (per-major load-time upgraders) — this issue is the *test* that would guard them; the shim machinery lands when the second major is introduced.

## Acceptance
- A committed v1 corpus covering the deterministic paths + stubbed model-backed paths, seeded from the L0a–L3 fixtures.
- A CI-wired runner that replays every case and diffs the produced `Decision` against the expected one; any mismatch fails the build.
- A short doc note (in the schemas README) pointing to the corpus as the enforcement mechanism behind the "never redefine" rule.

## Dependencies
Needs the engine actually producing `Decision`s: the foundation interfaces (#2407), the evaluator/registry/conditions/classifiers, and the engine assembly (#2382). Best filled once `route()` is implemented; the corpus *fixtures* can be authored earlier. **Non-gating** for the rest of the milestone — it's the guard rail, not a build step on the critical path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Router] Back-compat conformance corpus (golden policy → Decision tests) #2425

Goal

Why

Scope

Out of scope

Acceptance

Dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Router] Back-compat conformance corpus (golden policy → Decision tests) #2425

Description

Goal

Why

Scope

Out of scope

Acceptance

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions