feat(llm): multi-model routing with availability fallback#217
Open
qiankunli wants to merge 4 commits into
Open
feat(llm): multi-model routing with availability fallback#217qiankunli wants to merge 4 commits into
qiankunli wants to merge 4 commits into
Conversation
Add an ordered model pool so a review falls over to another provider/model
when the primary is rate-limited, down, or timing out — instead of failing
the file.
- config: new `routing` namespace — `routing.models` ([{provider, model}],
priority order, reusing the existing `providers` map for credentials) and
`routing.policy` (only "priority" today; reserved for future policies, an
unknown value is rejected rather than silently ignored). Namespacing under
`routing` keeps it distinct from providers.<name>.models (a provider's model
catalog) and gives future routing knobs a home.
- LLMRouter implements LLMClient: tries members in order, advances on
availability errors (429/5xx/network), short-circuits on client-side errors
(400/413/422) and context cancellation. A per-run shared cooldown parks a
throttled model so concurrent per-file subtasks skip it.
- router members use a low SDK retry budget so a rate-limited model fails fast
to the next instead of burning the full backoff (MaxRetries now configurable;
default 5 preserved).
- docs: README.md / README.zh-CN.md config reference + Multi-model fallback.
No `routing.models` keeps the current single-model behavior; `--model` pins a
single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown,
error classification, config chain resolution, and policy validation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
- resolveModelRef: clear sub.Model so a top-level `model` cannot leak into a routing entry that omits its own model (model now comes only from ref.Model or the provider default). - LLMRouter: when a call fails, stop and return ctx.Err() if the shared context is canceled or past its deadline — every member uses that ctx, so none can succeed; avoids wasted fallover attempts and misleading logs. A per-request timeout (ctx still live) still falls over. - order(): delete expired cooldown entries so the map stays bounded. - ResolvedEndpoint.MaxRetries: clarify it is internal/router-set, not read from config. Adds a router test for the context-done short-circuit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
A web edit (5bbe6e9) accidentally pasted the for/if/if header twice in LLMRouter.order(), leaving unbalanced braces that broke the build. Remove the duplicate; the intended if/else cooldown handling is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file.
routingnamespace —routing.models([{provider, model}], priority order, reusing the existingprovidersmap for credentials) androuting.policy(only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing underroutingkeeps it distinct from providers..models (a provider's model catalog) and gives future routing knobs a home.No
routing.modelskeeps the current single-model behavior;--modelpins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation.