Skip to content

feat(llm): multi-model routing with availability fallback#217

Open
qiankunli wants to merge 4 commits into
alibaba:mainfrom
qiankunli:feat/llm-multi-model
Open

feat(llm): multi-model routing with availability fallback#217
qiankunli wants to merge 4 commits into
alibaba:mainfrom
qiankunli:feat/llm-multi-model

Conversation

@qiankunli

@qiankunli qiankunli commented Jun 25, 2026

Copy link
Copy Markdown

Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file.

  • config: new routing namespace — routing.models ([{provider, model}], priority order, reusing the existing providers map for credentials) and routing.policy (only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing under routing keeps it distinct from providers..models (a provider's model catalog) and gives future routing knobs a home.
  • LLMRouter implements LLMClient: tries members in order, advances on availability errors (429/5xx/network), short-circuits on client-side errors (400/413/422) and context cancellation. A per-run shared cooldown parks a throttled model so concurrent per-file subtasks skip it.
  • router members use a low SDK retry budget so a rate-limited model fails fast to the next instead of burning the full backoff (MaxRetries now configurable; default 5 preserved).
  • docs: README.md / README.zh-CN.md config reference + Multi-model fallback.

No routing.models keeps the current single-model behavior; --model pins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation.

Add an ordered model pool so a review falls over to another provider/model
when the primary is rate-limited, down, or timing out — instead of failing
the file.

- config: new `routing` namespace — `routing.models` ([{provider, model}],
  priority order, reusing the existing `providers` map for credentials) and
  `routing.policy` (only "priority" today; reserved for future policies, an
  unknown value is rejected rather than silently ignored). Namespacing under
  `routing` keeps it distinct from providers.<name>.models (a provider's model
  catalog) and gives future routing knobs a home.
- LLMRouter implements LLMClient: tries members in order, advances on
  availability errors (429/5xx/network), short-circuits on client-side errors
  (400/413/422) and context cancellation. A per-run shared cooldown parks a
  throttled model so concurrent per-file subtasks skip it.
- router members use a low SDK retry budget so a rate-limited model fails fast
  to the next instead of burning the full backoff (MaxRetries now configurable;
  default 5 preserved).
- docs: README.md / README.zh-CN.md config reference + Multi-model fallback.

No `routing.models` keeps the current single-model behavior; `--model` pins a
single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown,
error classification, config chain resolution, and policy validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CLAassistant

CLAassistant commented Jun 25, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ qiankunli
❌ liqiankun1111
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 4 issue(s) in this PR.

  • ✅ 4 posted as inline comment(s)
  • 📝 0 posted as summary

Comment thread internal/llm/client.go
Comment thread internal/llm/client.go
Comment thread internal/llm/resolver.go Outdated
Comment thread internal/llm/resolver.go
liqiankun1111 and others added 3 commits June 25, 2026 14:03
- resolveModelRef: clear sub.Model so a top-level `model` cannot leak into a
  routing entry that omits its own model (model now comes only from ref.Model
  or the provider default).
- LLMRouter: when a call fails, stop and return ctx.Err() if the shared context
  is canceled or past its deadline — every member uses that ctx, so none can
  succeed; avoids wasted fallover attempts and misleading logs. A per-request
  timeout (ctx still live) still falls over.
- order(): delete expired cooldown entries so the map stays bounded.
- ResolvedEndpoint.MaxRetries: clarify it is internal/router-set, not read from
  config.

Adds a router test for the context-done short-circuit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
A web edit (5bbe6e9) accidentally pasted the for/if/if header twice in
LLMRouter.order(), leaving unbalanced braces that broke the build. Remove
the duplicate; the intended if/else cooldown handling is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants