Motivation
LAMB is designed to run on top of any LLM backend. For classic RAG / single-shot
assistant use, the de-facto "OpenAI-compatible" chat-completions API is a
good-enough common denominator. But as we push LAMB toward more agentic
patterns (tool-calling loops, multi-step reasoning), the assumption of a single
standard LLM API starts to leak.
Mario Zechner's write-up on building a minimal coding agent documents this well:
in practice there are ~4 distinct API families (OpenAI Completions, OpenAI
Responses, Anthropic Messages, Google Generative AI), and "OpenAI-compatible"
endpoints disagree on details that matter once you run a loop:
- field-name divergence (
store; max_tokens vs max_completion_tokens)
- where reasoning/thinking content is returned
- when token usage is reported (breaks cost accounting on aborted calls)
- missing or ignored abort signals (a problem for any production loop)
- cross-provider context hand-off (thinking-trace conversion; replay of signed blobs)
Proposal (to contemplate)
Consider a provider-agnostic connector layer — a multi-AI compliance mode —
inspired by pi-ai's approach: a thin abstraction over the major API families that
normalizes the quirks above, so a LAMB assistant configured for agentic use
behaves consistently regardless of backend.
Scope to discuss:
- which API families to target first
- normalizing tool-calling + streaming across providers
- abort + timeout handling as a first-class concern
- best-effort usage / cost accounting
- (stretch) cross-provider context hand-off for switching models mid-task
This is a forward-looking design discussion, not a committed deliverable — opening
it so we explore the design space before agentic features harden the current
single-API assumption into the codebase.
Reference
Motivation
LAMB is designed to run on top of any LLM backend. For classic RAG / single-shot
assistant use, the de-facto "OpenAI-compatible" chat-completions API is a
good-enough common denominator. But as we push LAMB toward more agentic
patterns (tool-calling loops, multi-step reasoning), the assumption of a single
standard LLM API starts to leak.
Mario Zechner's write-up on building a minimal coding agent documents this well:
in practice there are ~4 distinct API families (OpenAI Completions, OpenAI
Responses, Anthropic Messages, Google Generative AI), and "OpenAI-compatible"
endpoints disagree on details that matter once you run a loop:
store;max_tokensvsmax_completion_tokens)Proposal (to contemplate)
Consider a provider-agnostic connector layer — a multi-AI compliance mode —
inspired by pi-ai's approach: a thin abstraction over the major API families that
normalizes the quirks above, so a LAMB assistant configured for agentic use
behaves consistently regardless of backend.
Scope to discuss:
This is a forward-looking design discussion, not a committed deliverable — opening
it so we explore the design space before agentic features harden the current
single-API assumption into the codebase.
Reference