Skip to content

Harden LiteLLM provider flow and upgrade to 1.88.1#490

Merged
MaxEriksson2000 merged 8 commits into
developfrom
fix/litellm-provider-flow
Jun 12, 2026
Merged

Harden LiteLLM provider flow and upgrade to 1.88.1#490
MaxEriksson2000 merged 8 commits into
developfrom
fix/litellm-provider-flow

Conversation

@MaxEriksson2000

Copy link
Copy Markdown
Collaborator

Summary

  • upgrade LiteLLM from 1.83.10 to 1.88.1 and python-dotenv from 1.0.1 to 1.2.2
  • centralize LiteLLM transport, provider routing, credential/config resolution, and public error normalization
  • enforce tenant-scoped provider lookup for completion, embeddings, and transcription
  • replace dynamic stream mutation with an explicit PreparedModelStream
  • align streaming and non-streaming MCP tool execution with bounded multi-round flows
  • persist Eneo-owned model capability snapshots and keep LiteLLM deprecation metadata advisory
  • regenerate the TypeScript OpenAPI schema and add coverage for tenant isolation, provider errors, capabilities, and tool flows

Why

LiteLLM had accumulated transport, provider, orchestration, policy, and error-handling responsibilities inside the completion adapter. This duplicated provider logic across model types, caused streaming and non-streaming behavior to diverge, allowed dependency metadata to alter model availability, and risked exposing raw provider details.

The new structure keeps LiteLLM as a transport dependency while Eneo owns tenant boundaries, model policy, capabilities, orchestration, and public contracts.

Impact

  • existing API shapes remain compatible, with optional explicit model_kwargs_capabilities added to tenant completion model create/update
  • models without tool support no longer receive tool definitions
  • provider and credential failures return stable sanitized errors
  • package upgrades no longer automatically disable configured models through LiteLLM metadata

Validation

  • 72 focused unit tests passed
  • 66 relevant integration tests passed
  • broad backend suite: 3170 passed, with one pre-existing unrelated failure in tests/unit/test_route_error_contract.py::test_detects_positional_http_exception_status
  • Ruff passed
  • Pyright passed
  • frontend @intric/intric-js lint passed
  • OpenAPI schema drift check passed
  • commit and pre-push hooks passed

Stacked PR

This PR is intentionally based on #489 (feature/attachment-vision-and-token-accuracy) because the work was developed and verified on that branch. After #489 merges, this PR can be rebased or retargeted to develop.

… overhead

Token counting previously measured concatenated text only, with three
systematic errors:

- count_tokens received the bare model name while litellm is invoked
  with "{provider_type}/{name}", silently resolving the default
  tokenizer for Azure/vLLM/Mistral deployments
- images (current question and history) counted as 0 tokens
- tool definitions (generate_image + MCP tools) and per-message chat
  scaffolding counted as 0 tokens

Counting now mirrors the OpenAI-format payload the adapter builds:
litellm.token_counter(messages=...) includes image_url content and
message wrappers, tools are counted via the tools= parameter, and the
MCP proxy is created before context building so its tool definitions
shrink the history/knowledge budgets.

Also fixes add_knowledge subtracting the prompt tokens a second time
from a budget that already excluded them.
…ploads

PDF attachments previously reached the model as extracted text only —
embedded photos and scanned pages were silently dropped. Uploading a
PDF now also renders its embedded image regions (pdfplumber + pypdfium,
already transitive dependencies) into derived image files linked via
files.parent_file_id, capped per document and filtered by source size
to skip logos.

At ask time the attachment list is expanded with derived images only
when the model supports vision; without vision the PDF degrades to a
plain text attachment instead of erroring. History replay now also
strips images for non-vision models, which previously produced provider
errors after a model switch.

All vision images (direct uploads included) are downscaled to max
2048px and recompressed before storage, since blobs are base64-encoded
into every request and replayed for the rest of the session.
- PDF: per-page [PAGE N] markers and tables rendered as markdown
  (table text excluded from the running text to avoid duplication);
  image-only PDFs return empty text instead of bare markers
- PPTX: speaker notes are extracted (previously discarded)
- Attachment prompt block: compact 'FILE: name (mimetype)' format
  replaces JSON-per-line (less token overhead, gives the model the
  file type), and each file is capped at a configurable token budget
  with a visible truncation notice instead of failing the whole
  request via QueryException
iPhone photos default to HEIC, which was rejected. pillow-heif decodes
them and the existing downscale step converts to JPEG before storage —
providers never see image/heic. Formats outside PNG/JPEG/WEBP are now
always converted even when conversion does not shrink the blob.

The frontend accept list picks the new mimetypes up automatically via
the limits endpoint (ImageMimeTypes drives FormatLimit).
LiteLLM BadRequestError was mapped to OpenAIException, which the
embedding/transcription retry policies do not exempt: a deterministic
upstream 400 was retried three times (up to ~40s of backoff) and then
surfaced as HTTP 503, masking tenant configuration errors as
availability incidents.

Introduce ProviderRejectedRequestException (subclass of OpenAIException
so existing provider-error handling keeps catching it) mapped to 400
with error code 9041, and a shared NON_RETRYABLE_PROVIDER_ERRORS tuple
used by both adapter retry policies. Invalid API credentials
(APIKeyNotConfiguredException) are excluded from retries as well.
Capability snapshots ignored the model's reasoning flag: LiteLLM
discovery is name-based, so opaque routes (e.g. Azure deployment names)
missed reasoning_effort entirely, and the discovery-failure fallback
hardcoded reasoning=False. Since the persisted snapshot acts as an
explicit override at resolve time and is never widened again, reasoning
models could permanently lose the reasoning_effort control.

snapshot_supported_model_kwargs now takes the admin-declared reasoning
flag: the fallback respects it, and reasoning_effort is forced on when
discovery misses it. update() re-snapshots when name or reasoning
changes (previously only name), with an explicit capability payload
still winning over the refreshed snapshot.
@MaxEriksson2000 MaxEriksson2000 marked this pull request as ready for review June 12, 2026 19:24
Base automatically changed from feature/attachment-vision-and-token-accuracy to develop June 12, 2026 19:40
…er-flow

# Conflicts:
#	backend/alembic/versions/b4f2a9c1e7d3_add_parent_file_id_to_files.py
#	backend/src/intric/assistants/assistant_service.py
#	backend/src/intric/completion_models/infrastructure/adapters/base_adapter.py
#	backend/src/intric/completion_models/infrastructure/adapters/tenant_model_adapter.py
#	backend/src/intric/completion_models/infrastructure/completion_service.py
#	backend/src/intric/completion_models/infrastructure/context_builder.py
#	backend/src/intric/files/file_protocol.py
#	backend/src/intric/files/file_service.py
#	backend/src/intric/files/image_processing.py
#	backend/src/intric/main/config.py
#	backend/src/intric/tokens/token_utils.py
#	backend/tests/unit/test_completion_service_streaming.py
#	backend/tests/unittests/ai_models/test_context_builder.py
#	backend/tests/unittests/assistants/test_assistants_service.py
#	backend/tests/unittests/files/test_file_service.py
#	backend/tests/unittests/files/test_image_processing.py
#	backend/tests/unittests/tokens/test_token_utils.py
@github-actions

Copy link
Copy Markdown

🧹 Dead-code & unused-dependency report

Advisory — never gates the PR. Whole-repo scan, so some findings may be false positives (dynamic dispatch, framework hooks, runtime-resolved imports). Triage before removing.

No dead code or unused dependencies detected.

@github-actions

Copy link
Copy Markdown

📊 Patch coverage

Share of this PR's new/changed lines exercised by tests. Report-only — never gates the PR.

Area Changed Uncovered Coverage
Backend 297 69 76%
Uncovered lines — Backend (11 files)
  • …/infrastructure/adapters/litellm_embeddings.py — 163, 172
  • …/completion_models/infrastructure/completion_service.py — 114
  • …/model_providers/infrastructure/tenant_model_credential_resolver.py — 42–43
  • …/model_providers/infrastructure/litellm_transport.py — 77–78, 82–83, 87–88, 113, 129, 135, 141, 155
  • …/tenant_models/application/tenant_model_service.py — 107, 134–135, 140, 258, 342–344, 347, 351–352, 357–358
  • …/embedding_models/infrastructure/create_embeddings_service.py — 69, 102, 124–125, 128, 133, 136, 139
  • …/infrastructure/adapters/base_adapter.py — 32
  • …/infrastructure/adapters/litellm_transcription.py — 53, 67, 69–70, 140, 151
  • …/infrastructure/adapters/tenant_model_adapter.py — 147, 151, 329, 332–333, 338, 352, 366–368, 513, 519, 635, 695, 717, 807, 1061
  • …/model_providers/infrastructure/litellm_provider.py — 21–22, 81, 93
  • …/intric/files/transcriber.py — 118–119, 124, 129

@MaxEriksson2000 MaxEriksson2000 merged commit 71dba6f into develop Jun 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant