Harden LiteLLM provider flow and upgrade to 1.88.1 by MaxEriksson2000 · Pull Request #490 · eneo-ai/eneo

MaxEriksson2000 · 2026-06-11T19:08:33Z

Summary

upgrade LiteLLM from 1.83.10 to 1.88.1 and python-dotenv from 1.0.1 to 1.2.2
centralize LiteLLM transport, provider routing, credential/config resolution, and public error normalization
enforce tenant-scoped provider lookup for completion, embeddings, and transcription
replace dynamic stream mutation with an explicit PreparedModelStream
align streaming and non-streaming MCP tool execution with bounded multi-round flows
persist Eneo-owned model capability snapshots and keep LiteLLM deprecation metadata advisory
regenerate the TypeScript OpenAPI schema and add coverage for tenant isolation, provider errors, capabilities, and tool flows

Why

LiteLLM had accumulated transport, provider, orchestration, policy, and error-handling responsibilities inside the completion adapter. This duplicated provider logic across model types, caused streaming and non-streaming behavior to diverge, allowed dependency metadata to alter model availability, and risked exposing raw provider details.

The new structure keeps LiteLLM as a transport dependency while Eneo owns tenant boundaries, model policy, capabilities, orchestration, and public contracts.

Impact

existing API shapes remain compatible, with optional explicit model_kwargs_capabilities added to tenant completion model create/update
models without tool support no longer receive tool definitions
provider and credential failures return stable sanitized errors
package upgrades no longer automatically disable configured models through LiteLLM metadata

Validation

72 focused unit tests passed
66 relevant integration tests passed
broad backend suite: 3170 passed, with one pre-existing unrelated failure in tests/unit/test_route_error_contract.py::test_detects_positional_http_exception_status
Ruff passed
Pyright passed
frontend @intric/intric-js lint passed
OpenAPI schema drift check passed
commit and pre-push hooks passed

Stacked PR

This PR is intentionally based on #489 (feature/attachment-vision-and-token-accuracy) because the work was developed and verified on that branch. After #489 merges, this PR can be rebased or retargeted to develop.

… overhead Token counting previously measured concatenated text only, with three systematic errors: - count_tokens received the bare model name while litellm is invoked with "{provider_type}/{name}", silently resolving the default tokenizer for Azure/vLLM/Mistral deployments - images (current question and history) counted as 0 tokens - tool definitions (generate_image + MCP tools) and per-message chat scaffolding counted as 0 tokens Counting now mirrors the OpenAI-format payload the adapter builds: litellm.token_counter(messages=...) includes image_url content and message wrappers, tools are counted via the tools= parameter, and the MCP proxy is created before context building so its tool definitions shrink the history/knowledge budgets. Also fixes add_knowledge subtracting the prompt tokens a second time from a budget that already excluded them.

…ploads PDF attachments previously reached the model as extracted text only — embedded photos and scanned pages were silently dropped. Uploading a PDF now also renders its embedded image regions (pdfplumber + pypdfium, already transitive dependencies) into derived image files linked via files.parent_file_id, capped per document and filtered by source size to skip logos. At ask time the attachment list is expanded with derived images only when the model supports vision; without vision the PDF degrades to a plain text attachment instead of erroring. History replay now also strips images for non-vision models, which previously produced provider errors after a model switch. All vision images (direct uploads included) are downscaled to max 2048px and recompressed before storage, since blobs are base64-encoded into every request and replayed for the rest of the session.

- PDF: per-page [PAGE N] markers and tables rendered as markdown (table text excluded from the running text to avoid duplication); image-only PDFs return empty text instead of bare markers - PPTX: speaker notes are extracted (previously discarded) - Attachment prompt block: compact 'FILE: name (mimetype)' format replaces JSON-per-line (less token overhead, gives the model the file type), and each file is capped at a configurable token budget with a visible truncation notice instead of failing the whole request via QueryException

iPhone photos default to HEIC, which was rejected. pillow-heif decodes them and the existing downscale step converts to JPEG before storage — providers never see image/heic. Formats outside PNG/JPEG/WEBP are now always converted even when conversion does not shrink the blob. The frontend accept list picks the new mimetypes up automatically via the limits endpoint (ImageMimeTypes drives FormatLimit).

LiteLLM BadRequestError was mapped to OpenAIException, which the embedding/transcription retry policies do not exempt: a deterministic upstream 400 was retried three times (up to ~40s of backoff) and then surfaced as HTTP 503, masking tenant configuration errors as availability incidents. Introduce ProviderRejectedRequestException (subclass of OpenAIException so existing provider-error handling keeps catching it) mapped to 400 with error code 9041, and a shared NON_RETRYABLE_PROVIDER_ERRORS tuple used by both adapter retry policies. Invalid API credentials (APIKeyNotConfiguredException) are excluded from retries as well.

Capability snapshots ignored the model's reasoning flag: LiteLLM discovery is name-based, so opaque routes (e.g. Azure deployment names) missed reasoning_effort entirely, and the discovery-failure fallback hardcoded reasoning=False. Since the persisted snapshot acts as an explicit override at resolve time and is never widened again, reasoning models could permanently lose the reasoning_effort control. snapshot_supported_model_kwargs now takes the admin-declared reasoning flag: the fallback respects it, and reasoning_effort is forced on when discovery misses it. update() re-snapshots when name or reasoning changes (previously only name), with an explicit capability payload still winning over the refreshed snapshot.

…er-flow # Conflicts: # backend/alembic/versions/b4f2a9c1e7d3_add_parent_file_id_to_files.py # backend/src/intric/assistants/assistant_service.py # backend/src/intric/completion_models/infrastructure/adapters/base_adapter.py # backend/src/intric/completion_models/infrastructure/adapters/tenant_model_adapter.py # backend/src/intric/completion_models/infrastructure/completion_service.py # backend/src/intric/completion_models/infrastructure/context_builder.py # backend/src/intric/files/file_protocol.py # backend/src/intric/files/file_service.py # backend/src/intric/files/image_processing.py # backend/src/intric/main/config.py # backend/src/intric/tokens/token_utils.py # backend/tests/unit/test_completion_service_streaming.py # backend/tests/unittests/ai_models/test_context_builder.py # backend/tests/unittests/assistants/test_assistants_service.py # backend/tests/unittests/files/test_file_service.py # backend/tests/unittests/files/test_image_processing.py # backend/tests/unittests/tokens/test_token_utils.py

github-actions · 2026-06-12T19:50:38Z

🧹 Dead-code & unused-dependency report

Advisory — never gates the PR. Whole-repo scan, so some findings may be false positives (dynamic dispatch, framework hooks, runtime-resolved imports). Triage before removing.

✅ No dead code or unused dependencies detected.

github-actions · 2026-06-12T20:09:27Z

📊 Patch coverage

Share of this PR's new/changed lines exercised by tests. Report-only — never gates the PR.

Area	Changed	Uncovered	Coverage
Backend	297	69	76%

Uncovered lines — Backend (11 files)

…/infrastructure/adapters/litellm_embeddings.py — 163, 172
…/completion_models/infrastructure/completion_service.py — 114
…/model_providers/infrastructure/tenant_model_credential_resolver.py — 42–43
…/model_providers/infrastructure/litellm_transport.py — 77–78, 82–83, 87–88, 113, 129, 135, 141, 155
…/tenant_models/application/tenant_model_service.py — 107, 134–135, 140, 258, 342–344, 347, 351–352, 357–358
…/embedding_models/infrastructure/create_embeddings_service.py — 69, 102, 124–125, 128, 133, 136, 139
…/infrastructure/adapters/base_adapter.py — 32
…/infrastructure/adapters/litellm_transcription.py — 53, 67, 69–70, 140, 151
…/infrastructure/adapters/tenant_model_adapter.py — 147, 151, 329, 332–333, 338, 352, 366–368, 513, 519, 635, 695, 717, 807, 1061
…/model_providers/infrastructure/litellm_provider.py — 21–22, 81, 93
…/intric/files/transcriber.py — 118–119, 124, 129

MaxEriksson2000 added 7 commits June 11, 2026 19:47

fix(ai): harden LiteLLM provider flow

b672b8e

MaxEriksson2000 marked this pull request as ready for review June 12, 2026 19:24

Base automatically changed from feature/attachment-vision-and-token-accuracy to develop June 12, 2026 19:40

MaxEriksson2000 merged commit 71dba6f into develop Jun 12, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden LiteLLM provider flow and upgrade to 1.88.1#490

Harden LiteLLM provider flow and upgrade to 1.88.1#490
MaxEriksson2000 merged 8 commits into
developfrom
fix/litellm-provider-flow

MaxEriksson2000 commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxEriksson2000 commented Jun 11, 2026

Summary

Why

Impact

Validation

Stacked PR

Uh oh!

github-actions Bot commented Jun 12, 2026

🧹 Dead-code & unused-dependency report

Uh oh!

github-actions Bot commented Jun 12, 2026

📊 Patch coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant