Skip to content

[codex] Add Mistral OCR client resource#152

Merged
wachterjohannes merged 5 commits into
0.4from
codex/mistral-ocr-api
Jun 25, 2026
Merged

[codex] Add Mistral OCR client resource#152
wachterjohannes merged 5 commits into
0.4from
codex/mistral-ocr-api

Conversation

@wachterjohannes

@wachterjohannes wachterjohannes commented Jun 24, 2026

Copy link
Copy Markdown
Member

What changed

  • Adds an OCR resource to the Mistral client for the ocr endpoint.
  • Adds Model::OCR (mistral-ocr-latest) and Model::OCR_4 (mistral-ocr-4-0).
  • Adds typed OCR response objects for pages, images, dimensions, tables, confidence scores, usage info, and OCR 4 blocks.
  • Adds OCR examples for basic Markdown extraction and OCR 4 block extraction.
  • Adds unit coverage for Client::ocr(), OCR request payload mapping, default model behavior, validation, and OCR 4 response block mapping.
  • Fixes an unrelated highest-deps PHPStan issue in the Google Gemini adapter that blocked the monorepo GitHub Actions matrix.

Why

Downstream consumers need first-class Mistral OCR support instead of carrying a Composer patch in application repositories. OCR 4 block extraction is included so clients can consume structured page regions from the API.

Validation

Ran locally in packages/mistral:

  • composer fix
  • composer lint
  • composer test

Ran locally in packages/google-gemini-adapter for the CI fix:

  • composer fix
  • composer lint
  • composer test

GitHub Actions:

  • 86/86 checks passing on this PR.

Summary by CodeRabbit

  • New Features

    • Added OCR support to the Mistral client, including a new OCR endpoint and response handling.
    • Added OCR model options and structured OCR result objects for pages, tables, images, blocks, confidence scores, and usage details.
    • Added CLI OCR examples for processing local files or document URLs, with optional page selection and richer extracted output.
  • Bug Fixes

    • Tightened response configuration handling for Gemini so schema-related output is only applied when the response format is valid.
  • Tests

    • Added unit coverage for OCR client access, OCR processing, and OCR response parsing.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@wachterjohannes, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 46 minutes and 15 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1b05390-a707-4564-944f-930e049679e3

📥 Commits

Reviewing files that changed from the base of the PR and between 3d407dd and bccdc82.

📒 Files selected for processing (7)
  • packages/mistral/src/Resources/Ocr.php
  • packages/mistral/src/Responses/Ocr/ConfidenceScore.php
  • packages/mistral/src/Responses/Ocr/ConfidenceScores.php
  • packages/mistral/src/Responses/Ocr/Image.php
  • packages/mistral/src/Responses/Ocr/Page.php
  • packages/mistral/tests/Unit/Resources/OcrTest.php
  • packages/mistral/tests/Unit/Responses/Ocr/ProcessResponseTest.php

Walkthrough

Adds Mistral OCR client support with request validation, typed OCR response objects, CLI examples, and unit tests, and narrows the Google Gemini chat adapter so response schema data is only included when the response MIME type is valid.

Changes

Mistral OCR feature

Layer / File(s) Summary
OCR surface and client wiring
packages/mistral/src/Client.php, packages/mistral/src/ClientInterface.php, packages/mistral/src/Model.php, packages/mistral/src/Resources/OcrInterface.php, packages/mistral/tests/Unit/ClientTest.php
Client and ClientInterface expose ocr(), Model adds OCR enum cases, and the client test covers the new entrypoint.
OCR request flow
packages/mistral/src/Resources/Ocr.php, packages/mistral/tests/Unit/Resources/OcrTest.php
The OCR resource adds a transport-backed constructor, validates input, infers document type, submits OCR payloads, and is covered by request tests.
OCR response value objects
packages/mistral/src/Responses/Ocr/UsageInfo.php, packages/mistral/src/Responses/Ocr/Dimensions.php, packages/mistral/src/Responses/Ocr/Image.php, packages/mistral/src/Responses/Ocr/ConfidenceScore.php, packages/mistral/src/Responses/Ocr/ConfidenceScores.php, packages/mistral/src/Responses/Ocr/Block.php, packages/mistral/src/Responses/Ocr/Table.php
The OCR namespace adds immutable value objects for usage, dimensions, images, confidence scores, blocks, and tables.
OCR page and response hydration
packages/mistral/src/Responses/Ocr/Page.php, packages/mistral/src/Responses/Ocr/ProcessResponse.php, packages/mistral/tests/Unit/Responses/Ocr/ProcessResponseTest.php
The OCR page and process-response models map nested OCR payloads into typed objects, with a unit test covering the full mapping.
OCR CLI examples
packages/mistral/examples/ocr.php, packages/mistral/examples/ocr-blocks.php
The new CLI examples normalize URL/file inputs, send OCR requests, and print page or block output.

Google Gemini generation config fix

Layer / File(s) Summary
Generation config branch
packages/google-gemini-adapter/src/Chat/GoogleGeminiChatAdapter.php
The chat adapter now requires a valid response MIME type before building a GenerationConfig that carries response schema data.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • modelflow-ai/.github#107: Introduced the Google Gemini adapter implementation that this PR narrows in buildGenerationConfig.
  • modelflow-ai/.github#148: Added structured output and response-schema support on the same Gemini generation-config path adjusted here.

Poem

I hopped through OCR fields tonight,
With blocks and pages shining bright.
A Gemini twig was gently nibbled too,
Now schemas wait for MIME types true.
🐰 Hop, hop—my code smells crisp and light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding a Mistral OCR client resource and related support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/mistral-ocr-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@wachterjohannes wachterjohannes marked this pull request as ready for review June 25, 2026 14:43

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/mistral/src/Resources/Ocr.php`:
- Around line 29-39: The OCR payload is missing the required document type
discriminator because `Ocr::process()` validates it via `detectDocumentType()`
but does not persist the result into `$parameters['document']`. Update the
normalization flow in `Ocr::process()` so the detected type is written back onto
the `document` array before calling `Payload::create('ocr', $parameters)`,
ensuring requests without an explicit type still send a valid `document.type`
value.

In `@packages/mistral/src/Responses/Ocr/ConfidenceScore.php`:
- Around line 33-37: The ConfidenceScore hydration logic rejects integer-valued
confidence inputs because `ConfidenceScore::fromArray` currently validates
`$confidence` with `Assert::float`; update this to accept numeric values
instead, then cast the value to float before constructing the object. Keep the
existing `$text` and `$startIndex` handling intact, and make the change in the
`ConfidenceScore` class where the `confidence` attribute is read and validated.

In `@packages/mistral/src/Responses/Ocr/ConfidenceScores.php`:
- Around line 40-41: The page confidence validation in ConfidenceScores::from is
too strict because Assert::float rejects whole-number values returned by the
API. Update the handling of averagePageConfidenceScore and
minimumPageConfidenceScore to accept any numeric value consistently with
ConfidenceScore::from, then normalize/cast to float before constructing the
object.

In `@packages/mistral/src/Responses/Ocr/Image.php`:
- Around line 36-42: The nullable OCR coordinate fields in Image::from are still
being read with direct array access, which can trigger undefined key warnings
before validation. Update the coordinate parsing in Image::from to use
null-coalescing defaults for top_left_x, top_left_y, bottom_right_x, and
bottom_right_y, matching the handling used in Block::from, while keeping the
existing Assert::nullOrInteger checks and constructor argument mapping intact.

In `@packages/mistral/src/Responses/Ocr/Page.php`:
- Around line 47-48: The Page::fromArray hydration is treating required OCR
fields as if they were optional, so update the image/dimension extraction to
follow the file’s existing defensive defaults pattern. In the Page class, adjust
the assignments for $rawImages and $rawDimensions so they use null-coalescing
defaults consistent with the other optional arrays in this parser, while keeping
the handling aligned with the Mistral OCR schema expectations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b6032bf-6f29-4409-aedd-87cb77be4bc5

📥 Commits

Reviewing files that changed from the base of the PR and between 38621c3 and 3d407dd.

📒 Files selected for processing (20)
  • packages/google-gemini-adapter/src/Chat/GoogleGeminiChatAdapter.php
  • packages/mistral/examples/ocr-blocks.php
  • packages/mistral/examples/ocr.php
  • packages/mistral/src/Client.php
  • packages/mistral/src/ClientInterface.php
  • packages/mistral/src/Model.php
  • packages/mistral/src/Resources/Ocr.php
  • packages/mistral/src/Resources/OcrInterface.php
  • packages/mistral/src/Responses/Ocr/Block.php
  • packages/mistral/src/Responses/Ocr/ConfidenceScore.php
  • packages/mistral/src/Responses/Ocr/ConfidenceScores.php
  • packages/mistral/src/Responses/Ocr/Dimensions.php
  • packages/mistral/src/Responses/Ocr/Image.php
  • packages/mistral/src/Responses/Ocr/Page.php
  • packages/mistral/src/Responses/Ocr/ProcessResponse.php
  • packages/mistral/src/Responses/Ocr/Table.php
  • packages/mistral/src/Responses/Ocr/UsageInfo.php
  • packages/mistral/tests/Unit/ClientTest.php
  • packages/mistral/tests/Unit/Resources/OcrTest.php
  • packages/mistral/tests/Unit/Responses/Ocr/ProcessResponseTest.php

Comment thread packages/mistral/src/Resources/Ocr.php
Comment thread packages/mistral/src/Responses/Ocr/ConfidenceScore.php Outdated
Comment thread packages/mistral/src/Responses/Ocr/ConfidenceScores.php Outdated
Comment thread packages/mistral/src/Responses/Ocr/Image.php
Comment thread packages/mistral/src/Responses/Ocr/Page.php Outdated
- Persist detected document type into the OCR request payload so type-less
  requests still send the required document.type discriminator
- Accept integer-valued confidence scores by validating numeric and casting
  to float in ConfidenceScore and ConfidenceScores
- Use defensive null-coalescing for nullable Image coordinates and optional
  Page images/dimensions to avoid undefined-key warnings
- Add regression tests covering the above
@wachterjohannes wachterjohannes merged commit 92c381e into 0.4 Jun 25, 2026
86 checks passed
@wachterjohannes wachterjohannes deleted the codex/mistral-ocr-api branch June 25, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant