Fix #5930 - Add multimodal content formatting API to handle files as native provider content blocks#6241
Fix #5930 - Add multimodal content formatting API to handle files as native provider content blocks#6241Whning0513 wants to merge 2 commits into
Conversation
…files as native provider content blocks
There was a problem hiding this comment.
Summary: This PR updates multimodal file formatting for OpenAI providers by passing MIME type information and rejecting unsupported PDF attachments for the Chat Completions API; no exploitable security vulnerabilities were identified.
Risk: Low risk. The changes narrow provider handling behavior and do not introduce new authentication, authorization, data exposure, or user-controlled execution paths.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthrough
ChangesOpenAI PDF handling for Chat Completions and Responses
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai-files/src/crewai_files/formatting/openai.py`:
- Around line 140-143: The PDF detection at the is_pdf assignment is using an
exact string comparison against "application/pdf" which can be bypassed by MIME
types with parameters or different casing (e.g., "application/pdf;
charset=binary" or "Application/PDF"). Normalize the content_type before the
comparison by converting it to lowercase and extracting only the media type
portion (removing any parameters after a semicolon), then compare the normalized
value against "application/pdf" to ensure all PDF variants are properly detected
and rejected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 78230643-21ec-4abe-ba2c-625860a2fcbb
📒 Files selected for processing (2)
lib/crewai-files/src/crewai_files/formatting/api.pylib/crewai-files/src/crewai_files/formatting/openai.py
|
I independently validated the MIME normalization issue on PR head Current behavior on that head:
A minimal local patch that normalizes once with: media_type = content_type.split(";", 1)[0].strip().lower()and uses Focused validation after the local patch: So the normalization concern still looks valid, and it may be worth sharing the same media-type helper across both formatter classes. |
|
Addressed in the latest commit on |
[BUG] input_files (PDFFile) are passed as base64 via read_file tool, causing context overflow and inconsistent LLM behavior
When using
input_fileswithPDFFile(orFile), CrewAI does not handle the file as a native file input at the provider level. Instead, the file is processed via theread_filetool and its content is returned as a binary/base64 representation that is injected into the agent execution context. This causes LLM context to become extremely large, responses to become inconsistent or fail due to context overflow, and the same file to be re-processed during agent execution via tools.Reported by: Community user
Files passed via
input_files(such as PDFs) were processed through theread_filetool which returned base64-encoded binary data. This base64 data was then injected directly into the LLM context as text, rather than being formatted as native provider content blocks (e.g., OpenAI'sinput_filetype, Anthropic's image/file blocks). This caused massive context inflation and unreliable LLM behavior.File 1:
lib/crewai-files/src/crewai_files/formatting/api.py(new file)format_multimodal_content()function that converts text and files into provider-specific multimodal content blocksaformat_multimodal_content()async variant for parallel file resolution_normalize_provider()to map provider strings to standardized types (gemini, anthropic, bedrock, openai, etc.)_format_text_block()for provider-specific text block formattingFile 2:
lib/crewai-files/src/crewai_files/formatting/openai.py(new file)OpenAIFormatterclass for OpenAI Chat Completions API formatting (image_url blocks with base64 data)OpenAIResponsesFormatterclass for OpenAI Responses API formatting (input_image and input_file types)Medium - The change introduces a new API for multimodal content formatting that fundamentally changes how files are handled by LLM providers. Files are now formatted as native provider content blocks rather than being injected as raw base64 text. This is a significant architectural improvement but introduces new code paths for file handling. Proper integration with existing
input_filesflows is required to avoid breaking changes.Summary by CodeRabbit