google-folder-upload-flow-added#6
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughReplaces the old single-step Google Drive file-listing import with a two-step folder-based batch upload wizard. The backend now recursively traverses Drive folders, enforces public-sharing, triggers Celery AI extraction tasks, and saves results via improved ChangesGoogle Drive Batch Upload Wizard
Document Type Regex Narrowing
Sequence DiagramsequenceDiagram
rect rgba(100, 149, 237, 0.5)
note over AdminBrowser,CeleryWorker: Step 1 – Folder Import Initialization
AdminBrowser->>GoogleDriveFileImportView: POST folder_url + org/company_bot
GoogleDriveFileImportView->>GoogleDriveAPI: check root folder permissions (public?)
GoogleDriveFileImportView->>GoogleDriveAPI: get_all_files_in_folder (recursive)
GoogleDriveAPI-->>GoogleDriveFileImportView: flat file list
loop each file
GoogleDriveFileImportView->>GoogleDriveAPI: download_drive_file (enforce public)
GoogleDriveFileImportView->>CacheManager: cache DummyFile
GoogleDriveFileImportView->>CeleryWorker: get_auto_extracted_data.delay
end
GoogleDriveFileImportView-->>AdminBrowser: session_id + task metadata list
end
rect rgba(60, 179, 113, 0.5)
note over AdminBrowser,BatchSaveAPI: Step 2 – Polling and Batch Save
loop until all tasks done
AdminBrowser->>BatchTaskStatusAPI: POST task ids
BatchTaskStatusAPI-->>AdminBrowser: task statuses (array or dict)
AdminBrowser->>AdminBrowser: copy AI fields (title/summary/key_values/…) into items
end
AdminBrowser->>BatchSaveAPI: POST consolidated extracted items
BatchSaveAPI-->>AdminBrowser: save confirmation
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitignore:
- Line 22: The .gitignore file at line 22 currently only includes
client_secret.json, but this removes the previous .venv/ entry which is needed
to prevent accidental virtualenv directory commits. Ensure both .venv/ and
client_secret.json are present in the .gitignore file to maintain proper ignore
rules for both the virtual environment directory and the secret file.
In `@chatbot/templates/google_drive_integration.html`:
- Around line 214-218: The showErrorModal function uses innerHTML to display
desc2 which could contain untrusted data from backend error messages, creating
an XSS vulnerability. Change line 217 where error-modal-desc2 is set to use
textContent instead of innerHTML, and do the same for the desc1 parameter on
line 216 (change innerText to textContent for consistency). Similarly, locate
the updateLoader function mentioned in the comment (around lines 230-235) and
apply the same fix to replace any innerHTML usage for untrusted backend error
content with textContent. Keep the SVG in the title as innerHTML since it is a
controlled hard-coded string.
In `@chatbot/views/Media/google_drive_integration.py`:
- Around line 324-328: The extension determination logic in the code block
starting with the download_drive_file call relies only on the original_name
filename, but the metadata parameter returned from download_drive_file contains
MIME type information that should be used first. Modify the extension derivation
to check the MIME type from metadata (likely accessible via metadata['mimeType']
or similar) and map it to the appropriate extension before falling back to
parsing the filename. This ensures Google-native files without reliable filename
extensions are correctly identified based on their MIME type from Drive.
- Around line 334-337: The CacheManager.cache_file() method in the
google_drive_integration.py file can return None when the cache write fails, but
the code at line 334 assigns the return value to file_key without checking for
None. Add a guard clause after the CacheManager.cache_file() call to check if
file_key is None and handle the failure appropriately (either skip the file or
raise an error to fail fast), ensuring that only successfully cached files are
appended at line 370 where file_key is used in downstream operations for batch
save. Apply the same guard pattern to any other locations where
CacheManager.cache_file() is called.
- Line 2: The line setting OAUTHLIB_INSECURE_TRANSPORT to '1' in the
google_drive_integration.py file enables HTTP transport for OAuth callbacks
unconditionally, which weakens security in production environments. Remove this
line entirely or modify it to be conditional so that insecure transport is only
enabled during local development (e.g., when a DEBUG flag or environment
variable indicates development mode). Ensure OAuth callbacks use HTTPS in all
server runtime environments outside of local testing.
- Around line 279-284: The code retrieves company_bot based on bot_id without
validating that it belongs to the selected company, creating a security issue
where extraction could route through the wrong tenant's bot. After fetching both
the company and company_bot objects (lines where company is filtered by slug and
company_bot is filtered by id), add validation to ensure that if both objects
exist, the company_bot's company field matches the selected company object. If
the company_bot does not belong to the specified company, treat it as invalid
and fall back to the default extraction bot logic.
- Around line 397-402: The download_drive_file function call can raise a
ValueError("not_public") exception which is not being caught, resulting in a 500
server error. Wrap the download_drive_file(service, file_id) call and the
FileResponse construction in a try-except block to catch the ValueError
exception and return an appropriate HTTP error response (such as HTTP 403
Forbidden or HTTP 404 Not Found) instead of letting the exception propagate.
- Around line 330-376: Wrap the get_auto_extracted_data.delay() call in a nested
try-except block to handle failures and clean up the temp file created earlier.
If the Celery task enqueue fails, delete the temp file using the tmp_path
variable before re-raising or handling the error. Additionally, replace the
blanket except Exception + pass pattern at the end with proper error handling
that either logs the specific error details or re-raises critical exceptions,
ensuring that only expected failures are silently skipped while unexpected
errors are properly reported.
In `@chatbot/views/Media/save_views.py`:
- Around line 293-296: The media_type default value is inconsistent between main
media creation and subdocument media creation. In the main media creation block
(around the filename assignment), the default is set to the string 'txt'
(extension format), but subdocument media creation uses
FileTypeChoices.TXT.value (MIME type format). Since FileTypeChoices enum values
are MIME types, update the media_type default in the main media creation to use
FileTypeChoices.TXT.value instead of the hardcoded 'txt' string to maintain
consistency across both places.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d58833d-fe38-452b-932b-6672007a3d25
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
.gitignorechatbot/admin/media_admin.pychatbot/serializer/media_serializer.pychatbot/templates/admin/change_list.htmlchatbot/templates/google_drive_integration.htmlchatbot/urls.pychatbot/views/Media/google_drive_integration.pychatbot/views/Media/media_api_views.pychatbot/views/Media/save_views.pychatbot/views/Media/status_views.pypyproject.toml
💤 Files with no reviewable changes (1)
- chatbot/urls.py
7d91716 to
10a78e3
Compare
|
|
||
| description = models.TextField(null=True, blank=True) | ||
| is_theme = models.BooleanField(default=False) | ||
| icon = models.CharField(max_length=1000, null=True, blank=True) |
There was a problem hiding this comment.
@MallanagoudaB check any other image url data type, keep the same
Summary by CodeRabbit
New Features
Improvements
Chores