Skip to content

google-folder-upload-flow-added#6

Open
MallanagoudaBiradar wants to merge 4 commits into
ELEVATE-Project:release-3.0.0from
MallanagoudaBiradar:GoogleDriveFlowAdded
Open

google-folder-upload-flow-added#6
MallanagoudaBiradar wants to merge 4 commits into
ELEVATE-Project:release-3.0.0from
MallanagoudaBiradar:GoogleDriveFlowAdded

Conversation

@MallanagoudaBiradar

@MallanagoudaBiradar MallanagoudaBiradar commented Jun 17, 2026

Copy link
Copy Markdown

Summary by CodeRabbit

  • New Features

    • Redesigned Google Drive integration with a two-step batch upload wizard for seamless folder importing.
    • Added organization selection dropdown in Google Drive integration for metadata categorization.
    • Added progress tracking and status indicators during batch file processing.
  • Improvements

    • Enhanced metadata extraction with better fallbacks for missing document information.
    • Improved error handling for inaccessible or private Google Drive folders.
  • Chores

    • Updated dependencies to support Google Drive API operations.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8be82c68-a457-428d-8683-74043f13cc6b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces the old single-step Google Drive file-listing import with a two-step folder-based batch upload wizard. The backend now recursively traverses Drive folders, enforces public-sharing, triggers Celery AI extraction tasks, and saves results via improved save_views/status_views logic. The template becomes a polling-driven wizard. Separately, the document_type key regex is narrowed from [_\s] to [ _] across all serializers and API views.

Changes

Google Drive Batch Upload Wizard

Layer / File(s) Summary
Dependencies and config setup
pyproject.toml, .gitignore
Adds google-api-python-client and google-auth-oauthlib to project dependencies; replaces .venv/ with client_secret.json in .gitignore.
Backend Drive integration view: helpers and import pipeline
chatbot/views/Media/google_drive_integration.py
Enables OAuthlib insecure transport; adds get_default_extraction_bot(), get_all_files_in_folder() for recursive Drive traversal, stricter public-sharing enforcement in download_drive_file, and a rewritten GoogleDriveFileImportView.post that accepts a folder URL, caches files as DummyFile, triggers Celery extraction tasks sequentially, and returns task metadata with a session ID.
URL routing and admin wiring
chatbot/urls.py, chatbot/admin/media_admin.py, chatbot/templates/admin/change_list.html
Removes google-drive/files/ and download routes; keeps only google-drive/files/import/; removes GoogleDriveFilesView from admin imports/routes; renames the admin list button to "Add drive" with btn-success.
AI extraction data improvements
chatbot/views/Media/status_views.py, chatbot/views/Media/save_views.py
Adds get_summary_text helper to derive summary from ai_data or structured_content; updates save_views to use .get() with defaults for Media field population and adds description fallback logic for both main and subdocument records.
Frontend two-step batch upload wizard
chatbot/templates/google_drive_integration.html
Rewrites the template as a two-step wizard: step 1 collects org and folder URL; step 2 shows a spinner/progress. JavaScript implements window.goToStep, window.confirmOrgSelection, pollAITasks (polls batch-task-status, tolerantly maps array/dict responses, copies AI fields per item), and a "Fetch & Upload All Files" handler that posts to files/import/, polls until completion, and batch-saves to api/batch-save/.

Document Type Regex Narrowing

Layer / File(s) Summary
Regex fix in media serializers
chatbot/serializer/media_serializer.py
Changes key__iregex from ^document[_\s]type$ to ^document[ _]type$ in S3UrlMixin, MediaListSerializer, and MediaDetailSerializer methods.
Regex fix in media API views
chatbot/views/Media/media_api_views.py
Applies the same ^document[ _]type$ narrowing across all ORM subqueries and filters in media_api_views.py.

Sequence Diagram

sequenceDiagram
  rect rgba(100, 149, 237, 0.5)
    note over AdminBrowser,CeleryWorker: Step 1 – Folder Import Initialization
    AdminBrowser->>GoogleDriveFileImportView: POST folder_url + org/company_bot
    GoogleDriveFileImportView->>GoogleDriveAPI: check root folder permissions (public?)
    GoogleDriveFileImportView->>GoogleDriveAPI: get_all_files_in_folder (recursive)
    GoogleDriveAPI-->>GoogleDriveFileImportView: flat file list
    loop each file
      GoogleDriveFileImportView->>GoogleDriveAPI: download_drive_file (enforce public)
      GoogleDriveFileImportView->>CacheManager: cache DummyFile
      GoogleDriveFileImportView->>CeleryWorker: get_auto_extracted_data.delay
    end
    GoogleDriveFileImportView-->>AdminBrowser: session_id + task metadata list
  end
  rect rgba(60, 179, 113, 0.5)
    note over AdminBrowser,BatchSaveAPI: Step 2 – Polling and Batch Save
    loop until all tasks done
      AdminBrowser->>BatchTaskStatusAPI: POST task ids
      BatchTaskStatusAPI-->>AdminBrowser: task statuses (array or dict)
      AdminBrowser->>AdminBrowser: copy AI fields (title/summary/key_values/…) into items
    end
    AdminBrowser->>BatchSaveAPI: POST consolidated extracted items
    BatchSaveAPI-->>AdminBrowser: save confirmation
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 A rabbit once tunneled through Drive's winding halls,
Fetching each folder — no file ever falls!
With a spinner that spins and a wizard in two,
The AI extracts, and the data breaks through.
[ _] now narrows the regex so tight,
Every document type matched just right. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'google-folder-upload-flow-added' accurately summarizes the main change - a Google folder upload flow feature has been added to the project.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rakeshSgr

Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 22: The .gitignore file at line 22 currently only includes
client_secret.json, but this removes the previous .venv/ entry which is needed
to prevent accidental virtualenv directory commits. Ensure both .venv/ and
client_secret.json are present in the .gitignore file to maintain proper ignore
rules for both the virtual environment directory and the secret file.

In `@chatbot/templates/google_drive_integration.html`:
- Around line 214-218: The showErrorModal function uses innerHTML to display
desc2 which could contain untrusted data from backend error messages, creating
an XSS vulnerability. Change line 217 where error-modal-desc2 is set to use
textContent instead of innerHTML, and do the same for the desc1 parameter on
line 216 (change innerText to textContent for consistency). Similarly, locate
the updateLoader function mentioned in the comment (around lines 230-235) and
apply the same fix to replace any innerHTML usage for untrusted backend error
content with textContent. Keep the SVG in the title as innerHTML since it is a
controlled hard-coded string.

In `@chatbot/views/Media/google_drive_integration.py`:
- Around line 324-328: The extension determination logic in the code block
starting with the download_drive_file call relies only on the original_name
filename, but the metadata parameter returned from download_drive_file contains
MIME type information that should be used first. Modify the extension derivation
to check the MIME type from metadata (likely accessible via metadata['mimeType']
or similar) and map it to the appropriate extension before falling back to
parsing the filename. This ensures Google-native files without reliable filename
extensions are correctly identified based on their MIME type from Drive.
- Around line 334-337: The CacheManager.cache_file() method in the
google_drive_integration.py file can return None when the cache write fails, but
the code at line 334 assigns the return value to file_key without checking for
None. Add a guard clause after the CacheManager.cache_file() call to check if
file_key is None and handle the failure appropriately (either skip the file or
raise an error to fail fast), ensuring that only successfully cached files are
appended at line 370 where file_key is used in downstream operations for batch
save. Apply the same guard pattern to any other locations where
CacheManager.cache_file() is called.
- Line 2: The line setting OAUTHLIB_INSECURE_TRANSPORT to '1' in the
google_drive_integration.py file enables HTTP transport for OAuth callbacks
unconditionally, which weakens security in production environments. Remove this
line entirely or modify it to be conditional so that insecure transport is only
enabled during local development (e.g., when a DEBUG flag or environment
variable indicates development mode). Ensure OAuth callbacks use HTTPS in all
server runtime environments outside of local testing.
- Around line 279-284: The code retrieves company_bot based on bot_id without
validating that it belongs to the selected company, creating a security issue
where extraction could route through the wrong tenant's bot. After fetching both
the company and company_bot objects (lines where company is filtered by slug and
company_bot is filtered by id), add validation to ensure that if both objects
exist, the company_bot's company field matches the selected company object. If
the company_bot does not belong to the specified company, treat it as invalid
and fall back to the default extraction bot logic.
- Around line 397-402: The download_drive_file function call can raise a
ValueError("not_public") exception which is not being caught, resulting in a 500
server error. Wrap the download_drive_file(service, file_id) call and the
FileResponse construction in a try-except block to catch the ValueError
exception and return an appropriate HTTP error response (such as HTTP 403
Forbidden or HTTP 404 Not Found) instead of letting the exception propagate.
- Around line 330-376: Wrap the get_auto_extracted_data.delay() call in a nested
try-except block to handle failures and clean up the temp file created earlier.
If the Celery task enqueue fails, delete the temp file using the tmp_path
variable before re-raising or handling the error. Additionally, replace the
blanket except Exception + pass pattern at the end with proper error handling
that either logs the specific error details or re-raises critical exceptions,
ensuring that only expected failures are silently skipped while unexpected
errors are properly reported.

In `@chatbot/views/Media/save_views.py`:
- Around line 293-296: The media_type default value is inconsistent between main
media creation and subdocument media creation. In the main media creation block
(around the filename assignment), the default is set to the string 'txt'
(extension format), but subdocument media creation uses
FileTypeChoices.TXT.value (MIME type format). Since FileTypeChoices enum values
are MIME types, update the media_type default in the main media creation to use
FileTypeChoices.TXT.value instead of the hardcoded 'txt' string to maintain
consistency across both places.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d58833d-fe38-452b-932b-6672007a3d25

📥 Commits

Reviewing files that changed from the base of the PR and between 14b78e4 and 7d91716.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .gitignore
  • chatbot/admin/media_admin.py
  • chatbot/serializer/media_serializer.py
  • chatbot/templates/admin/change_list.html
  • chatbot/templates/google_drive_integration.html
  • chatbot/urls.py
  • chatbot/views/Media/google_drive_integration.py
  • chatbot/views/Media/media_api_views.py
  • chatbot/views/Media/save_views.py
  • chatbot/views/Media/status_views.py
  • pyproject.toml
💤 Files with no reviewable changes (1)
  • chatbot/urls.py

Comment thread .gitignore
Comment thread chatbot/templates/google_drive_integration.html
Comment thread chatbot/views/Media/google_drive_integration.py
Comment thread chatbot/views/Media/google_drive_integration.py
Comment thread chatbot/views/Media/google_drive_integration.py
Comment thread chatbot/views/Media/google_drive_integration.py
Comment thread chatbot/views/Media/google_drive_integration.py Outdated
Comment thread chatbot/views/Media/google_drive_integration.py
Comment thread chatbot/views/Media/save_views.py
Comment thread chatbot/migrations/0082_tag_is_theme_tag_icon.py

description = models.TextField(null=True, blank=True)
is_theme = models.BooleanField(default=False)
icon = models.CharField(max_length=1000, null=True, blank=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MallanagoudaB check any other image url data type, keep the same

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will check this

Comment thread chatbot/templates/google_drive_integration.html Outdated
Comment thread chatbot/templates/google_drive_integration.html
Comment thread chatbot/templates/google_drive_integration.html
Comment thread chatbot/templates/google_drive_integration.html Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants