google-folder-upload-flow-added by MallanagoudaBiradar · Pull Request #6 · ELEVATE-Project/mitra-service

MallanagoudaBiradar · 2026-06-17T08:36:10Z

Summary by CodeRabbit

New Features
- Redesigned Google Drive integration with a two-step batch upload wizard for seamless folder importing.
- Added organization selection dropdown in Google Drive integration for metadata categorization.
- Added progress tracking and status indicators during batch file processing.
Improvements
- Enhanced metadata extraction with better fallbacks for missing document information.
- Improved error handling for inaccessible or private Google Drive folders.
Chores
- Updated dependencies to support Google Drive API operations.

coderabbitai · 2026-06-17T09:06:43Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8be82c68-a457-428d-8683-74043f13cc6b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces the old single-step Google Drive file-listing import with a two-step folder-based batch upload wizard. The backend now recursively traverses Drive folders, enforces public-sharing, triggers Celery AI extraction tasks, and saves results via improved save_views/status_views logic. The template becomes a polling-driven wizard. Separately, the document_type key regex is narrowed from [_\s] to [ _] across all serializers and API views.

Changes

Google Drive Batch Upload Wizard

Layer / File(s)	Summary
Dependencies and config setup `pyproject.toml`, `.gitignore`	Adds `google-api-python-client` and `google-auth-oauthlib` to project dependencies; replaces `.venv/` with `client_secret.json` in `.gitignore`.
Backend Drive integration view: helpers and import pipeline `chatbot/views/Media/google_drive_integration.py`	Enables OAuthlib insecure transport; adds `get_default_extraction_bot()`, `get_all_files_in_folder()` for recursive Drive traversal, stricter public-sharing enforcement in `download_drive_file`, and a rewritten `GoogleDriveFileImportView.post` that accepts a folder URL, caches files as `DummyFile`, triggers Celery extraction tasks sequentially, and returns task metadata with a session ID.
URL routing and admin wiring `chatbot/urls.py`, `chatbot/admin/media_admin.py`, `chatbot/templates/admin/change_list.html`	Removes `google-drive/files/` and download routes; keeps only `google-drive/files/import/`; removes `GoogleDriveFilesView` from admin imports/routes; renames the admin list button to "Add drive" with `btn-success`.
AI extraction data improvements `chatbot/views/Media/status_views.py`, `chatbot/views/Media/save_views.py`	Adds `get_summary_text` helper to derive summary from `ai_data` or `structured_content`; updates `save_views` to use `.get()` with defaults for `Media` field population and adds description fallback logic for both main and subdocument records.
Frontend two-step batch upload wizard `chatbot/templates/google_drive_integration.html`	Rewrites the template as a two-step wizard: step 1 collects org and folder URL; step 2 shows a spinner/progress. JavaScript implements `window.goToStep`, `window.confirmOrgSelection`, `pollAITasks` (polls `batch-task-status`, tolerantly maps array/dict responses, copies AI fields per item), and a "Fetch & Upload All Files" handler that posts to `files/import/`, polls until completion, and batch-saves to `api/batch-save/`.

Document Type Regex Narrowing

Layer / File(s)	Summary
Regex fix in media serializers `chatbot/serializer/media_serializer.py`	Changes `key__iregex` from `^document[_\s]type$` to `^document[ _]type$` in `S3UrlMixin`, `MediaListSerializer`, and `MediaDetailSerializer` methods.
Regex fix in media API views `chatbot/views/Media/media_api_views.py`	Applies the same `^document[ _]type$` narrowing across all ORM subqueries and filters in `media_api_views.py`.

Sequence Diagram

sequenceDiagram
  rect rgba(100, 149, 237, 0.5)
    note over AdminBrowser,CeleryWorker: Step 1 – Folder Import Initialization
    AdminBrowser->>GoogleDriveFileImportView: POST folder_url + org/company_bot
    GoogleDriveFileImportView->>GoogleDriveAPI: check root folder permissions (public?)
    GoogleDriveFileImportView->>GoogleDriveAPI: get_all_files_in_folder (recursive)
    GoogleDriveAPI-->>GoogleDriveFileImportView: flat file list
    loop each file
      GoogleDriveFileImportView->>GoogleDriveAPI: download_drive_file (enforce public)
      GoogleDriveFileImportView->>CacheManager: cache DummyFile
      GoogleDriveFileImportView->>CeleryWorker: get_auto_extracted_data.delay
    end
    GoogleDriveFileImportView-->>AdminBrowser: session_id + task metadata list
  end
  rect rgba(60, 179, 113, 0.5)
    note over AdminBrowser,BatchSaveAPI: Step 2 – Polling and Batch Save
    loop until all tasks done
      AdminBrowser->>BatchTaskStatusAPI: POST task ids
      BatchTaskStatusAPI-->>AdminBrowser: task statuses (array or dict)
      AdminBrowser->>AdminBrowser: copy AI fields (title/summary/key_values/…) into items
    end
    AdminBrowser->>BatchSaveAPI: POST consolidated extracted items
    BatchSaveAPI-->>AdminBrowser: save confirmation
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 A rabbit once tunneled through Drive's winding halls,
Fetching each folder — no file ever falls!
With a spinner that spins and a wizard in two,
The AI extracts, and the data breaks through.
[ _] now narrows the regex so tight,
Every document type matched just right. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'google-folder-upload-flow-added' accurately summarizes the main change - a Google folder upload flow feature has been added to the project.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rakeshSgr · 2026-06-18T12:04:45Z

@coderabbitai full review

coderabbitai · 2026-06-18T12:04:53Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 22: The .gitignore file at line 22 currently only includes
client_secret.json, but this removes the previous .venv/ entry which is needed
to prevent accidental virtualenv directory commits. Ensure both .venv/ and
client_secret.json are present in the .gitignore file to maintain proper ignore
rules for both the virtual environment directory and the secret file.

In `@chatbot/templates/google_drive_integration.html`:
- Around line 214-218: The showErrorModal function uses innerHTML to display
desc2 which could contain untrusted data from backend error messages, creating
an XSS vulnerability. Change line 217 where error-modal-desc2 is set to use
textContent instead of innerHTML, and do the same for the desc1 parameter on
line 216 (change innerText to textContent for consistency). Similarly, locate
the updateLoader function mentioned in the comment (around lines 230-235) and
apply the same fix to replace any innerHTML usage for untrusted backend error
content with textContent. Keep the SVG in the title as innerHTML since it is a
controlled hard-coded string.

In `@chatbot/views/Media/google_drive_integration.py`:
- Around line 324-328: The extension determination logic in the code block
starting with the download_drive_file call relies only on the original_name
filename, but the metadata parameter returned from download_drive_file contains
MIME type information that should be used first. Modify the extension derivation
to check the MIME type from metadata (likely accessible via metadata['mimeType']
or similar) and map it to the appropriate extension before falling back to
parsing the filename. This ensures Google-native files without reliable filename
extensions are correctly identified based on their MIME type from Drive.
- Around line 334-337: The CacheManager.cache_file() method in the
google_drive_integration.py file can return None when the cache write fails, but
the code at line 334 assigns the return value to file_key without checking for
None. Add a guard clause after the CacheManager.cache_file() call to check if
file_key is None and handle the failure appropriately (either skip the file or
raise an error to fail fast), ensuring that only successfully cached files are
appended at line 370 where file_key is used in downstream operations for batch
save. Apply the same guard pattern to any other locations where
CacheManager.cache_file() is called.
- Line 2: The line setting OAUTHLIB_INSECURE_TRANSPORT to '1' in the
google_drive_integration.py file enables HTTP transport for OAuth callbacks
unconditionally, which weakens security in production environments. Remove this
line entirely or modify it to be conditional so that insecure transport is only
enabled during local development (e.g., when a DEBUG flag or environment
variable indicates development mode). Ensure OAuth callbacks use HTTPS in all
server runtime environments outside of local testing.
- Around line 279-284: The code retrieves company_bot based on bot_id without
validating that it belongs to the selected company, creating a security issue
where extraction could route through the wrong tenant's bot. After fetching both
the company and company_bot objects (lines where company is filtered by slug and
company_bot is filtered by id), add validation to ensure that if both objects
exist, the company_bot's company field matches the selected company object. If
the company_bot does not belong to the specified company, treat it as invalid
and fall back to the default extraction bot logic.
- Around line 397-402: The download_drive_file function call can raise a
ValueError("not_public") exception which is not being caught, resulting in a 500
server error. Wrap the download_drive_file(service, file_id) call and the
FileResponse construction in a try-except block to catch the ValueError
exception and return an appropriate HTTP error response (such as HTTP 403
Forbidden or HTTP 404 Not Found) instead of letting the exception propagate.
- Around line 330-376: Wrap the get_auto_extracted_data.delay() call in a nested
try-except block to handle failures and clean up the temp file created earlier.
If the Celery task enqueue fails, delete the temp file using the tmp_path
variable before re-raising or handling the error. Additionally, replace the
blanket except Exception + pass pattern at the end with proper error handling
that either logs the specific error details or re-raises critical exceptions,
ensuring that only expected failures are silently skipped while unexpected
errors are properly reported.

In `@chatbot/views/Media/save_views.py`:
- Around line 293-296: The media_type default value is inconsistent between main
media creation and subdocument media creation. In the main media creation block
(around the filename assignment), the default is set to the string 'txt'
(extension format), but subdocument media creation uses
FileTypeChoices.TXT.value (MIME type format). Since FileTypeChoices enum values
are MIME types, update the media_type default in the main media creation to use
FileTypeChoices.TXT.value instead of the hardcoded 'txt' string to maintain
consistency across both places.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d58833d-fe38-452b-932b-6672007a3d25

📥 Commits

Reviewing files that changed from the base of the PR and between 14b78e4 and 7d91716.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

.gitignore
chatbot/admin/media_admin.py
chatbot/serializer/media_serializer.py
chatbot/templates/admin/change_list.html
chatbot/templates/google_drive_integration.html
chatbot/urls.py
chatbot/views/Media/google_drive_integration.py
chatbot/views/Media/media_api_views.py
chatbot/views/Media/save_views.py
chatbot/views/Media/status_views.py
pyproject.toml

💤 Files with no reviewable changes (1)

chatbot/urls.py

rakeshSgr · 2026-06-19T04:58:37Z


    description = models.TextField(null=True, blank=True)
+    is_theme = models.BooleanField(default=False)
+    icon = models.CharField(max_length=1000, null=True, blank=True)


@MallanagoudaB check any other image url data type, keep the same

i will check this

google-folder-upload-flow-added

cbf7da3

google-folder-upload-flow-added

457e7a9

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

global-tag-new-theme-api

10a78e3

MallanagoudaBiradar force-pushed the GoogleDriveFlowAdded branch from 7d91716 to 10a78e3 Compare June 18, 2026 16:41