feat(summary) Migrate visio integration to summary API v2#1362
feat(summary) Migrate visio integration to summary API v2#1362FloChehab wants to merge 3 commits into
Conversation
8dc08b2 to
21202e4
Compare
f5d2ee6 to
b8b5d01
Compare
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR modernizes the recording and transcription system by adding external process webhook integration to the backend and consolidating the summary service to v2-only with docs integration. The backend extends the Recording model with external_process_id tracking and adds a webhook endpoint to receive events from external services, updating recording status based on transcript success/failure outcomes. The notification service is enhanced to generate locale-specific titles, send cloud storage URLs, and persist job IDs for webhook correlation. The summary service eliminates v1 API support, makes authorized tenants required with per-tenant push_to_docs permissions, introduces typed data contracts with received_at timestamps for analytics, creates a docs integration service for document creation, and refactors task processing to include optional synchronous docs push. CI/CD workflows are updated to allow Docker image pushes on integration/* branches, and Helm templates are configured with v2 endpoints and multi-tenant authorization. Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 21
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/summary/summary/core/celery_worker.py (1)
436-450:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDuplicate analytics capture call.
metadata_manager.capture(job_id, settings.posthog_event_success)is called twice: once at line 436 and again at line 450. This will result in duplicate analytics events being recorded.🐛 Proposed fix: remove the first duplicate call
) - metadata_manager.capture(job_id, settings.posthog_event_success) - file_service.store_transcript( transcript=transcription_res, job_id=job_id,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/summary/summary/core/celery_worker.py` around lines 436 - 450, metadata_manager.capture(job_id, settings.posthog_event_success) is invoked twice causing duplicate analytics events; remove the earlier call and keep a single capture after you finish storing the transcript and scheduling the webhook. Specifically, delete the first metadata_manager.capture(...) that appears before file_service.store_transcript(...) so only the final metadata_manager.capture(job_id, settings.posthog_event_success) after call_webhook_v2_task.apply_async(...) remains; the other symbols to locate the section are file_service.store_transcript, TranscribeWebhookSuccessPayload, and call_webhook_v2_task.apply_async.
♻️ Duplicate comments (1)
src/summary/summary/core/celery_worker.py (1)
388-389:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCatch
Exceptioninstead ofBaseException.Catching
BaseExceptionwill interceptSystemExitandKeyboardInterrupt, preventing proper process termination. Change toexcept Exception as e:to maintain resilience for operational errors while allowing process-control exceptions to propagate.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/summary/summary/core/celery_worker.py` around lines 388 - 389, Change the broad exception handler in celery_worker.py from "except BaseException as e:" to "except Exception as e:" so SystemExit/KeyboardInterrupt can propagate; locate the try/except around the speaker identity resolution (the block logging "Failed to resolve speaker identities, skipping: {e}") and update the exception type accordingly, leaving the logger.error call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/core/recording/event/authentication.py`:
- Line 106: Fix the typo in the logger message: replace the string
"Authentificating" passed to logger.info with the correct spelling
"Authenticating" (locate the logger.info(...) call in the authentication event
handling code and update the message accordingly).
- Around line 102-119: In authenticate, add checks to ensure
settings.SUMMARY_SERVICE_WEBHOOK_API_TOKEN is configured (non-empty/None) before
calling secrets.compare_digest, and validate the Authorization header format by
rejecting anything that does not start with the exact "Bearer " prefix (similar
to StorageEventAuthentication) so you only compare the extracted bearer token;
update the logger messages to reflect configuration or header format failures
and raise AuthenticationFailed on invalid/missing token or bad prefix, keeping
the final secrets.compare_digest call against the extracted bearer token and
returning MachineUser("external_process_user"), None on success.
In `@src/backend/core/recording/event/notification.py`:
- Around line 297-309: The metadata dict is being constructed when started_at
and ended_at exist even if metadata_filename is None; update the conditional
that builds "metadata" so it requires metadata_filename as well (e.g., require
started_at and ended_at and metadata_filename) before calling
generate_download_s3_file_url(metadata_filename, ...), ensuring metadata remains
None when metadata_filename is absent; refer to the "metadata" entry, the
variables metadata_filename, started_at, ended_at, and the function
generate_download_s3_file_url when locating and fixing the code.
- Around line 219-220: Wrap the ZoneInfo(owner_timezone) usage in a try/except
that catches zoneinfo.ZoneInfoNotFoundError (import it) around the block that
sets dt = recording_datetime.astimezone(ZoneInfo(owner_timezone)); on exception,
log a warning including the invalid owner_timezone (use the same logger used
nearby) and skip the astimezone conversion so dt remains as recording_datetime
(or apply a safe fallback), ensuring the code references owner_timezone and
recording_datetime and that ZoneInfoNotFoundError is imported from zoneinfo.
In `@src/backend/core/utils.py`:
- Around line 460-492: The function generate_download_s3_file_url lacks
validation for the key parameter, so callers passing None or an empty string
produce unclear boto3 errors; add a short guard at the top of
generate_download_s3_file_url that verifies key is a non-empty string (e.g.,
check isinstance(key, str) and key.strip() not empty) and raise a clear
ValueError (with a message like "key must be a non-empty string") before any
boto3/default_storage calls so failures are immediate and debuggable.
In `@src/backend/locale/de_DE/LC_MESSAGES/django.po`:
- Around line 149-155: Add German translations for the two msgid entries in the
locale file: replace the empty msgstr for "External process successful" with
"Externer Prozess erfolgreich" and replace the empty msgstr for "External
process failed" with "Externer Prozess fehlgeschlagen" so the msgid-to-msgstr
pairs read respectively for those strings.
- Around line 358-364: Replace the empty msgstr entries for the two msgid
strings by adding German translations: set msgstr for "External Process ID" to
"Externe Prozess‑ID" and set msgstr for "ID of the external process associated
with the recording." to "ID des externen Prozesses, der mit der Aufnahme
verknüpft ist."; update the corresponding entries in django.po so the two msgid
lines are followed by these msgstr translations.
- Around line 595-598: Update the German translation for msgid "Click the
\"Open\" button below" by removing the old HTML link markup in msgstr and
replacing it with wording that matches the new msgid (e.g., "Klicken Sie auf die
Schaltfläche „Öffnen“ unten"), and remove the "#, fuzzy" flag so the entry is no
longer marked fuzzy; edit the msgstr corresponding to that msgid and clear the
fuzzy marker.
In `@src/backend/locale/en_US/LC_MESSAGES/django.po`:
- Around line 591-594: The translation still contains the old link HTML and the
"#, fuzzy" marker; update the msgstr to match the new msgid wording ("Click the
\"Open\" button below ") (i.e., remove the <a href=...> markup) and delete the
"#, fuzzy" marker so msgid and msgstr are identical and no longer fuzzy.
In `@src/backend/locale/fr_FR/LC_MESSAGES/django.po`:
- Around line 151-157: Add French translations for the two msgid entries by
replacing the empty msgstr values for "External process successful" and
"External process failed" with their French equivalents; set msgstr for
"External process successful" to "Processus externe réussi" and set msgstr for
"External process failed" to "Échec du processus externe" (update the entries
where the msgid strings appear).
- Around line 359-365: Update the two empty msgstr entries for the given msgid
keys: replace msgstr "" for "External Process ID" with "ID du processus externe"
and replace msgstr "" for "ID of the external process associated with the
recording." with "ID du processus externe associé à l'enregistrement." so the
French .po contains the proper translations for those message IDs (msgid
"External Process ID" and msgid "ID of the external process associated with the
recording.").
- Around line 597-600: Update the French translation to match the new msgid by
removing the old HTML link markup and translating the button wording: change the
msgstr for msgid "Click the \"Open\" button below " so it reads something like
"Cliquez sur le bouton « Ouvrir » ci‑dessous" (remove the "<a
href=\"%(link)s\">…</a>" and the %(link)s placeholder) and also remove the "#,
fuzzy" marker so the entry is no longer marked fuzzy.
In `@src/backend/locale/nl_NL/LC_MESSAGES/django.po`:
- Around line 354-360: Update the empty msgstr entries for the two msgid
strings: set msgid "External Process ID" to the Dutch translation "Externe
proces-ID" and set msgid "ID of the external process associated with the
recording." to "ID van het externe proces dat aan de opname gekoppeld is." by
replacing the empty msgstr values for those exact msgid entries.
- Around line 149-155: The translation entries for msgid "External process
successful" and "External process failed" in the locale file are empty; update
the corresponding msgstr values to Dutch by replacing the empty msgstr for
"External process successful" with a suitable translation such as "Extern proces
voltooid" (or "Extern proces geslaagd") and for "External process failed" with
"Extern proces mislukt" (or "Extern proces gefaald") so the UI shows Dutch text
for these external process status messages.
- Around line 590-593: The Dutch translation for msgid "Click the \"Open\"
button below" still contains old HTML link markup in msgstr; update msgstr to
match the new msgid (e.g., change "Klik op de \"<a
href=\"%(link)s\">Openen</a>\"-link hieronder " to a button-oriented translation
such as "Klik op de knop \"Open\" hieronder") and remove the "#, fuzzy" marker
so the entry is no longer marked fuzzy; edit the entry that contains the shown
msgid/msgstr pair to make these changes.
In `@src/helm/env.d/dev-dinum/values.meet.yaml.gotmpl`:
- Around line 108-110: Add the missing AUTHORIZED_TENANTS environment entry to
the summary service block so v2 authorization succeeds; locate the
SUMMARY_SERVICE_* entries (SUMMARY_SERVICE_ENDPOINT, SUMMARY_SERVICE_API_TOKEN,
SUMMARY_SERVICE_WEBHOOK_API_TOKEN) in this values.meet.yaml.gotmpl and add an
AUTHORIZED_TENANTS key with the same tenant list format used in other dev
configs (e.g., a JSON/array string of allowed tenant IDs) so the summary service
accepts v2 tasks and passes startup validation.
In `@src/summary/summary/core/config.py`:
- Around line 119-123: The docs integration defaults currently enable a
dangerous placeholder configuration; change is_docs_integration_enabled default
to False and add a startup validation (e.g., a Pydantic root_validator or
post-init check in the same config class) that raises an error if
is_docs_integration_enabled is True while docs_base_url == "https://example.com"
or docs_server_to_server_api_key == "NO_API_KEY"; reference the fields
is_docs_integration_enabled, docs_base_url and docs_server_to_server_api_key so
the check runs early and fails fast instead of silently using placeholder
values.
In `@src/summary/summary/core/docs_service.py`:
- Around line 57-59: The debug logs in docs_service.py currently print full
request/response payloads (logger.debug calls around the "Submitting to" block
referencing data and later response), which may include sensitive fields like
email and sub; update the code to log only metadata by redacting sensitive keys
before logging (create/use a helper such as redact_payload and apply it to the
request variable `data` and the downstream `response` body), remove or avoid
logging full response bodies, and ensure the logger messages include
non-sensitive context (endpoint URL, payload size, status code) instead of raw
content.
- Around line 69-70: The except clause handling response.json() in the fallback
path should also catch ValueError to cover legacy/runtime variants; update the
except tuple in the fallback of the method that reads response.json() (the block
currently using "except (json.JSONDecodeError, AttributeError): document_id =
'Unable to parse response'") to "except (json.JSONDecodeError, ValueError,
AttributeError)" so plain ValueError decode failures are handled the same way.
In `@src/summary/summary/core/file_service.py`:
- Around line 250-258: read_cloud_storage_json currently opens the downloaded
temp file with local_path.open("r") without a context manager and never deletes
the temp file returned by _download_from_cloud_storage_url, causing a resource
leak; modify read_cloud_storage_json to open the file using a with-statement
(context manager) to ensure the file handle is closed and wrap the read in
try/except/finally where the finally block removes the temporary file (using
local_path.unlink() or equivalent) and re-raises FileServiceException on
JSON/Unicode errors, referencing the existing symbols read_cloud_storage_json,
_download_from_cloud_storage_url, and FileServiceException.
In `@src/summary/summary/core/models.py`:
- Around line 47-49: The download_link attribute is currently typed as optional
but lacks a default, making it required under Pydantic v2; update the model's
download_link declaration (the download_link Field in your model class) to
provide an explicit default of None (e.g., use Field(default=None,
title="Download Link", description="The link to download the recording.") or set
download_link: str | None = None with Field(...)) so clients can omit it without
validation errors.
---
Outside diff comments:
In `@src/summary/summary/core/celery_worker.py`:
- Around line 436-450: metadata_manager.capture(job_id,
settings.posthog_event_success) is invoked twice causing duplicate analytics
events; remove the earlier call and keep a single capture after you finish
storing the transcript and scheduling the webhook. Specifically, delete the
first metadata_manager.capture(...) that appears before
file_service.store_transcript(...) so only the final
metadata_manager.capture(job_id, settings.posthog_event_success) after
call_webhook_v2_task.apply_async(...) remains; the other symbols to locate the
section are file_service.store_transcript, TranscribeWebhookSuccessPayload, and
call_webhook_v2_task.apply_async.
---
Duplicate comments:
In `@src/summary/summary/core/celery_worker.py`:
- Around line 388-389: Change the broad exception handler in celery_worker.py
from "except BaseException as e:" to "except Exception as e:" so
SystemExit/KeyboardInterrupt can propagate; locate the try/except around the
speaker identity resolution (the block logging "Failed to resolve speaker
identities, skipping: {e}") and update the exception type accordingly, leaving
the logger.error call intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fa956f9a-b003-4d24-9a4b-ed16d0bf9ea2
📒 Files selected for processing (41)
.github/workflows/docker-hub.yml.github/workflows/meet.ymlenv.d/development/common.distenv.d/development/summary.distsrc/backend/core/api/viewsets.pysrc/backend/core/migrations/0019_recording_external_process_id_alter_recording_status.pysrc/backend/core/models.pysrc/backend/core/recording/event/authentication.pysrc/backend/core/recording/event/notification.pysrc/backend/core/tests/recording/event/test_notification.pysrc/backend/core/tests/recording/test_api_recordings_external_process_hook.pysrc/backend/core/utils.pysrc/backend/locale/de_DE/LC_MESSAGES/django.posrc/backend/locale/en_US/LC_MESSAGES/django.posrc/backend/locale/fr_FR/LC_MESSAGES/django.posrc/backend/locale/nl_NL/LC_MESSAGES/django.posrc/backend/meet/settings.pysrc/helm/env.d/dev-dinum/values.meet.yaml.gotmplsrc/helm/env.d/dev-keycloak/values.meet.yaml.gotmplsrc/summary/summary/api/main.pysrc/summary/summary/api/route/tasks.pysrc/summary/summary/api/route/tasks_v2.pysrc/summary/summary/core/analytics.pysrc/summary/summary/core/celery_worker.pysrc/summary/summary/core/config.pysrc/summary/summary/core/docs_service.pysrc/summary/summary/core/file_service.pysrc/summary/summary/core/locales/de.pysrc/summary/summary/core/locales/en.pysrc/summary/summary/core/locales/fr.pysrc/summary/summary/core/locales/nl.pysrc/summary/summary/core/locales/strings.pysrc/summary/summary/core/models.pysrc/summary/summary/core/shared_models.pysrc/summary/summary/core/transcript_formatter.pysrc/summary/summary/core/user_assign.pysrc/summary/summary/core/webhook_service.pysrc/summary/summary/main.pysrc/summary/tests/api/test_api_tasks.pysrc/summary/tests/api/test_api_tasks_v2.pysrc/summary/tests/conftest.py
💤 Files with no reviewable changes (5)
- src/summary/tests/api/test_api_tasks.py
- src/summary/summary/api/route/tasks.py
- src/summary/tests/conftest.py
- .github/workflows/meet.yml
- src/summary/summary/core/webhook_service.py
3d30412 to
2389908
Compare
0bbc291 to
7cb536d
Compare
This commit cleans up most of the code related to v1 route that is used only by visio. This is first step before introducing an update to the v2 route.
* Update the v2 create transcribe payload to support publishing directly to docs and performing automatically a summary if requested * Note that title computation is to be handled by the caller now as it makes more sense and avoids throwing a bunch of parameters to the endpoint. * All files are send as signed URL now, we don't read directly from s3 anymore, * Docs integration is now explicit in summary settings * To make analytics work properly accross projects, we use the user sub as distinct id, this will require a new posthog project to be deployed to work properly. We create a post hog event at the creation request processing, to make sure feature flags work properly after that. * User email should now be provided to the API, it's not mandatory to avoid a breaking change. * Use a specific user agent for better tracking Finally note that existing helpers don't always make use of the pydantic models, so that's why there are model_dump in some places. To avoid bigger changes.
This commit introduces the compatibility with summary v2. It tecnically doesn't break the compatibility with v1 as v1 params are still sent. But we advise people using the transcribe feature in their own deployments to adapt to the new v2 API, as this compatibility will be removed in a future major version. * RecordingStatusChoices now has EXTERNAL_PROCESS_SUCCESSFUL & EXTERNAL_PROCESS_FAILED Values, which are changed by a new webhook that can be called by the transcribe service. This webhook is protected by its own bearer token. * Title for the document is computed in visio, * Tests are added / updated accordingly
7cb536d to
ea71fdf
Compare
|
| data = request.data | ||
| if not data.get("job_id"): | ||
| raise drf_exceptions.ValidationError(detail="No job_id provided") | ||
|
|
||
| job_id = data["job_id"] |
There was a problem hiding this comment.
wip: use a serializer? with a clear validation of what format should a job_id be
There was a problem hiding this comment.
I think we can be relaxed here, to avoid self holster being forced to used a specific uuid format, etc.
| external_process_id = models.CharField( | ||
| max_length=255, | ||
| null=True, | ||
| blank=True, | ||
| unique=True, | ||
| verbose_name=_("External Process ID"), | ||
| help_text=_("ID of the external process associated with the recording."), | ||
| ) |
There was a problem hiding this comment.
wip: CharField seems very open right now. Is that by design, or should we enforce a UUID format?
There was a problem hiding this comment.
I think we can be relaxed here, to avoid self holster being forced to used a specific uuid format, etc.
| # Docs service configuration | ||
| is_docs_integration_enabled: bool = False | ||
| docs_base_url: Url = "https://example.com" | ||
| docs_server_to_server_api_key: SecretStr = Field( | ||
| title="API key for using docs server to server api", default="NO_API_KEY" | ||
| ) |
There was a problem hiding this comment.
WIP: Docs is probably vague for external contributors; I would suggest being more precise, LaSuite Docs ?
There was a problem hiding this comment.
Yeah we should find something better.
| def validate_docs_config(self): | ||
| """Validate docs integration configuration.""" | ||
| if self.is_docs_integration_enabled: | ||
| if not self.docs_base_url: | ||
| raise ValueError( | ||
| "docs_base_url is required when docs integration is enabled" | ||
| ) | ||
| if self.docs_server_to_server_api_key.get_secret_value() in { | ||
| "", | ||
| "NO_API_KEY", | ||
| }: | ||
| raise ValueError( | ||
| "Valid docs_server_to_server_api_key is required when" | ||
| " docs integration is enabled" | ||
| ) | ||
| return self |
There was a problem hiding this comment.
nit: early return would save one nesting level
if not self.is_docs_integration_enabled:
return self
| webhook_backoff_factor: float = 0.1 | ||
|
|
||
| # Locale | ||
| default_context_language: Literal["de", "en", "fr", "nl"] = "fr" |
There was a problem hiding this comment.
Default should be handled at the caller level from my pov
| SUMMARY_SERVICE_API_TOKEN = SecretFileValue( | ||
| None, environ_name="SUMMARY_SERVICE_API_TOKEN", environ_prefix=None | ||
| ) | ||
| SUMMARY_SERVICE_WEBHOOK_API_TOKEN = SecretFileValue( | ||
| None, environ_name="SUMMARY_SERVICE_WEBHOOK_API_TOKEN", environ_prefix=None | ||
| ) |
There was a problem hiding this comment.
WIP: open question, since these two components already share a secret, would it make sense to reuse the same one here for simplicity? My intuition is that it’s probably not a good idea, as it would introduce tighter coupling and reduce separation of concerns.
But it would make configuration easier
There was a problem hiding this comment.
I agree with ease of configuration. However that wouldn't be ideal.
Maybe this could be adapted with a single SUMMARY_SERVICE_TOKEN, that we use to sign the payload with on the summary side, and on visio side we can check that signed payload is signed with something we know 🤷 Not sure though and would require changes in Transcripts too.
| def generate_download_file_url(file, *, expires_in: int, override_domain: bool = True): | ||
| """ | ||
| Generate a S3 signed download url for a given file. | ||
| """ | ||
|
|
||
| key = file.file_key | ||
| return generate_download_s3_file_url( | ||
| key, expires_in=expires_in, override_domain=override_domain | ||
| ) | ||
|
|
||
|
|
||
| def generate_download_s3_file_url( | ||
| key: str, *, expires_in: int, override_domain: bool = True | ||
| ): | ||
| """ | ||
| Generate a S3 signed download url for a given key. | ||
| """ | ||
| if not key: | ||
| raise ValueError("key cannot be empty") |
There was a problem hiding this comment.
wip: naming generate_download_s3_file_url and generate_download_file_url are close, and we don't get easily what's the difference between them.
There was a problem hiding this comment.
yeah, I need to delete one of those;
| locale = ( | ||
| locale | ||
| if locale in {"de-de", "en-us", "fr-fr", "nl-nl"} | ||
| else settings.LANGUAGES[0][0] | ||
| ) | ||
| default_template_by_locale = { | ||
| "de-de": "Transkription", | ||
| "en-us": "Transcription", | ||
| "fr-fr": "Transcription", | ||
| "nl-nl": "Transcriptie", | ||
| } | ||
| if recording_datetime is None: | ||
| return default_template_by_locale.get( | ||
| locale, default_template_by_locale["en-us"] | ||
| ) | ||
|
|
||
| template_by_locale = { | ||
| "de-de": 'Besprechung "{room}" am {room_recording_date} um {room_recording_time}', | ||
| "en-us": 'Meeting "{room}" on {room_recording_date} at {room_recording_time}', | ||
| "fr-fr": 'Réunion "{room}" du {room_recording_date} à {room_recording_time}', | ||
| "nl-nl": 'Vergadering "{room}" op {room_recording_date} om {room_recording_time}', | ||
| } |
There was a problem hiding this comment.
wip: can't we rely on Django i18n for these parts? to avoid these manipulations
There was a problem hiding this comment.
True, maybe, I am not familiar enough with that. I'll check.
| "language": recording.options.get("language"), | ||
| "owner_timezone": str(owner_access.user.timezone), | ||
| "download_link": f"{get_recording_download_base_url()}/{recording.id}", | ||
| "context_language": owner_access.user.language, |
There was a problem hiding this comment.
WIP: Removing these params is inherently a breaking change, don't you think?
There was a problem hiding this comment.
They are moved in the "V2" part, so not breaking from my pov.
There was a problem hiding this comment.
(they are shared between v2 and v1)
| class PushToDocsBaseConfig(BaseModel): | ||
| """Model containing information for pushing transcript and summaries to docs.""" | ||
|
|
||
| user_email: str = Field( |
There was a problem hiding this comment.
should it be a string?
There was a problem hiding this comment.
What else are you thinking ? (like a proper email check?)
| class PushToDocsTranscriptConfig(PushToDocsBaseConfig): | ||
| """Model for push to docs information for transcripts.""" | ||
|
|
||
| download_link: str | None = Field( |
There was a problem hiding this comment.
same should it be url?
| title="Download Link", | ||
| description="The link to download the recording.", | ||
| ) | ||
| form_link: str | None = Field( |
| is_v2_implementation = False | ||
| try: | ||
| response_json = response.json() | ||
| # We do not require a job_id to avoid a breaking change | ||
| if job_id := response_json.get("job_id"): | ||
| recording.external_process_id = job_id | ||
| recording.save() | ||
| is_v2_implementation = True | ||
| except json.JSONDecodeError: |
There was a problem hiding this comment.
Should we make it explicit via Django settings whether we use v1 or v2?
Right now, checking for the presence of job_id to infer the version feels a bit weak, it’s implicit that v1 has no job_id, but not very clear.
I think we’d benefit from an explicit setting-based switch, calling either a dedicated v1 or v2 method. That would make the logic easier to follow and avoid sending requests with both v1 and v2 params mixed together.
It would also make the v1 deprecation path cleaner: we could simply remove/vendor the v1 method instead of refactoring this one later.
There was a problem hiding this comment.
True I'll like that idea
| raise AuthenticationFailed("Webhook authentication is not configured.") | ||
|
|
||
| auth_header: str = request.headers.get("Authorization") or "" | ||
| if not auth_header.startswith("Bearer "): |
There was a problem hiding this comment.
RFC 7235 says the scheme is case-insensitive
| if not required_token: | ||
| raise AuthenticationFailed("Webhook authentication is not configured.") | ||
|
|
||
| auth_header: str = request.headers.get("Authorization") or "" |
There was a problem hiding this comment.
wip: probably, use the class attribute, AUTH_HEADER and TOKEN_TYPE?
| request.META.get("REMOTE_ADDR"), | ||
| ) | ||
| raise AuthenticationFailed("Invalid authorization header format.") | ||
| token = auth_header[7:] # len("Bearer ") == 7 |
There was a problem hiding this comment.
it doesn't seem robust to me,
Could smth like this be appropriate :
scheme, _, token = auth_header.partition(" ")
if scheme.lower() != "bearer" or not token.strip():
raise AuthenticationFailed("Invalid authorization header format.")
token = token.strip()
| return f"{self.TOKEN_TYPE} realm='Storage event API'" | ||
|
|
||
|
|
||
| class RecordingProcessWebhookAuthentication(BaseAuthentication): |
There was a problem hiding this comment.
Probably we should follow the implementation from https://github.com/suitenumerique/docs/blob/3fe8bd5cc0aa250e2513297cdfda34f219b180af/src/backend/core/authentication/__init__.py#L9, and maybe backport it to the django-lasuite library
| "allowed_push_to_docs": false | ||
| }, | ||
| { | ||
| "id": "visio", |
There was a problem hiding this comment.
| "id": "visio", | |
| "id": "meet", |
| class PushToDocsBaseConfig(BaseModel): | ||
| """Model containing information for pushing transcript and summaries to docs.""" | ||
|
|
||
| user_email: str = Field( |
| recording = None | ||
|
|
||
| changed = False | ||
| if recording and data.get("type") == "transcript": |
There was a problem hiding this comment.
what are the other possible types?
| response = requests.post( | ||
| tenant.webhook_url, | ||
| json=payload.model_dump(), | ||
| headers={ | ||
| "Authorization": f"Bearer {tenant.webhook_api_key.get_secret_value()}", | ||
| "User-Agent": settings.app_external_user_agent, | ||
| }, | ||
| timeout=(10, 20), | ||
| ) | ||
| response.raise_for_status() | ||
|
|
||
| try: | ||
| response_data = response.json() | ||
| document_id = response_data.get("id", "N/A") | ||
| except (json.JSONDecodeError, AttributeError): | ||
| document_id = "Unable to parse response" | ||
| response_data = response.text |
There was a problem hiding this comment.
Do we need a retry mechanism here?
It would be a shame for the transcription to fail entirely just because the final HTTP request failed once. Seems worth making this more resilient.
| # We do it synchronously for now | ||
| if _should_push_to_docs(payload): | ||
| # Format output | ||
| content = format_transcript( | ||
| transcription_res.model_dump(), | ||
| payload.context_language, | ||
| payload.language, | ||
| payload.push_to_docs_config.download_link, | ||
| payload.push_to_docs_config.form_link, | ||
| ) | ||
|
|
||
| create_document_in_docs( | ||
| content=content, | ||
| title=payload.push_to_docs_config.title, | ||
| email=payload.push_to_docs_config.user_email, | ||
| sub=payload.user_sub, | ||
| ) | ||
|
|
||
| if _should_auto_create_summary(payload): | ||
| summary = summarize_transcription_internals( | ||
| distinct_id=payload.user_sub, | ||
| transcript=content, | ||
| session_id=self.request.id, | ||
| ) | ||
| locale = get_locale(payload.context_language, payload.language) | ||
| create_document_in_docs( | ||
| content=summary, | ||
| title=locale.summary_title_template.format( | ||
| title=payload.push_to_docs_config.title | ||
| ), | ||
| email=payload.push_to_docs_config.user_email, | ||
| sub=payload.user_sub, | ||
| ) |
There was a problem hiding this comment.
That’s one of the major changes here: waiting for the summary output synchronously. I don’t think this is the right pattern.
Right now, because of an issue in the Albert API, the summarization job is failing, and I haven’t fixed it yet. With this setup, that means all transcription jobs will fail from a Celery standpoint. To me, that clearly shows that coupling these two tasks is not the right approach.
As soon as we start working again on the summary feature, we need to be able to iterate on it freely without any risk of impacting the transcription pipeline, which is production-ready.
We’re also regressing operationally: with the current design, we lose the ability to scale workers independently. Today, we can dedicate workers to summaries (which involve external API calls and can block), while keeping transcription workers isolated and efficient.
For me, this is a blocking concern for merge.
| payload.user_sub, | ||
| ) | ||
|
|
||
| job_id = self.request.id |
There was a problem hiding this comment.
| task_id = self.request.id |
| ).allowed_push_to_docs: | ||
| logger.info("Push to docs is not allowed for tenant, skipping push to docs.") | ||
| return False | ||
| return True |
There was a problem hiding this comment.
is the info the right level?
I’d go with this approach, since all those if statements really only exist to compute the reason that gets logged. Sharing a common prefix would also make the logs easier to parse when this situation occurs.
| return True | |
| def _should_push_to_docs() -> bool: | |
| """Wip.""" | |
| if not payload.push_to_docs_config: | |
| reason = "not requested in the payload" | |
| elif not settings.is_docs_integration_enabled: | |
| reason = "docs integration is disabled" | |
| elif not settings.get_authorized_tenant( | |
| tenant_id=payload.tenant_id | |
| ).allowed_push_to_docs: | |
| reason = "not allowed for tenant" | |
| else: | |
| return True | |
| logger.info("Skipping push to docs: %s.", reason) | |
| return False |



Overview
This PR migrates the visio transcription integration from the legacy v1 summary API to the streamlined v2 API, cleaning up old code and introducing several improvements to how transcription, summarization, and analytics are handled.
Changes
🔥 Breaking / Cleanup
=> Not considered as a real breaking change as the deployment of both services simultaneously is considered stable, and visio is still compatible with a summary v1 API.
✨ Summary Service — v2 Route Extensions
create transcribepayload now supports publishing directly to Docs and triggering automatic summarizationsubis now used as the PostHogdistinct_idfor consistent cross-project analytics; a PostHog event is emitted at request creation time to ensure feature flags are properly evaluated✨ Visio — v2 Compatibility
RecordingStatusChoiceshas two new values:EXTERNAL_PROCESS_SUCCESSFULandEXTERNAL_PROCESS_FAILED, updated via a new webhook called by the transcribe serviceNotes
model_dump()in places due to incomplete Pydantic model adoption — left as-is to avoid scope creepIntegration tests
TODO :