feat: add opt-in TwelveLabs video modality (Pegasus + Marengo)#301
Open
mohit-twelvelabs wants to merge 1 commit into
Open
feat: add opt-in TwelveLabs video modality (Pegasus + Marengo)#301mohit-twelvelabs wants to merge 1 commit into
mohit-twelvelabs wants to merge 1 commit into
Conversation
Add a video modality backed by the TwelveLabs platform: - Pegasus (analyze) generates a transcript/description that flows through the existing knowledge-graph + chunk pipeline like every other modality. - Marengo (embed) produces a 512-dim multimodal embedding returned on the item's entity_info for semantic video retrieval (with embed_text() for text queries in the same space). TwelveLabsModalProcessor mirrors GenericModalProcessor and reuses BaseModalProcessor._create_entity_and_chunk. Registered as 'video' only when ENABLE_VIDEO_PROCESSING is set and the 'twelvelabs' package is installed (new [video] extra); disabled by default, so existing behaviour is unchanged. Adds config flag, dispatch routing, optional package export, env.example entries, README docs, and tests (no-network wiring unit tests + a live Marengo 512-dim smoke test gated on TWELVELABS_API_KEY).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).
Description
This PR adds an opt-in
videomodality to RAG-Anything, backed by the TwelveLabs platform. It lets RAG-Anything ingest and retrieve video the same way it already handles images, tables and equations.analyze) generates a detailed transcript/description of a video. That text flows through the existing knowledge-graph + chunk pipeline viaBaseModalProcessor._create_entity_and_chunk— no new retrieval path, no special-casing downstream.embed) produces a 512-dim multimodal embedding, returned on the item'sentity_info(tl_video_embedding) for semantic video retrieval. A companionembed_text()lets you score a text query in the same 512-dim space.Why it helps this project
RAG-Anything is explicitly multimodal, but video isn't yet a first-class modality. Pegasus + Marengo give a turnkey way to bring video understanding and video↔text semantic search into the existing graph/RAG pipeline, with no extra infra on the user's side.
Changes Made
raganything/twelvelabs.py: newTwelveLabsModalProcessor(mirrorsGenericModalProcessor). Acceptsvideo_url,video_path(uploaded as a TwelveLabs asset), orvideo_id.config.py:enable_video_processingflag (envENABLE_VIDEO_PROCESSING), default False.raganything.py: registers thevideoprocessor only when enabled andtwelvelabsis installed and a key is present; otherwise it logs a warning and skips — initialization never breaks.utils.py: dispatch routes"video"to the processor (falls back togenericwhen unregistered, neverKeyError).__init__.py: optionalTwelveLabsModalProcessorexport (same try/except pattern as other optional features).pyproject.toml/setup.py: new[video]extra (twelvelabs>=1.2.8), also folded into[all].env.example,requirements.txt,README.md: docs + config entries.Opt-in / non-breaking
Disabled by default. With
ENABLE_VIDEO_PROCESSINGunset, there are zero behavioural changes and no new hard dependency (twelvelabsonly installs via the[video]/[all]extra).How it was tested
tests/test_twelvelabs_integration.py:TWELVELABS_API_KEY, skipped in CI without it) asserting a real Marengo text embedding is 512-dim — verified locally (9 passed).Ran the repo's
ruff format --checkandruff check --ignore=E402on all changed files — clean.Checklist
Additional Notes
You can grab a free API key at https://twelvelabs.io — there's a generous free tier.