Skip to content

feat: add opt-in TwelveLabs video modality (Pegasus + Marengo)#301

Open
mohit-twelvelabs wants to merge 1 commit into
HKUDS:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration
Open

feat: add opt-in TwelveLabs video modality (Pegasus + Marengo)#301
mohit-twelvelabs wants to merge 1 commit into
HKUDS:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

Conversation

@mohit-twelvelabs

Copy link
Copy Markdown

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

Description

This PR adds an opt-in video modality to RAG-Anything, backed by the TwelveLabs platform. It lets RAG-Anything ingest and retrieve video the same way it already handles images, tables and equations.

  • Pegasus (analyze) generates a detailed transcript/description of a video. That text flows through the existing knowledge-graph + chunk pipeline via BaseModalProcessor._create_entity_and_chunk — no new retrieval path, no special-casing downstream.
  • Marengo (embed) produces a 512-dim multimodal embedding, returned on the item's entity_info (tl_video_embedding) for semantic video retrieval. A companion embed_text() lets you score a text query in the same 512-dim space.

Why it helps this project

RAG-Anything is explicitly multimodal, but video isn't yet a first-class modality. Pegasus + Marengo give a turnkey way to bring video understanding and video↔text semantic search into the existing graph/RAG pipeline, with no extra infra on the user's side.

Changes Made

  • raganything/twelvelabs.py: new TwelveLabsModalProcessor (mirrors GenericModalProcessor). Accepts video_url, video_path (uploaded as a TwelveLabs asset), or video_id.
  • config.py: enable_video_processing flag (env ENABLE_VIDEO_PROCESSING), default False.
  • raganything.py: registers the video processor only when enabled and twelvelabs is installed and a key is present; otherwise it logs a warning and skips — initialization never breaks.
  • utils.py: dispatch routes "video" to the processor (falls back to generic when unregistered, never KeyError).
  • __init__.py: optional TwelveLabsModalProcessor export (same try/except pattern as other optional features).
  • pyproject.toml / setup.py: new [video] extra (twelvelabs>=1.2.8), also folded into [all].
  • env.example, requirements.txt, README.md: docs + config entries.

Opt-in / non-breaking

Disabled by default. With ENABLE_VIDEO_PROCESSING unset, there are zero behavioural changes and no new hard dependency (twelvelabs only installs via the [video]/[all] extra).

How it was tested

tests/test_twelvelabs_integration.py:

  • No-network unit tests: config default, dispatch routing/fallback, and Pegasus/Marengo request wiring against a mocked TwelveLabs client.
  • A live smoke test (gated on TWELVELABS_API_KEY, skipped in CI without it) asserting a real Marengo text embedding is 512-dim — verified locally (9 passed).

Ran the repo's ruff format --check and ruff check --ignore=E402 on all changed files — clean.

Note: full Pegasus/Marengo video runs are server-side and can be slow (Marengo video embedding is an async, polled task). The video paths are wiring-verified against the SDK contract; the synchronous Marengo text-embedding path is verified end-to-end against the live API.

Checklist

  • Changes tested locally
  • Documentation updated
  • Unit tests added

Additional Notes

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Add a video modality backed by the TwelveLabs platform:
- Pegasus (analyze) generates a transcript/description that flows through the
  existing knowledge-graph + chunk pipeline like every other modality.
- Marengo (embed) produces a 512-dim multimodal embedding returned on the
  item's entity_info for semantic video retrieval (with embed_text() for
  text queries in the same space).

TwelveLabsModalProcessor mirrors GenericModalProcessor and reuses
BaseModalProcessor._create_entity_and_chunk. Registered as 'video' only when
ENABLE_VIDEO_PROCESSING is set and the 'twelvelabs' package is installed
(new [video] extra); disabled by default, so existing behaviour is unchanged.

Adds config flag, dispatch routing, optional package export, env.example
entries, README docs, and tests (no-network wiring unit tests + a live
Marengo 512-dim smoke test gated on TWELVELABS_API_KEY).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant