feat: add optional TwelveLabs Pegasus provider (--provider twelvelabs) for long-video, low-token analysis by mohit-twelvelabs · Pull Request #36 · bradautomates/claude-video

mohit-twelvelabs · 2026-06-22T01:53:49Z

Who I am

Hi! I'm Mohit, and I work @ TwelveLabs. I use /watch and wanted to contribute an integration with our video model as an optional parser.

What TwelveLabs / Pegasus is

TwelveLabs builds video-understanding foundation models. Pegasus is its generative video-language model: you point it at a video and it returns text grounded in the actual pixels and audio. It performs its own ASR and produces timestamped, temporally grounded output with no pre-indexing required.

What this adds

The default frames pipeline is great for short clips and pixel-level inspection, but every frame is an image. That means token cost and context limits can make long videos a sparse, expensive scan.

This PR adds --provider twelvelabs. Instead of extracting frames and a Whisper transcript, it hands the whole video to Pegasus, which analyzes it server-side and returns text: a verbatim, timestamped transcript plus a scene-by-scene visual walkthrough.

Claude then reads a few KB of text instead of 80–100 JPEGs, so there is no per-frame image-token cost, no context-length ceiling, and no Whisper key required because Pegasus does its own ASR.

Highlights

Default unchanged: --provider defaults to frames; nothing about the existing path changes.
Long videos: anything over --chunk-minutes is split with ffmpeg -c copy, analyzed per chunk, and merged into one report with absolute-timestamp segment headings.
Auto chunk resizing: chunk length auto-shrinks and re-splits to stay under TwelveLabs' 200 MB direct-upload cap.
Pure stdlib: mirrors whisper.py and adds no new pip dependencies.
New scripts:
- scripts/twelvelabs.py: REST client for asset upload, async analyze task creation, and polling.
- scripts/chunk.py: segmenting and trimming utilities.
Setup support: setup.py scaffolds an optional TWELVELABS_API_KEY, treats it as satisfying preflight, and reports has_twelvelabs_key in --json.
New flags:
- --provider
- --tl-model
- --tl-prompt
- --tl-max-tokens
- --tl-temperature
- --chunk-minutes
Safety/edge-case handling: max_tokens is clamped to the model range, sub-4s clips are guarded, and multipart filenames are sanitized.
Tests: tests/test_provider.py uses stdlib unittest and covers chunk planning, segment offsets, prompt building, result extraction, token clamping, and filename sanitization.

Testing

Verified end-to-end against the live TwelveLabs API on both:

single-clip videos
chunked videos with multi-segment analysis and merged absolute offsets

Also verified that the default frames path is unchanged.

All unit tests pass.

When to use this provider

This is useful for long videos, tight token budgets, summary/Q&A workflows, and transcript-heavy use cases.

The existing frames provider remains best for short clips and exact visual detail. This is an alternative, not a replacement.

Requirements

Requires only a TWELVELABS_API_KEY. A free tier is available.

Thanks for building /watch

Add an opt-in parser backend that hands the video to TwelveLabs' Pegasus video-language model for on-the-fly analysis instead of extracting frames and a Whisper transcript. Pegasus analyzes the video server-side (pixels + its own audio ASR) and returns a verbatim, timestamped transcript plus a scene-by-scene visual walkthrough as TEXT — so Claude reads a few KB instead of 80-100 frame images. This removes the per-frame image-token cost and the context-length ceiling that make long videos expensive in the default frames mode, and needs no Whisper key. - Default provider stays `frames`; existing behavior is unchanged. - Videos longer than --chunk-minutes (default 30) are split with `ffmpeg -c copy`, analyzed per-chunk, and merged into one report with absolute-timestamp segment headings. Chunk length auto-shrinks and re-splits to stay under the 200 MB direct-upload cap. - New scripts/twelvelabs.py (pure-stdlib REST client: asset upload, async analyze task, polling — mirrors whisper.py) and scripts/chunk.py (segmenting + trimming). - setup.py scaffolds an optional TWELVELABS_API_KEY, treats it as satisfying preflight, and reports has_twelvelabs_key in --json. - New flags: --provider, --tl-model, --tl-prompt, --tl-max-tokens, --tl-temperature, --chunk-minutes. max_tokens is clamped to the model range; sub-4s clips and the multipart filename are guarded. - tests/test_provider.py covers chunk planning, segment offsets, prompt building, result extraction, max_tokens clamping, and filename sanitization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add optional TwelveLabs Pegasus provider (--provider twelvelabs) for long-video, low-token analysis#36

feat: add optional TwelveLabs Pegasus provider (--provider twelvelabs) for long-video, low-token analysis#36
mohit-twelvelabs wants to merge 1 commit into
bradautomates:mainfrom
mohit-twelvelabs:feat/twelvelabs-pegasus-provider

mohit-twelvelabs commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohit-twelvelabs commented Jun 22, 2026

Who I am

What TwelveLabs / Pegasus is

What this adds

Highlights

Testing

When to use this provider

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant