Skip to content

moviendome/Talkforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

talkforge

Short AI-narrated promo videos with a lip-synced avatar of your face,
your cloned voice, and custom data-viz scenes.

GitHub stars Runpod referral Remotion OpenVoice v2

💸 New to Runpod? Sign up via this referral link and get a random credit bonus between $5 and $500 — enough to render anywhere from ~30 to ~3,000 talkforge promos before paying a cent. Runpod is the only paid component of this pipeline (~$0.10–$0.20 per typical 45-second video, measured on the RTX-4090-class GPUs we use).

Driven end-to-end by Claude Code through the /marketing-video skill — you point Claude at a product URL, it does the research, drafts the script, designs the scenes, runs the pipeline, and hands back an mp4. The manual setup below is for the underlying tools; day-to-day you don't run any of it by hand.

talkforge is the pipeline; /marketing-video is the entry point. The skill (in ~/.claude/skills/marketing-video/, part of the solana.new toolkit) handles the research, questioning, and script drafting; talkforge handles the narration synthesis, lip-sync, Remotion render, and final mux.

The output is a single self-contained .mp4. Re-rendering after a text or asset tweak typically costs ~$0.10–$0.20 in cloud GPU and ~7 minutes wall time on a modest VPS, thanks to a content-hash cache that skips anything unchanged.

Where to start?

  • First time? Drop your user filesinstall the subprojectsrender once manually to verify the rig.
  • Driving a single video? From Claude Code: /marketing-video <product url> — the skill handles research, drafting, render, and delivery.
  • Producing a series? Also bootstrap the Claude Project (one-time, ~10 min) — your ideation surface that drafts briefs to feed Claude Code.
Table of contents

See it end-to-end

The pipeline turns one selfie into a talking-head promo with your cloned voice + custom data-viz scenes:


1. Your selfie
A normal headshot — phone camera is fine.

2. AI-generated podcast scene
Cinematic studio shot generated from your selfie via ChatGPT image gen (prompt below). This becomes portrait.jpg.

3. Talkforge output
InfiniteTalk lip-syncs your cloned voice (OpenVoice v2) onto the portrait; Remotion covers the avatar with data-viz scenes between the on-camera bookends. View script
ChatGPT image-gen prompt used for step 2 (click to expand)

Use the character and clothing from the attached reference image. Create a realistic cinematic podcast studio scene with the same man seated at a modern wooden desk, speaking into a large professional black studio microphone mounted on a boom arm. Preserve his facial identity, hairstyle, light stubble, plain black crew-neck shirt, calm attentive expression, and direct eye contact with the camera.

The desk should be clean and uncluttered, with no audio mixer and no headphones. The man's forearms should rest naturally on the desk, with relaxed, believable hand posture, one hand lightly resting over the other.

The environment is a refined modern content-creator studio with dark acoustic wall panels or vertical slatted panels, a warm wooden desk, minimal modern decor, and a premium YouTube podcast aesthetic. Use cyberpunk-inspired but tasteful lighting: moody blue, cyan, purple, and magenta neon accents, soft cinematic key lighting on the face, subtle rim lighting, shallow depth of field, realistic photography, high detail, and a polished professional composition.

Keep the background clean and minimal — avoid clutter, busy shelves, and repeated branding. Place at most one tasteful focal element behind the subject (e.g. a single glowing neon wall sign with your brand's logo or symbol, or one large piece of geometric wall art). No screens, no framed photos, no plaques, no duplicate logos. Subtle decor only: a small plant, a few books, minimal geometric objects. The background should support the subject, never compete with them.

Modern podcast studio, clean desk, realistic hands, cinematic lighting, dark premium atmosphere, single tasteful neon focal element, shallow depth of field, warm wood and black materials, professional YouTube creator setup, 16:9 aspect ratio.

Tip: Where the prompt says "your brand's logo or symbol", swap in whatever fits your product — a neon "$" for a fintech, a stylized neuron for an AI tool, a logomark, even just a colored geometric shape. Leaving it generic gives the model creative latitude; specifying it gets you on-brand. The original generated portrait above used a Solana "S" logo.

Reuse the prompt with your own selfie. The pipeline doesn't care whether your portrait.jpg is a real photo or AI-generated — InfiniteTalk lip-syncs both equally well.

How you actually use this (Claude Code)

You:    /marketing-video Make a promo video for example.com
Claude: → loads the marketing-video skill from
          ~/.claude/skills/marketing-video/
        → screenshots the site with Playwright
        → reads the copy, identifies the hook + key features + CTA
        → asks 2-4 clarifying questions (audience, distribution,
          tone, must-haves) via AskUserQuestion
        → drafts a 45-55s narration in casual dev-to-dev voice,
          structured as 2 on-camera bookends + 4-6 voiceover middle
          segments
        → pastes the draft script back for your approval
You:    Approves (or tweaks)
Claude: → forks the example Remotion project to <product>-promo/
        → builds the data-viz scenes matching the product
        → invokes the talkforge pipeline: generate.py → 3-6 Runpod
          lip-sync calls + 5-6 local still-portrait builds
          (~7 min, ~$0.10-$0.20)
        → retimes scene boundaries to match measured chunk durations
        → renders Remotion + mixes SFX
        → uploads the final mp4 to litterbox + sends you the URL
You:    Watches, says "the stats segment sounds soft, EN-AU + EQ"
Claude: → swaps OpenVoice settings, busts the audio cache for that
          one segment, re-runs (~1 min, $0)
        → uploads the new version

The skill knows about story structure, research, asking the right questions. talkforge (this repo) knows about TTS, lip-sync, scene timing, caching. The skill calls into talkforge as its render backend.

The full operational runbook (recipe, cache invariants, gotchas, when to ask vs proceed) lives in CLAUDE.md. That file is loaded at the start of every Claude Code session and is the source of truth for how the pipeline is orchestrated. Read it if you want to understand the agent's playbook, or edit it to change Claude's behavior on future runs.

What the pipeline does under the hood

┌──────────────────┐    ┌──────────────────────┐    ┌────────────────────┐
│ Script segments  │───►│ TTS (local)          │───►│ Lip-sync (Runpod)  │
│ (claude-drafted) │    │  Kokoro or OpenVoice │    │  InfiniteTalk      │
└──────────────────┘    └──────────────────────┘    └─────────┬──────────┘
                                                              ▼
┌──────────────────┐    ┌──────────────────────┐    ┌────────────────────┐
│ Remotion scenes  │───►│ Remotion render      │◄───│ narration.mp4      │
│ (claude-coded)   │    │  (data viz / B-roll) │    │  (talking head)    │
└──────────────────┘    └──────────────────────┘    └────────────────────┘
                                  │
                                  ▼
                       ┌──────────────────────┐
                       │ ffmpeg mix + SFX     │
                       └─────────┬────────────┘
                                 ▼
                          final mp4

What's interesting about it

  • Per-segment mode switching. Mark each line of your script as either "video" (the viewer sees the avatar's mouth move; pay for lip-sync) or "audio" (voiceover only; the avatar is hidden by your data viz, lip-sync skipped, $0 spent on that segment). A typical bookend video has 2–4 "video" segments and the rest "audio" — most of the cloud-GPU cost is spent on the small portion the viewer actually sees.
  • Two-layer content-hash cache. Edit one sentence, and only that segment re-renders. Swap the portrait, and the audio waveforms cache-hit while only the mp4s rebuild. Edit voice settings, and the audio + mp4 both auto-bust via a worker-script hash. Cache makes iteration genuinely cheap.
  • Pluggable TTS engines. Kokoro for fast/polished/stock voices, OpenVoice v2 for cloning your own voice. Swap with a single config flag.
  • Remotion for the visuals. Code-defined React scenes, deterministic rendering, no editor required. Comes with one example project (Solana Bytes promo) that shows how to wire data viz over the narration.

Project layout

talkforge/
├── InfiniteTalk/         # Narration generator (TTS + Runpod lip-sync orchestration)
│   ├── generate.py       #   ← main script
│   ├── README.md         #   ← cache layers, modes, Runpod deployment
│   ├── voice_reference.wav  # 12-30s sample of the cloned voice
│   ├── portrait.jpg      # the face the avatar uses
│   ├── _work/            # per-chunk intermediates + cache
│   └── out.mp4           # raw narration (avatar video with audio)
│
├── videos/               # Each video lives in its own subdir
│   ├── solana-bytes-promo/   #   ← example Remotion project (committed)
│   │   ├── src/              #     React/TSX scenes
│   │   ├── scripts/          #     SFX mix script
│   │   ├── public/           #     audio SFX, screenshots
│   │   └── out/              #     final mp4 (rendered, gitignored)
│   └── <your-project>/       #   ← your videos go here (auto-gitignored)
│
└── openvoice-test/       # OpenVoice v2 + MeloTTS voice cloning (its own venv)
    ├── synth_worker.py   # long-lived TTS subprocess
    ├── README.md         # setup notes, patches, troubleshooting
    └── venv/             # isolated Python env (~720 MB, not in git)

Required user-provided files

The repo ships as a template, not a working video. Before your first render, drop these unique-to-you files into place. None of them are committed (all gitignored).

Path What Format How to get it
InfiniteTalk/portrait.jpg The face the avatar uses JPG or PNG (update IMAGE_PATH in generate.py for PNG). Aspect ratio close to OUTPUT_WIDTH × OUTPUT_HEIGHT (default 720×400 ≈ 16:9). Square works too — the pipeline center-crops. Take a clean front-facing photo. Good lighting, neutral expression, head/shoulders framing.
InfiniteTalk/voice_reference.wav Sample of the voice you want cloned WAV, mono, 24 kHz, 16-bit, 12-30 seconds. Read the script in docs/voice-reference-script.md*; it's designed to cover all phonemes the cloning model needs. Record on your phone, convert to the right format with ffmpeg -i input.m4a -ac 1 -ar 24000 -sample_fmt s16 voice_reference.wav
InfiniteTalk/.env Your Runpod credentials RUNPOD_API_KEY=<your-key> + RUNPOD_ENDPOINT_ID=<your-endpoint-id> Copy from InfiniteTalk/.env.example. Get the API key from your Runpod Settings page; get the endpoint ID by deploying the InfiniteTalk Runpod Hub template (1-click, see InfiniteTalk/README.md → Deploying your own Runpod endpoint). New to Runpod? Sign up here (referral link, get a $5-$500 random bonus = ~30-3,000 free videos). Both values only needed if your script has "video" mode segments.

* If docs/voice-reference-script.md doesn't exist yet, paste a 25-30 second monologue with varied vowels/consonants — see voice-a-style-guide.md for what the voice should sound like.

What's already there for you

The repo ships with two example Remotion projects wired up to the pipeline, so you can render a sample video on day one without writing React from scratch:

  • videos/solana-bytes-promo/ — the original Solana Bytes account-decoder promo (V1 in the Solana Concepts series). Committed as the canonical reference for the Remotion conventions this pipeline uses.

Your own videos go in videos/<your-project>/ and are automatically gitignored — fork the example, drop screenshots in public/screenshots/, edit the scenes, render. The .gitignore has a videos/* pattern with a !videos/solana-bytes-promo/ exception, so the example stays tracked while your personal videos don't bloat the repo.

Setup

1. InfiniteTalk (narration generator)

cd InfiniteTalk
pip install --user --break-system-packages python-dotenv requests
echo "RUNPOD_API_KEY=<your-key>" > .env
# Drop a portrait at portrait.jpg (square or 16:9 — will be cropped to
# OUTPUT_WIDTH × OUTPUT_HEIGHT, default 720×400)
# Drop a 12-30s voice sample at voice_reference.wav (24 kHz mono recommended)

You'll also need to deploy a Runpod endpoint that speaks the InfiniteTalk API the script expects — see InfiniteTalk/README.md → Deploying your own Runpod endpoint.

Put your endpoint ID in InfiniteTalk/.env as RUNPOD_ENDPOINT_ID=<your-endpoint-id> alongside the API key.

2. Remotion (visuals)

cd videos/<your-project>
npm install

videos/solana-bytes-promo/ is the included example. Use it as a template for your own project — see Adapting for your project.

3. OpenVoice (voice cloning) — optional

The OpenVoice v2 ecosystem has tight version pins that conflict with modern Python/torch. We isolate it in its own venv with patched requirements — see openvoice-test/README.md for the full recipe (~15 min, includes a 117 MB checkpoint download).

If you don't need voice cloning, set VOICE_ENGINE = "kokoro" in generate.py and skip OpenVoice setup entirely.

4. Kokoro (stock voices) — optional

pip install --user --break-system-packages kokoro
# 10+ stock voices — see KOKORO_VOICE constant in generate.py.
# Fast, polished, not cloneable.

Quick start: running the pipeline manually

Day-to-day, Claude Code orchestrates these steps for you. This section is for: (a) initial bring-up, (b) debugging a stuck render, (c) running the pipeline outside Claude.

Assumes all three subprojects are set up (see Setup above) and you have a RUNPOD_API_KEY for a deployed InfiniteTalk endpoint (see Deploying the Runpod endpoint).

# 1. Generate narration (TTS + lip-sync). ~7 min, ~$0.10-$0.20 in Runpod
#    for a typical bookend video (3-6 visible video chunks + 5-6 voiceover).
cd InfiniteTalk
python3 generate.py
# Produces out.mp4 — the avatar talking.

# 2. Wire that narration into your Remotion composition.
cp out.mp4 ../videos/<your-project>/public/avatar/narration.mp4

# 3. Render the visuals over the narration.
cd ../videos/<your-project>
npx remotion render Master out/master-silent.mp4

# 4. Mix narration + SFX onto the final master (script lives in
#    videos/solana-bytes-promo/scripts/mix-audio-narration.ts as a template).
npx tsx scripts/mix-audio-narration.ts
# Final video: out/master.mp4

Common tweaks

Change the narration text

Edit InfiniteTalk/generate.pySCRIPT_SEGMENTS list of (mode, text) tuples:

SCRIPT_SEGMENTS = [
    ("video", "Your opening line (on-camera)."),
    ("audio", "Voiceover line that plays under your data viz."),
    ("audio", "Another voiceover line."),
    ("video", "Your closing line (on-camera)."),
]

Re-run the pipeline. The cache keeps unchanged segments — only modified ones re-render. If you only tweak an "audio" segment, Runpod isn't called at all.

Swap the avatar

Drop a new portrait at InfiniteTalk/portrait.jpg. Re-run; the cache key includes the image hash, so all mp4 chunks rebuild (3-6 Runpod calls for the visible avatar segments + local still-video rebuilds for voiceover). The approved audio waveforms cache-hit, so no voice re-roll.

Re-roll one segment

rm InfiniteTalk/_work/job_005_audio.key   # bust audio cache for that seg
python3 InfiniteTalk/generate.py
# Only that one segment re-synthesizes. Useful when TTS produces a bad
# take for a single segment but the rest sound right.

Switch TTS engine

Edit InfiniteTalk/generate.py: set VOICE_ENGINE = "kokoro" or "openvoice". Caches are keyed by engine — switching back and forth is instant (each engine keeps its own cache).

Change cloned-voice settings (OpenVoice)

Edit openvoice-test/synth_worker.py:

  • SPEAKER_KEY"EN-US" / "EN-AU" / "EN-BR" / "EN-Default" / "EN_INDIA"
  • ENABLE_WATERMARKFalse recovers presence (default)
  • PRESENCE_EQ — ffmpeg filter chain applied post-synthesis

Any edit auto-invalidates audio + mp4 caches via a script-source hash.

How the pipeline flows

SCRIPT_SEGMENTS in generate.py
    │
    │ build_jobs (clause-pack video to ≤3s sub-chunks):
    │
    ▼
┌────────────────────────────────┐
│ Per-job dispatch               │
│   audio mode → 1 wav synth     │
│   video mode → N wav sub-chunks│
└────────────────────────────────┘
    │
    │ for each job:
    │
    ▼
┌─────────────────────┐    ┌───────────────────┐
│ TTS synth           │    │ Cache check       │
│   kokoro CLI        │◄─► │  (text/voice/img/ │
│   or OpenVoice      │    │   worker hash)    │
│   worker subprocess │    └───────────────────┘
└─────────────────────┘
    │
    ▼
┌───────────────┐    ┌──────────────────────┐
│ audio mode    │    │ video mode           │
│ → still-video │    │ → Runpod InfiniteTalk│
│   (ffmpeg)    │    │   (lip-sync ~2 min)  │
└───────────────┘    └──────────────────────┘
    │                          │
    └────────────┬─────────────┘
                 ▼
        concat → out.mp4
                 │
                 ▼
        public/avatar/narration.mp4
                 │
                 ▼
        ┌────────────────┐
        │ Remotion render│  → master-silent.mp4
        └────────────────┘
                 │
                 ▼
        ┌────────────────┐
        │ ffmpeg mix SFX │  → master.mp4
        └────────────────┘

Adapting for your project

The example Remotion project (videos/solana-bytes-promo/) shows the pattern, but it's specific to one product. To repoint the pipeline at your project:

  1. Fork the Remotion subdircp -r videos/solana-bytes-promo videos/<your-project> and edit:
    • src/Master.tsxSCENES map and scene composition
    • src/scenes/*.tsx — replace each scene's content
    • src/data/*.ts — your product's data
    • public/screenshots/ — your product's screenshots
    • src/theme.ts — colors and fonts
  2. Edit SCRIPT_SEGMENTS in generate.py to match your scene structure: "video" for visible avatar moments, "audio" for voiceover under data viz.
  3. Update SCENES frame boundaries in Master.tsx to land at the measured chunk endings (run generate.py once, check the printed per-chunk durations, multiply by fps to get frames).
  4. Update SFX timing in scripts/mix-audio-narration.ts to land on your scene cuts.

The InfiniteTalk and openvoice-test subdirs stay as-is — they're project-agnostic. The pipeline doesn't know or care what your scenes contain.

Cost reference

Operation Time Cost
Full re-render, 3-6 visible video chunks + 5-6 voiceover, fresh inputs ~7 min ~$0.10–$0.20
Full re-render, cached audio (e.g. image-only swap) ~5 min ~$0.10–$0.20 (same Runpod count)
Single audio segment changed ~1 min $0 (no Runpod)
Single visible video segment changed ~3 min ~$0.03
Same inputs (cache fully hits) <30 s $0
Remotion render ~1 min $0
SFX mix <10 s $0

Runpod is the only paid component. Each "video"-mode sub-chunk ≤3 s costs ~$0.03 on an RTX-4090-class GPU (≈60 s of GPU time at Runpod's serverless rate). Audio-mode segments are local-only. Picking a beefier GPU (A100, H100) makes the lip-sync render faster but doesn't improve the output for this model — stick with 4090/L40-class for the best cost/quality trade.

The chunk count per render isn't fixed — generate.py greedy-packs clauses up to 3 s each, so every comma inside a "video"-mode segment becomes another chunk. A natural-language outro and URL typically yield 4–6 visible chunks total (≈$0.12–$0.18); a bookend with no commas in either runs 3 chunks (≈$0.10).

💸 New to Runpod? Sign up via this referral link and get a random credit bonus between $5 and $500. At ~$0.15 per typical promo render, that's anywhere from ~30 to ~3,000 free videos before you pay anything.

Dependencies summary

System (all subprojects):

  • Python 3.12+
  • Node.js 18+
  • ffmpeg with libx264, AAC, libsoxr (default Ubuntu build is fine)

Python:

  • python-dotenv, requests (for generate.py)
  • kokoro (only if using kokoro engine)
  • Everything else lives in openvoice-test/venv/ (isolated)

Node:

  • Remotion 4.x + plugins (in your Remotion subproject's package.json)
  • tsx for the SFX mix script

Cloud:

Files you'll iterate on

What Where
Narration script InfiniteTalk/generate.pySCRIPT_SEGMENTS
TTS engine + Kokoro voice InfiniteTalk/generate.pyVOICE_ENGINE, KOKORO_VOICE
OpenVoice config (speaker / EQ / wavmark) openvoice-test/synth_worker.py
Runpod endpoint ID + API key InfiniteTalk/.envRUNPOD_ENDPOINT_ID, RUNPOD_API_KEY
Scene timing (frame boundaries) <your-project>/src/Master.tsxSCENES
Scene visuals <your-project>/src/scenes/*.tsx
Compose duration <your-project>/src/Root.tsxdurationInFrames
SFX events <your-project>/scripts/mix-audio-narration.ts
Portrait InfiniteTalk/portrait.jpg
Voice reference InfiniteTalk/voice_reference.wav

Troubleshooting

  • "Could not load this library: ...libcudart.so..." — torchaudio version mismatch with system torch. Pin torchaudio to match (e.g. pip install "torchaudio==2.10.*" for torch 2.10.x).
  • "You need to install fugashi" — MeloTTS trying to import Japanese support. The patched melo/text/cleaner.py lazy-loads per language; if you reinstalled MeloTTS, re-apply the patch (see openvoice-test/README.md).
  • "averaged_perceptron_tagger_eng not found"python -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')" from inside the OpenVoice venv.
  • Mp4 has wrong audio after config change — check that you're on the latest generate.py; the mp4 cache key now folds in the worker source hash, but older builds didn't.
  • Lip-sync drifts late in a chunk — the chunk exceeded Runpod's 3.24 s cap; shorten the spoken text or split the segment.
  • Voice sounds too soft — see the voice configuration section.
  • {"error": "..."} from Runpod — the deployed endpoint isn't speaking the InfiniteTalk API the script expects. See Deploying your own Runpod endpoint for the contract.

Running a content series

If you're producing multiple videos around a theme (e.g. "Solana Concepts"), the natural setup is a two-surface workflow that separates creative ideation from pipeline execution:

  1. A Claude Project (web) owns the ideation surface — backlog tracking, concept selection, script drafting, "what next" loops. Lightweight, works on mobile, persistent context across sessions.
  2. This repo + Claude Code owns the execution surface — turns an approved brief into pixels via the /marketing-video skill.

One-time setup (~10 min)

Bootstrap the web Project:

  1. Open claude.ai/projectsCreate Project. Name it whatever fits (e.g. "Solana Concepts — Video Series").

  2. Custom instructions → paste the entire block from docs/claude-project-instructions.md (everything inside the ``` block under Custom Instructions).

  3. Project knowledge → upload these files from this repo (drag- and-drop into the sidebar):

  4. Smoke test: in a new conversation in the Project, ask "What concept should we cover next?" — it should suggest 2-3 untouched topics from the backlog with one-line angles for each. If it asks "what's this project about?", the custom instructions didn't paste correctly.

Per-video workflow

1. (web)        Open the Project → "What concept next?"
                → ideation loop → fills brief template
                → outputs a ready-to-paste brief

2. (terminal)   Clear Claude Code context (/clear or new session)
                so the agent loads CLAUDE.md fresh

3. (terminal)   /marketing-video <paste the entire brief>

4. (terminal)   Watch Claude execute the runbook:
                  ✓ load CLAUDE.md (you'll see it cite the runbook)
                  ✓ validate brief against series-brief-template.md
                  ✓ if brief is incomplete, AskUserQuestion for gaps
                  ✓ create videos/<new-name>/ by forking the example
                  ✓ run generate.py → retime SCENES → render → mix
                  ✓ upload mp4 + send you the URL
                  ✓ append shipped entry to docs/concepts-covered.md

5. (web)        Refresh the concepts-covered.md upload in the
                Project so the next ideation session sees what
                shipped (in-place edit in the Knowledge sidebar)

Sanity checks during execution

Good signs that Claude is following the runbook:

  • References CLAUDE.md or the style guide by name in its narration
  • Quotes costs in $0.03/chunk or $0.10-$0.20/video (the latest numbers in CLAUDE.md). If you see "$0.25" or "$0.75", your uploaded docs are stale — refresh them.
  • Names the new project videos/<concept>-promo/ per the layout convention
  • Hyphenates 2-3 letter acronyms in SCRIPT_SEGMENTS (e.g. P-D-A, I-D) without being told to

Red flags that the docs need tightening (tell me what happened, I'll fix the docs):

  • Asks "where should I put the new video?" — layout convention isn't surfacing
  • Forgets to hyphenate acronyms — style guide isn't reaching Claude
  • Tries to run commands from the wrong cwd — recipe step needs absolute paths
  • Asks 4+ pipeline-mechanics questions — CLAUDE.md isn't being read

When the Project knowledge gets stale

The repo changes faster than you'll refresh Project Knowledge. As a rule, refresh after meaningful commits to:

  • docs/concepts-covered.md (every shipped video)
  • CLAUDE.md or docs/voice-a-style-guide.md (when conventions change)
  • README.md (when costs / structure / setup change)

Pull the latest from raw.githubusercontent.com/moviendome/Talkforge/main/ or git pull and re-upload via the Project's Knowledge sidebar.

See also

  • CLAUDE.md — operational runbook for the Claude agent. Recipe for the full URL → mp4 workflow, cache invariants, gotchas learned the hard way, decision rules for "ask the user vs proceed".
  • InfiniteTalk/README.md — narration generator, cache layers, Runpod endpoint deployment guide
  • openvoice-test/README.md — voice cloning setup, patches, dependencies, alternative engines
  • docs/ — Claude Project setup, series brief template, concepts-covered backlog

About

Short AI-narrated promo videos with a lip-synced avatar (your face), cloned voice via OpenVoice v2, and code-defined Remotion scenes. Driven end-to-end by Claude Code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors