feat: add Gemini text-to-speech service by osolmaz · Pull Request #134 · ManimCommunity/manim-voiceover

osolmaz · 2026-06-14T11:09:31Z

Opened on behalf of Onur Solmaz (osolmaz).

Summary

Gemini text-to-speech can now be used as a Manim Voiceover service.
This rebases and completes the Gemini service work from #119 on the current package layout.
It adds the service, documents both API-key and ADC authentication, updates the main demo to render with Gemini, and adds typed tests around the SDK boundary.

What Changed

The new service follows the existing Manim Voiceover service shape while keeping Google SDK details isolated at the boundary.
It supports Gemini Developer API keys by default and Vertex/ADC authentication when requested.

Added GeminiService with configurable model, voice, authentication mode, project, and location.
Added a gemini optional dependency for google-genai on supported Python versions.
Writes Gemini PCM audio as WAV files and teaches audio duration handling to read WAV metadata.
Updated the main voiceover demo to use Gemini and the current Manim Code API.
Added Gemini docs and API documentation entries.
Added focused tests for Gemini response handling, cache behavior, dotenv loading, API-key auth, and ADC auth wiring.

Testing

The local gates passed, including the demo compile check requested before merge.
The main demo also rendered locally with Gemini ADC credentials and produced media/videos/voiceover-demo/480p15/VoiceoverDemo.mp4.

uv run ruff format --check .
uv run ruff check .
uv run ty check src/manim_voiceover
uv run mypy
uv run pytest --cov=manim_voiceover --cov-fail-under=85
uv run python -m compileall -q examples
uv run pytest tests/test_examples_render.py -q
uv run manim -ql examples/voiceover-demo.py VoiceoverDemo
uv run python -m compileall -q src/manim_voiceover tests examples
uv build
uv run pip-audit
uvx slophammer-py@0.3.0 dry .
uvx slophammer-py@0.3.0 check .
PATH="$PWD/.venv/bin:$PATH" uvx slophammer-py@0.3.0 check . --execute

Risks

The main risk is Google SDK/API drift because Gemini TTS preview models can change over time.
The code keeps SDK access in one service module and covers malformed responses in tests.

Supersedes #119.

osolmaz added 2 commits June 14, 2026 19:06

feat: add Gemini text-to-speech service

cd9335b

test: cover Gemini ADC configuration

90412f0

osolmaz merged commit 45150df into main Jun 14, 2026
1 check passed

osolmaz deleted the refactor/gemini-tts branch June 14, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemini text-to-speech service#134

feat: add Gemini text-to-speech service#134
osolmaz merged 2 commits into
mainfrom
refactor/gemini-tts

osolmaz commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

osolmaz commented Jun 14, 2026

Summary

What Changed

Testing

Risks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant