Skip to content

feat: add Gemini text-to-speech service#134

Merged
osolmaz merged 2 commits into
mainfrom
refactor/gemini-tts
Jun 14, 2026
Merged

feat: add Gemini text-to-speech service#134
osolmaz merged 2 commits into
mainfrom
refactor/gemini-tts

Conversation

@osolmaz

@osolmaz osolmaz commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Opened on behalf of Onur Solmaz (osolmaz).

Summary

Gemini text-to-speech can now be used as a Manim Voiceover service.
This rebases and completes the Gemini service work from #119 on the current package layout.
It adds the service, documents both API-key and ADC authentication, updates the main demo to render with Gemini, and adds typed tests around the SDK boundary.

What Changed

The new service follows the existing Manim Voiceover service shape while keeping Google SDK details isolated at the boundary.
It supports Gemini Developer API keys by default and Vertex/ADC authentication when requested.

  • Added GeminiService with configurable model, voice, authentication mode, project, and location.
  • Added a gemini optional dependency for google-genai on supported Python versions.
  • Writes Gemini PCM audio as WAV files and teaches audio duration handling to read WAV metadata.
  • Updated the main voiceover demo to use Gemini and the current Manim Code API.
  • Added Gemini docs and API documentation entries.
  • Added focused tests for Gemini response handling, cache behavior, dotenv loading, API-key auth, and ADC auth wiring.

Testing

The local gates passed, including the demo compile check requested before merge.
The main demo also rendered locally with Gemini ADC credentials and produced media/videos/voiceover-demo/480p15/VoiceoverDemo.mp4.

  • uv run ruff format --check .
  • uv run ruff check .
  • uv run ty check src/manim_voiceover
  • uv run mypy
  • uv run pytest --cov=manim_voiceover --cov-fail-under=85
  • uv run python -m compileall -q examples
  • uv run pytest tests/test_examples_render.py -q
  • uv run manim -ql examples/voiceover-demo.py VoiceoverDemo
  • uv run python -m compileall -q src/manim_voiceover tests examples
  • uv build
  • uv run pip-audit
  • uvx slophammer-py@0.3.0 dry .
  • uvx slophammer-py@0.3.0 check .
  • PATH="$PWD/.venv/bin:$PATH" uvx slophammer-py@0.3.0 check . --execute

Risks

The main risk is Google SDK/API drift because Gemini TTS preview models can change over time.
The code keeps SDK access in one service module and covers malformed responses in tests.

Supersedes #119.

@osolmaz osolmaz merged commit 45150df into main Jun 14, 2026
1 check passed
@osolmaz osolmaz deleted the refactor/gemini-tts branch June 14, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant