An AI-powered audiobook generator that converts PDF books into high-quality audiobooks using LangGraph workflow orchestration with three TTS engine options: Edge TTS (free, online), Neuphonic (premium quality), and pyttsx3 (offline).
- PDF to Audiobook Conversion: Convert PDF books into natural-sounding audiobooks
- Intelligent Chapter Detection: Automatically detect and split books into chapters using regex patterns
- Multiple Audio Formats: Support for MP3, WAV, OGG, and FLAC output formats
- LangGraph Workflow: Robust workflow orchestration with conditional routing and error handling
- Triple TTS Engine Support:
- Edge TTS (free, cloud-based, high-quality neural voices)
- Neuphonic TTS (premium, highest quality neural voices - requires API key)
- pyttsx3 (offline, system voices)
- CLI Interface: Beautiful command-line interface powered by Typer and Rich
- Configurable Settings: Customizable voice, speed, format, and chapter detection patterns
- Python 3.13 Compatible: Modern Python support with ffmpeg-based format conversion
Audiobook-Generator/
├── agents/ # LangGraph agent implementation
│ ├── audiobook_agent.py # Main agent workflow with parse/split/tts nodes
│ └── audiobook_state.py # State management for the workflow
├── models/ # Data models
│ ├── book.py # Book model with chapters
│ ├── chapter.py # Chapter model with text and audio
│ └── config.py # Configuration models (TTS, Output, ChapterDetection)
├── modules/ # Core processing modules
│ ├── parser/ # PDF parsing
│ │ ├── base_parser.py # Abstract parser interface
│ │ └── pdf_parser.py # PDF parser implementation
│ ├── splitter/ # Chapter splitting
│ │ └── chapter_splitter.py # Regex-based chapter detection
│ └── tts/ # Text-to-speech engines
│ ├── base_tts.py # Abstract TTS interface
│ ├── edge_tts_provider.py # Edge TTS (default, online)
│ ├── neuphonic_tts.py # Neuphonic TTS (premium quality)
│ └── pyttsx3_tts.py # pyttsx3 TTS (offline)
├── cli/ # Command-line interface
│ └── main.py # Typer CLI implementation
├── utils/ # Utility functions
├── audiobook-gen-offline # Offline wrapper script
├── requirements.txt # Python dependencies
└── setup.py # Package installation
- Python 3.10+ (supports Python 3.10, 3.11, 3.12, 3.13)
- ffmpeg (for audio format conversion)
# On Ubuntu/Debian sudo apt-get install ffmpeg # On macOS brew install ffmpeg # On Windows # Download from https://ffmpeg.org/download.html
- Clone the repository:
git clone https://github.com/yourusername/Audiobook-Generator.git
cd Audiobook-Generator- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the package:
pip install -e .This installs the audiobook-gen command in your virtual environment.
For premium quality audio using Neuphonic TTS:
- Get an API key from https://app.neuphonic.com
- Set the environment variable:
export NEUPHONIC_API_KEY="your-api-key-here"
# Or add to your .env file
echo "NEUPHONIC_API_KEY=your-api-key-here" >> .envOption 1: Using audiobook-gen (requires activated venv)
# Make sure venv is activated first!
source venv/bin/activate
# Online Mode (Edge TTS - requires internet, free)
audiobook-gen generate /path/to/book.pdf
# Offline Mode (uses Neuphonic TTS if API key is set)
audiobook-gen generate /path/to/book.pdf --offlineOption 2: Using the offline wrapper script (works without activating venv)
# High-quality offline mode - works directly
./audiobook-gen-offline generate /path/to/book.pdfOption 3: Using Python module (works without activating venv)
# From project directory
python -m cli.main generate /path/to/book.pdf# View help
audiobook-gen --help
# View version
audiobook-gen version
# List available voices (for pyttsx3)
audiobook-gen list-voices
# Generate audiobook with options
audiobook-gen generate --help| Option | Short | Default | Description |
|---|---|---|---|
--output-dir |
-o |
./audiobooks/<book_name> |
Output directory |
--format |
-f |
mp3 |
Audio format (mp3, wav, ogg, flac) |
--voice |
-v |
default |
TTS voice to use |
--speed |
-s |
1.0 |
Speech speed (0.5 - 2.0) |
--pattern |
-p |
^(Chapter|CHAPTER)\s+\d+ |
Chapter detection regex |
--offline |
- | - | Use offline TTS mode (Neuphonic if API key set) |
Basic generation (Edge TTS, free, online):
audiobook-gen generate mybook.pdfCustom output directory and speed:
audiobook-gen generate mybook.pdf -o ~/Audiobooks/mybook -s 1.5WAV format for maximum quality:
audiobook-gen generate mybook.pdf -f wavOffline mode with Neuphonic (premium quality, requires API key):
./audiobook-gen-offline generate mybook.pdfCustom chapter pattern (for "Part 1", "Part 2" style):
audiobook-gen generate mybook.pdf -p "^Part \d+"Full path with spaces:
audiobook-gen generate "/path/with spaces/My Book.pdf" -o ./outputSlow down speech for better comprehension:
audiobook-gen generate mybook.pdf -s 0.8| Engine | Quality | Speed | Internet | Cost | Use Case |
|---|---|---|---|---|---|
| Edge TTS | High | Fast | Required | Free | Default, best for most users |
| Neuphonic | Highest | Medium | Required | Paid API | Premium audiobooks, professional use |
| pyttsx3 | Basic | Fast | Not required | Free | True offline, basic needs |
When to use each:
- Edge TTS (default): Great quality, free, requires internet
- Neuphonic (--offline with API key): Best quality for premium audiobooks
- pyttsx3 (--offline without API key): Truly offline, basic system voices
Generated audiobooks are organized as follows:
audiobooks/
└── mybook/
└── chapters/
├── chapter_01.mp3
├── chapter_02.mp3
├── chapter_03.mp3
└── ...
Each chapter is saved as a separate audio file for easy navigation.
The audiobook generation process is orchestrated by a LangGraph workflow with three main nodes:
-
Parse Node (
parse_node): Extracts text and metadata from PDF files- Uses PyPDF2 for PDF parsing
- Extracts title, author, and raw text
- Validates content existence
- Falls back to filename if no title metadata
-
Split Node (
split_node): Splits the book into chapters- Uses regex patterns for chapter detection
- Falls back to single chapter if no matches found
- Configurable chapter detection patterns
-
TTS Node (
tts_node): Generates audio for each chapter- Supports three TTS engines (Edge TTS, Neuphonic, pyttsx3)
- Async audio generation for efficiency
- Supports multiple audio formats (MP3, WAV, OGG, FLAC)
- Handles partial failures gracefully
- Generates individual audio files per chapter
The workflow includes robust error handling:
- Fatal Errors: Stop the workflow (e.g., file not found, all chapters failed)
- Partial Failures: Continue processing (e.g., some chapters failed to generate)
- Conditional Routing:
route_after_stepmethod routes based on error severity - Graceful Degradation: Falls back to single chapter if chapter detection fails
The AudiobookState TypedDict maintains workflow state:
{
"book_path": str, # Path to input PDF
"book": Optional[Book], # Book object with chapters
"config": AudiobookConfig, # Configuration settings
"output_dir": str, # Output directory path
"error": Optional[str] # Error message (prefixed with "fatal" for fatal errors)
}- Provider: TTS engine (Edge TTS, Neuphonic, or pyttsx3)
- Voice: Voice model to use
- Edge TTS: "en-US-AriaNeural", "en-GB-SoniaNeural", etc.
- Neuphonic: voice_id from Neuphonic dashboard
- pyttsx3: System voice names (use
list-voices)
- Speed: Speech rate multiplier (0.5 - 2.0)
- 0.5 = 50% slower
- 1.0 = normal speed
- 2.0 = 2x faster
- Language: Language code (default: en)
- Format: Audio format (mp3, wav, ogg, flac)
- Bitrate: Audio bitrate for compressed formats (default: 128k)
- Directory: Output directory path
- Method: Detection method (currently supports regex)
- Pattern: Regex pattern for matching chapter headings
- Default:
^(Chapter|CHAPTER)\s+\d+ - Also works with:
^Part \d+,^Section \d+, etc.
- Default:
Core dependencies:
- langgraph (>=0.2.0): Workflow orchestration
- langchain (>=0.3.0): LangChain framework
- edge-tts (>=7.2.0): Microsoft Edge TTS (online, free)
- pyneuphonic (>=1.8.0): Neuphonic TTS (premium quality)
- pyttsx3 (>=2.90): Offline TTS engine
- pypdf2 (>=3.0.0): PDF parsing
- typer (>=0.12.0): CLI framework
- rich (>=13.0.0): Beautiful CLI output
- pyyaml (>=6.0): Configuration management
- python-dotenv (>=1.0.0): Environment management
- ebooklib (>=0.18): EPUB support (future feature)
Note: Audio format conversion uses ffmpeg directly (Python 3.13+ compatible). The pydub library has been removed for Python 3.13 compatibility.
pytest tests/- Models: Dataclasses for Book, Chapter, and configuration
- Modules: Pluggable parsers, splitters, and TTS providers following abstract base classes
- Agents: LangGraph workflow orchestration with conditional routing
- CLI: User-facing command-line interface with Typer
- Create a new class in
modules/tts/that extendsBaseTTS - Implement the
generate_audiomethod - Add the provider to the TTS factory or configuration
- Edge TTS support (free, online)
- Neuphonic TTS support (premium quality)
- pyttsx3 TTS support (offline)
- Multiple audio formats (MP3, WAV, OGG, FLAC)
- Python 3.13 compatibility
- Add support for EPUB files
- AI-based chapter detection using LLMs
- OpenAI TTS support
- Playlist generation for full audiobook (M3U/PLS)
- Resume from partial generation
- Parallel chapter processing for faster generation
- Voice cloning support
- Audio post-processing (normalization, compression)
- Web UI for easier use
"ffmpeg not found"
- Install ffmpeg (see Prerequisites section)
- Make sure it's in your system PATH
"NEUPHONIC_API_KEY not set"
- Only needed if using offline mode (
--offlineoraudiobook-gen-offline) - Get an API key from https://app.neuphonic.com
- Set the environment variable or add to .env file
"No chapters detected"
- Check your chapter pattern with
-poption - The default pattern looks for "Chapter 1", "Chapter 2", etc.
- Use
-p "^Part \d+"for "Part 1", "Part 2" style - If no pattern matches, the entire book becomes one chapter
"ModuleNotFoundError"
- Make sure you activated the virtual environment:
source venv/bin/activate - Or use the wrapper scripts that don't require activation
"Permission denied: ./audiobook-gen-offline"
- Make the script executable:
chmod +x audiobook-gen-offline
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Edge TTS for high-quality online text-to-speech
- Neuphonic for premium neural text-to-speech
- pyttsx3 for offline text-to-speech
- LangGraph for workflow orchestration
- Typer for the CLI framework
- Rich for beautiful terminal output
Current version: 0.1.0
Note: This project is under active development. Features and APIs may change.