Audiobook Generator

An AI-powered audiobook generator that converts PDF books into high-quality audiobooks using LangGraph workflow orchestration with three TTS engine options: Edge TTS (free, online), Neuphonic (premium quality), and pyttsx3 (offline).

Features

PDF to Audiobook Conversion: Convert PDF books into natural-sounding audiobooks
Intelligent Chapter Detection: Automatically detect and split books into chapters using regex patterns
Multiple Audio Formats: Support for MP3, WAV, OGG, and FLAC output formats
LangGraph Workflow: Robust workflow orchestration with conditional routing and error handling
Triple TTS Engine Support:
- Edge TTS (free, cloud-based, high-quality neural voices)
- Neuphonic TTS (premium, highest quality neural voices - requires API key)
- pyttsx3 (offline, system voices)
CLI Interface: Beautiful command-line interface powered by Typer and Rich
Configurable Settings: Customizable voice, speed, format, and chapter detection patterns
Python 3.13 Compatible: Modern Python support with ffmpeg-based format conversion

Project Structure

Audiobook-Generator/
├── agents/                    # LangGraph agent implementation
│   ├── audiobook_agent.py     # Main agent workflow with parse/split/tts nodes
│   └── audiobook_state.py     # State management for the workflow
├── models/                    # Data models
│   ├── book.py                # Book model with chapters
│   ├── chapter.py             # Chapter model with text and audio
│   └── config.py              # Configuration models (TTS, Output, ChapterDetection)
├── modules/                   # Core processing modules
│   ├── parser/                # PDF parsing
│   │   ├── base_parser.py     # Abstract parser interface
│   │   └── pdf_parser.py      # PDF parser implementation
│   ├── splitter/              # Chapter splitting
│   │   └── chapter_splitter.py  # Regex-based chapter detection
│   └── tts/                   # Text-to-speech engines
│       ├── base_tts.py        # Abstract TTS interface
│       ├── edge_tts_provider.py  # Edge TTS (default, online)
│       ├── neuphonic_tts.py   # Neuphonic TTS (premium quality)
│       └── pyttsx3_tts.py     # pyttsx3 TTS (offline)
├── cli/                       # Command-line interface
│   └── main.py                # Typer CLI implementation
├── utils/                     # Utility functions
├── audiobook-gen-offline      # Offline wrapper script
├── requirements.txt           # Python dependencies
└── setup.py                   # Package installation

Installation

Prerequisites

Python 3.10+ (supports Python 3.10, 3.11, 3.12, 3.13)

ffmpeg (for audio format conversion)

# On Ubuntu/Debian
sudo apt-get install ffmpeg

# On macOS
brew install ffmpeg

# On Windows
# Download from https://ffmpeg.org/download.html

Setup

Clone the repository:

git clone https://github.com/yourusername/Audiobook-Generator.git
cd Audiobook-Generator

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:

pip install -e .

This installs the audiobook-gen command in your virtual environment.

Optional: Neuphonic TTS Setup

For premium quality audio using Neuphonic TTS:

Get an API key from https://app.neuphonic.com
Set the environment variable:

export NEUPHONIC_API_KEY="your-api-key-here"
# Or add to your .env file
echo "NEUPHONIC_API_KEY=your-api-key-here" >> .env

Usage

Three Ways to Use the CLI

Option 1: Using audiobook-gen (requires activated venv)

# Make sure venv is activated first!
source venv/bin/activate

# Online Mode (Edge TTS - requires internet, free)
audiobook-gen generate /path/to/book.pdf

# Offline Mode (uses Neuphonic TTS if API key is set)
audiobook-gen generate /path/to/book.pdf --offline

Option 2: Using the offline wrapper script (works without activating venv)

# High-quality offline mode - works directly
./audiobook-gen-offline generate /path/to/book.pdf

Option 3: Using Python module (works without activating venv)

# From project directory
python -m cli.main generate /path/to/book.pdf

CLI Commands

# View help
audiobook-gen --help

# View version
audiobook-gen version

# List available voices (for pyttsx3)
audiobook-gen list-voices

# Generate audiobook with options
audiobook-gen generate --help

Command Options

Option	Short	Default	Description
`--output-dir`	`-o`	`./audiobooks/<book_name>`	Output directory
`--format`	`-f`	`mp3`	Audio format (mp3, wav, ogg, flac)
`--voice`	`-v`	`default`	TTS voice to use
`--speed`	`-s`	`1.0`	Speech speed (0.5 - 2.0)
`--pattern`	`-p`	`^(Chapter\|CHAPTER)\s+\d+`	Chapter detection regex
`--offline`	-	-	Use offline TTS mode (Neuphonic if API key set)

Examples

Basic generation (Edge TTS, free, online):

audiobook-gen generate mybook.pdf

Custom output directory and speed:

audiobook-gen generate mybook.pdf -o ~/Audiobooks/mybook -s 1.5

WAV format for maximum quality:

audiobook-gen generate mybook.pdf -f wav

Offline mode with Neuphonic (premium quality, requires API key):

./audiobook-gen-offline generate mybook.pdf

Custom chapter pattern (for "Part 1", "Part 2" style):

audiobook-gen generate mybook.pdf -p "^Part \d+"

Full path with spaces:

audiobook-gen generate "/path/with spaces/My Book.pdf" -o ./output

Slow down speech for better comprehension:

audiobook-gen generate mybook.pdf -s 0.8

TTS Engine Comparison

Engine	Quality	Speed	Internet	Cost	Use Case
Edge TTS	High	Fast	Required	Free	Default, best for most users
Neuphonic	Highest	Medium	Required	Paid API	Premium audiobooks, professional use
pyttsx3	Basic	Fast	Not required	Free	True offline, basic needs

When to use each:

Edge TTS (default): Great quality, free, requires internet
Neuphonic (--offline with API key): Best quality for premium audiobooks
pyttsx3 (--offline without API key): Truly offline, basic system voices

Output Structure

Generated audiobooks are organized as follows:

audiobooks/
└── mybook/
    └── chapters/
        ├── chapter_01.mp3
        ├── chapter_02.mp3
        ├── chapter_03.mp3
        └── ...

Each chapter is saved as a separate audio file for easy navigation.

Architecture

LangGraph Workflow

The audiobook generation process is orchestrated by a LangGraph workflow with three main nodes:

Parse Node (parse_node): Extracts text and metadata from PDF files
- Uses PyPDF2 for PDF parsing
- Extracts title, author, and raw text
- Validates content existence
- Falls back to filename if no title metadata
Split Node (split_node): Splits the book into chapters
- Uses regex patterns for chapter detection
- Falls back to single chapter if no matches found
- Configurable chapter detection patterns
TTS Node (tts_node): Generates audio for each chapter
- Supports three TTS engines (Edge TTS, Neuphonic, pyttsx3)
- Async audio generation for efficiency
- Supports multiple audio formats (MP3, WAV, OGG, FLAC)
- Handles partial failures gracefully
- Generates individual audio files per chapter

Error Handling

The workflow includes robust error handling:

Fatal Errors: Stop the workflow (e.g., file not found, all chapters failed)
Partial Failures: Continue processing (e.g., some chapters failed to generate)
Conditional Routing: route_after_step method routes based on error severity
Graceful Degradation: Falls back to single chapter if chapter detection fails

State Management

The AudiobookState TypedDict maintains workflow state:

{
    "book_path": str,          # Path to input PDF
    "book": Optional[Book],    # Book object with chapters
    "config": AudiobookConfig, # Configuration settings
    "output_dir": str,         # Output directory path
    "error": Optional[str]     # Error message (prefixed with "fatal" for fatal errors)
}

Configuration

TTS Configuration

Provider: TTS engine (Edge TTS, Neuphonic, or pyttsx3)
Voice: Voice model to use
- Edge TTS: "en-US-AriaNeural", "en-GB-SoniaNeural", etc.
- Neuphonic: voice_id from Neuphonic dashboard
- pyttsx3: System voice names (use list-voices)
Speed: Speech rate multiplier (0.5 - 2.0)
- 0.5 = 50% slower
- 1.0 = normal speed
- 2.0 = 2x faster
Language: Language code (default: en)

Output Configuration

Format: Audio format (mp3, wav, ogg, flac)
Bitrate: Audio bitrate for compressed formats (default: 128k)
Directory: Output directory path

Chapter Detection Configuration

Method: Detection method (currently supports regex)
Pattern: Regex pattern for matching chapter headings
- Default: ^(Chapter|CHAPTER)\s+\d+
- Also works with: ^Part \d+, ^Section \d+, etc.

Dependencies

Core dependencies:

langgraph (>=0.2.0): Workflow orchestration
langchain (>=0.3.0): LangChain framework
edge-tts (>=7.2.0): Microsoft Edge TTS (online, free)
pyneuphonic (>=1.8.0): Neuphonic TTS (premium quality)
pyttsx3 (>=2.90): Offline TTS engine
pypdf2 (>=3.0.0): PDF parsing
typer (>=0.12.0): CLI framework
rich (>=13.0.0): Beautiful CLI output
pyyaml (>=6.0): Configuration management
python-dotenv (>=1.0.0): Environment management
ebooklib (>=0.18): EPUB support (future feature)

Note: Audio format conversion uses ffmpeg directly (Python 3.13+ compatible). The pydub library has been removed for Python 3.13 compatibility.

Development

Running Tests

pytest tests/

Code Structure

Models: Dataclasses for Book, Chapter, and configuration
Modules: Pluggable parsers, splitters, and TTS providers following abstract base classes
Agents: LangGraph workflow orchestration with conditional routing
CLI: User-facing command-line interface with Typer

Adding a New TTS Engine

Create a new class in modules/tts/ that extends BaseTTS
Implement the generate_audio method
Add the provider to the TTS factory or configuration

Roadmap

Troubleshooting

Common Issues

"ffmpeg not found"

Install ffmpeg (see Prerequisites section)
Make sure it's in your system PATH

"NEUPHONIC_API_KEY not set"

Only needed if using offline mode (--offline or audiobook-gen-offline)
Get an API key from https://app.neuphonic.com
Set the environment variable or add to .env file

"No chapters detected"

Check your chapter pattern with -p option
The default pattern looks for "Chapter 1", "Chapter 2", etc.
Use -p "^Part \d+" for "Part 1", "Part 2" style
If no pattern matches, the entire book becomes one chapter

"ModuleNotFoundError"

Make sure you activated the virtual environment: source venv/bin/activate
Or use the wrapper scripts that don't require activation

"Permission denied: ./audiobook-gen-offline"

Make the script executable: chmod +x audiobook-gen-offline

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Edge TTS for high-quality online text-to-speech
Neuphonic for premium neural text-to-speech
pyttsx3 for offline text-to-speech
LangGraph for workflow orchestration
Typer for the CLI framework
Rich for beautiful terminal output

Version

Current version: 0.1.0

Note: This project is under active development. Features and APIs may change.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
agents		agents
cli		cli
models		models
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audiobook-gen-offline		audiobook-gen-offline
requirements.txt		requirements.txt
setup.py		setup.py
test_audiobook.py		test_audiobook.py
test_audiobook_offline.py		test_audiobook_offline.py
test_audiobook_simple.py		test_audiobook_simple.py

Folders and files

Latest commit

History

Repository files navigation

Audiobook Generator

Features

Project Structure

Installation

Prerequisites

Setup

Optional: Neuphonic TTS Setup

Usage

Three Ways to Use the CLI

CLI Commands

Command Options

Examples

TTS Engine Comparison

Output Structure

Architecture

LangGraph Workflow

Error Handling

State Management

Configuration

TTS Configuration

Output Configuration

Chapter Detection Configuration

Dependencies

Development

Running Tests

Code Structure

Adding a New TTS Engine

Roadmap

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

Version

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages