Skip to content

oueslati1990/Audiobook-Generator

Repository files navigation

Audiobook Generator

An AI-powered audiobook generator that converts PDF books into high-quality audiobooks using LangGraph workflow orchestration with three TTS engine options: Edge TTS (free, online), Neuphonic (premium quality), and pyttsx3 (offline).

Features

  • PDF to Audiobook Conversion: Convert PDF books into natural-sounding audiobooks
  • Intelligent Chapter Detection: Automatically detect and split books into chapters using regex patterns
  • Multiple Audio Formats: Support for MP3, WAV, OGG, and FLAC output formats
  • LangGraph Workflow: Robust workflow orchestration with conditional routing and error handling
  • Triple TTS Engine Support:
    • Edge TTS (free, cloud-based, high-quality neural voices)
    • Neuphonic TTS (premium, highest quality neural voices - requires API key)
    • pyttsx3 (offline, system voices)
  • CLI Interface: Beautiful command-line interface powered by Typer and Rich
  • Configurable Settings: Customizable voice, speed, format, and chapter detection patterns
  • Python 3.13 Compatible: Modern Python support with ffmpeg-based format conversion

Project Structure

Audiobook-Generator/
├── agents/                    # LangGraph agent implementation
│   ├── audiobook_agent.py     # Main agent workflow with parse/split/tts nodes
│   └── audiobook_state.py     # State management for the workflow
├── models/                    # Data models
│   ├── book.py                # Book model with chapters
│   ├── chapter.py             # Chapter model with text and audio
│   └── config.py              # Configuration models (TTS, Output, ChapterDetection)
├── modules/                   # Core processing modules
│   ├── parser/                # PDF parsing
│   │   ├── base_parser.py     # Abstract parser interface
│   │   └── pdf_parser.py      # PDF parser implementation
│   ├── splitter/              # Chapter splitting
│   │   └── chapter_splitter.py  # Regex-based chapter detection
│   └── tts/                   # Text-to-speech engines
│       ├── base_tts.py        # Abstract TTS interface
│       ├── edge_tts_provider.py  # Edge TTS (default, online)
│       ├── neuphonic_tts.py   # Neuphonic TTS (premium quality)
│       └── pyttsx3_tts.py     # pyttsx3 TTS (offline)
├── cli/                       # Command-line interface
│   └── main.py                # Typer CLI implementation
├── utils/                     # Utility functions
├── audiobook-gen-offline      # Offline wrapper script
├── requirements.txt           # Python dependencies
└── setup.py                   # Package installation

Installation

Prerequisites

  • Python 3.10+ (supports Python 3.10, 3.11, 3.12, 3.13)
  • ffmpeg (for audio format conversion)
    # On Ubuntu/Debian
    sudo apt-get install ffmpeg
    
    # On macOS
    brew install ffmpeg
    
    # On Windows
    # Download from https://ffmpeg.org/download.html

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/Audiobook-Generator.git
cd Audiobook-Generator
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install the package:
pip install -e .

This installs the audiobook-gen command in your virtual environment.

Optional: Neuphonic TTS Setup

For premium quality audio using Neuphonic TTS:

  1. Get an API key from https://app.neuphonic.com
  2. Set the environment variable:
export NEUPHONIC_API_KEY="your-api-key-here"
# Or add to your .env file
echo "NEUPHONIC_API_KEY=your-api-key-here" >> .env

Usage

Three Ways to Use the CLI

Option 1: Using audiobook-gen (requires activated venv)

# Make sure venv is activated first!
source venv/bin/activate

# Online Mode (Edge TTS - requires internet, free)
audiobook-gen generate /path/to/book.pdf

# Offline Mode (uses Neuphonic TTS if API key is set)
audiobook-gen generate /path/to/book.pdf --offline

Option 2: Using the offline wrapper script (works without activating venv)

# High-quality offline mode - works directly
./audiobook-gen-offline generate /path/to/book.pdf

Option 3: Using Python module (works without activating venv)

# From project directory
python -m cli.main generate /path/to/book.pdf

CLI Commands

# View help
audiobook-gen --help

# View version
audiobook-gen version

# List available voices (for pyttsx3)
audiobook-gen list-voices

# Generate audiobook with options
audiobook-gen generate --help

Command Options

Option Short Default Description
--output-dir -o ./audiobooks/<book_name> Output directory
--format -f mp3 Audio format (mp3, wav, ogg, flac)
--voice -v default TTS voice to use
--speed -s 1.0 Speech speed (0.5 - 2.0)
--pattern -p ^(Chapter|CHAPTER)\s+\d+ Chapter detection regex
--offline - - Use offline TTS mode (Neuphonic if API key set)

Examples

Basic generation (Edge TTS, free, online):

audiobook-gen generate mybook.pdf

Custom output directory and speed:

audiobook-gen generate mybook.pdf -o ~/Audiobooks/mybook -s 1.5

WAV format for maximum quality:

audiobook-gen generate mybook.pdf -f wav

Offline mode with Neuphonic (premium quality, requires API key):

./audiobook-gen-offline generate mybook.pdf

Custom chapter pattern (for "Part 1", "Part 2" style):

audiobook-gen generate mybook.pdf -p "^Part \d+"

Full path with spaces:

audiobook-gen generate "/path/with spaces/My Book.pdf" -o ./output

Slow down speech for better comprehension:

audiobook-gen generate mybook.pdf -s 0.8

TTS Engine Comparison

Engine Quality Speed Internet Cost Use Case
Edge TTS High Fast Required Free Default, best for most users
Neuphonic Highest Medium Required Paid API Premium audiobooks, professional use
pyttsx3 Basic Fast Not required Free True offline, basic needs

When to use each:

  • Edge TTS (default): Great quality, free, requires internet
  • Neuphonic (--offline with API key): Best quality for premium audiobooks
  • pyttsx3 (--offline without API key): Truly offline, basic system voices

Output Structure

Generated audiobooks are organized as follows:

audiobooks/
└── mybook/
    └── chapters/
        ├── chapter_01.mp3
        ├── chapter_02.mp3
        ├── chapter_03.mp3
        └── ...

Each chapter is saved as a separate audio file for easy navigation.

Architecture

LangGraph Workflow

The audiobook generation process is orchestrated by a LangGraph workflow with three main nodes:

  1. Parse Node (parse_node): Extracts text and metadata from PDF files

    • Uses PyPDF2 for PDF parsing
    • Extracts title, author, and raw text
    • Validates content existence
    • Falls back to filename if no title metadata
  2. Split Node (split_node): Splits the book into chapters

    • Uses regex patterns for chapter detection
    • Falls back to single chapter if no matches found
    • Configurable chapter detection patterns
  3. TTS Node (tts_node): Generates audio for each chapter

    • Supports three TTS engines (Edge TTS, Neuphonic, pyttsx3)
    • Async audio generation for efficiency
    • Supports multiple audio formats (MP3, WAV, OGG, FLAC)
    • Handles partial failures gracefully
    • Generates individual audio files per chapter

Error Handling

The workflow includes robust error handling:

  • Fatal Errors: Stop the workflow (e.g., file not found, all chapters failed)
  • Partial Failures: Continue processing (e.g., some chapters failed to generate)
  • Conditional Routing: route_after_step method routes based on error severity
  • Graceful Degradation: Falls back to single chapter if chapter detection fails

State Management

The AudiobookState TypedDict maintains workflow state:

{
    "book_path": str,          # Path to input PDF
    "book": Optional[Book],    # Book object with chapters
    "config": AudiobookConfig, # Configuration settings
    "output_dir": str,         # Output directory path
    "error": Optional[str]     # Error message (prefixed with "fatal" for fatal errors)
}

Configuration

TTS Configuration

  • Provider: TTS engine (Edge TTS, Neuphonic, or pyttsx3)
  • Voice: Voice model to use
    • Edge TTS: "en-US-AriaNeural", "en-GB-SoniaNeural", etc.
    • Neuphonic: voice_id from Neuphonic dashboard
    • pyttsx3: System voice names (use list-voices)
  • Speed: Speech rate multiplier (0.5 - 2.0)
    • 0.5 = 50% slower
    • 1.0 = normal speed
    • 2.0 = 2x faster
  • Language: Language code (default: en)

Output Configuration

  • Format: Audio format (mp3, wav, ogg, flac)
  • Bitrate: Audio bitrate for compressed formats (default: 128k)
  • Directory: Output directory path

Chapter Detection Configuration

  • Method: Detection method (currently supports regex)
  • Pattern: Regex pattern for matching chapter headings
    • Default: ^(Chapter|CHAPTER)\s+\d+
    • Also works with: ^Part \d+, ^Section \d+, etc.

Dependencies

Core dependencies:

  • langgraph (>=0.2.0): Workflow orchestration
  • langchain (>=0.3.0): LangChain framework
  • edge-tts (>=7.2.0): Microsoft Edge TTS (online, free)
  • pyneuphonic (>=1.8.0): Neuphonic TTS (premium quality)
  • pyttsx3 (>=2.90): Offline TTS engine
  • pypdf2 (>=3.0.0): PDF parsing
  • typer (>=0.12.0): CLI framework
  • rich (>=13.0.0): Beautiful CLI output
  • pyyaml (>=6.0): Configuration management
  • python-dotenv (>=1.0.0): Environment management
  • ebooklib (>=0.18): EPUB support (future feature)

Note: Audio format conversion uses ffmpeg directly (Python 3.13+ compatible). The pydub library has been removed for Python 3.13 compatibility.

Development

Running Tests

pytest tests/

Code Structure

  • Models: Dataclasses for Book, Chapter, and configuration
  • Modules: Pluggable parsers, splitters, and TTS providers following abstract base classes
  • Agents: LangGraph workflow orchestration with conditional routing
  • CLI: User-facing command-line interface with Typer

Adding a New TTS Engine

  1. Create a new class in modules/tts/ that extends BaseTTS
  2. Implement the generate_audio method
  3. Add the provider to the TTS factory or configuration

Roadmap

  • Edge TTS support (free, online)
  • Neuphonic TTS support (premium quality)
  • pyttsx3 TTS support (offline)
  • Multiple audio formats (MP3, WAV, OGG, FLAC)
  • Python 3.13 compatibility
  • Add support for EPUB files
  • AI-based chapter detection using LLMs
  • OpenAI TTS support
  • Playlist generation for full audiobook (M3U/PLS)
  • Resume from partial generation
  • Parallel chapter processing for faster generation
  • Voice cloning support
  • Audio post-processing (normalization, compression)
  • Web UI for easier use

Troubleshooting

Common Issues

"ffmpeg not found"

  • Install ffmpeg (see Prerequisites section)
  • Make sure it's in your system PATH

"NEUPHONIC_API_KEY not set"

  • Only needed if using offline mode (--offline or audiobook-gen-offline)
  • Get an API key from https://app.neuphonic.com
  • Set the environment variable or add to .env file

"No chapters detected"

  • Check your chapter pattern with -p option
  • The default pattern looks for "Chapter 1", "Chapter 2", etc.
  • Use -p "^Part \d+" for "Part 1", "Part 2" style
  • If no pattern matches, the entire book becomes one chapter

"ModuleNotFoundError"

  • Make sure you activated the virtual environment: source venv/bin/activate
  • Or use the wrapper scripts that don't require activation

"Permission denied: ./audiobook-gen-offline"

  • Make the script executable: chmod +x audiobook-gen-offline

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Edge TTS for high-quality online text-to-speech
  • Neuphonic for premium neural text-to-speech
  • pyttsx3 for offline text-to-speech
  • LangGraph for workflow orchestration
  • Typer for the CLI framework
  • Rich for beautiful terminal output

Version

Current version: 0.1.0


Note: This project is under active development. Features and APIs may change.

About

AI-powered PDF to audiobook converter with LangGraph workflow orchestration. Features 3 TTS engines (Edge TTS, Neuphonic, pyttsx3), intelligent chapter detection, multiple audio formats, and a beautiful CLI. Python 3.13 compatible.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages