Materials for Codex Mentis Podcast

A Python tool to convert audio files (WAV, MP3, FLAC, M4A, AAC, OGG, WMA) into visually appealing MP4 videos with animated waveforms, logo animations and professional audio processing. This workflow was vibe-coded by Pablo Bernabeu for the Codex Mentis podcast, available on Youtube, Spotify, Apple Podcasts and iVoox.

Initial Workflow

The current repository focusses on the video creation workflow. However, before that, the podcast is created using artificial intelligence tools. A good option is to begin by creating a script using Google Gemini Pro by providing the key materials. An example of the prompt is available in Gemini prompt.txt. Next, the audio is created using NotebookLM using a prompt like NotebookLM prompt.txt.

Video Creation Workflow

Audio Enhancement: Optional volume stabilisation optimised for AI-generated speech (use --enhance-audio to enable)
Professional Intro/Outro: Royalty-free musical intro and outro added to each episode
Audio Waveform: Continuous waveform visualisation representing real audio amplitude
Logo Animation: Subtle breathing animation effect for the podcast logo
Episode Images: Optional episode-specific images displayed on the right side
Professional Typography: Elegant serif fonts with proper text hierarchy
Smooth Effects: Professional transitions and animations
YouTube Ready: 1920x1080 HD resolution output
Batch Processing: Process individual files with easy selection
Smart Caching: Intelligent waveform analysis caching for faster repeat processing
Auto-Validation: Automatic logo file validation with helpful error guidance

Setup Instructions

1. Install Dependencies

First, ensure you have FFmpeg installed on your system (required for audio format conversion):

Windows (easiest method using winget):

winget install Gyan.FFmpeg

After installation, restart your terminal for FFmpeg to be available.

Alternative Windows methods:

# Using chocolatey
choco install ffmpeg

# Or download manually from https://www.gyan.dev/ffmpeg/builds/
# Extract to C:\ffmpeg and add C:\ffmpeg\bin to your PATH

macOS:

brew install ffmpeg

Linux:

sudo apt-get install ffmpeg  # Ubuntu/Debian
sudo yum install ffmpeg      # CentOS/RHEL

Then, create and activate a virtual environment (recommended):

# Create virtual environment
python -m venv .venv

# Activate it (Windows Git Bash)
source .venv/Scripts/activate

# Or on Windows Command Prompt
.venv\Scripts\activate

# Or on Windows PowerShell
.venv\Scripts\Activate.ps1

Then install the required packages:

pip install -r requirements.txt

2. Project Structure

Place your files in the following structure:

podcast/
├── assets/
│   └── podcast_logo.*             # Place your podcast logo here (any image format, REQUIRED)
├── input/                        # Place your audio files here (WAV, MP3, FLAC, M4A, AAC, OGG, WMA)
├── output/                       # Generated MP4 files will be saved here
├── .waveform_cache/              # Auto-generated cache directories (created automatically)
├── src/
│   ├── audio_processor.py
│   ├── video_generator.py
│   ├── waveform_visualizer.py
│   └── main.py
├── episode_titles.json           # Auto-generated episode title storage
├── requirements.txt
└── README.md

3. Logo Requirements

File name: podcast_logo.* (with any supported image extension)
Location: assets/podcast_logo.[extension]
Recommended size: 500x500 pixels or larger (square format works best)
Format: Any format supported by Pillow (JPG, JPEG, PNG, GIF, BMP, TIFF, WEBP, AVIF, etc.)
Important: Only place ONE logo file in the assets/ directory to avoid conflicts
Required: A logo file is mandatory - the application will exit if no logo is found

The tool will automatically validate your logo setup and exit with an error if no logo is detected.

4. Usage

Run the main script:

# Using the helper script (recommended - handles FFmpeg PATH automatically)
./run.sh

# Or manually:
python src/main.py

# To enable audio enhancement (EQ, normalization, click removal):
python src/main.py --enhance-audio

Note: If FFmpeg isn't found, you may need to restart your terminal or use the run.sh script which automatically adds FFmpeg to your PATH.

The tool will:

Show available audio files in the input/ directory (supports WAV, MP3, FLAC, M4A, AAC, OGG, WMA)
Let you select which file(s) to process
Process each file with audio enhancement
Generate MP4 with animated waveform and logo
Save results in the output/ directory

5. Audio Format Support

The tool supports multiple audio formats:

WAV - Uncompressed audio (recommended for best quality)
MP3 - Compressed audio
FLAC - Lossless compressed audio
M4A/AAC - Advanced Audio Coding
OGG - Ogg Vorbis
WMA - Windows Media Audio

All formats are automatically converted and processed with the same quality output.

6. Episode Title Configuration

For each audio file, you can specify a custom episode title. The tool will prompt you for this during processing or you can modify the episode_titles.json file.

7. Optional Episode-Specific Images

You can add an optional image for each episode that will appear on the right side of the video:

Naming: Image must match the audio filename (e.g., episode_name.png for episode_name.wav)
Location: Place in the same input/ directory as the audio file
Formats: PNG, JPG, JPEG, WEBP, GIF, BMP
Display: Automatically positioned on right half, vertically centred, ~40% of video height
Layering: Appears on top of the waveform, below text overlays

Example:

input/
├── My Episode.wav
├── My Episode.png          # Optional episode image
└── Another Episode.m4a

Cache Management

Automatic Caching

The tool automatically creates cache files to improve performance:

Location: .waveform_cache/ directories next to your audio files
Purpose: Stores pre-computed waveform analysis data
Behaviour: Created automatically on first processing, reused on subsequent runs

Cache Benefits

Speed: Dramatically faster video regeneration for existing audio files
Reliability: Cache is invalidated automatically when source files change
Storage: Minimal disk space usage with efficient binary storage

Manual Cache Management

If needed, you can manually manage cache files:

# Remove all cache files to force fresh analysis
find . -name ".waveform_cache" -type d -exec rm -rf {} +

# Remove cache for specific file
rm -rf input/.waveform_cache/your_episode_waveform.pkl

HPC Usage (Oxford ARC)

For batch processing on the Oxford ARC HPC cluster, see the dedicated HPC documentation:

docs/hpc_usage.md - Comprehensive HPC usage guide
hpc/README.md - Quick reference for HPC scripts

Quick Start on ARC

# 1. Set up directory structure (one-time)
cd ~/podcast/hpc
./setup_arc_structure.sh

# 2. Activate environment
source activate_project_env_arc.sh

# 3. Upload audio files
scp episode.wav USER@arc-login.arc.ox.ac.uk:$DATA/podcast_env/input/

# 4. Submit batch job
./submit_video_conversion.sh

# 5. Monitor and retrieve
squeue -u $USER --clusters=htc
scp USER@arc-login.arc.ox.ac.uk:$DATA/podcast_env/output/*.mp4 ./

Transcription (Enabled by Default)

The HPC workflow includes audio transcription by default using the secure_local_HPC_speech_transcription workflow:

# Default: video conversion + transcription
./submit_video_conversion.sh

# With name masking
./submit_video_conversion.sh --mask-personal-names

# Disable transcription (video only)
./submit_video_conversion.sh --no-transcription

See docs/hpc_usage.md for transcription setup instructions and all available options.

Troubleshooting

Logo Issues

Multiple logos detected: Remove extra files from assets/ directory, keep only one
Logo not found: Application will exit with error - ensure logo file is in assets/ directory with supported format (any image format)
Logo not displaying: Check file permissions and ensure file isn't corrupted

Performance Issues

Slow first run: Normal behaviour - waveform analysis cache is being built
Slow subsequent runs: Check if audio files were modified (triggers cache rebuild)
High memory usage: Normal for long audio files during initial processing

Technical Details

Output Resolution: 1920x1080 (YouTube HD)
Audio Enhancement: Volume normalisation and gentle EQ optimised for AI speech
Intro/Outro Music: 3-second intro and 2.5-second outro with royalty-free harmonic progressions
Music Efficiency: Intro/outro music cached during batch processing for maximum speed
Waveform Style: Real-time amplitude visualisation with smooth scrolling animation
Logo Animation: Subtle scale animation (breathing effect) integrated with waveform
Typography: Professional serif fonts with episode title prominence
Performance: Optimised for long audio files with efficient memory usage
Smart Caching: Waveform analysis results cached for instant regeneration of existing files
Cache Management: Automatic cache invalidation when source audio files are modified
Validation System: Comprehensive logo file checking with clear error messages

Performance Optimisations

Waveform Analysis Caching

The tool implements intelligent caching to dramatically improve performance on repeat video generations:

First Processing: Audio waveform analysis is performed and cached in .waveform_cache/ directories
Subsequent Processing: Cached analysis data is loaded instantly, skipping expensive audio processing
Smart Invalidation: Cache is automatically regenerated when source audio files are modified
Cache Validation: File hash verification ensures cache integrity and freshness

Logo Validation System

Enhanced logo file validation provides better user experience:

Automatic Detection: Scans assets/ directory for logo files during startup
Conflict Prevention: Warns if multiple logo files are detected and provides guidance
Error Guidance: Clear instructions when logo setup is incorrect or missing
Seamless Integration: Validation runs automatically without interrupting workflow

Dependencies

moviepy: Video editing and generation
librosa: Audio analysis and processing
numpy: Numerical computations
opencv-python: Image processing
pillow: Image manipulation
scipy: Signal processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Materials for Codex Mentis Podcast

Initial Workflow

Video Creation Workflow

Setup Instructions

1. Install Dependencies

2. Project Structure

3. Logo Requirements

4. Usage

5. Audio Format Support

6. Episode Title Configuration

7. Optional Episode-Specific Images

Cache Management

Automatic Caching

Cache Benefits

Manual Cache Management

HPC Usage (Oxford ARC)

Quick Start on ARC

Transcription (Enabled by Default)

Troubleshooting

Logo Issues

Performance Issues

Technical Details

Performance Optimisations

Waveform Analysis Caching

Logo Validation System

Dependencies

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
docs		docs
hpc		hpc
input		input
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Gemini prompt.txt		Gemini prompt.txt
Licence.md		Licence.md
NotebookLM prompt.txt		NotebookLM prompt.txt
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
setup.bat		setup.bat
setup.sh		setup.sh
test_composite.py		test_composite.py

Folders and files

Latest commit

History

Repository files navigation

Materials for Codex Mentis Podcast

Initial Workflow

Video Creation Workflow

Setup Instructions

1. Install Dependencies

2. Project Structure

3. Logo Requirements

4. Usage

5. Audio Format Support

6. Episode Title Configuration

7. Optional Episode-Specific Images

Cache Management

Automatic Caching

Cache Benefits

Manual Cache Management

HPC Usage (Oxford ARC)

Quick Start on ARC

Transcription (Enabled by Default)

Troubleshooting

Logo Issues

Performance Issues

Technical Details

Performance Optimisations

Waveform Analysis Caching

Logo Validation System

Dependencies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages