Unreal Engine 5.5 C++ project featuring AI-powered conversational MetaHuman with Text-to-Speech, facial animation via NVIDIA Audio2Face, and chatbot integration.
- Text-to-Speech Component: Blueprint-compatible component using OpenAI's TTS API (model: tts-1, voice: alloy)
- NVIDIA Audio2Face Integration: Real-time facial animation generation from audio via NVIDIA ACE
- Dify Chatbot Integration: AI conversation management with streaming responses
- MetaHuman Support: Full ARKit to MetaHuman blendshape mapping (55+ facial curves)
- Voice Capture: Real-time microphone input with Silero VAD (Voice Activity Detection)
- Runtime Audio: Uses USoundWaveProcedural for dynamic audio playback
| Requirement | Version | Notes |
|---|---|---|
| Unreal Engine | 5.5.4 | Required for MetaHuman and ACE plugin compatibility |
| Visual Studio | 2022 | For C++ compilation |
| NVIDIA MetaHuman | Hadley | Required for Audio2Face integration |
| NVIDIA ACE Plugin | 2.5+ | Required for Audio2Face integration |
| Operating System | Windows 10/11 | Linux not currently supported |
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA RTX 2080 | RTX 3080 or higher |
| RAM | 16 GB | 32 GB |
| VRAM | 8 GB | 10+ GB |
| Storage | 50 GB free | SSD recommended |
You must obtain the following API keys and set them as environment variables:
| Service | Environment Variable | Purpose | Get Key From |
|---|---|---|---|
| OpenAI | OPENAI_API_KEY |
Text-to-Speech (TTS) | platform.openai.com |
| NVIDIA | NVIDIA_API_KEY |
Audio2Face facial animation | build.nvidia.com |
| Dify | DIFY_API_KEY |
Chatbot conversation management | Your Dify instance dashboard |
Windows PowerShell:
$env:OPENAI_API_KEY="sk-proj-your-key-here"
$env:NVIDIA_API_KEY="nvapi-your-key-here"
$env:DIFY_API_KEY="app-your-key-here"Windows CMD:
set OPENAI_API_KEY=sk-proj-your-key-here
set NVIDIA_API_KEY=nvapi-your-key-here
set DIFY_API_KEY=app-your-key-hereWindows System Environment (Permanent):
- Open System Properties β Advanced β Environment Variables
- Add new User or System variables for each key
- Download the ACE Plugin from NVIDIA:
- Install the plugin to your Unreal Engine installation
- Enable the plugin in your project settings
- Verify installation by checking for "NV ACE Reference" in Plugins menu
- Clone the repository
- Right-click on
FlopperamUnrealMCP.uprojectand select "Generate Visual Studio project files" - Open the project in Unreal Engine 5.5
- Compile the C++ code
- Add a
TTSComponentto your actor - Set the OpenAI API key using
SetAPIKeynode - Call
PlayTTSAudiowith your text to speak - Or use
PlayAudioFromFileto play local WAV files
// Add component
UTTSComponent* TTS = CreateDefaultSubobject<UTTSComponent>(TEXT("TTSComponent"));
// Set API key
TTS->SetAPIKey("your-api-key");
// Play TTS
TTS->PlayTTSAudio("Hello world");
// Or play from file
TTS->PlayAudioFromFile("path/to/audio.wav");The project includes comprehensive emotion parameter controls for NVIDIA Audio2Face facial animation generation. These parameters are fully validated against NVIDIA's official ACE plugin source code and native library documentation.
All parameters are configurable in the Details Panel when selecting an Audio2FaceClient component:
| Parameter | Range | Default | Description |
|---|---|---|---|
| PreferredEmotionStrength | 0.0 - 1.0 | 1.0 | Overall emotion intensity (maps to OverallEmotionStrength) |
| EmotionStrength | 0.0 - 1.0 | 1.0 | Legacy parameter for backward compatibility |
| Parameter | Range | Default | NVIDIA Mapping | Description |
|---|---|---|---|---|
| DetectedEmotionContrast | 0.3 - 3.0 | 1.0 | emotion_contrast |
Increases spread between emotion values by pushing them higher or lower |
| MaxDetectedEmotions | 1 - 6 | 3 | max_emotions |
Sets firm limit on quantity of emotion sliders engaged by A2E |
| DetectedEmotionSmoothing | 0.0 - 1.0 | 0.7 | live_blend_coef |
Coefficient for smoothing emotions over time (0=jittery, 1=no updates) |
| bEnableEmotionOverride | boolean | true | enable_preferred_emotion |
Activate blending between app-provided emotions and A2F detection |
| EmotionOverrideStrength | 0.0 - 1.0 | 0.5 | preferred_emotion_strength |
Balance between app overrides (1.0) and A2F detection (0.0) |
These parameters are validated against NVIDIA's official sources:
Source Files Verified:
/Engine/Plugins/Marketplace/NV_ACE_Reference/Source/LegacyA2FRemote/Private/LegacyA2FRemote.cpp(lines 404-409)/Engine/Plugins/Marketplace/NV_ACE_Reference/Source/ThirdParty/Nvacl/Include/nvacl.h
Official Parameter Mapping:
NvACEEmotionParameters NvParams;
NvParams.emotion_contrast = InEmotionParameters->DetectedEmotionContrast;
NvParams.live_blend_coef = InEmotionParameters->DetectedEmotionSmoothing;
NvParams.enable_preferred_emotion = InEmotionParameters->bEnableEmotionOverride;
NvParams.preferred_emotion_strength = InEmotionParameters->EmotionOverrideStrength;
NvParams.emotion_strength = InEmotionParameters->OverallEmotionStrength;
NvParams.max_emotions = InEmotionParameters->MaxDetectedEmotions;// Configure emotion parameters in Blueprint or C++
UAudio2FaceClient* A2FClient = GetComponentByClass<UAudio2FaceClient>();
// High emotion intensity with strong contrast
A2FClient->PreferredEmotionStrength = 1.0f; // Full emotion strength
A2FClient->DetectedEmotionContrast = 2.0f; // Strong emotion contrast
A2FClient->MaxDetectedEmotions = 4; // Allow up to 4 emotions
A2FClient->DetectedEmotionSmoothing = 0.5f; // Moderate smoothing
A2FClient->bEnableEmotionOverride = true; // Enable blending
A2FClient->EmotionOverrideStrength = 0.3f; // Favor A2F detection
// Process audio with these parameters
A2FClient->ProcessAudioToAnimation(AudioData, AudioDuration);DetectedEmotionContrast (emotion_contrast):
0.3= Subtle emotion differences1.0= Natural emotion spread (default)3.0= Extreme emotion differences
DetectedEmotionSmoothing (live_blend_coef):
0.0= No smoothing (can be jittery)0.7= Good balance (default)1.0= Maximum smoothing (emotions barely change)
EmotionOverrideStrength (preferred_emotion_strength):
0.0= Use only A2F-detected emotions0.5= Blend app and detected emotions (default)1.0= Use only app-provided emotion overrides
Enable bEnableDebugLogging to see parameter values in console:
π Audio2FaceClient: Emotion Parameters:
π OverallEmotionStrength: 1.00
π DetectedEmotionContrast: 1.00
π MaxDetectedEmotions: 3
π DetectedEmotionSmoothing: 0.70
π bEnableEmotionOverride: true
π EmotionOverrideStrength: 0.50
Audio from OpenAI's TTS API was playing at double speed in UE5.5, making speech unintelligible.
| Date/Time | Attempt | Result | Issue |
|---|---|---|---|
| Session 1 | Manual WAV parsing | 2x speed | Incorrect PCM offset/size calculation |
| Session 2 | Pitch adjustment (0.5x) | Correct speed | Voice became masculine |
| Session 3 | Sample duplication | Correct speed | Lowered pitch, masculine voice |
| Session 4 | Zero padding | Correct speed | Audio artifacts, messy sound |
| Session 5 | Linear interpolation | Crash | Array bounds error |
| Session 6 | Sample rate lie (48kHz) | 2x speed | UE5 ignored the lie |
| Session 7 | Cubic interpolation | Unknown | Over-engineered solution |
| FINAL | UE5 built-in parser | β FIXED | Perfect playback |
The issue wasn't with UE5's ability to play 24kHz audio. The manual WAV parser was incorrectly reading the PCM data offset or size, causing UE5 to receive malformed audio data.
Use Unreal Engine's built-in FWaveModInfo class from AudioDecompress.h:
#include "AudioDecompress.h"
bool UTTSComponent::ParseWav(const TArray<uint8>& WavBytes,
int32& OutSampleRate,
int32& OutNumChannels,
int32& OutBitsPerSample,
const uint8*& OutPCM,
int32& OutPCMSize) const
{
FWaveModInfo Info;
if (!Info.ReadWaveInfo((uint8*)WavBytes.GetData(), WavBytes.Num()))
return false;
OutNumChannels = *Info.pChannels;
OutSampleRate = *Info.pSamplesPerSec;
OutBitsPerSample = *Info.pBitsPerSample;
OutPCM = Info.SampleDataStart;
OutPCMSize = static_cast<int32>(Info.SampleDataSize);
return true;
}- Always use framework tools: UE5's
FWaveModInfohandles edge cases that manual parsing misses - Don't manipulate audio data: Pass PCM directly without modification
- Use correct settings:
SOUNDGROUP_Voicefor TTS audio - Trust the framework: UE5 handles 24kHz audio perfectly when given correct data
- Simplicity wins: The fix was simpler than all the workarounds attempted
- Framework knowledge: Knowing about
AudioDecompress.hwould have saved hours - Wrong assumptions: The "bug" was in the parsing, not in UE5's audio system
- Over-engineering trap: Complex solutions (interpolation, resampling) were unnecessary
FlopperamUnrealMCP/
βββ Source/
β βββ FlopperamUnrealMCP/
β βββ Public/
β β βββ TTSComponent.h # Component header
β βββ Private/
β βββ TTSComponent.cpp # Implementation
βββ Content/ # Unreal assets
βββ Config/ # Project configuration
βββ FlopperamUnrealMCP.uproject # Project file
Please follow the version control protocol:
- Only modify files in
Source/directory - Don't modify
.uassetfiles - Commit after each meaningful change
- Use descriptive commit messages
[Project License Information]
-
OpenAI for the TTS API
-
Unreal Engine community for audio system documentation
-
Special thanks to the debugging sessions that led to the final fix