Skip to content

feat: integrate RAG system with proactive tool calling#1

Open
diannt wants to merge 3 commits into
mainfrom
feat/rag-proactive-tool-calling
Open

feat: integrate RAG system with proactive tool calling#1
diannt wants to merge 3 commits into
mainfrom
feat/rag-proactive-tool-calling

Conversation

@diannt

@diannt diannt commented Nov 22, 2025

Copy link
Copy Markdown
Owner

RAG System Integration with Proactive Tool Calling

Overview

This PR integrates a comprehensive RAG (Retrieval-Augmented Generation) system with proactive tool calling capabilities to enhance the VTuber experience with intelligent, context-aware responses.

Purpose

The changes implement a sophisticated RAG system that can proactively analyze user interests and inject relevant information into conversations, creating a more engaging and informative VTuber interaction.

Files Modified

Core Implementation Files

  1. src/open_llm_vtuber/config_manager/main.py

    • Added RAGConfig import and integration
    • Added rag_config field with default factory
    • Added internationalized description for RAG configuration
  2. src/open_llm_vtuber/conversations/single_conversation.py

    • Implemented proactive tool calling integration
    • Added lazy initialization of proactive tool caller
    • Enhanced batch input with proactive information injection
    • Added proper error handling and logging
  3. src/open_llm_vtuber/service_context.py

    • Added RAG system state management (rag_enabled, rag_components)
    • Implemented comprehensive init_rag() method
    • Added connection validation for Ollama and Qdrant services
    • Implemented graceful degradation when RAG services are unavailable
  4. uv.lock

    • Added ElevenLabs TTS dependency (elevenlabs==2.21.0)
    • Updated dependency resolution

Impact on Existing Functionality

Enhanced Features

  • Proactive Information Injection: System can now detect user interests and proactively provide relevant information
  • RAG Integration: Full integration with document processing, embedding, and retrieval systems
  • Lazy Component Loading: Optimized initialization prevents service slowdowns

Backward Compatibility

  • All changes are backward compatible
  • RAG system is optional and gracefully degrades if services are unavailable
  • Existing conversation flow remains unchanged when RAG is disabled

Performance Considerations

  • Lazy initialization prevents unnecessary resource usage
  • Connection validation prevents hanging on service unavailability
  • Error handling ensures system stability

Related Components

This PR integrates with several existing and new RAG system components:

  • src/open_llm_vtuber/rag/document_processor.py - Document processing
  • src/open_llm_vtuber/rag/retrieval_engine.py - Information retrieval
  • src/open_llm_vtuber/rag/session_manager.py - Session management
  • src/open_llm_vtuber/rag/report_generator.py - Report generation
  • src/open_llm_vtuber/rag/proactive_tool_caller.py - Proactive tool invocation

Testing Notes

Manual Testing Required

  1. RAG System Initialization

    • Verify RAG system initializes when enabled in config
    • Test graceful degradation when Ollama/Qdrant are unavailable
    • Confirm connection validation works correctly
  2. Proactive Tool Calling

    • Test proactive information injection during conversations
    • Verify lazy initialization of proactive tool caller
    • Test error handling in proactive tool calling path
  3. Configuration Management

    • Verify RAG configuration loads correctly
    • Test internationalized descriptions display properly
    • Confirm default configuration behavior

Integration Testing

  • Test with existing conversation flow
  • Verify no impact on non-RAG conversations
  • Test ElevenLabs TTS integration

Configuration Requirements

rag_config:
  enabled: true
  ollama:
    base_url: "http://localhost:11434"
    embedding_model: "nomic-embed-text"
  qdrant:
    host: "localhost"
    port: 6333
    collection_name: "vtuber_knowledge"

Dependencies

  • elevenlabs>=1.0.0 - For enhanced TTS capabilities
  • Requires Ollama service for embeddings
  • Requires Qdrant vector database for RAG functionality

Breaking Changes

None - all changes maintain backward compatibility.

Review Checklist

  • Code follows project style guidelines
  • Error handling is comprehensive
  • Configuration management is robust
  • Performance implications are acceptable
  • Documentation is clear and complete
  • Testing coverage is adequate

Labels

  • enhancement - New feature implementation
  • rag - RAG system related changes
  • ai-integration - AI/ML integration work
  • configuration - Configuration management changes

Reviewers

Recommended reviewers based on affected areas:

  • Core VTuber Logic: For conversation processing changes
  • RAG System: For RAG integration components
  • Configuration: For config management changes
  • TTS Integration: For ElevenLabs TTS changes

CI/CD Status

Please verify the following checks pass:

  • All existing tests continue to pass
  • New functionality tests pass
  • Configuration validation tests pass
  • Integration tests with RAG components pass

… issues

This fix addresses the async issues mentioned in conf.yaml line 272-273:
'Note: TTS has async issues in v1.2.1, text responses work fine'

Changes:
- Use asyncio.to_thread() to wrap edge_tts.save_sync() preventing event loop conflicts
- Enhanced FFmpeg detection with WSL support for cross-platform audio conversion
- Added retry logic and file validation in TTS generation
- Improved error handling throughout TTS pipeline
- Added TTS configuration API endpoints and WebSocket message handling
- Created utility module for TTS config serialization

Technical details:
- Edge TTS was experiencing async event loop conflicts when called from uvicorn's
  async context, causing MP3 files to never be created
- Solution uses asyncio.to_thread() to run synchronous save_sync() in separate
  thread, preventing conflicts with uvicorn's event loop
- FFmpeg detection now supports Windows PATH, WSL, and common installation paths
- WSL FFmpeg conversion uses explicit subprocess calls via 'wsl' executable
- Retry logic ensures robustness against transient network or file system issues
This PR improves image reception, validation, and transmission to vision models,
ensuring screen sharing images are properly processed and sent to LLM vision models.

Changes:
- Enhanced image format validation and conversion
- Better error handling for image processing
- Improved logging for debugging image transmission
- Support for image source variations (screen/screenshot)
- Enhanced image transmission logging for LLM API calls

Technical details:
- Added comprehensive image reception logging in conversation_handler
- Improved image format validation in conversation_utils with better error messages
- Enhanced image format handling in basic_memory_agent for LLM compatibility
- Added image transmission logging in openai_compatible_llm to track when images
  are sent to vision models
- Support for common image source variations (screen/screenshot/camera/upload)
- Automatic conversion of base64 data to data URL format when needed
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @diannt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the VTuber experience by integrating a Retrieval-Augmented Generation (RAG) system with proactive tool calling. The core purpose is to make VTuber interactions more engaging and informative by allowing the system to intelligently analyze user interests and proactively provide relevant, context-aware information. This integration is designed to be robust, with lazy loading of components, connection validation for external services, and graceful degradation to ensure system stability and backward compatibility.

Highlights

  • RAG System Integration: A comprehensive Retrieval-Augmented Generation (RAG) system has been integrated, enabling the VTuber to provide intelligent, context-aware responses by retrieving relevant information.
  • Proactive Tool Calling: The system now includes proactive tool calling capabilities, allowing it to detect user interests and inject relevant information into conversations automatically, enhancing engagement.
  • Robust Image Processing: Improved handling of image data in conversations, including conversion of base64 images to data URLs, validation, and comprehensive error handling for various image sources.
  • Dynamic FFmpeg Configuration: Enhanced audio processing by dynamically locating and configuring FFmpeg/FFprobe, including support for Windows Subsystem for Linux (WSL), to improve audio conversion reliability.
  • Lazy Initialization & Graceful Degradation: The RAG system components, including the proactive tool caller, are initialized lazily, and the system is designed for graceful degradation if required services (like Ollama or Qdrant) are unavailable.
  • ElevenLabs TTS Dependency: The uv.lock file has been updated to include elevenlabs==2.21.0, indicating support for ElevenLabs Text-to-Speech capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@diannt diannt added enhancement New feature or request documentation Improvements or additions to documentation labels Nov 22, 2025

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature by integrating a RAG system for proactive tool calling, which is a great step towards more intelligent and context-aware conversations. The related improvements to image handling, TTS robustness, and FFmpeg detection are also valuable additions. My review focuses on improving code quality by adhering to Python's standard practices, specifically regarding module imports and code deduplication. These changes will enhance the maintainability and readability of the new code.

)
except Exception as e:
logger.error(f"Error processing image {i + 1}: {e}")
import traceback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import traceback statement is inside the except block. According to PEP 8, which this project follows, imports should always be placed at the top of the file.1 This improves readability and makes dependencies clear.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

)
except Exception as e:
logger.error(f"Error processing image {i}: {e}")
import traceback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

if proactive_tool_results:
logger.info(f"Proactive tool results: {proactive_tool_results[:200]}...")
# Inject proactive tool results into batch input
from ..agent.input_types import TextData, TextSource

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This local import should be moved to the top of the file to follow standard Python conventions (PEP 8).1 While lazy imports can be useful, in this case, TextData and TextSource are simple data classes and can be imported at the top without issue.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

return

# Verify file exists before processing
import os

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import os statement is inside the _process_tts method. According to PEP 8, imports should be at the top of the file.1 Please move this to the top-level imports for consistency and readability.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.


except Exception as e:
logger.error(f"Error preparing audio payload: {e}")
import traceback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

f"Edge TTS: Error on attempt {attempt + 1}/{max_retries}: {type(e).__name__}: {e}"
)
logger.debug(f"Edge TTS: File path attempted: {file_name}")
import traceback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

logger.critical(
"It's possible that edge-tts is blocked in your region or there's a network issue."
)
import traceback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

Comment on lines +21 to +22
import subprocess
import platform

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The imports for subprocess and platform are inside the _configure_ffmpeg function. According to PEP 8, imports should be at the top of the file.1 Please move these to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

global _wsl_ffmpeg, _wsl_ffprobe
if _wsl_ffmpeg:
# Convert Windows absolute path to WSL path
import subprocess

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import subprocess statement is inside the prepare_audio_payload function. According to PEP 8, imports should be at the top of the file.1 Please move this to the top-level imports.

Style Guide References

Footnotes

  1. The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file.

Comment on lines +33 to +43
def _build_tts_config_dict(tts_config) -> dict:
"""
Build TTS configuration dictionary from TTSConfig object.

Args:
tts_config: TTSConfig instance

Returns:
dict: TTS configuration dictionary
"""
return build_tts_config_dict(tts_config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This wrapper function _build_tts_config_dict is redundant as it just calls the imported build_tts_config_dict function from tts_config_utils. You can remove this wrapper and directly use the imported function where needed to simplify the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant