feat: integrate RAG system with proactive tool calling by diannt · Pull Request #1 · diannt/Claude_Game_Companion

diannt · 2025-11-22T05:17:49Z

RAG System Integration with Proactive Tool Calling

Overview

This PR integrates a comprehensive RAG (Retrieval-Augmented Generation) system with proactive tool calling capabilities to enhance the VTuber experience with intelligent, context-aware responses.

Purpose

The changes implement a sophisticated RAG system that can proactively analyze user interests and inject relevant information into conversations, creating a more engaging and informative VTuber interaction.

Files Modified

Core Implementation Files

src/open_llm_vtuber/config_manager/main.py
- Added RAGConfig import and integration
- Added rag_config field with default factory
- Added internationalized description for RAG configuration
src/open_llm_vtuber/conversations/single_conversation.py
- Implemented proactive tool calling integration
- Added lazy initialization of proactive tool caller
- Enhanced batch input with proactive information injection
- Added proper error handling and logging
src/open_llm_vtuber/service_context.py
- Added RAG system state management (rag_enabled, rag_components)
- Implemented comprehensive init_rag() method
- Added connection validation for Ollama and Qdrant services
- Implemented graceful degradation when RAG services are unavailable
uv.lock
- Added ElevenLabs TTS dependency (elevenlabs==2.21.0)
- Updated dependency resolution

Impact on Existing Functionality

Enhanced Features

Proactive Information Injection: System can now detect user interests and proactively provide relevant information
RAG Integration: Full integration with document processing, embedding, and retrieval systems
Lazy Component Loading: Optimized initialization prevents service slowdowns

Backward Compatibility

All changes are backward compatible
RAG system is optional and gracefully degrades if services are unavailable
Existing conversation flow remains unchanged when RAG is disabled

Performance Considerations

Lazy initialization prevents unnecessary resource usage
Connection validation prevents hanging on service unavailability
Error handling ensures system stability

Related Components

This PR integrates with several existing and new RAG system components:

src/open_llm_vtuber/rag/document_processor.py - Document processing
src/open_llm_vtuber/rag/retrieval_engine.py - Information retrieval
src/open_llm_vtuber/rag/session_manager.py - Session management
src/open_llm_vtuber/rag/report_generator.py - Report generation
src/open_llm_vtuber/rag/proactive_tool_caller.py - Proactive tool invocation

Testing Notes

Manual Testing Required

RAG System Initialization
- Verify RAG system initializes when enabled in config
- Test graceful degradation when Ollama/Qdrant are unavailable
- Confirm connection validation works correctly
Proactive Tool Calling
- Test proactive information injection during conversations
- Verify lazy initialization of proactive tool caller
- Test error handling in proactive tool calling path
Configuration Management
- Verify RAG configuration loads correctly
- Test internationalized descriptions display properly
- Confirm default configuration behavior

Integration Testing

Test with existing conversation flow
Verify no impact on non-RAG conversations
Test ElevenLabs TTS integration

Configuration Requirements

rag_config:
  enabled: true
  ollama:
    base_url: "http://localhost:11434"
    embedding_model: "nomic-embed-text"
  qdrant:
    host: "localhost"
    port: 6333
    collection_name: "vtuber_knowledge"

Dependencies

elevenlabs>=1.0.0 - For enhanced TTS capabilities
Requires Ollama service for embeddings
Requires Qdrant vector database for RAG functionality

Breaking Changes

None - all changes maintain backward compatibility.

Review Checklist

Code follows project style guidelines
Error handling is comprehensive
Configuration management is robust
Performance implications are acceptable
Documentation is clear and complete
Testing coverage is adequate

Labels

enhancement - New feature implementation
rag - RAG system related changes
ai-integration - AI/ML integration work
configuration - Configuration management changes

Reviewers

Recommended reviewers based on affected areas:

Core VTuber Logic: For conversation processing changes
RAG System: For RAG integration components
Configuration: For config management changes
TTS Integration: For ElevenLabs TTS changes

CI/CD Status

Please verify the following checks pass:

All existing tests continue to pass
New functionality tests pass
Configuration validation tests pass
Integration tests with RAG components pass

… issues This fix addresses the async issues mentioned in conf.yaml line 272-273: 'Note: TTS has async issues in v1.2.1, text responses work fine' Changes: - Use asyncio.to_thread() to wrap edge_tts.save_sync() preventing event loop conflicts - Enhanced FFmpeg detection with WSL support for cross-platform audio conversion - Added retry logic and file validation in TTS generation - Improved error handling throughout TTS pipeline - Added TTS configuration API endpoints and WebSocket message handling - Created utility module for TTS config serialization Technical details: - Edge TTS was experiencing async event loop conflicts when called from uvicorn's async context, causing MP3 files to never be created - Solution uses asyncio.to_thread() to run synchronous save_sync() in separate thread, preventing conflicts with uvicorn's event loop - FFmpeg detection now supports Windows PATH, WSL, and common installation paths - WSL FFmpeg conversion uses explicit subprocess calls via 'wsl' executable - Retry logic ensures robustness against transient network or file system issues

This PR improves image reception, validation, and transmission to vision models, ensuring screen sharing images are properly processed and sent to LLM vision models. Changes: - Enhanced image format validation and conversion - Better error handling for image processing - Improved logging for debugging image transmission - Support for image source variations (screen/screenshot) - Enhanced image transmission logging for LLM API calls Technical details: - Added comprehensive image reception logging in conversation_handler - Improved image format validation in conversation_utils with better error messages - Enhanced image format handling in basic_memory_agent for LLM compatibility - Added image transmission logging in openai_compatible_llm to track when images are sent to vision models - Support for common image source variations (screen/screenshot/camera/upload) - Automatic conversion of base64 data to data URL format when needed

gemini-code-assist · 2025-11-22T05:18:25Z

Summary of Changes

Hello @diannt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the VTuber experience by integrating a Retrieval-Augmented Generation (RAG) system with proactive tool calling. The core purpose is to make VTuber interactions more engaging and informative by allowing the system to intelligently analyze user interests and proactively provide relevant, context-aware information. This integration is designed to be robust, with lazy loading of components, connection validation for external services, and graceful degradation to ensure system stability and backward compatibility.

Highlights

RAG System Integration: A comprehensive Retrieval-Augmented Generation (RAG) system has been integrated, enabling the VTuber to provide intelligent, context-aware responses by retrieving relevant information.
Proactive Tool Calling: The system now includes proactive tool calling capabilities, allowing it to detect user interests and inject relevant information into conversations automatically, enhancing engagement.
Robust Image Processing: Improved handling of image data in conversations, including conversion of base64 images to data URLs, validation, and comprehensive error handling for various image sources.
Dynamic FFmpeg Configuration: Enhanced audio processing by dynamically locating and configuring FFmpeg/FFprobe, including support for Windows Subsystem for Linux (WSL), to improve audio conversion reliability.
Lazy Initialization & Graceful Degradation: The RAG system components, including the proactive tool caller, are initialized lazily, and the system is designed for graceful degradation if required services (like Ollama or Qdrant) are unavailable.
ElevenLabs TTS Dependency: The uv.lock file has been updated to include elevenlabs==2.21.0, indicating support for ElevenLabs Text-to-Speech capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature by integrating a RAG system for proactive tool calling, which is a great step towards more intelligent and context-aware conversations. The related improvements to image handling, TTS robustness, and FFmpeg detection are also valuable additions. My review focuses on improving code quality by adhering to Python's standard practices, specifically regarding module imports and code deduplication. These changes will enhance the maintainability and readability of the new code.

gemini-code-assist · 2025-11-22T05:20:33Z

+                        )
+                except Exception as e:
+                    logger.error(f"Error processing image {i + 1}: {e}")
+                    import traceback


The import traceback statement is inside the except block. According to PEP 8, which this project follows, imports should always be placed at the top of the file.¹ This improves readability and makes dependencies clear.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z

+                )
+            except Exception as e:
+                logger.error(f"Error processing image {i}: {e}")
+                import traceback


The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.¹ Please move this import to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z

+                    if proactive_tool_results:
+                        logger.info(f"Proactive tool results: {proactive_tool_results[:200]}...")
+                        # Inject proactive tool results into batch input
+                        from ..agent.input_types import TextData, TextSource


This local import should be moved to the top of the file to follow standard Python conventions (PEP 8).¹ While lazy imports can be useful, in this case, TextData and TextSource are simple data classes and can be imported at the top without issue.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z

+                return
+
+            # Verify file exists before processing
+            import os


The import os statement is inside the _process_tts method. According to PEP 8, imports should be at the top of the file.¹ Please move this to the top-level imports for consistency and readability.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z


        except Exception as e:
            logger.error(f"Error preparing audio payload: {e}")
+            import traceback


The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.¹ Please move this import to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z

+                    f"Edge TTS: Error on attempt {attempt + 1}/{max_retries}: {type(e).__name__}: {e}"
+                )
+                logger.debug(f"Edge TTS: File path attempted: {file_name}")
+                import traceback


The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.¹ Please move this import to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:34Z

+            logger.critical(
+                "It's possible that edge-tts is blocked in your region or there's a network issue."
+            )
+            import traceback


The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.¹ Please move this import to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:35Z

+    import subprocess
+    import platform


The imports for subprocess and platform are inside the _configure_ffmpeg function. According to PEP 8, imports should be at the top of the file.¹ Please move these to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:35Z

+        global _wsl_ffmpeg, _wsl_ffprobe
+        if _wsl_ffmpeg:
+            # Convert Windows absolute path to WSL path
+            import subprocess


The import subprocess statement is inside the prepare_audio_payload function. According to PEP 8, imports should be at the top of the file.¹ Please move this to the top-level imports.

Style Guide References

Footnotes

The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩

gemini-code-assist · 2025-11-22T05:20:35Z

+def _build_tts_config_dict(tts_config) -> dict:
+    """
+    Build TTS configuration dictionary from TTSConfig object.
+
+    Args:
+        tts_config: TTSConfig instance
+
+    Returns:
+        dict: TTS configuration dictionary
+    """
+    return build_tts_config_dict(tts_config)


This wrapper function _build_tts_config_dict is redundant as it just calls the imported build_tts_config_dict function from tts_config_utils. You can remove this wrapper and directly use the imported function where needed to simplify the code.

diannt added 3 commits November 5, 2025 02:27

feat: integrate RAG system with proactive tool calling

74788f1

diannt added enhancement New feature or request documentation Improvements or additions to documentation labels Nov 22, 2025

gemini-code-assist Bot reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate RAG system with proactive tool calling#1

feat: integrate RAG system with proactive tool calling#1
diannt wants to merge 3 commits into
mainfrom
feat/rag-proactive-tool-calling

diannt commented Nov 22, 2025

Uh oh!

gemini-code-assist Bot commented Nov 22, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

diannt commented Nov 22, 2025

RAG System Integration with Proactive Tool Calling

Overview

Purpose

Files Modified

Core Implementation Files

Impact on Existing Functionality

Enhanced Features

Backward Compatibility

Performance Considerations

Related Components

Testing Notes

Manual Testing Required

Integration Testing

Configuration Requirements

Dependencies

Breaking Changes

Review Checklist

Labels

Reviewers

CI/CD Status

Uh oh!

gemini-code-assist Bot commented Nov 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist Bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!