feat: integrate RAG system with proactive tool calling#1
Conversation
… issues This fix addresses the async issues mentioned in conf.yaml line 272-273: 'Note: TTS has async issues in v1.2.1, text responses work fine' Changes: - Use asyncio.to_thread() to wrap edge_tts.save_sync() preventing event loop conflicts - Enhanced FFmpeg detection with WSL support for cross-platform audio conversion - Added retry logic and file validation in TTS generation - Improved error handling throughout TTS pipeline - Added TTS configuration API endpoints and WebSocket message handling - Created utility module for TTS config serialization Technical details: - Edge TTS was experiencing async event loop conflicts when called from uvicorn's async context, causing MP3 files to never be created - Solution uses asyncio.to_thread() to run synchronous save_sync() in separate thread, preventing conflicts with uvicorn's event loop - FFmpeg detection now supports Windows PATH, WSL, and common installation paths - WSL FFmpeg conversion uses explicit subprocess calls via 'wsl' executable - Retry logic ensures robustness against transient network or file system issues
This PR improves image reception, validation, and transmission to vision models, ensuring screen sharing images are properly processed and sent to LLM vision models. Changes: - Enhanced image format validation and conversion - Better error handling for image processing - Improved logging for debugging image transmission - Support for image source variations (screen/screenshot) - Enhanced image transmission logging for LLM API calls Technical details: - Added comprehensive image reception logging in conversation_handler - Improved image format validation in conversation_utils with better error messages - Enhanced image format handling in basic_memory_agent for LLM compatibility - Added image transmission logging in openai_compatible_llm to track when images are sent to vision models - Support for common image source variations (screen/screenshot/camera/upload) - Automatic conversion of base64 data to data URL format when needed
Summary of ChangesHello @diannt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the VTuber experience by integrating a Retrieval-Augmented Generation (RAG) system with proactive tool calling. The core purpose is to make VTuber interactions more engaging and informative by allowing the system to intelligently analyze user interests and proactively provide relevant, context-aware information. This integration is designed to be robust, with lazy loading of components, connection validation for external services, and graceful degradation to ensure system stability and backward compatibility. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature by integrating a RAG system for proactive tool calling, which is a great step towards more intelligent and context-aware conversations. The related improvements to image handling, TTS robustness, and FFmpeg detection are also valuable additions. My review focuses on improving code quality by adhering to Python's standard practices, specifically regarding module imports and code deduplication. These changes will enhance the maintainability and readability of the new code.
| ) | ||
| except Exception as e: | ||
| logger.error(f"Error processing image {i + 1}: {e}") | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. According to PEP 8, which this project follows, imports should always be placed at the top of the file.1 This improves readability and makes dependencies clear.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| ) | ||
| except Exception as e: | ||
| logger.error(f"Error processing image {i}: {e}") | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| if proactive_tool_results: | ||
| logger.info(f"Proactive tool results: {proactive_tool_results[:200]}...") | ||
| # Inject proactive tool results into batch input | ||
| from ..agent.input_types import TextData, TextSource |
There was a problem hiding this comment.
This local import should be moved to the top of the file to follow standard Python conventions (PEP 8).1 While lazy imports can be useful, in this case, TextData and TextSource are simple data classes and can be imported at the top without issue.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| return | ||
|
|
||
| # Verify file exists before processing | ||
| import os |
There was a problem hiding this comment.
The import os statement is inside the _process_tts method. According to PEP 8, imports should be at the top of the file.1 Please move this to the top-level imports for consistency and readability.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
|
|
||
| except Exception as e: | ||
| logger.error(f"Error preparing audio payload: {e}") | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| f"Edge TTS: Error on attempt {attempt + 1}/{max_retries}: {type(e).__name__}: {e}" | ||
| ) | ||
| logger.debug(f"Edge TTS: File path attempted: {file_name}") | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| logger.critical( | ||
| "It's possible that edge-tts is blocked in your region or there's a network issue." | ||
| ) | ||
| import traceback |
There was a problem hiding this comment.
The import traceback statement is inside the except block. To adhere to PEP 8, all imports should be at the top of the file.1 Please move this import to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| import subprocess | ||
| import platform |
There was a problem hiding this comment.
The imports for subprocess and platform are inside the _configure_ffmpeg function. According to PEP 8, imports should be at the top of the file.1 Please move these to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| global _wsl_ffmpeg, _wsl_ffprobe | ||
| if _wsl_ffmpeg: | ||
| # Convert Windows absolute path to WSL path | ||
| import subprocess |
There was a problem hiding this comment.
The import subprocess statement is inside the prepare_audio_payload function. According to PEP 8, imports should be at the top of the file.1 Please move this to the top-level imports.
Style Guide References
Footnotes
-
The project's coding standards specify adherence to Python's PEP 8 style guide, which recommends placing all imports at the top of the file. ↩
| def _build_tts_config_dict(tts_config) -> dict: | ||
| """ | ||
| Build TTS configuration dictionary from TTSConfig object. | ||
|
|
||
| Args: | ||
| tts_config: TTSConfig instance | ||
|
|
||
| Returns: | ||
| dict: TTS configuration dictionary | ||
| """ | ||
| return build_tts_config_dict(tts_config) |
RAG System Integration with Proactive Tool Calling
Overview
This PR integrates a comprehensive RAG (Retrieval-Augmented Generation) system with proactive tool calling capabilities to enhance the VTuber experience with intelligent, context-aware responses.
Purpose
The changes implement a sophisticated RAG system that can proactively analyze user interests and inject relevant information into conversations, creating a more engaging and informative VTuber interaction.
Files Modified
Core Implementation Files
src/open_llm_vtuber/config_manager/main.pyRAGConfigimport and integrationrag_configfield with default factorysrc/open_llm_vtuber/conversations/single_conversation.pysrc/open_llm_vtuber/service_context.pyrag_enabled,rag_components)init_rag()methoduv.lockelevenlabs==2.21.0)Impact on Existing Functionality
Enhanced Features
Backward Compatibility
Performance Considerations
Related Components
This PR integrates with several existing and new RAG system components:
src/open_llm_vtuber/rag/document_processor.py- Document processingsrc/open_llm_vtuber/rag/retrieval_engine.py- Information retrievalsrc/open_llm_vtuber/rag/session_manager.py- Session managementsrc/open_llm_vtuber/rag/report_generator.py- Report generationsrc/open_llm_vtuber/rag/proactive_tool_caller.py- Proactive tool invocationTesting Notes
Manual Testing Required
RAG System Initialization
Proactive Tool Calling
Configuration Management
Integration Testing
Configuration Requirements
Dependencies
elevenlabs>=1.0.0- For enhanced TTS capabilitiesBreaking Changes
None - all changes maintain backward compatibility.
Review Checklist
Labels
enhancement- New feature implementationrag- RAG system related changesai-integration- AI/ML integration workconfiguration- Configuration management changesReviewers
Recommended reviewers based on affected areas:
CI/CD Status
Please verify the following checks pass: