Axis — Voice-Driven Browser Agent, powered by Gemini. Built for accessibility and seamless voice-driven web navigation. #GeminiLiveAgentChallenge
-
Updated
Mar 17, 2026 - Python
Axis — Voice-Driven Browser Agent, powered by Gemini. Built for accessibility and seamless voice-driven web navigation. #GeminiLiveAgentChallenge
Cogni-Brain Omni: Gemma 4 12B on DGX Spark via vLLM. Multimodal input, Telegram voice notes, multilingual chat, tools, MTP, 196K context, and reproducible local benchmarks.
This project is a **Self-Guided AI Audio Tour Agent** that uses AI to provide interactive, voice-enabled guided tours based on your location or interest. It leverages a multimodal agent to narrate historical and cultural insights in real-time for an immersive travel experience.
A curated literature resource hub for Medical Visual Question Answering, covering surveys, datasets, benchmarks, evaluation metrics, representative methods, and multimodal medical agents, with a focus on the shift from passive answer prediction to active, evidence-seeking clinical inquiry.
Production-grade AI Agent for Revenue Cycle Management (RCM) using FastAPI and LangGraph. Features multi-agent orchestration, Vision-augmented extraction, Agentic RAG, and LIMS-integrated forensic auditing with Human-in-the-Loop state management
DESI-DIET : Personalise Dietary Companion Based on target, disease and health condition of an individual.
AI-powered wedding photo organizer using multimodal LLMs for intelligent image classification and automated photo organization.
Add a description, image, and links to the multimodal-ai-agents topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-ai-agents topic, visit your repo's landing page and select "manage topics."