👁️ AI Scene Narrator using Gemini & OpenCV

An AI-powered real-time scene narration system that captures images from multiple cameras, analyzes surroundings using Google's Gemini AI, and provides intelligent voice feedback through text-to-speech technology.

This project is designed to enhance accessibility and environmental awareness by converting visual surroundings into spoken descriptions.

🎓 Academic Project Information

This repository is part of my B.Tech Final Year Project and represents Module 6 of 7 in the complete system.

This module focuses on developing an AI-powered scene narration and environmental awareness assistant using computer vision and generative AI technologies.

The system integrates Gemini AI, OpenCV, image processing, and text-to-speech technologies to provide intelligent real-time voice descriptions of surroundings for accessibility and assistive applications.

The project demonstrates the practical implementation of artificial intelligence and real-time visual understanding in an accessibility-focused smart assistance system.

🚀 Features

📷 Dual Camera Image Capture
🤖 AI Scene Understanding using Gemini AI
🗣️ Real-Time Voice Narration
⚡ Fast and Lightweight Processing
🧠 Smart Image Validation System
🔊 Text-to-Speech Feedback
🛡️ Camera Failure Detection & Handling
♿ Accessibility-Focused Design
🎯 Obstacle & Object Awareness

🧠 Project Overview

The system captures images from both front and back cameras, validates the captured frames, and sends them to Gemini AI for intelligent scene understanding.

Gemini AI generates short and meaningful descriptions focused on important objects, surroundings, and obstacles. These descriptions are then converted into speech using pyttsx3, enabling hands-free environmental awareness.

The project demonstrates the practical integration of:

Computer Vision
Generative AI
Real-Time Image Processing
Voice Synthesis
Accessibility Technologies

🛠️ Technologies Used

Technology	Purpose
Python	Core Programming Language
OpenCV	Camera Access & Image Processing
Gemini AI	Scene Understanding & Description
PIL	Image Conversion
NumPy	Frame Validation
pyttsx3	Text-to-Speech Engine

📂 System Workflow

Capture images from front and back cameras
Process and extract image frames
Validate image quality using frame analysis
Send validated images to Gemini AI
Generate intelligent scene descriptions
Convert descriptions into speech using TTS
Provide real-time voice-based environmental awareness

⚙️ Installation

1️⃣ Clone the Repository

git clone https://github.com/Adithya2369/AI-Scene-Narrator.git
cd AI-Scene-Narrator

2️⃣ Install Required Packages

pip install opencv-python google-generativeai pillow numpy pyttsx3

🔑 Configure Gemini API Key

Open the Python file and replace:

GEMINI_API_KEY = "YOUR_API_KEY"

with your own Gemini API key.

▶️ Running the Project

python main.py

📸 Example Output

=== SMART SCENE VOICE SYSTEM STARTED ===

Scanning surroundings

Front scene: A person standing near a table and laptop.

Back scene: A parked motorcycle beside a wall.

=== PROCESS COMPLETE ===

🧪 Core Functionalities

📷 Camera Capture

The system captures images from both front and back cameras independently.

🧠 Frame Validation

Captured frames are validated using grayscale variance analysis to ensure image quality before AI processing.

🤖 Gemini AI Scene Analysis

Images are processed using Gemini AI to generate concise descriptions focused on:

Objects
Obstacles
Environmental context
Important surroundings

🔊 Voice Feedback

Generated descriptions are spoken aloud using pyttsx3 for real-time audio narration.

🎯 Applications

♿ Assistive Technology for Visually Impaired Users
🤖 Smart Robotics Systems
🚗 Autonomous Navigation Assistance
🏠 Smart Surveillance Systems
🧠 AI Accessibility Research
👓 Wearable AI Devices
📡 Real-Time Environmental Monitoring

📁 Project Structure

├── main.py
└── README.md

🔮 Future Enhancements

🎥 Continuous Real-Time Video Analysis
🧭 Indoor Navigation Assistance
🌍 Object Distance Estimation
📱 Mobile Application Integration
☁️ Cloud-Based AI Processing
🧑 Face & Emotion Recognition
🚨 Emergency Alert Detection

📜 License

This project is intended for educational and research purposes.

👨‍💻 Author

T. Adithya Reddy

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
block_diagram.png		block_diagram.png
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👁️ AI Scene Narrator using Gemini & OpenCV

🎓 Academic Project Information

🚀 Features

🧠 Project Overview

🛠️ Technologies Used

📂 System Workflow

⚙️ Installation

1️⃣ Clone the Repository

2️⃣ Install Required Packages

🔑 Configure Gemini API Key

▶️ Running the Project

📸 Example Output

🧪 Core Functionalities

📷 Camera Capture

🧠 Frame Validation

🤖 Gemini AI Scene Analysis

🔊 Voice Feedback

🎯 Applications

📁 Project Structure

🔮 Future Enhancements

📜 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

👁️ AI Scene Narrator using Gemini & OpenCV

🎓 Academic Project Information

🚀 Features

🧠 Project Overview

🛠️ Technologies Used

📂 System Workflow

⚙️ Installation

1️⃣ Clone the Repository

2️⃣ Install Required Packages

🔑 Configure Gemini API Key

▶️ Running the Project

📸 Example Output

🧪 Core Functionalities

📷 Camera Capture

🧠 Frame Validation

🤖 Gemini AI Scene Analysis

🔊 Voice Feedback

🎯 Applications

📁 Project Structure

🔮 Future Enhancements

📜 License

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages