Vision Sentinel is a decoupled microservices architecture designed for high-efficiency object detection. It bridges the gap between raw AI models and scalable web services by utilizing asynchronous task processing and ONNX-optimized inference.
Traditional AI scripts block the main thread during heavy math operations, leading to timeouts and poor UX. Vision Sentinel solves this by decoupling the "Reception" from the "Execution."
- The Waiter (FastAPI): Receives image uploads and immediately returns a Task ID.
- The Bridge (Redis): Acts as a message broker to queue incoming detection requests.
- The Chef (Celery): Background workers that pull tasks from Redis and perform heavy CPU-bound inference.
- The Engine (ONNX Runtime): Runs a quantized YOLOv8 model optimized specifically for hardware without dedicated GPUs.
Most AI tutorials show you how to run a model in a Jupyter Notebook. This project addresses the "last mile" of AI deployment: Productionization.
- The "Blocking" Problem: In a standard web app, running an AI model freezes the server until the math is done. I overcame this by implementing a Distributed Task Queue, allowing the server to remain responsive while the Worker processes results in the background.
- Hardware Constraints: Developed on an Intel i3 (CPU-only) environment, I couldn't rely on expensive NVIDIA GPUs. I solved this by converting the pipeline to ONNX Runtime, achieving a 3x speedup over raw PyTorch and bringing inference time down to sub-250ms.
- Environment Drift: To ensure this works on any cloud provider, I containerized the entire environment. This eliminates the "it works on my machine" excuse by locking in specific system-level dependencies for OpenCV and Python.
By converting the model to ONNX format, I achieved significant latency improvements compared to standard PyTorch weights:
| Metric | Result |
|---|---|
| Model | YOLOv8n (Nano) |
| Precision | 95% (Benchmarked) |
| Inference Time | ~180ms - 250ms (CPU-only) |
| System Load | Non-blocking (Asynchronous) |
The pipeline doesn't just predict; it visualizes. The worker performs real-time Non-Maximum Suppression (NMS) and coordinate scaling to generate clean, high-precision bounding boxes.
Organized following Modular Python principles to separate concerns between the API, the Task Worker, and the AI Logic.
vision-sentinel/
├── data/ # Persistent volumes for uploads & results
├── models/
│ └── exported/ # Optimized ONNX model weights
├── src/
│ ├── main.py # FastAPI application (The Waiter)
│ ├── tasks.py # Celery background tasks (The Chef)
│ ├── model_factory.py # ONNX Inference & OpenCV visualization
│ └── __init__.py
├── Dockerfile # Custom image definition
├── docker-compose.yml # Multi-container orchestration
└── requirements.txt # Production dependencies
- Languages: Python 3.12
- Frameworks: FastAPI, Celery, OpenCV
- AI/ML: ONNX Runtime, YOLOv8
- Infrastructure: Docker, Docker Compose, Redis
Since the project is fully orchestrated with Docker, you only need one terminal and one command.
git clone https://github.com/YOUR_USERNAME/vision-sentinel.git
cd vision-sentineldocker-compose up --build- Open your browser and go to:
👉 http://localhost:8000/docs - Expand the
POST /detectendpoint - Upload an image
- The API will return a success response while the worker processes the detection asynchronously
- Check the output locally:
data/results/ - Or open in browser:
👉 http://localhost:8000/results/detected_[your_image_name].jpg
-
System Reliability:
Leveraged Celery and Redis to handle high-traffic workloads via asynchronous task queues -
Resource Efficiency:
Applied ONNX quantization to enable efficient inference on CPU-only environments -
Cloud Readiness:
Fully containerized with Docker, making it deployment-ready for AWS ECS, GCP Cloud Run, or similar platforms
Vision Sentinel is designed for continuous evolution. The following features are currently in development:
- 🔄 CI/CD Integration: Implementing GitHub Actions for automated linting, unit testing of the inference logic, and automated Docker builds on every push.
- 🧠 Generative AI (VLM) Adaptation: Integrating a lightweight Vision-Language Model (like Moondream) to provide natural language scene descriptions triggered by specific YOLO detections.
- ☁️ Full Cloud Deployment: Transitioning the stack to AWS (ECS/App Runner) with a stateless architecture using AWS S3 for result persistence.
- 📊 Metadata Persistence: Integrating a PostgreSQL database to log historical detection data and system performance metrics.