Real-time video captioning powered by FastVLM-0.5B AI model, built with vanilla JavaScript (no frameworks!).
- WebGPU-enabled browser
- Camera/webcam
-
Start a local server:
# Using Python python -m http.server 8000 # Or Node.js npx http-server -p 8000
-
Open in browser:
http://localhost:8000 -
Grant camera permission when prompted
-
Click "Start Live Captioning" to load the AI model
-
Wait for model to load (~1-2 minutes first time, cached after)
-
Start captioning! The AI will describe what it sees in real-time
- Real-time video captioning using AI
- Runs entirely in browser - no server needed, works offline
- WebGPU acceleration for fast inference
- Modern glass morphism UI
- Draggable interface elements
- Custom prompts - ask the AI anything about the video
- Zero dependencies - pure vanilla JavaScript
Use the prompt input (bottom-left) to ask specific questions:
- "What is the color of my shirt?"
- "Identify any text or written content visible."
- "What emotions or actions are being portrayed?"
Or click suggestion chips for quick prompts.
- Play/Pause button (top-left) - Start/stop captioning
- Drag containers - Move prompt input and caption display anywhere
- Suggestion chips - Quick prompt selection
fastvlm-webgpu/
├── index.html # Entry point
├── favicon.ico # Favicon
├── styles/
│ ├── main.css # Base styles
│ └── components.css # Component styles
└── js/
├── main.js # App entry
├── utils/ # Helpers
├── services/ # Webcam & AI
└── components/ # UI components
| Browser | Support | Notes |
|---|---|---|
| Chrome 113+ | ✅ | Full support |
| Edge 113+ | ✅ | Full support |
| Firefox 141+ | ✅ | Full support |
| Safari 26 Beta | WebGPU experimental |
All loaded via CDN (no npm install needed!):
- @huggingface/transformers - AI model inference
- Model: FastVLM-0.5B-ONNX (quantized for browser)
- Framework: Vanilla JavaScript (ES6 modules)
- AI Library: Transformers.js
- Acceleration: WebGPU
- Architecture: Event-driven component system
- Built on Hugging Face Transformers.js
- Uses FastVLM-0.5B-ONNX model
- Rewritten in vanilla JS from React version

