See web/index.html for dashboard, run API with uvicorn edge.app:app --reload.
What it does / problem it solves Small warehouses and fabrication shops rarely have the budget for enterprise safety analytics. SafeSight turns cheap IP cameras into a real-time safety co-pilot: it detects people, tools, forklifts, PPE compliance (hardhats/vests), unsafe postures, and blocked fire exits; summarizes incidents; and posts alerts with short video clips to Slack/Teams. Managers get a timeline, heat-maps, and trend reports that correlate with shift schedules and weather (e.g., more near-misses on rainy deliveries).
Based off of key huggingface tasks
- Object Detection (forklifts, pallets, fire extinguishers, spill kits, etc.)
- Keypoint Detection (human pose to flag bending/lifting risk)
- Video Classification (action recognition: “running forklift,” “person-in-no-go zone”)
- Zero-Shot Object Detection (quickly add “new” PPE types or signage without relabeling)
- Image-to-Text (generate incident summaries: “Person without hardhat near forklift aisle.”)
- Multi-task + multi-modal pipeline.
- Edge + cloud architecture that respects bandwidth & privacy (on-prem inference, cloud analytics).
- Zero shot extensibility so customers add new rules via natural language (“alert when a ladder is left unattended”).
- Operational ROI: fewer incidents, audit trails for OSHA/insurance, quantified “risk score per hour.”
- Python + FastAPI microservice; GStreamer or FFmpeg to read RTSP.
- PyTorch/ONNX Runtime (or OpenVINO on Intel NUC) for low-latency inference.
Models from Hugging Face:
- Object detection: YOLOv8/YOLOv10 or DETR variants hosted on HF.
- Pose / Keypoint Detection: YOLO-Pose or RTMPose.
- Video Classification: TimeSformer/ViViT distilled variants.
- Zero-Shot Detection: Grounding-DINO + OpenCLIP checkpoints.
- Image to Text: BLIP-2 / Qwen-VL / Llava-Next (small) for on-device captions.
- Local stream router: Redis Streams or NATS JetStream for frames + metadata.
Cloud for Analytics and UI:
- API: FastAPI + PostgreSQL/TimescaleDB (events/time-series).
- Message bus: Kafka (or managed: Redpanda Cloud) to ingest site events.
- Object storage: S3/Cloudflare R2 for clips & snapshots.
- Dashboard: Next.js (React) + WebRTC live thumbnails; Mapbox GL heatmaps; Recharts for trends.
- Alerts: Slack/Teams webhooks, Twilio SMS.
- Auth & tenancy: Auth0 / Clerk.
- Deployment: Docker + Kubernetes (k3s at edge, managed K8s in cloud) with ArgoCD.
- CI: GitHub Actions; model registry via Hugging Face Hub + release tags.
- OpenWeather (weather events → correlate with incident spikes).
- Workable/BambooHR (shift schedules to segment metrics by crew).
- Slack/Teams for alert delivery + feedback buttons to label false positives (active learning).
Rule Builder (no code)
- “If person without hardhat within 3m of forklift → High severity alert.”
- Zero-shot extensions: add objects with a phrase (“blue nitrile gloves”)—Grounding-DINO + CLIP text prompts.
Real-time Alerts
- 10–15s clip, bounding boxes + pose overlay, image-to-text caption, rule match, confidence.
- Buttons: “Valid / False Positive / Escalate,” feeding a feedback topic for re-training.
Risk Analytics
- Heatmap of incidents by camera and floor plan; trend lines by shift, weather, and line.
- “Top 5 recurring hazards” with generated summaries.
Privacy & Compliance: Optional on-edge blurring of faces; configurable retention; per-tenant encryption.
Capture: RTSP → GStreamer → frame sampler (e.g., 10 FPS for detection, 2 FPS for pose).
Per-task workers:
- Detection → tracks with BYTETrack/DeepSORT to maintain identities over frames.
- Pose → compute unsafe-posture heuristics (back angle > X°, knees locked, etc.).
- Video classification → sliding windows (2–4 s) for “dangerous action” labels.
Rule engine: CEP (Complex Event Processing) over tracks (e.g., person id #37 and forklift id #8 distance < 2.5m for >1.5 s).
Summarizer: Image crop(s) → Image-to-Text → templated incident card.
Publisher: Event JSON + clip to Kafka → Cloud API → store, alert, and render in UI.
Scalability:
- Add cameras by adding edge workers; throughput scales horizontally.
- Models swapped via HF Hub tags (e.g., safesight/detector:v0.3), rolled out with canary on a subset of sites.
- Data schema designed for multi-tenant (org_id, site_id, camera_id) and time-series queries.
- Seed with public forklift/warehouse videos (YouTube-8M style), synth data from Roboflow Universe + your own recordings.
- Create a labeled validation set of 200–400 clips; track precision/recall, false-alert rate/hour, latency p95.
- A/B compare detectors (DETR vs YOLOv10) and pose models; log model+hash per incident for reproducibility.