Skip to content

JoeyBobDev/SafeSight

Repository files navigation

SafeSight — Edge to Cloud Computer Vision Analytics for Workplace Safety (In Progress)

See web/index.html for dashboard, run API with uvicorn edge.app:app --reload.

SafeSight Proof of Concept

What it does / problem it solves Small warehouses and fabrication shops rarely have the budget for enterprise safety analytics. SafeSight turns cheap IP cameras into a real-time safety co-pilot: it detects people, tools, forklifts, PPE compliance (hardhats/vests), unsafe postures, and blocked fire exits; summarizes incidents; and posts alerts with short video clips to Slack/Teams. Managers get a timeline, heat-maps, and trend reports that correlate with shift schedules and weather (e.g., more near-misses on rainy deliveries).

Based off of key huggingface tasks

  • Object Detection (forklifts, pallets, fire extinguishers, spill kits, etc.)
  • Keypoint Detection (human pose to flag bending/lifting risk)
  • Video Classification (action recognition: “running forklift,” “person-in-no-go zone”)
  • Zero-Shot Object Detection (quickly add “new” PPE types or signage without relabeling)
  • Image-to-Text (generate incident summaries: “Person without hardhat near forklift aisle.”)

What it brings

  • Multi-task + multi-modal pipeline.
  • Edge + cloud architecture that respects bandwidth & privacy (on-prem inference, cloud analytics).
  • Zero shot extensibility so customers add new rules via natural language (“alert when a ladder is left unattended”).
  • Operational ROI: fewer incidents, audit trails for OSHA/insurance, quantified “risk score per hour.”

Tech Stack considered

  • Python + FastAPI microservice; GStreamer or FFmpeg to read RTSP.
  • PyTorch/ONNX Runtime (or OpenVINO on Intel NUC) for low-latency inference.

Models from Hugging Face:

  • Object detection: YOLOv8/YOLOv10 or DETR variants hosted on HF.
  • Pose / Keypoint Detection: YOLO-Pose or RTMPose.
  • Video Classification: TimeSformer/ViViT distilled variants.
  • Zero-Shot Detection: Grounding-DINO + OpenCLIP checkpoints.
  • Image to Text: BLIP-2 / Qwen-VL / Llava-Next (small) for on-device captions.
  • Local stream router: Redis Streams or NATS JetStream for frames + metadata.

Cloud for Analytics and UI:

  • API: FastAPI + PostgreSQL/TimescaleDB (events/time-series).
  • Message bus: Kafka (or managed: Redpanda Cloud) to ingest site events.
  • Object storage: S3/Cloudflare R2 for clips & snapshots.
  • Dashboard: Next.js (React) + WebRTC live thumbnails; Mapbox GL heatmaps; Recharts for trends.
  • Alerts: Slack/Teams webhooks, Twilio SMS.
  • Auth & tenancy: Auth0 / Clerk.
  • Deployment: Docker + Kubernetes (k3s at edge, managed K8s in cloud) with ArgoCD.
  • CI: GitHub Actions; model registry via Hugging Face Hub + release tags.

Real World Data Usuage and APIs

  • OpenWeather (weather events → correlate with incident spikes).
  • Workable/BambooHR (shift schedules to segment metrics by crew).
  • Slack/Teams for alert delivery + feedback buttons to label false positives (active learning).

Core Features and User Experience

Rule Builder (no code)

  • “If person without hardhat within 3m of forklift → High severity alert.”
  • Zero-shot extensions: add objects with a phrase (“blue nitrile gloves”)—Grounding-DINO + CLIP text prompts.

Real-time Alerts

  • 10–15s clip, bounding boxes + pose overlay, image-to-text caption, rule match, confidence.
  • Buttons: “Valid / False Positive / Escalate,” feeding a feedback topic for re-training.

Risk Analytics

  • Heatmap of incidents by camera and floor plan; trend lines by shift, weather, and line.
  • “Top 5 recurring hazards” with generated summaries.

Privacy & Compliance: Optional on-edge blurring of faces; configurable retention; per-tenant encryption.

High Level Architecture

Capture: RTSP → GStreamer → frame sampler (e.g., 10 FPS for detection, 2 FPS for pose).

Per-task workers:

  • Detection → tracks with BYTETrack/DeepSORT to maintain identities over frames.
  • Pose → compute unsafe-posture heuristics (back angle > X°, knees locked, etc.).
  • Video classification → sliding windows (2–4 s) for “dangerous action” labels.

Rule engine: CEP (Complex Event Processing) over tracks (e.g., person id #37 and forklift id #8 distance < 2.5m for >1.5 s).

Summarizer: Image crop(s) → Image-to-Text → templated incident card.

Publisher: Event JSON + clip to Kafka → Cloud API → store, alert, and render in UI.

Scalability:

  • Add cameras by adding edge workers; throughput scales horizontally.
  • Models swapped via HF Hub tags (e.g., safesight/detector:v0.3), rolled out with canary on a subset of sites.
  • Data schema designed for multi-tenant (org_id, site_id, camera_id) and time-series queries.

Data and Evaluation

  • Seed with public forklift/warehouse videos (YouTube-8M style), synth data from Roboflow Universe + your own recordings.
  • Create a labeled validation set of 200–400 clips; track precision/recall, false-alert rate/hour, latency p95.
  • A/B compare detectors (DETR vs YOLOv10) and pose models; log model+hash per incident for reproducibility.

About

early proof of concept of a computer vision project addressing workplace risk and safety

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors