YOLO object detection pipeline for cleanroom equipment recognition. Covers the full workflow from raw video to trained model to ROI generation.
Raw Video
|
v
1. data_prep/vid_to_frame.py -- extract frames from videos
|
v
2. Annotate in CVAT -- draw bounding boxes (see docs/cvat-guide.md)
|
v
3. data_prep/train_test_split.py -- split CVAT export into train/val + cleanroom.yaml
|
v
4. training/train.py -- train YOLO model
|
v
5. training/eval.py -- evaluate on val split
|
v
6. inference/convert_video.py -- convert new video to 720p
|
v
7. inference/generate_roi.py -- run inference + tracking -> .roi JSON
Optional: data_prep/build_annotation_subset.py -- pre-label frames with existing model
pip install ultralytics opencv-python pyyamlffmpeg must be on PATH (used for frame extraction and video conversion).
Docker is required for running CVAT.
python data_prep/vid_to_frame.py \
--input-dir data/videos \
--output-dir data/frames \
--every 300See docs/cvat-guide.md for the full annotation workflow -- setting up CVAT, creating projects/tasks, distributing work across multiple annotators, drawing bounding boxes, and exporting in YOLO 1.1 format.
After exporting from CVAT, unzip the labels and copy frames in:
unzip job_*.zip -d cvat_export
# Copy images into the export (one cp per video folder)
cp data/frames/<video-folder>/*.jpg cvat_export/obj_train_data/<video-folder>/
python data_prep/train_test_split.py \
--raw-dir cvat_export \
--output-dir data/datasets/cleanroompython training/train.py \
--data data/datasets/cleanroom/cleanroom.yaml \
--model yolov10m.pt \
--device 0Best weights are saved to runs/cleanroom/<name>/weights/best.pt.
python training/eval.py \
--weights runs/cleanroom/v0/weights/best.pt \
--eval-yaml data/datasets/cleanroom/cleanroom.yamlConvert video to 720p, then generate the .roi file:
python inference/convert_video.py input.MOV -o video_720p.mp4
python inference/generate_roi.py video_720p.mp4 \
--weights runs/cleanroom/v0/weights/best.pt \
--conf 0.6The .roi file is a JSON with per-frame normalized bounding boxes, track IDs, and class labels.