SMILE-YOLO

YOLO object detection pipeline for cleanroom equipment recognition. Covers the full workflow from raw video to trained model to ROI generation.

Pipeline Overview

Raw Video
    |
    v
1. data_prep/vid_to_frame.py          -- extract frames from videos
    |
    v
2. Annotate in CVAT                   -- draw bounding boxes (see docs/cvat-guide.md)
    |
    v
3. data_prep/train_test_split.py      -- split CVAT export into train/val + cleanroom.yaml
    |
    v
4. training/train.py                  -- train YOLO model
    |
    v
5. training/eval.py                   -- evaluate on val split
    |
    v
6. inference/convert_video.py         -- convert new video to 720p
    |
    v
7. inference/generate_roi.py          -- run inference + tracking -> .roi JSON

Optional: data_prep/build_annotation_subset.py -- pre-label frames with existing model

Prerequisites

pip install ultralytics opencv-python pyyaml

ffmpeg must be on PATH (used for frame extraction and video conversion).

Docker is required for running CVAT.

Quick Start

1. Extract frames

python data_prep/vid_to_frame.py \
    --input-dir data/videos \
    --output-dir data/frames \
    --every 300

2. Annotate

See docs/cvat-guide.md for the full annotation workflow -- setting up CVAT, creating projects/tasks, distributing work across multiple annotators, drawing bounding boxes, and exporting in YOLO 1.1 format.

3. Split into train/val

After exporting from CVAT, unzip the labels and copy frames in:

unzip job_*.zip -d cvat_export

# Copy images into the export (one cp per video folder)
cp data/frames/<video-folder>/*.jpg cvat_export/obj_train_data/<video-folder>/

python data_prep/train_test_split.py \
    --raw-dir cvat_export \
    --output-dir data/datasets/cleanroom

4. Train

python training/train.py \
    --data data/datasets/cleanroom/cleanroom.yaml \
    --model yolov10m.pt \
    --device 0

Best weights are saved to runs/cleanroom/<name>/weights/best.pt.

5. Evaluate

python training/eval.py \
    --weights runs/cleanroom/v0/weights/best.pt \
    --eval-yaml data/datasets/cleanroom/cleanroom.yaml

6. Generate ROI

Convert video to 720p, then generate the .roi file:

python inference/convert_video.py input.MOV -o video_720p.mp4

python inference/generate_roi.py video_720p.mp4 \
    --weights runs/cleanroom/v0/weights/best.pt \
    --conf 0.6

The .roi file is a JSON with per-frame normalized bounding boxes, track IDs, and class labels.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data_prep		data_prep
docs		docs
inference		inference
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMILE-YOLO

Pipeline Overview

Prerequisites

Quick Start

1. Extract frames

2. Annotate

3. Split into train/val

4. Train

5. Evaluate

6. Generate ROI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SMILE-YOLO

Pipeline Overview

Prerequisites

Quick Start

1. Extract frames

2. Annotate

3. Split into train/val

4. Train

5. Evaluate

6. Generate ROI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages