Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
e8caea9
feat: Fine-tune SAM 2 / 2.1 on user annotations (bnsreenu#73)
cofade Jun 19, 2026
34a34a2
fix: Address senior review on SAM fine-tuning
cofade Jun 20, 2026
bc6ec58
fix: Restore SAM UI if training setup fails; review nits
cofade Jun 20, 2026
75df282
fix: ASCII arrow in export_sam_dataset print (cp1252 console safety)
cofade Jun 20, 2026
1fbc269
fix: Correct SAM fine-tuning mask shift + UI papercuts (PR #21)
cofade Jun 20, 2026
80e2a22
chore: Drop redundant "Project Opened" success dialog
cofade Jun 20, 2026
af1479b
fix: Light-mode contrast for DINO threshold-table headers
cofade Jun 20, 2026
ae7773e
feat: Canvas mask selection, multi-delete & visible selection styling…
cofade Jun 21, 2026
996fba7
feat: Editable bboxes + out-of-bounds clamp/clip (bnsreenu#40, #32, #36)
cofade Jun 21, 2026
f5eea48
fix: Address senior-review P1s — list-selection edit loss + move clam…
cofade Jun 21, 2026
96be56e
refactor: Address senior-review P2s (perf + clarifying comments)
cofade Jun 21, 2026
879960f
fix: #40 resize/move now works on any shape, not just bbox-typed anno…
cofade Jun 21, 2026
216fd6e
fix: Harden bbox-only hit-test + refresh stale shape-edit comments
cofade Jun 21, 2026
6df7544
feat: Reversible per-annotation mask complexity (Detail %) — closes b…
cofade Jun 22, 2026
3a983d9
fix: Address senior-review P1s at the #24×#40 seam
cofade Jun 22, 2026
5aad278
test+cleanup: Make P1b test a real guard; drop dead delete_annotation
cofade Jun 22, 2026
20f15d7
feat: Undo/redo, frictionless delete/merge, exclusive tools, splitter…
cofade Jun 25, 2026
34526a6
feat: Make polygon vertex edits undoable + fix their save/Esc discipline
cofade Jun 25, 2026
833dbe9
fix: Exit vertex-edit mode on slice switch (parity with image switch)
cofade Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,17 @@ src/digitalsreeni_image_annotator/
├── __init__.py # Public API re-exports
├── core/ # constants, annotation_utils, image_utils
├── controllers/ # 7 controllers (project, image, sam, dino,
│ # yolo, annotation, class) + io_controller
├── controllers/ # 8 controllers (project, image, sam,
│ # sam_train, dino, yolo, annotation,
│ # class) + io_controller
├── widgets/
│ ├── image_label.py # ImageLabel canvas widget (dispatcher)
│ ├── canvas_context.py # CanvasContext read accessor (ADR-018)
│ └── tools/ # Per-tool handlers (ADR-019): rectangle,
│ # polygon, paint, eraser
├── inference/ # sam_utils.py, dino_utils.py
├── training/ # SAM fine-tuning (ADR-021): sam_trainer.py
│ # (SAMFineTuner), sam_dataset.py
├── io/ # export_formats.py, import_formats.py
├── ui/ # menu_bar, sidebar, shortcuts, theme, stylesheets
└── dialogs/ # Standalone tool dialogs (statistics,
Expand All @@ -76,8 +79,10 @@ src/digitalsreeni_image_annotator/
| `SAMController` | controllers/sam_controller.py | SAM model picker, debounce, in-flight guard (ADR-013) |
| `DINOController` | controllers/dino_controller.py | DINO single/batch detection, batch review, temp-class workflow |
| `YOLOController` | controllers/yolo_controller.py | YOLO training menu + prediction wiring |
| `SAMUtils` | inference/sam_utils.py | Load SAM models, run inference |
| `SAMUtils` | inference/sam_utils.py | Load SAM models (built-in + fine-tuned), run inference |
| `DINOUtils` | inference/dino_utils.py | Grounding-DINO model load + inference |
| `SAMFineTuner` | training/sam_trainer.py | Fine-tune SAM 2 decoder/encoder via custom loop over Ultralytics SAM2Model (ADR-021) |
| `SAMTrainController` | controllers/sam_train_controller.py | SAM fine-tune menu, GPU gate, training thread, selector registration |

See [Building Block View](docs/05_building_block_view.md) for detailed class documentation.

Expand Down Expand Up @@ -169,6 +174,13 @@ See [Runtime View](docs/06_runtime_view.md#multi-dimensional-image-loading) for
| GPU model unload | `model.cpu()` → `gc.collect()` → `torch.cuda.empty_cache()` + `ipc_collect()` + `synchronize()` — full reclaim requires app restart due to per-process CUDA context | Setting refs to None alone leaves circular refs pinned and shows zero Task Manager drop. See [Releasing Model GPU Memory](docs/08_crosscutting_concepts.md#releasing-model-gpu-memory). |
| Export image-path lookup | Exact-key match first, substring fallback only | `"bee.jpg" in "honeybee.jpg"` is True — substring-only matching writes the wrong file. See [Export Format Filename Matching](docs/08_crosscutting_concepts.md#export-format-filename-matching). |
| F2 / global shortcuts | Use `QShortcut` with `Qt.ShortcutContext.ApplicationShortcut`, not `keyPressEvent` | `QTableWidget` consumes F2 for in-cell edit before it bubbles up. |
| Canvas ↔ list selection sync | Canvas selection (idle-mode click/Shift/rubber-band) drives the annotation list via `apply_canvas_selection`; mirror the list with `blockSignals(True/False)` and match annotations by **value-equality**, never identity | PyQt round-trips `UserRole` dicts as copies and `image_label.annotations` is a deepcopy, so identity is never stable; un-blocked `setSelected` recurses through `update_highlighted_annotations`. Multi-select uses **Shift** (Ctrl stays pan). See [ADR-022](docs/09_architecture_decisions.md#adr-022-canvas-mask-selection-unified-with-the-annotation-list). |
| Selection rendering | Don't recolour a selected mask. Keep its class colour; draw a class-colour-independent overlay (dashed `_SELECTION_COLOR` blue bounding box + bright handle squares at corners/edge-midpoints, OGP-style) in a final pass — `_draw_selection_overlay`. For a single selected shape those handles are draggable (resize/move, any shape). Default class colours come from `core/constants.py::default_class_color` (red last, muted) | Red selection was invisible on a red-class mask; a thin dashed outline alone was too faint; the handles carry the visibility. See ADR-022 amendment. |
| Shape editing (#40) | Direct manipulation on the selection handles for **any** single selected shape (`_single_selected_shape()` — most shapes are `"segmentation"`, not `"bbox"`; gating on a bbox key made it unreachable). `_begin_shape_edit` records `kind`: a `"seg"` polygon **scales** its vertices (`_scale_segmentation`) / translates them; a `"bbox"` edits `[x,y,w,h]`. `_draw_selection_overlay` + `_bbox_handle_at` share `_bbox_handle_points` (visual == grab); resize anchors the opposite side (`_resize_bbox`); interior drag moves, **drag-gated**. `_sync_bbox_key` keeps an imported bbox consistent. Clamp + `bboxEditCommitted` → `commit_bbox_edit` on release; Esc reverts | Handles drawn since #75 were visual-only; dispatch sits before the rubber-band branch but stays gated on `_is_select_mode()`. Names keep the `bbox_edit` prefix = "edit via the bounding-box handles". See [ADR-023](docs/09_architecture_decisions.md). |
| Bounds enforcement (#32/#36) | No commit may persist coords outside the image. **Clamp** manual edits (`clamp_segmentation`/`clamp_bbox`, per-coordinate, count-preserving) at edit commit; **clip** augmented polygons (`clip_polygon_to_bounds`, shapely intersection, may drop → `None`, augmenter must `continue`). Drawn shapes already clip in `finish_polygon`/`finish_rectangle` | Clamp keeps vertex correspondence mid-edit; clip is geometrically correct for batch augmentation. See [ADR-024](docs/09_architecture_decisions.md). |
| Annotations table + Detail % (#24) | The Annotations panel is a **`QTableWidget`** (ID \| Class \| Area \| Detail %), not a list — col 0's UserRole holds the annotation (the #75 value-equality marker). Selection-mirror uses **`setRangeSelected`** (additive); `selectRow()` replaces in `ExtendedSelection`. Per-row Detail % spinbox → `on_detail_pct_changed` resolves the live obj (`_live_annotation`), simplifies from a lazy-captured `segmentation_raw` (`simplify_polygon`, 100=raw), refreshes Area+UserRole in place, saves. Connect `valueChanged` **after** the initial `setValue` so building the table doesn't fire it | Re-homing the selection bridge onto a table is the risk; reversibility needs the raw preserved. See [ADR-025](docs/09_architecture_decisions.md). |
| Undo/redo (ADR-026) | **Snapshot** the whole per-image annotation dict, don't replay commands — restoring a deep copy sidesteps value-equality/renumber/`segmentation_raw`. `AnnotationController.record_history()` is the choke-point, called **before** each synchronous mutation (finish poly/rect, delete, merge, change-class, eraser, SAM/DINO accept); **don't** hook `save_current_annotations` (also fires on navigation, runs after mutation). Deferred gestures (bbox drag, paint, **polygon vertex edit**) capture the baseline at **start** via `editBaselineRequested` and push at commit (`commit_edit_baseline` via `commit_bbox_edit`/`commit_polygon_edit`/batch-saved); a deep-equal dedup in `AnnotationHistory.record` drops aborted ones. Vertex edit also gained a save-discipline fix — `commit_polygon_edit` now calls `save_current_annotations` and Esc reverts the in-place drags. Detail-% drags coalesce to one entry. Ctrl+Z/Y are `ApplicationShortcut`s; `_undo_blocked` no-ops during load/modal/text-focus/in-flight gesture | Delete/merge confirmation+success dialogs were **removed** (undo is the net); merge always deletes originals. See [ADR-026](docs/09_architecture_decisions.md#adr-026-snapshot-based-undoredo-for-annotation-edits). |
| Tool activation + Esc | **All six tools (manual + SAM) funnel through `ImageAnnotator.activate_tool(name)`** — the only place `current_tool`, `sam_*_active`, and button checks change, so they can't drift and a SAM tool can't be active with a manual one. Keep `tool_group` non-exclusive (need click-to-toggle-off); `activate_tool` unchecks the others (block-signals around `setChecked`). Esc cancels the in-progress shape **and** emits `selectModeRequested` → `activate_tool(None)`, so Esc always returns to selection mode | SAM toggles used to write state ad-hoc; the group was non-exclusive. See [Tool Activation](docs/08_crosscutting_concepts.md#tool-activation--one-choke-point-mutually-exclusive). |

## Development Workflow

Expand Down Expand Up @@ -237,6 +249,7 @@ See [Risks and Technical Debt](docs/11_risks_and_technical_debt.md) for full lis
| Global | Action |
|--------|--------|
| Ctrl+N/O/S | New/Open/Save Project |
| Ctrl+Z / Ctrl+Y (or Ctrl+Shift+Z) | Undo / redo annotation edit (ADR-026) |
| Ctrl+Shift+= / Ctrl+Shift+- | UI font bigger/smaller (8-24pt, persisted via QSettings) |
| Ctrl+Shift+0 | Reset UI font size |
| F1 | Help |
Expand All @@ -245,8 +258,13 @@ See [Risks and Technical Debt](docs/11_risks_and_technical_debt.md) for full lis
|--------|--------|
| Ctrl+Wheel | Zoom |
| Ctrl+Drag | Pan |
| Click / Shift+Click (no tool) | Select / toggle mask |
| Drag / Shift+Drag (no tool) | Rubber-band select / add |
| Drag handle / inside (one shape selected) | Resize (scale) / move the shape |
| Double-click | Vertex-edit mode |
| Delete | Delete selected mask(s) — instant, undoable (no confirm dialog) |
| Enter | Finish/Accept |
| Esc | Cancel |
| Esc | Cancel in-progress shape **and** return to selection mode (deactivates the tool) |
| Up/Down | Navigate slices |
| -/= | Brush size |

Expand Down
20 changes: 19 additions & 1 deletion docs/05_building_block_view.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,23 @@ The earlier subprocess approach is documented as
[ADR-011](09_architecture_decisions.md#adr-011-run-torch-based-workers-in-isolated-subprocesses)
(Superseded).

### SAM Fine-Tuning Subsystem (`training/`)

Lets users fine-tune SAM 2 / 2.1 on their own annotations, since
Ultralytics ships no SAM trainer (ADR-021). Distinct from `inference/`
because it is *training*, not inference.

| Module | Responsibility |
|--------|----------------|
| `training/sam_trainer.py` | `SAMFineTuner` — custom decoder (optionally encoder) fine-tuning loop reusing `SAM2Predictor.get_im_features` / `prompt_inference` under autograd, focal+dice loss, AdamW, checkpoint save+reload-verify. Also geometry helpers (`polygon_to_mask`, `mask_to_xyxy`, `mask_to_point`), `make_custom_filename`, `list_custom_models`, and the `SampleGroup` lazy-rasterising dataset item. |
| `training/sam_dataset.py` | `build_groups_from_project` (live `all_annotations`) and `build_groups_from_folder` (prepared dataset) → `list[SampleGroup]`, mirroring `export_yolo_v5plus` image resolution. |
| `io/export_formats.py::export_sam_dataset` | Writes `images/` + `manifest.json` (authoritative bbox/segmentation specs) for an inspectable, re-trainable on-disk dataset. |

Fine-tuned checkpoints save as `{"model": state_dict}` and reload
through the unchanged `SAM(path)` inference path; `SAMUtils` gains a
`custom_models` registry so they appear in the SAM selector alongside
the eight built-ins.

### DINO Subsystem (Grounding DINO + SAM pipeline)

LLM-assisted detection: the user gives free-form text phrases per class,
Expand Down Expand Up @@ -196,7 +213,7 @@ export time — see [Cross-cutting Concepts](08_crosscutting_concepts.md)).

## Level 3: Controllers

Seven `QObject` controllers plus an `io_controller` helper module
Eight `QObject` controllers plus an `io_controller` helper module
carve `ImageAnnotator` into single-responsibility owners that the
orchestrator delegates to. Each `QObject` controller holds `self.mw
= main_window` and owns one slice of behaviour; the
Expand All @@ -215,6 +232,7 @@ the controller graph.
| `SAMController` | SAM box/points tool lifecycle, debounce timer, `_sam_inference_in_flight` re-entrancy guard (ADR-013), model picker. |
| `DINOController` | Single + batch detection, batch review navigation, temp-annotation accept/reject, custom-model browse, `DINOReviewEventFilter` ownership (ADR-015). |
| `YOLOController` | Training menu, `TrainingThread`, prediction dialog, result processing. |
| `SAMTrainController` | SAM fine-tuning menu, GPU gate, `SAMTrainingThread`, config dialog, registers fine-tuned checkpoints into the SAM selector (ADR-021). |
| `io_controller` *(module-level functions, not a class)* | Thin UI wrappers around the pure `io/export_formats.py` and `io/import_formats.py` modules. |

Communication: `ImageLabel` does not import controllers directly —
Expand Down
125 changes: 125 additions & 0 deletions docs/06_runtime_view.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,96 @@ User presses Enter
└─> update() to show final annotation
```

## Mask Selection & Deletion on the Canvas (issue #75)

Active only when no drawing/SAM tool is selected (`ImageLabel._is_select_mode()`).
Double-click still enters vertex-edit; Ctrl+drag still pans.

```
User clicks / drags on image (no tool active)
├─> ImageLabel mouse press/move/release
│ ├─> click → annotation_at(pos) (smallest mask, seg or bbox)
│ ├─> click empty → [] (clears selection)
│ ├─> drag → annotations_in_rect(rect) (rubber band, bounds-intersect)
│ └─> Shift → toggle (click) / add (drag)
├─> emit canvasSelectionChanged(annotations, mode) mode = replace|add|toggle
└─> AnnotationController.apply_canvas_selection()
├─> compute new set from highlighted_annotations + annotations per mode
├─> image_label.highlighted_annotations = new (blue selection overlay)
├─> mirror onto annotation_list (blockSignals while selecting)
└─> enable Merge (≥2) / Change Class (≥1)

User presses Delete (canvas focused)
├─> ImageLabel.keyPressEvent → deleteSelectionRequested
└─> AnnotationController.delete_selected_annotations() (record_history → remove → re-sort → autosave)
```

The canvas and the list share one selection (matched by dict value-equality), so
Delete/Merge/Change-Class behave the same from either surface. See ADR-022.

**Delete and merge are now frictionless and reversible.** Delete removes the
selection immediately — no "Are you sure?" confirmation and no "N deleted" success
dialog. Merge always replaces the originals with their union (no keep/delete prompt)
and shows no success dialog. Both snapshot the pre-edit state first, so **Ctrl+Z**
restores it; the removed confirmations are unnecessary now that undo is the net.
See [ADR-026](09_architecture_decisions.md#adr-026-snapshot-based-undoredo-for-annotation-edits).

## Shape Editing on the Canvas (issue #40)

When exactly one shape is selected (idle mode), its 8 selection handles become
draggable — direct manipulation, no separate mode, for **any** shape (polygon,
mask, or imported box). The geometry mutates in place so the canvas updates live;
release clamps it into the image and persists.

```
One shape selected → handles are grab targets (hover shows resize/move cursors)
├─> press on a handle → "resize" (anchor = opposite corner/edge)
├─> press inside the shape → "pending_move" → "move" once drag > 3px/zoom
│ (plain click, no drag → falls through to select)
├─> press outside → normal rubber-band selection (#75)
├─> drag → _update_bbox_drag(): mutate geometry in place
│ ├─ bbox kind → set [x,y,w,h] (resize trims; move translates)
│ └─ seg kind → scale vertices (resize) / translate (move);
│ _sync_bbox_key keeps an imported bbox consistent
├─> release → clamp into the image (ADR-024: move slides inside, resize clamps)
│ emit bboxEditCommitted
│ └─> AnnotationController.commit_bbox_edit()
│ save → rebuild list (area refreshes) → re-mirror selection → autosave
└─> Esc during drag → restore original geometry, cancel
```

Polygon vertex edits (double-click) are likewise clamped into the image on Enter.
See ADR-023 (shape editing) and ADR-024 (bounds enforcement).

## Adjusting Mask Complexity — Detail % (issue #24)

The Annotations table carries a per-row **Detail %** spinbox (100 = raw). Dialing
it down thins a dense SAM/DINO mask; dialing back to 100 restores it exactly.

```
User changes a row's Detail % spinbox (1..100)
└─> AnnotationController.on_detail_pct_changed(row, pct)
├─> resolve the live drawn object (value-equality, _live_annotation)
├─> pct == 100 → segmentation = segmentation_raw (restore)
│ pct < 100 → lazy-init segmentation_raw (first time);
│ segmentation = simplify_polygon(raw, pct) [Douglas-Peucker]
├─> recompute bbox key if present
├─> refresh the row's Area cell + UserRole in place (no rebuild)
└─> image_label.update() → save_current_annotations() → auto_save()
```

The effective (simplified) `segmentation` renders and exports; `segmentation_raw`
+ `detail_pct` persist in the `.iap`. See ADR-025.

## SAM-Assisted Annotation (SAM-box / SAM-points)

```
Expand Down Expand Up @@ -370,3 +460,38 @@ User clicks "Export" > "YOLO v8/v11"
└─> Return
```

## SAM Fine-Tuning (annotate → train → use)

See [ADR-021](09_architecture_decisions.md#adr-021-sam-fine-tuning-via-a-custom-loop-over-the-ultralytics-sam2-module).

```
User: SAM Fine-Tune (beta) > Train on Current Project…
├─> build_groups_from_project(all_annotations, image_paths, slices, image_slices)
│ polygons/bboxes → SampleGroup(image_loader, specs) (masks rasterised lazily)
├─> _gpu_gate(): resolve_torch_device(); if "cpu" → warn + let user back out
├─> SAMTrainConfigDialog: base model, epochs, lr, batch, prompt (bbox/point),
│ "also fine-tune image encoder?"
├─> deactivate_sam_tools() + lock SAM inference UI (tools, selector, menu)
│ trainer loads its OWN SAM instance; locking avoids a 2nd model on the same CUDA context
└─> SAMTrainingThread → SAMFineTuner.train(...)
│ build predictor (one warmup predict), pin device, apply freeze policy
│ for each epoch / image / instance:
│ set_image → get_im_features (no_grad when encoder frozen)
│ prompt_inference(bbox|point) under enable_grad → mask logits
│ focal+dice loss → backward → AdamW step (every batch_size instances)
│ progress_signal → TrainingInfoDialog (Stop supported)
│ save {"model": state_dict} as <name>_<base_token>.pt → reload-verify via SAM()
└─> training_finished: register in SAMUtils.custom_models,
add "★ <name>" to the SAM selector and select it
→ SAM-box / SAM-points now use the fine-tuned model

Offline variant: "Prepare SAM Dataset…" → export_sam_dataset (images/ + manifest.json),
then "Train from Dataset Folder…" → build_groups_from_folder → same training path.
```
Loading
Loading