Skip to content

feat: LLM-assisted detection with Grounding DINO#66

Merged
bnsreenu merged 9 commits into
bnsreenu:masterfrom
cofade:feature/llm-dino-integration
May 19, 2026
Merged

feat: LLM-assisted detection with Grounding DINO#66
bnsreenu merged 9 commits into
bnsreenu:masterfrom
cofade:feature/llm-dino-integration

Conversation

@cofade

@cofade cofade commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an LLM-assisted detection workflow: pick a class, provide free-form text phrases, and Grounding DINO produces bounding boxes that SAM 2 refines into segmentation masks.
  • Supports per-class phrase lists and per-class thresholds (box / text / NMS); single-image and batch detection with an optional "review before accepting" mode.
  • DINO inference runs in a subprocess (same pattern as sam_worker.py from feat: Python 3.14 compatibility with subprocess-based SAM #65) so transformers/torch is loaded in a clean process, avoiding the Windows + Python 3.14 DLL load order issue.

What's new

User-facing

  • DINO panel under the SAM tools: model dropdown, class threshold table, phrase editor.
  • "Detect Current Image" and "Detect All Images" actions.
  • Lazy model loading: picking a model from the dropdown sets ready state immediately; download happens on first Detect (mirrors SAM behaviour).
  • Detection results render as orange preview masks; Enter accepts, Esc rejects.
  • models/sam/ and models/<dino-model>/ directories keep downloaded weights tidy.

Implementation

  • dino_worker.py is the DINO equivalent of sam_worker.py. Parent (DINOUtils.detect()) sends a JSON request, worker returns boxes; boxes then feed SAMUtils.apply_sam_predictions_batch() for masks.
  • DINO config (phrases + thresholds) persists in the .iap project file under dino_config.
  • UTF-8 forced on both sides of the SAM/DINO worker pipes (encoding=\"utf-8\", errors=\"replace\" on the parent, PYTHONIOENCODING=utf-8 in the worker env) so cp1252-locale Windows can decode torch warnings without crashing.

Dependencies added (in requirements.txt / setup.py)

  • `transformers >= 4.40` (Grounding DINO support)
  • `huggingface_hub >= 0.20` (model download)
  • `torch >= 2.0`, `torchvision >= 0.15` (already transitive via ultralytics; now pinned explicitly because DINO uses them directly)

Docs

  • Flatten `docs/arc42/` → `docs/` (extra nesting was redundant).
  • README: new GPU acceleration note (PyPI's default torch wheel is CPU-only on Windows; point GPU users at the PyTorch CUDA index).

Repo housekeeping

  • `.gitignore`: replace explicit `sam2_.pt` list with `.pt` wildcard; add `models/`, `*.iap`, IDE/OS/log/env patterns.
  • Untrack stale `pycache/*.cpython-310.pyc` files that pre-dated the `pycache/` ignore rule.

Test plan

There are no automated tests for this feature — DINO benefits from a GPU and the existing test environment is manual-only per `CLAUDE.md`. Manual verification performed:

  • Pick a DINO model from dropdown → no Load click needed, status shows "Ready: " or "will download on first detection"
  • First Detect downloads the model (HuggingFace progress in console); subsequent Detects use the local cache
  • Single-image detection on drone test images: DINO finds the drone, SAM masks the silhouette accurately
  • Enter accepts the mask; it lands in `image_label.annotations`, renders in the class colour, persists through save/load
  • Esc rejects without saving
  • Adding/removing phrases per class works
  • Adjusting per-class thresholds takes effect on next Detect
  • CUDA path: with `torch+cu128` installed, `[SAM] Using CUDA` and `[DINO] Using device: cuda` appear in console; detection is ~10–20× faster
  • No regression in existing SAM-box / SAM-points / polygon / rectangle flows
  • Project save/load round-trips with DINO config preserved

Status

Marked as draft — open to feedback on scope, naming, or anything that should be split into smaller follow-up PRs.

🤖 Generated with Claude Code

Sebastian Wienhold and others added 9 commits May 13, 2026 17:41
…thresholds

Ports Grounding DINO + SAM segmentation from the standalone annotation_tool_v4
into the DigitalSreeni annotator as a new sidebar section.

Core changes:
- dino_worker.py: standalone subprocess running HF transformers DINO inference
  (avoids PyQt5/torch DLL conflict on Windows+Python3.14)
- dino_utils.py: PyQt5-safe wrapper spawning the worker, mirroring SAMUtils
- dino_phrase_editor.py: ClassThresholdTable + PhraseEditorPanel widgets
- dino_merge_dialog.py: merge per-image COCO JSONs into train/val splits

SAM infrastructure:
- sam_worker.py: extended for batch bbox segmentation (efficiency for DINO results)
- sam_utils.py: new apply_sam_predictions_batch() for multi-bbox in one call

UI integration (annotator_window.py):
- New "LLM-Assisted Detection" sidebar section with model selector,
  per-class threshold table, phrase editor, detect buttons, batch mode
- DINO phrases/thresholds synced with existing class add/remove
- temp_annotations overlay: DINO results shown as orange masks,
  Enter to accept, Escape to discard
- Batch detection with progress dialog: auto-accept or review mode
- DINO config persisted in .iap project files (dino_config key)
- Tools menu: "Merge COCO for Training" entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add .claude/agents/senior-reviewer.md adapted for this project
- Update CLAUDE.md with quality gate workflow:
  - Feature branch requirement (never commit to master)
  - Manual testing checklist (7 items)
  - arc42 documentation update rules
  - Senior reviewer agent as mandatory pre-PR quality gate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up fixes to the DINO integration discovered during testing,
plus repo housekeeping.

DINO UX:
- Drop the explicit "Load" button; selecting a model from the
  dropdown sets ready state immediately, with download deferred
  to the first Detect call (mirrors SAM's lazy-load model).
- Focus the canvas after detection so Enter/Esc accept/reject
  without needing an extra click.

DINO + SAM rendering:
- Stop storing DINO's xyxy bbox alongside the SAM segmentation
  in temp/accepted annotations. The polygon is the actual mask;
  a bbox is derived from it at export. Fixes the rendering bug
  where the temp annotation showed a giant rectangle instead of
  the SAM polygon (xyxy treated as xywh by drawRect).
- Route accept_dino_results through image_label.annotations
  (matching accept_sam_prediction's contract) so accepted masks
  render immediately and save_current_annotations correctly
  syncs them to all_annotations.

Subprocess workers:
- Force UTF-8 on both sides of the SAM/DINO worker pipes
  (encoding/errors on the parent, PYTHONIOENCODING=utf-8 in the
  worker env). On Windows the default cp1252 decode would crash
  on non-ASCII bytes in torch/transformers warnings, hiding the
  actual worker output.
- Move SAM weights to <project_root>/models/sam/ (parallel to
  models/<dino-model>/), with auto-mkdir on first download.

Dependencies:
- Add huggingface_hub, transformers, torch, torchvision to
  requirements.txt and setup.py — DINO failed at runtime
  without these.
- README: note that PyPI ships CPU-only torch on Windows and
  point GPU users at the PyTorch CUDA index.

Repo housekeeping:
- Flatten docs/arc42/* up to docs/ (removes redundant nesting);
  update CLAUDE.md and docs/README.md path references.
- .gitignore: replace explicit sam2_*.pt list with *.pt
  wildcard; add models/, *.iap, IDE/OS/log/env patterns.
- Untrack stale __pycache__/*.cpython-310.pyc files (the
  __pycache__/ rule only catches new additions).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DINO config persistence (P0):
- Make the widgets (PhraseEditorPanel and ClassThresholdTable) the
  single source of truth for phrases and thresholds. Remove the
  shadow self.dino_phrases / self.dino_thresholds dicts that were
  never kept in sync with user edits.
- save_project snapshots both widgets at save time.
- load_project_data pushes phrases into the panel and thresholds
  into the table after classes are created.
- Add ClassThresholdTable.set_thresholds() and clear_classes(),
  and PhraseEditorPanel.clear() for these flows.

Batch auto-accept data loss (P0):
- _commit_dino_results now routes through image_label.annotations
  when committing to the currently-displayed image, so the canvas
  reflects the change and the next save_current_annotations() call
  does not overwrite it. Mirrors the single-image accept_dino_results
  fix from the previous commit.
- Also adds the missing add_class() call for unknown class names
  returned by DINO (matching single-mode behaviour).

Model-path resolution consolidation (P1):
- New utils.models_base_dir() is the single source of truth for the
  models/ directory used by both SAM and DINO. Resolves to
  <project_root>/models in editable installs and falls back to
  <cwd>/models for site-packages installs.
- sam_worker.py bootstraps sys.path so the standalone subprocess
  can import the helper.
- dino_utils.py and annotator_window.py route through the same
  helper instead of duplicating heuristics.

Other P1s:
- .gitignore: un-ignore .claude/agents/ explicitly so the tracked
  senior-reviewer.md isn't shadowed by the .claude rule.
- DINO detection paths now preflight-check that a SAM model is
  selected before launch (avoids the misleading "SAM segmentation
  failed" error).
- arc42 docs: add ADR-011 (subprocess workers for torch isolation)
  and ADR-012 (lazy model load on dropdown selection); document
  DINO subsystem in the building block view; add DINO+SAM workflow
  to the runtime view; add DINO/NMS/Phrase/Subprocess-Worker
  entries to the glossary.
- clear_all now clears the PhraseEditorPanel and ClassThresholdTable.

P2:
- Bump torch floor to >=2.2.0 (first NumPy-2-compatible release;
  prior floor was inconsistent with numpy>=2.4.0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P0 regression (introduced in the previous fix-up commit):
- sam_worker.py imported `digitalsreeni_image_annotator.utils`, which
  triggered the package __init__.py that imports ImageAnnotator from
  annotator_window — pulling PyQt5 into the subprocess and reproducing
  the WinError 1114 DLL load-order bug that ADR-011 exists to prevent.
- Revert to inlined `_models_base_dir()` in sam_worker.py (5 lines).
  Keep in sync with utils.models_base_dir(). Comment explains the
  invariant.
- Verified: importing sam_worker.py without going through the package
  no longer pulls in PyQt5.

P1: dead path-resolution in dino_utils.detect() removed — GDINO_MODEL_PATHS
now holds absolute paths, and custom_model_path from QFileDialog is also
always absolute, so the relative-path normalisation branch was unreachable.

P1: drop in-loop add_class() calls in batch _commit_dino_results and
single accept_dino_results. DINO can only return labels from the
class_configs the parent supplied (built from the class table), so any
unknown class is a bug, not a normal case. Skip with a warning instead
of fanning out auto_save() calls across the batch.

P2: _commit_dino_results now derives annotation "number" via
max(existing) + 1 (matching add_annotation_to_list) instead of
len(existing) + 1 (which would collide after deletes).

P2: _ensure_dino_model_downloaded preflight-checks for huggingface_hub
and shows the actionable "pip install huggingface_hub" message when
missing, instead of falling through to generic "Could not download".

P2: load_project_data now logs when a saved DINO threshold targets a
class that no longer exists (hand-edited .iap or deleted class), so
the user knows why their saved value didn't take effect.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P1: .claude/agents/senior-reviewer.md referenced docs/arc42/ which this
branch deleted. Update to docs/.

P2: Filter orphan DINO phrase entries on load. Saved phrases for
classes that no longer exist in the project are now logged and dropped,
matching the threshold-loader behaviour instead of silently round-
tripping stale state.

P2: Batch DINO detection no longer drops multi-dimensional image
slices (which live in self.image_slices, not self.image_paths) without
explanation. Print a skip notice with the reason instead.

Tooling: bake the "no PyQt5 in worker subprocess" smoke check into the
senior-reviewer agent's manual checklist (one-liner with meta_path
guard). This is the regression that almost shipped in 94de9d5 — making
it part of the quality gate so it doesn't slip again.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P1: The PyQt-isolation smoke check added in the previous commit used
Python's legacy `find_module` finder API, which 3.12+ removed. The
guard's `loaded` flag never tripped because modern imports go through
`find_spec` instead — meaning the check reported success regardless
of whether PyQt5 actually leaked, exactly the surface-compliance trap
the senior reviewer warned about.

Pull the snippet out of `.claude/agents/senior-reviewer.md` and into
`tools/check_worker_isolation.py`:

- Use `importlib.abc.MetaPathFinder.find_spec` (the modern API)
- Belt-and-braces: after each worker exec, sweep `sys.modules` for
  any `PyQt5*` entry and fail loudly if found
- Distinguish "PyQt5 leaked" (real failure, exit 1) from "missing
  third-party dep" (skip with note, exit 0) so the script is usable
  in a slim review env without torch/transformers
- Verified positive (real workers exit 0) and negative (a synthetic
  worker that does `import PyQt5` is caught and reported)

The agent doc now invokes the script by path instead of carrying a
copy-paste snippet that rots when the import system evolves.

P2: Batch DINO detection's skip-notice now distinguishes the two cases
it covered with one generic message: no entry in `image_paths` (the
multi-dim slice case) vs. file missing on disk (a different problem).
Two P2s from the fifth review pass:

- Snapshot PyQt5-related sys.modules keys before each worker exec and
  diff against them, instead of comparing to an absolute "no PyQt5 in
  sys.modules" state. The old approach false-positived when the script
  was invoked from an interpreter that already had PyQt5 loaded; the
  diff catches the same regressions without that footgun.
- Wrap the meta_path tripwire install in try/finally so main() cleans
  up after itself and is safe to call repeatedly (the tooling is
  exclusively a CLI today, but the cleanup is cheap and prevents
  surprises if anyone ever imports main() from another script).

Verified positive (real workers exit 0), idempotent (repeated main()
calls return the same exit code), and that the tripwire is removed
from sys.meta_path after main() returns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two unrelated CI failures on PR bnsreenu#66 (9/9 jobs red), neither in the
DINO scope but both blocking the PR:

1. All Python 3.10 jobs failed at `pip install` because numpy>=2.4.0
   requires Python 3.11+. The strict floor came from Python 3.14 needing
   numpy 2.4, but the project also claims to support 3.10. Relax to
   numpy>=2.0.0 — pip resolves 2.4+ on 3.14 and 2.2.x on 3.10
   (last 3.10-compatible) automatically.

2. Two integration tests pre-existing from commit 8e549db (Phase 1
   testing infrastructure) asserted the wrong output directory names:

   - test_export_yolo_creates_directories expected `train/` and
     `valid/` at the output root. The export creates
     `images/train`, `images/val`, `labels/train`, `labels/val`
     (per the docstring on export_yolo_v5plus). Note: `val` not
     `valid`. Test now matches the actual layout.

   - test_export_pascal_voc_creates_directories expected
     `JPEGImages/`. The export creates `images/`. Test now asserts
     `Annotations/` and `images/`.

These tests had been failing on every CI run since they were added;
no one had noticed because no one had checked. The export functions
themselves are correct; only the assertions were wrong.

All 65 tests now pass locally on Windows + Python 3.12.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@cofade cofade marked this pull request as ready for review May 18, 2026 21:07
@cofade

cofade commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Dear @bnsreenu , I enjoyed your recent series on the DINO+SAM labelling method a lot. As long-term user of your image annotator, which also offers advanced capabilities like batch processing, various export options and training dataset preparation steps, I think it would be great to have these new capabilities also in your established tool, therefore I made this PR to combine the best of both.
Keep up the great work, always enjoying your efforts!
Best regards, Sebastian

@bnsreenu bnsreenu self-assigned this May 19, 2026
@bnsreenu bnsreenu merged commit fdfc862 into bnsreenu:master May 19, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants