feat: LLM-assisted detection with Grounding DINO by cofade · Pull Request #66 · bnsreenu/digitalsreeni-image-annotator

cofade · 2026-05-18T19:53:04Z

Summary

Adds an LLM-assisted detection workflow: pick a class, provide free-form text phrases, and Grounding DINO produces bounding boxes that SAM 2 refines into segmentation masks.
Supports per-class phrase lists and per-class thresholds (box / text / NMS); single-image and batch detection with an optional "review before accepting" mode.
DINO inference runs in a subprocess (same pattern as sam_worker.py from feat: Python 3.14 compatibility with subprocess-based SAM #65) so transformers/torch is loaded in a clean process, avoiding the Windows + Python 3.14 DLL load order issue.

What's new

User-facing

DINO panel under the SAM tools: model dropdown, class threshold table, phrase editor.
"Detect Current Image" and "Detect All Images" actions.
Lazy model loading: picking a model from the dropdown sets ready state immediately; download happens on first Detect (mirrors SAM behaviour).
Detection results render as orange preview masks; Enter accepts, Esc rejects.
models/sam/ and models/<dino-model>/ directories keep downloaded weights tidy.

Implementation

dino_worker.py is the DINO equivalent of sam_worker.py. Parent (DINOUtils.detect()) sends a JSON request, worker returns boxes; boxes then feed SAMUtils.apply_sam_predictions_batch() for masks.
DINO config (phrases + thresholds) persists in the .iap project file under dino_config.
UTF-8 forced on both sides of the SAM/DINO worker pipes (encoding=\"utf-8\", errors=\"replace\" on the parent, PYTHONIOENCODING=utf-8 in the worker env) so cp1252-locale Windows can decode torch warnings without crashing.

Dependencies added (in requirements.txt / setup.py)

`transformers >= 4.40` (Grounding DINO support)
`huggingface_hub >= 0.20` (model download)
`torch >= 2.0`, `torchvision >= 0.15` (already transitive via ultralytics; now pinned explicitly because DINO uses them directly)

Docs

Flatten `docs/arc42/` → `docs/` (extra nesting was redundant).
README: new GPU acceleration note (PyPI's default torch wheel is CPU-only on Windows; point GPU users at the PyTorch CUDA index).

Repo housekeeping

`.gitignore`: replace explicit `sam2_.pt` list with `.pt` wildcard; add `models/`, `*.iap`, IDE/OS/log/env patterns.
Untrack stale `pycache/*.cpython-310.pyc` files that pre-dated the `pycache/` ignore rule.

Test plan

There are no automated tests for this feature — DINO benefits from a GPU and the existing test environment is manual-only per `CLAUDE.md`. Manual verification performed:

Status

Marked as draft — open to feedback on scope, naming, or anything that should be split into smaller follow-up PRs.

🤖 Generated with Claude Code

…thresholds Ports Grounding DINO + SAM segmentation from the standalone annotation_tool_v4 into the DigitalSreeni annotator as a new sidebar section. Core changes: - dino_worker.py: standalone subprocess running HF transformers DINO inference (avoids PyQt5/torch DLL conflict on Windows+Python3.14) - dino_utils.py: PyQt5-safe wrapper spawning the worker, mirroring SAMUtils - dino_phrase_editor.py: ClassThresholdTable + PhraseEditorPanel widgets - dino_merge_dialog.py: merge per-image COCO JSONs into train/val splits SAM infrastructure: - sam_worker.py: extended for batch bbox segmentation (efficiency for DINO results) - sam_utils.py: new apply_sam_predictions_batch() for multi-bbox in one call UI integration (annotator_window.py): - New "LLM-Assisted Detection" sidebar section with model selector, per-class threshold table, phrase editor, detect buttons, batch mode - DINO phrases/thresholds synced with existing class add/remove - temp_annotations overlay: DINO results shown as orange masks, Enter to accept, Escape to discard - Batch detection with progress dialog: auto-accept or review mode - DINO config persisted in .iap project files (dino_config key) - Tools menu: "Merge COCO for Training" entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add .claude/agents/senior-reviewer.md adapted for this project - Update CLAUDE.md with quality gate workflow: - Feature branch requirement (never commit to master) - Manual testing checklist (7 items) - arc42 documentation update rules - Senior reviewer agent as mandatory pre-PR quality gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Follow-up fixes to the DINO integration discovered during testing, plus repo housekeeping. DINO UX: - Drop the explicit "Load" button; selecting a model from the dropdown sets ready state immediately, with download deferred to the first Detect call (mirrors SAM's lazy-load model). - Focus the canvas after detection so Enter/Esc accept/reject without needing an extra click. DINO + SAM rendering: - Stop storing DINO's xyxy bbox alongside the SAM segmentation in temp/accepted annotations. The polygon is the actual mask; a bbox is derived from it at export. Fixes the rendering bug where the temp annotation showed a giant rectangle instead of the SAM polygon (xyxy treated as xywh by drawRect). - Route accept_dino_results through image_label.annotations (matching accept_sam_prediction's contract) so accepted masks render immediately and save_current_annotations correctly syncs them to all_annotations. Subprocess workers: - Force UTF-8 on both sides of the SAM/DINO worker pipes (encoding/errors on the parent, PYTHONIOENCODING=utf-8 in the worker env). On Windows the default cp1252 decode would crash on non-ASCII bytes in torch/transformers warnings, hiding the actual worker output. - Move SAM weights to <project_root>/models/sam/ (parallel to models/<dino-model>/), with auto-mkdir on first download. Dependencies: - Add huggingface_hub, transformers, torch, torchvision to requirements.txt and setup.py — DINO failed at runtime without these. - README: note that PyPI ships CPU-only torch on Windows and point GPU users at the PyTorch CUDA index. Repo housekeeping: - Flatten docs/arc42/* up to docs/ (removes redundant nesting); update CLAUDE.md and docs/README.md path references. - .gitignore: replace explicit sam2_*.pt list with *.pt wildcard; add models/, *.iap, IDE/OS/log/env patterns. - Untrack stale __pycache__/*.cpython-310.pyc files (the __pycache__/ rule only catches new additions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DINO config persistence (P0): - Make the widgets (PhraseEditorPanel and ClassThresholdTable) the single source of truth for phrases and thresholds. Remove the shadow self.dino_phrases / self.dino_thresholds dicts that were never kept in sync with user edits. - save_project snapshots both widgets at save time. - load_project_data pushes phrases into the panel and thresholds into the table after classes are created. - Add ClassThresholdTable.set_thresholds() and clear_classes(), and PhraseEditorPanel.clear() for these flows. Batch auto-accept data loss (P0): - _commit_dino_results now routes through image_label.annotations when committing to the currently-displayed image, so the canvas reflects the change and the next save_current_annotations() call does not overwrite it. Mirrors the single-image accept_dino_results fix from the previous commit. - Also adds the missing add_class() call for unknown class names returned by DINO (matching single-mode behaviour). Model-path resolution consolidation (P1): - New utils.models_base_dir() is the single source of truth for the models/ directory used by both SAM and DINO. Resolves to <project_root>/models in editable installs and falls back to <cwd>/models for site-packages installs. - sam_worker.py bootstraps sys.path so the standalone subprocess can import the helper. - dino_utils.py and annotator_window.py route through the same helper instead of duplicating heuristics. Other P1s: - .gitignore: un-ignore .claude/agents/ explicitly so the tracked senior-reviewer.md isn't shadowed by the .claude rule. - DINO detection paths now preflight-check that a SAM model is selected before launch (avoids the misleading "SAM segmentation failed" error). - arc42 docs: add ADR-011 (subprocess workers for torch isolation) and ADR-012 (lazy model load on dropdown selection); document DINO subsystem in the building block view; add DINO+SAM workflow to the runtime view; add DINO/NMS/Phrase/Subprocess-Worker entries to the glossary. - clear_all now clears the PhraseEditorPanel and ClassThresholdTable. P2: - Bump torch floor to >=2.2.0 (first NumPy-2-compatible release; prior floor was inconsistent with numpy>=2.4.0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

P0 regression (introduced in the previous fix-up commit): - sam_worker.py imported `digitalsreeni_image_annotator.utils`, which triggered the package __init__.py that imports ImageAnnotator from annotator_window — pulling PyQt5 into the subprocess and reproducing the WinError 1114 DLL load-order bug that ADR-011 exists to prevent. - Revert to inlined `_models_base_dir()` in sam_worker.py (5 lines). Keep in sync with utils.models_base_dir(). Comment explains the invariant. - Verified: importing sam_worker.py without going through the package no longer pulls in PyQt5. P1: dead path-resolution in dino_utils.detect() removed — GDINO_MODEL_PATHS now holds absolute paths, and custom_model_path from QFileDialog is also always absolute, so the relative-path normalisation branch was unreachable. P1: drop in-loop add_class() calls in batch _commit_dino_results and single accept_dino_results. DINO can only return labels from the class_configs the parent supplied (built from the class table), so any unknown class is a bug, not a normal case. Skip with a warning instead of fanning out auto_save() calls across the batch. P2: _commit_dino_results now derives annotation "number" via max(existing) + 1 (matching add_annotation_to_list) instead of len(existing) + 1 (which would collide after deletes). P2: _ensure_dino_model_downloaded preflight-checks for huggingface_hub and shows the actionable "pip install huggingface_hub" message when missing, instead of falling through to generic "Could not download". P2: load_project_data now logs when a saved DINO threshold targets a class that no longer exists (hand-edited .iap or deleted class), so the user knows why their saved value didn't take effect. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

P1: .claude/agents/senior-reviewer.md referenced docs/arc42/ which this branch deleted. Update to docs/. P2: Filter orphan DINO phrase entries on load. Saved phrases for classes that no longer exist in the project are now logged and dropped, matching the threshold-loader behaviour instead of silently round- tripping stale state. P2: Batch DINO detection no longer drops multi-dimensional image slices (which live in self.image_slices, not self.image_paths) without explanation. Print a skip notice with the reason instead. Tooling: bake the "no PyQt5 in worker subprocess" smoke check into the senior-reviewer agent's manual checklist (one-liner with meta_path guard). This is the regression that almost shipped in 94de9d5 — making it part of the quality gate so it doesn't slip again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

P1: The PyQt-isolation smoke check added in the previous commit used Python's legacy `find_module` finder API, which 3.12+ removed. The guard's `loaded` flag never tripped because modern imports go through `find_spec` instead — meaning the check reported success regardless of whether PyQt5 actually leaked, exactly the surface-compliance trap the senior reviewer warned about. Pull the snippet out of `.claude/agents/senior-reviewer.md` and into `tools/check_worker_isolation.py`: - Use `importlib.abc.MetaPathFinder.find_spec` (the modern API) - Belt-and-braces: after each worker exec, sweep `sys.modules` for any `PyQt5*` entry and fail loudly if found - Distinguish "PyQt5 leaked" (real failure, exit 1) from "missing third-party dep" (skip with note, exit 0) so the script is usable in a slim review env without torch/transformers - Verified positive (real workers exit 0) and negative (a synthetic worker that does `import PyQt5` is caught and reported) The agent doc now invokes the script by path instead of carrying a copy-paste snippet that rots when the import system evolves. P2: Batch DINO detection's skip-notice now distinguishes the two cases it covered with one generic message: no entry in `image_paths` (the multi-dim slice case) vs. file missing on disk (a different problem).

Two P2s from the fifth review pass: - Snapshot PyQt5-related sys.modules keys before each worker exec and diff against them, instead of comparing to an absolute "no PyQt5 in sys.modules" state. The old approach false-positived when the script was invoked from an interpreter that already had PyQt5 loaded; the diff catches the same regressions without that footgun. - Wrap the meta_path tripwire install in try/finally so main() cleans up after itself and is safe to call repeatedly (the tooling is exclusively a CLI today, but the cleanup is cheap and prevents surprises if anyone ever imports main() from another script). Verified positive (real workers exit 0), idempotent (repeated main() calls return the same exit code), and that the tripwire is removed from sys.meta_path after main() returns. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two unrelated CI failures on PR bnsreenu#66 (9/9 jobs red), neither in the DINO scope but both blocking the PR: 1. All Python 3.10 jobs failed at `pip install` because numpy>=2.4.0 requires Python 3.11+. The strict floor came from Python 3.14 needing numpy 2.4, but the project also claims to support 3.10. Relax to numpy>=2.0.0 — pip resolves 2.4+ on 3.14 and 2.2.x on 3.10 (last 3.10-compatible) automatically. 2. Two integration tests pre-existing from commit 8e549db (Phase 1 testing infrastructure) asserted the wrong output directory names: - test_export_yolo_creates_directories expected `train/` and `valid/` at the output root. The export creates `images/train`, `images/val`, `labels/train`, `labels/val` (per the docstring on export_yolo_v5plus). Note: `val` not `valid`. Test now matches the actual layout. - test_export_pascal_voc_creates_directories expected `JPEGImages/`. The export creates `images/`. Test now asserts `Annotations/` and `images/`. These tests had been failing on every CI run since they were added; no one had noticed because no one had checked. The export functions themselves are correct; only the assertions were wrong. All 65 tests now pass locally on Windows + Python 3.12. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

cofade · 2026-05-18T21:10:42Z

Dear @bnsreenu , I enjoyed your recent series on the DINO+SAM labelling method a lot. As long-term user of your image annotator, which also offers advanced capabilities like batch processing, various export options and training dataset preparation steps, I think it would be great to have these new capabilities also in your established tool, therefore I made this PR to combine the best of both.
Keep up the great work, always enjoying your efforts!
Best regards, Sebastian

Sebastian Wienhold and others added 9 commits May 13, 2026 17:41

cofade marked this pull request as ready for review May 18, 2026 21:07

bnsreenu self-assigned this May 19, 2026

bnsreenu merged commit fdfc862 into bnsreenu:master May 19, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM-assisted detection with Grounding DINO#66

feat: LLM-assisted detection with Grounding DINO#66
bnsreenu merged 9 commits into
bnsreenu:masterfrom
cofade:feature/llm-dino-integration

cofade commented May 18, 2026

Uh oh!

cofade commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cofade commented May 18, 2026

Summary

What's new

Test plan

Status

Uh oh!

cofade commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants