feat: LLM-assisted detection with Grounding DINO#66
Merged
Conversation
…thresholds Ports Grounding DINO + SAM segmentation from the standalone annotation_tool_v4 into the DigitalSreeni annotator as a new sidebar section. Core changes: - dino_worker.py: standalone subprocess running HF transformers DINO inference (avoids PyQt5/torch DLL conflict on Windows+Python3.14) - dino_utils.py: PyQt5-safe wrapper spawning the worker, mirroring SAMUtils - dino_phrase_editor.py: ClassThresholdTable + PhraseEditorPanel widgets - dino_merge_dialog.py: merge per-image COCO JSONs into train/val splits SAM infrastructure: - sam_worker.py: extended for batch bbox segmentation (efficiency for DINO results) - sam_utils.py: new apply_sam_predictions_batch() for multi-bbox in one call UI integration (annotator_window.py): - New "LLM-Assisted Detection" sidebar section with model selector, per-class threshold table, phrase editor, detect buttons, batch mode - DINO phrases/thresholds synced with existing class add/remove - temp_annotations overlay: DINO results shown as orange masks, Enter to accept, Escape to discard - Batch detection with progress dialog: auto-accept or review mode - DINO config persisted in .iap project files (dino_config key) - Tools menu: "Merge COCO for Training" entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add .claude/agents/senior-reviewer.md adapted for this project - Update CLAUDE.md with quality gate workflow: - Feature branch requirement (never commit to master) - Manual testing checklist (7 items) - arc42 documentation update rules - Senior reviewer agent as mandatory pre-PR quality gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up fixes to the DINO integration discovered during testing, plus repo housekeeping. DINO UX: - Drop the explicit "Load" button; selecting a model from the dropdown sets ready state immediately, with download deferred to the first Detect call (mirrors SAM's lazy-load model). - Focus the canvas after detection so Enter/Esc accept/reject without needing an extra click. DINO + SAM rendering: - Stop storing DINO's xyxy bbox alongside the SAM segmentation in temp/accepted annotations. The polygon is the actual mask; a bbox is derived from it at export. Fixes the rendering bug where the temp annotation showed a giant rectangle instead of the SAM polygon (xyxy treated as xywh by drawRect). - Route accept_dino_results through image_label.annotations (matching accept_sam_prediction's contract) so accepted masks render immediately and save_current_annotations correctly syncs them to all_annotations. Subprocess workers: - Force UTF-8 on both sides of the SAM/DINO worker pipes (encoding/errors on the parent, PYTHONIOENCODING=utf-8 in the worker env). On Windows the default cp1252 decode would crash on non-ASCII bytes in torch/transformers warnings, hiding the actual worker output. - Move SAM weights to <project_root>/models/sam/ (parallel to models/<dino-model>/), with auto-mkdir on first download. Dependencies: - Add huggingface_hub, transformers, torch, torchvision to requirements.txt and setup.py — DINO failed at runtime without these. - README: note that PyPI ships CPU-only torch on Windows and point GPU users at the PyTorch CUDA index. Repo housekeeping: - Flatten docs/arc42/* up to docs/ (removes redundant nesting); update CLAUDE.md and docs/README.md path references. - .gitignore: replace explicit sam2_*.pt list with *.pt wildcard; add models/, *.iap, IDE/OS/log/env patterns. - Untrack stale __pycache__/*.cpython-310.pyc files (the __pycache__/ rule only catches new additions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DINO config persistence (P0): - Make the widgets (PhraseEditorPanel and ClassThresholdTable) the single source of truth for phrases and thresholds. Remove the shadow self.dino_phrases / self.dino_thresholds dicts that were never kept in sync with user edits. - save_project snapshots both widgets at save time. - load_project_data pushes phrases into the panel and thresholds into the table after classes are created. - Add ClassThresholdTable.set_thresholds() and clear_classes(), and PhraseEditorPanel.clear() for these flows. Batch auto-accept data loss (P0): - _commit_dino_results now routes through image_label.annotations when committing to the currently-displayed image, so the canvas reflects the change and the next save_current_annotations() call does not overwrite it. Mirrors the single-image accept_dino_results fix from the previous commit. - Also adds the missing add_class() call for unknown class names returned by DINO (matching single-mode behaviour). Model-path resolution consolidation (P1): - New utils.models_base_dir() is the single source of truth for the models/ directory used by both SAM and DINO. Resolves to <project_root>/models in editable installs and falls back to <cwd>/models for site-packages installs. - sam_worker.py bootstraps sys.path so the standalone subprocess can import the helper. - dino_utils.py and annotator_window.py route through the same helper instead of duplicating heuristics. Other P1s: - .gitignore: un-ignore .claude/agents/ explicitly so the tracked senior-reviewer.md isn't shadowed by the .claude rule. - DINO detection paths now preflight-check that a SAM model is selected before launch (avoids the misleading "SAM segmentation failed" error). - arc42 docs: add ADR-011 (subprocess workers for torch isolation) and ADR-012 (lazy model load on dropdown selection); document DINO subsystem in the building block view; add DINO+SAM workflow to the runtime view; add DINO/NMS/Phrase/Subprocess-Worker entries to the glossary. - clear_all now clears the PhraseEditorPanel and ClassThresholdTable. P2: - Bump torch floor to >=2.2.0 (first NumPy-2-compatible release; prior floor was inconsistent with numpy>=2.4.0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P0 regression (introduced in the previous fix-up commit): - sam_worker.py imported `digitalsreeni_image_annotator.utils`, which triggered the package __init__.py that imports ImageAnnotator from annotator_window — pulling PyQt5 into the subprocess and reproducing the WinError 1114 DLL load-order bug that ADR-011 exists to prevent. - Revert to inlined `_models_base_dir()` in sam_worker.py (5 lines). Keep in sync with utils.models_base_dir(). Comment explains the invariant. - Verified: importing sam_worker.py without going through the package no longer pulls in PyQt5. P1: dead path-resolution in dino_utils.detect() removed — GDINO_MODEL_PATHS now holds absolute paths, and custom_model_path from QFileDialog is also always absolute, so the relative-path normalisation branch was unreachable. P1: drop in-loop add_class() calls in batch _commit_dino_results and single accept_dino_results. DINO can only return labels from the class_configs the parent supplied (built from the class table), so any unknown class is a bug, not a normal case. Skip with a warning instead of fanning out auto_save() calls across the batch. P2: _commit_dino_results now derives annotation "number" via max(existing) + 1 (matching add_annotation_to_list) instead of len(existing) + 1 (which would collide after deletes). P2: _ensure_dino_model_downloaded preflight-checks for huggingface_hub and shows the actionable "pip install huggingface_hub" message when missing, instead of falling through to generic "Could not download". P2: load_project_data now logs when a saved DINO threshold targets a class that no longer exists (hand-edited .iap or deleted class), so the user knows why their saved value didn't take effect. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P1: .claude/agents/senior-reviewer.md referenced docs/arc42/ which this branch deleted. Update to docs/. P2: Filter orphan DINO phrase entries on load. Saved phrases for classes that no longer exist in the project are now logged and dropped, matching the threshold-loader behaviour instead of silently round- tripping stale state. P2: Batch DINO detection no longer drops multi-dimensional image slices (which live in self.image_slices, not self.image_paths) without explanation. Print a skip notice with the reason instead. Tooling: bake the "no PyQt5 in worker subprocess" smoke check into the senior-reviewer agent's manual checklist (one-liner with meta_path guard). This is the regression that almost shipped in 94de9d5 — making it part of the quality gate so it doesn't slip again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P1: The PyQt-isolation smoke check added in the previous commit used Python's legacy `find_module` finder API, which 3.12+ removed. The guard's `loaded` flag never tripped because modern imports go through `find_spec` instead — meaning the check reported success regardless of whether PyQt5 actually leaked, exactly the surface-compliance trap the senior reviewer warned about. Pull the snippet out of `.claude/agents/senior-reviewer.md` and into `tools/check_worker_isolation.py`: - Use `importlib.abc.MetaPathFinder.find_spec` (the modern API) - Belt-and-braces: after each worker exec, sweep `sys.modules` for any `PyQt5*` entry and fail loudly if found - Distinguish "PyQt5 leaked" (real failure, exit 1) from "missing third-party dep" (skip with note, exit 0) so the script is usable in a slim review env without torch/transformers - Verified positive (real workers exit 0) and negative (a synthetic worker that does `import PyQt5` is caught and reported) The agent doc now invokes the script by path instead of carrying a copy-paste snippet that rots when the import system evolves. P2: Batch DINO detection's skip-notice now distinguishes the two cases it covered with one generic message: no entry in `image_paths` (the multi-dim slice case) vs. file missing on disk (a different problem).
Two P2s from the fifth review pass: - Snapshot PyQt5-related sys.modules keys before each worker exec and diff against them, instead of comparing to an absolute "no PyQt5 in sys.modules" state. The old approach false-positived when the script was invoked from an interpreter that already had PyQt5 loaded; the diff catches the same regressions without that footgun. - Wrap the meta_path tripwire install in try/finally so main() cleans up after itself and is safe to call repeatedly (the tooling is exclusively a CLI today, but the cleanup is cheap and prevents surprises if anyone ever imports main() from another script). Verified positive (real workers exit 0), idempotent (repeated main() calls return the same exit code), and that the tripwire is removed from sys.meta_path after main() returns. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two unrelated CI failures on PR bnsreenu#66 (9/9 jobs red), neither in the DINO scope but both blocking the PR: 1. All Python 3.10 jobs failed at `pip install` because numpy>=2.4.0 requires Python 3.11+. The strict floor came from Python 3.14 needing numpy 2.4, but the project also claims to support 3.10. Relax to numpy>=2.0.0 — pip resolves 2.4+ on 3.14 and 2.2.x on 3.10 (last 3.10-compatible) automatically. 2. Two integration tests pre-existing from commit 8e549db (Phase 1 testing infrastructure) asserted the wrong output directory names: - test_export_yolo_creates_directories expected `train/` and `valid/` at the output root. The export creates `images/train`, `images/val`, `labels/train`, `labels/val` (per the docstring on export_yolo_v5plus). Note: `val` not `valid`. Test now matches the actual layout. - test_export_pascal_voc_creates_directories expected `JPEGImages/`. The export creates `images/`. Test now asserts `Annotations/` and `images/`. These tests had been failing on every CI run since they were added; no one had noticed because no one had checked. The export functions themselves are correct; only the assertions were wrong. All 65 tests now pass locally on Windows + Python 3.12. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Author
|
Dear @bnsreenu , I enjoyed your recent series on the DINO+SAM labelling method a lot. As long-term user of your image annotator, which also offers advanced capabilities like batch processing, various export options and training dataset preparation steps, I think it would be great to have these new capabilities also in your established tool, therefore I made this PR to combine the best of both. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sam_worker.pyfrom feat: Python 3.14 compatibility with subprocess-based SAM #65) so transformers/torch is loaded in a clean process, avoiding the Windows + Python 3.14 DLL load order issue.What's new
User-facing
models/sam/andmodels/<dino-model>/directories keep downloaded weights tidy.Implementation
dino_worker.pyis the DINO equivalent ofsam_worker.py. Parent (DINOUtils.detect()) sends a JSON request, worker returns boxes; boxes then feedSAMUtils.apply_sam_predictions_batch()for masks..iapproject file underdino_config.encoding=\"utf-8\",errors=\"replace\"on the parent,PYTHONIOENCODING=utf-8in the worker env) so cp1252-locale Windows can decode torch warnings without crashing.Dependencies added (in
requirements.txt/setup.py)Docs
Repo housekeeping
Test plan
There are no automated tests for this feature — DINO benefits from a GPU and the existing test environment is manual-only per `CLAUDE.md`. Manual verification performed:
Status
Marked as draft — open to feedback on scope, naming, or anything that should be split into smaller follow-up PRs.
🤖 Generated with Claude Code