Skip to content

feat: cascade video duplicate detection #237

Merged
Djaler merged 12 commits into
masterfrom
feature/video-cascade-dedup
Jun 24, 2026
Merged

feat: cascade video duplicate detection #237
Djaler merged 12 commits into
masterfrom
feature/video-cascade-dedup

Conversation

@Djaler

@Djaler Djaler commented Jun 24, 2026

Copy link
Copy Markdown
Owner

No description provided.

Djaler added 12 commits June 24, 2026 15:43
Two-tier scheme: cheap metadata gate (thumbnail dHash + duration), then
ffmpeg keyframe comparison only on collisions. Scope: video only.
Add FFMPEG_TIMEOUT_MS = 30_000L and replace unbounded waitFor() with
waitFor(timeout, TimeUnit.MILLISECONDS) + destroyForcibly() to prevent
IO thread hangs. Also add a comment in findVideoCandidates clarifying
that the hash-only fallback returns at most one candidate (LIMIT 1).
…atch edges

Add integration tests for the video-specific repository query (distance gate
and duration window) and the frame_hashes BIGINT[] Hibernate round-trip.
Add unit tests for the non-majority framesMatch edge case and hammingDistance
boundary values (0→0 and 0→0xFF).
Reduce DuplicateMediaChecker to thin DB ops (findImageCandidate, record,
shiftToCurrent) and orchestrate both the image and video paths uniformly
in SeenMemeHandler, splitting handleMessage into handleImage/handleVideo.

Rename MediaHash.duration to durationSeconds to make the unit explicit,
extract all keyframes in a single ffmpeg invocation, and trim
feature-specific and business-logic detail from CLAUDE.md and the
repository KDoc.
@Djaler Djaler merged commit 57385e2 into master Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant