Semantic duplicate-bug detection built on two Qwen3 0.6B models loaded via Transformers.js v4. The models and the corpus load into the browser tab and run on WebGPU, so queries are processed entirely on the user's device:
- Recall —
onnx-community/Qwen3-Embedding-0.6B-ONNX(feature-extraction,last_tokenpooling, normalized). Cosine similarity → top candidates. - Precision —
onnx-community/Qwen3-Reranker-0.6B-ONNX(CausalLM). For each candidate it builds a yes/no judge prompt, runs one forward pass, and scoresP(yes) = softmax([yes_logit, no_logit])[yes]. A threshold on that score decides duplicate vs new.
Everything runs in a Web Worker so the UI stays responsive during load and inference.
- Static deployment. Ships as static files, so hosting it is as simple as serving a folder.
- Local processing. The corpus and each query are handled in the browser tab, which suits cases where reports should not leave the device.
- Transparent pipeline. Recall and rerank are shown side by side, so each stage's contribution to the result is visible.
npm install
npm run dev # http://localhost:5173npm run build # production build → dist/
npm run preview # serve the production build locallyInstall note: the repo ships an
.npmrcwithignore-scripts=true.@huggingface/transformerspulls inonnxruntime-node, whose postinstall downloads a native binary that a browser build doesn't need — skipping scripts avoids that. The browser runtime isonnxruntime-web, which is bundled at build time regardless.
Everything tunable lives in plain constants:
| What | Where | Default |
|---|---|---|
| Duplicate threshold | src/App.jsx (useState(0.5)) — also adjustable live via the UI slider |
0.5 |
| Candidates sent to the reranker | src/constants.js → CANDIDATE_COUNT |
8 |
| Embedding dtype | src/hooks/useDuplicateScanner.js → embedDtype |
fp16 (q4f16 on mobile) |
| Reranker dtype | src/hooks/useDuplicateScanner.js → rerankDtype |
q4 |
| Model IDs | src/lib/worker.js → EMBED_ID / RERANK_ID |
Qwen3 0.6B ONNX (embedding + reranker) |
| Embed batch size / rerank token cap | src/lib/worker.js → EMBED_BATCH, RERANK_MAX_TOKENS |
8 / 2048 |
| Retrieval/judge prompts | src/lib/worker.js → EMBED_TASK, RERANK_INSTRUCTION |
bug-report specific |
Using your own corpus: replace the BUGS array in src/data/bugs.js. Each entry
needs { id, title, text } — that's the only contract. SAMPLE_QUERIES (same file)
feeds the sample chips in the UI.
- Push to a Git repo, import in Vercel — framework auto-detects as Vite.
- If the build fails on the
onnxruntime-nodepostinstall, set Settings → General → Install Command tonpm install --ignore-scripts. - Deploy. Output is a static site; model weights stream from the Hugging Face CDN on first load and are cached by the browser afterwards.
The build output is a plain static bundle that any static host can serve as-is.
Weights download once (then cached). Defaults (set in src/hooks/useDuplicateScanner.js):
- embedding:
fp16(q4f16in mobile lite mode) - reranker:
q4— measured ~33× faster thanq8on WebGPU (optimizedMatMulNBitsshaders) with near-identical scores; fp16/fp32 don't exist for this model (they need external data files unsupported by ORT Web)
Combined first load is on the order of ~1 GB. For a screen-recorded walkthrough this is a one-time cost on your machine and a non-issue. For a public link, every new visitor downloads it — fine for a portfolio piece, but say so on the page or expect bounces.
WebGPU is required (Chrome/Edge desktop, Android Chrome with WebGPU, recent Safari Tech Preview). Browsers without WebGPU see an unsupported notice. Mobile devices get a lite mode (embedding-only recall) to stay within memory limits.
The corpus is a frozen snapshot of 180 real facebook/react issues containing
genuine duplicate clusters filed by different users.
- Load models — progress bar fills, "ready · 180 reports" appears.
- Sample 1 (screen reader doesn't announce emoji-picker buttons) → scan. Recall
surfaces the accessibility neighbourhood; the reranker lifts the true duplicate pair
(
#36421/#36422) to the top with CANDIDATE badges (P(yes) ≈ 1.0). - Sample 2 (SSR "text content does not match" — never says "hydration") → scan.
The embedding bridges the wording gap to the hydration-mismatch cluster; the
reranker confirms
#36241. - Sample 4 (near-miss: date-input locale bug, no true duplicate exists) → scan. Recall still returns 8 candidates, but every reranker score stays near 0.0 → "NO STRONG DUPLICATE". That contrast is the story: embeddings find the neighbourhood, the reranker makes the call.
Sample 3 (eslint-plugin-react-hooks 7.0.1 resolution failure) finds its true duplicate
(#35045) at #1, and also illustrates the model's known limit: same-package cousins
score above threshold too — which is why flagged rows are labeled CANDIDATE rather
than DUP.
index.html page shell, loads fonts + src/main.jsx
vite.config.js Vite + React, excludes transformers from pre-bundling
.npmrc ignore-scripts=true (skips onnxruntime-node postinstall)
src/main.jsx React entry point
src/App.jsx thin shell — wires the hook to the views
src/constants.js PHASE states + candidate count
src/hooks/useDuplicateScanner.js worker lifecycle, load/scan actions, all app state
src/components/Header.jsx brand + status pill
src/components/BootScreen.jsx WebGPU check + load button
src/components/LoadingView.jsx download progress
src/components/QueryPanel.jsx bug-report input + sample chips
src/components/Pipeline.jsx three-stage progress strip
src/components/Results.jsx verdict banner + recall/precision columns
src/lib/worker.js loads both models, runs embed → cosine → rerank off-thread
src/lib/environment.js WebGPU + mobile detection
src/data/bugs.js frozen snapshot of 180 real facebook/react issues
src/styles.css Spark design system (dark) — violet accent, Plus Jakarta Sans
The two-stage pipeline has been run end-to-end against both real models (embedding
fp16 + reranker q4, WebGPU) on the 180-issue corpus:
- Samples 1–3 each surface their true duplicate at #1 of PRECISION with P(yes) 0.999–1.000; whole scan completes in ~2 s (embedding ~80 ms + rerank ~1.7 s for 8 candidates).
- Sample 4 (near-miss, no true duplicate) stays "NO STRONG DUPLICATE" — every reranker score ≤ 0.016.
- Known limitation: when all candidates share a package/component (sample 3), the 0.6B reranker scores related-but-different defects above threshold as well. The top-1 match has been correct in every test run; per-row flags are candidates, not verdicts.