fix(scanner): repair and re-align the JS fallback scanner#603
Open
IrosTheBeggar wants to merge 3 commits into
Open
fix(scanner): repair and re-align the JS fallback scanner#603IrosTheBeggar wants to merge 3 commits into
IrosTheBeggar wants to merge 3 commits into
Conversation
92e426c to
6fb7487
Compare
The JS fallback scanner (src/db/scanner.mjs) had drifted from the Rust
scanner in ways the parity suite never catches — it only ever runs the
Rust binary. Fixes, highest-severity first:
1. Config rejection (fallback was dead on arrival). task-queue.js sends
`waveformCacheDir` in every jsonLoad and no longer sends
`generateWaveforms`; the Joi schema listed only the latter and
rejected unknown keys, so the fallback exited with "Invalid JSON
Input" before doing any work. Accept `waveformCacheDir` and tolerate
unknown keys (.unknown(true)) so future Rust-only fields can't break
the fallback again.
2. Data loss on a vanished mount. With no accessibility guard, a
transient CIFS/NFS outage made the walk find nothing and the
stale-cleanup DELETE wipe every track for the library (cascading to
albums/artists/user_album_stars). Add the upfront + post-walk guards
the Rust scanner already has.
3. Album-art hash corruption + parity. Embedded and directory art were
hashed via buffer.toString('utf-8') — a lossy UTF-8 round-trip, so
the digest was neither the real MD5 nor what Rust produces; distinct
covers could collide onto one filename. Hash the raw bytes, and map
the embedded MIME type to an extension the way Rust does
(image/jpeg -> .jpeg, not mime-types' .jpg).
4. Sidecar stat storm. The fast-path issued ~22 statSync calls per file
on every scan. Add a per-directory listing cache (sidecarMtimeCached)
mirroring the Rust scanner's DirListing — one readdir per directory.
5. Double tree walk + symlink-cycle hang. Replace the separate
count-then-scan passes with a single collectFiles walk, and add
realpath-based cycle detection when followSymlinks is on (walkdir
gives the Rust scanner this for free).
Adds test/scanner-js-guard.test.mjs — the first test to drive
scanner.mjs directly (happy-path walk, fast-path rescan, both guards,
real task-queue-shaped config). Stale comment line-refs fixed in
scanner.mjs and rust-parser/src/main.rs (comment-only; no binary change).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring the JS fallback scanner in line with the Rust scanner's #612 lock-contention + write-reduction work. (The synchronous=NORMAL / cache PRAGMAs and the NOT EXISTS orphan cleanup already arrived via the rebase onto master; these are the two that needed a JS-side port.) - Keep the write lock free during extraction. The scan loop wrapped the whole walk — including parseMyFile (read + tag parse + hash + album-art I/O) — in one batched transaction, so a concurrent API write could block for the length of a decode and blow past busy_timeout. Now the cheap unchanged-file scan_id bumps batch under one transaction, but processFile flushes that batch (releasing the lock) BEFORE parsing each changed file, parses with no transaction open, then writes that one track in its own tight BEGIN...COMMIT. A mid-write failure rolls back just that track (the JS analogue of the Rust writer's per-song savepoint) instead of leaving a partial row to be committed on the next batch flush. Mirrors the Rust serial-path / greedy-drain restructure. - Guard the per-album updateAlbumTags UPDATE so it's a no-op (0 rows, no row rewrite) when album_artist/compilation are unchanged — otherwise every track of a shared album rewrites the album row identically. Same guard ported from the Rust find_or_create_album. DB output unchanged: full suite green (1145 tests, 0 fail, 0 skipped) incl. scanner-js-guard end-to-end. A direct JS-scanner smoke run (initial / no-op rescan / force-rescan) produces row counts identical to the Rust scanner and leaves album rows byte-identical across a no-op rescan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6fb7487 to
0bcc566
Compare
pull Bot
pushed a commit
to Spencerx/mStream
that referenced
this pull request
Jun 10, 2026
task-queue.js sends one jsonLoad to both scanners, and it includes waveformCacheDir (it stopped sending the older generateWaveforms boolean). The JS fallback's Joi schema never listed the field, and Joi rejects unknown keys by default — so every real launch of the fallback died instantly with "Invalid JSON Input" / exit 1. Any install without a working Rust binary has had a scanner that cannot run at all. CI never caught it because the parity tests build their own jsonLoad by hand (without waveformCacheDir) instead of going through task-queue.js. Accept the field, and add .unknown(true) so the next Rust-only field added to task-queue.js's jsonLoad can't re-break the fallback the same way — the fields the JS scanner actually reads are still validated. Matches the identical hunk in PR IrosTheBeggar#603 so the eventual rebase auto-merges. Verified: forking scanner.mjs with a task-queue-shaped payload (waveformCacheDir + an unknown extra field) against a fresh migrated DB exits 1 with '"waveformCacheDir" is not allowed' before; exit 0 with a clean scanComplete event after. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Auditing the scanner surfaced a cluster of bugs in the JS fallback scanner (
src/db/scanner.mjs). It had quietly diverged from the Rust scanner because the parity suite only ever exercises the Rust binary. Fixes, highest-severity first:Config rejection — the fallback was dead on arrival.
task-queue.jssendswaveformCacheDirin everyjsonLoadand no longer sendsgenerateWaveforms, but the JS Joi schema listed only the latter and rejected unknown keys — so the fallback exited withInvalid JSON Inputbefore doing any work. Now acceptswaveformCacheDirand tolerates unknown keys (.unknown(true)) so future Rust-only config fields can't silently break it again.Data loss on a vanished mount. With no accessibility guard, a transient CIFS/NFS outage made the walk find nothing and then
DELETE FROM tracks WHERE scan_id != ?wipe every track for the library (cascading to albums / artists /user_album_stars). Added the upfront + post-walk guards the Rust scanner already has inrun_scan.Album-art hash corruption + JS/Rust parity. Embedded and directory art were hashed via
buffer.toString('utf-8')— a lossy UTF-8 round-trip, so the digest was neither the real MD5 of the image nor what Rust produces (it hashes raw bytes). Distinct covers could collide onto one filename. Now hashes the raw bytes and maps the embedded MIME type to an extension the way Rust does (image/jpeg→.jpeg, not mime-types'.jpg).Sidecar stat storm. The fast-path issued ~22
statSynccalls per file on every scan (even unchanged files and sidecar-free libraries). Added a per-directory listing cache (sidecarMtimeCached) mirroring the Rust scanner'sDirListing— onereaddirper directory.Double tree walk + symlink-cycle hang. Replaced the separate count-then-scan passes with a single
collectFileswalk, and added realpath-based cycle detection whenfollowSymlinksis enabled (walkdir gives the Rust scanner this for free).Also fixes a stale comment line-reference in
rust-parser/src/main.rs(comment-only).Rebased onto master + ported the #612 Rust scanner optimizations
This branch is rebased onto current master, which now includes #612's scanner write-path overhaul. Most of #612 reaches the JS fallback automatically through the rebase —
synchronous=NORMAL/cache_size/temp_storePRAGMAs and theNOT IN→NOT EXISTSorphan cleanup. Two wins needed a JS-side port, added in a separate commit (perf(scanner): port Rust write-path optimizations to the JS fallback):Keep the write lock free during extraction. The scan loop wrapped the entire walk — including
parseMyFile(read + tag parse + hash + album-art I/O) — in one batched transaction, so a concurrent API write (playlist save, scrobble) could block for the length of a decode and blow pastbusy_timeout. Now the cheap unchanged-filescan_idbumps batch under one transaction, butprocessFileflushes that batch — releasing the write lock — before parsing each changed file, parses with no transaction open, then writes that one track in its own tightBEGIN…COMMIT. A mid-write failure rolls back just that track (the JS analogue of the Rust writer's per-song savepoint) instead of leaving a partial row for the next batch flush. Mirrors the Rust serial-path / greedy-drain restructure.Guard the per-album
updateAlbumTagsUPDATE so it's a no-op (0 rows, no row rewrite) whenalbum_artist/compilationare unchanged — otherwise every track of a shared album rewrites the album row identically. Same guard ported from the Rustfind_or_create_album.Tests
New
test/scanner-js-guard.test.mjs— the first test to drivescanner.mjsdirectly:waveformCacheDir), so they regression-guard fix Added Dockerfile and modified the README #1After the rebase + port:
scanner-js-guard,scanner-parity,audio-hash-parity,lyrics-parity,waveform,task-queue).scan_progresscleaned up.Notes / out of scope
OnceLockdedup), which this branch is now rebased on top of.🤖 Generated with Claude Code