feat(scoring): gate Rule 1's near-Nyquist 320 kbps signature on wall hardness#2
Merged
Merged
Conversation
…hardness A 320 kbps MP3 low-passes at ~20.5 kHz, where genuinely band-limited lossless (baroque, harpsichord, older mastering, world-music reissues) also rolls off. Rule 1 flagged both as a "320 kbps spectral" transcode from cutoff position alone — on a full-library audit that was ~65% of all FAKE_CERTAIN verdicts, most of them authentic. Rule 1 now measures the residual spectral floor above the wall: a real 320k brickwall drops to digital silence (<= -55 dB vs the in-band reference), an authentic rolloff keeps a higher analog/dither floor (> -55 dB). Above the threshold the signature is dropped; at/below it the file stays FAKE_CERTAIN. A single threshold rather than a 3-band scheme on purpose: an intermediate band that withholds the +50 lets the score fall below the fast FAKE_CERTAIN short-circuit, after which protective rules (e.g. R7 clean-silence -50) can wrongly clear a genuine transcode — observed on a confirmed real @320 file. Calibrated on 50 synthetic FLAC->320k pairs plus a band-limited surrogate (residual ROC AUC 0.95) and verified end-to-end. Residual is computed only for near-Nyquist cutoffs (90-95% Nyquist) to keep the hot path fast; unknown/short inputs fall back to the previous behaviour, so only that zone changes.
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A 320 kbps MP3 low-passes at ~20.5 kHz — exactly where genuinely band-limited lossless (baroque, harpsichord, older mastering, world-music reissues) also rolls off. Rule 1 used to flag both as a "320 kbps spectral" transcode from the cutoff position alone. On a full-library audit that single path accounted for ~65% of all FAKE_CERTAIN verdicts, the large majority of them authentic.
Rule 1 now measures the residual spectral floor above the wall, which cutoff position cannot provide: a real 320k brickwall drops to digital silence, while an authentic rolloff keeps an analog/dither floor.
residual > -55 dB→ authentic band-limited → signature dropped (file reads AUTHENTIC)residual <= -55 dB→ digital-silence floor → stays FAKE_CERTAINWhy a single threshold, not a 3-band WARNING
An intermediate band that withholds the
+50lets the score fall below the fast FAKE_CERTAIN short-circuit, after which protective rules (e.g. R7 clean-silence-50) can wrongly clear a genuine transcode — observed end-to-end on a confirmed real@320file. Keeping real transcodes at+50preserves that short-circuit.Validation
wall205slope was tried and dropped (AUC 0.71, fails on 256k / band-limited).@320transcode stays FAKE_CERTAIN; audiophile baroque/classical (Jordi Savall, New World Records) that were FAKE_CERTAIN now read AUTHENTIC.Caveats
Synthetic transcodes use LAME (ffmpeg) only; the band-limited surrogate is a gentle low-pass approximation. The threshold favours the project's "protect authentic first" rule — transcodes of already-band-limited material (shallow wall, floor > -55 dB) remain near-undetectable, as with any spectral tool.
This changes verdicts (documented in CHANGELOG under Unreleased).