Beatport/Beatsource match error and API decoding, Fix ID3 tag duplication, date mapping, Updated dependencies.#526
Beatport/Beatsource match error and API decoding, Fix ID3 tag duplication, date mapping, Updated dependencies.#526rosgr100 wants to merge 30 commits into
Conversation
Beatport no longer exposes __NEXT_DATA__ on search pages, causing search failures. Updated Beatport search to use the v4 catalog API endpoint instead of scraping the website. Adjusted deserialization to support the new API response format (tracks array).
Added a note about a temporary Beatport fix for May 2026.
Updated user agent string and fixed a typo in the search query parameter. Changed the token fetching method to use OAuth API and updated the token expiration logic.
Updated GitHub Actions to use the latest versions of actions for checkout, cache, setup-node, and pnpm.
… and update platform statics
Summary of Dependency Modernization (Non-API Changes)
Audio Engine Upgrades (rodio 0.22): Migrated the entire playback module (onetagger-player) from i16 integers to f32 floats. Updated all local decoders (aiff, alac, flac, mp3, mp4, ogg, wav) to yield floating-point data streams, and wrapped sample rates/channels in strict NonZeroU32/NonZeroU16 structural safety types.
Shazam Fingerprinting Fixes (onetagger-autotag): Adapted shazam.rs to map floating-point streams from UniformSourceIterator. Implemented a down-sampling translation block to scale and convert the raw float sample buffer back to standard i16 PCM vectors to satisfy the underlying SongRec signature generator requirements.
Window & UI Layer Adjustments (wry 0.55): Refactored the window lifecycle application loop in main.rs. Split the legacy webview navigation handler into separate modern asynchronous functions, updated the closures to support dual-argument signatures (handling url and NewWindowFeatures), and migrated target window states to use the modern NewWindowResponse::Allow and Deny enums.
Crate Configuration Updates (Cargo.toml): Swapped deprecated reqwest flags from "rustls-tls" to "rustls". Restored explicit "query" and "form" dependency compilation flags inside the platforms module to handle isolated network request actions. Replaced the legacy audio "cpal" identifier with the modernized "playback" flag to maintain access to local host speakers.
Wrapped the cover art file-writing loop in an explicit `config.album_art_file` check to prevent loose 'cover.jpg' files from being saved when the feature is disabled in settings.
id3.rs: Resolved the duplicate TXXX frame bug (e.g., UNIQUEFILEID, WWWAUDIOFILE) by explicitly clearing existing extended text frames before writing new ones. lib.rs (tag) & lib.rs (autotagger) : Fixed ID3 date mapping to ensure the standard YEAR frame strictly outputs a 4-digit year (YYYY). lib.rs (tag) & lib.rs (autotagger): Added logic to automatically inject the full YYYY-MM-DD date strings into custom RELEASETIME and PUBLISHTIME tags.
beatsource.rs: Added clear_search_query helper to strip parentheses from search strings before hitting the v4/catalog/search endpoint. This mirrors the recent Beatport fix and prevents 400/403 API crashes when scraping tracks with bracketed metadata (e.g., mix names or featured artists) in the title
Removed installation of nodejs and pnpm from dependencies. Added separate steps for installing NodeJS and pnpm.
|
Just tested this PR's CLI build against a ~3,800-track DJ library on Linux (Docker, In context: ran 5 sequential passes over the library with Just chiming in with real-world results in case it helps nudge the review. Thanks @rosgr100 for the fix! |
… GitHub Actions workflow for build process
Updated macOS build job to create a universal binary for both Intel and Apple Silicon architectures. Adjusted steps for caching, installation, and artifact uploads.
|
+1 from me, Thank you @rosgr100 |
**The Problem:** Perfectly matched tracks were receiving ~56% accuracy scores because OneTagger's standard fuzzy matching compared long local titles like `Title (Extended Mix)` against Beatport's shorter base `Title` field. **The Solution:** * **Smart Fallback:** Added a secondary matching pass that only triggers if the initial score is < 80%. * **Regex Extraction:** Safely splits the local title and mix name for independent grading. * **Weighted Scoring:** Calculates a new accuracy score heavily weighted toward the base title (70%) but rewarding accurate mix names (30%). * **False Positive Prevention:** Added a strict boolean check to immediately reject API tracks if their mix name directly contradicts the local file's mix name. * **Deps:** Added `strsim` to `onetagger-platforms`.
**The Problem:** Perfectly matched tracks were receiving ~56% accuracy scores because OneTagger's standard fuzzy matching compared long local titles like `Title (Extended Mix)` against Beatport's shorter base `Title` field. **The Solution:** * **Smart Fallback:** Added a secondary matching pass that only triggers if the initial score is < 80%. * **Regex Extraction:** Safely splits the local title and mix name for independent grading. * **Weighted Scoring:** Calculates a new accuracy score heavily weighted toward the base title (70%) but rewarding accurate mix names (30%). * **False Positive Prevention:** Added a strict boolean check to immediately reject API tracks if their mix name directly contradicts the local file's mix name. * **Deps:** Added `strsim` to `onetagger-platforms`.
…y and remix matching
This PR introduces a highly optimized, secondary fallback matching engine exclusively within beatport.rs. It is designed to resolve widespread false negatives caused by Beatport's inconsistent metadata formatting (e.g., artist reordering, alias variations, and arbitrary (Extended Mix) suffixes) without altering the primary MatchingUtils core logic. The fallback only triggers if the primary matcher returns a confidence score below 0.80, acting as a localized rescue mission for difficult DJ metadata.
Key Features & Improvements
Categorical Version Taxonomy (MixType): Replaces brittle string equality with a strict enum matrix. This guarantees that functionally different DJ mixes (e.g., Club Mix vs Extended Mix) are strictly isolated and cannot falsely match, while safely bridging tracks missing explicit tags (Unknown ↔ Original). Jaccard (Token-Based) Similarity: Transitions from Levenshtein distance to Jaccard set intersection for Artist arrays and Remix titles. This completely resolves the mathematical penalties previously caused by word reordering (e.g., "Guetta Remix" vs "Remix Guetta") and punctuation differences ("&" vs "and"). Remix Stopword Filtering: Violently strips noise words (remix, rmx, mix, edit, vip, dub, etc.) prior to Jaccard comparison. This prevents false positives where two completely different remixers achieve a high similarity score simply because both strings contain the word "remix". Deterministic Confidence Ceiling: Enforces a strict scoring hierarchy where fuzzy Jaccard artist matches are mathematically capped (0.9) to ensure they can never outrank a verified exact match (1.0).
Performance Optimizations
O(N²) Prevention: Eliminates nested vector iteration by utilizing an internal HashMap lookup during fallback score updates. Regex Caching: Migrates the primary Mix Regex to a static OnceLock so it compiles exactly once per application lifetime. Closure Hoisting: Moves allocation-heavy normalizer closures (normalize_punctuation, normalize_artists) outside the main iteration loop.
Architectural Safety
Zero Core Impact: These changes are 100% fenced behind the < 0.80 fallback gate and contained entirely inside beatport.rs. The primary engine remains completely untouched, ensuring stability across other platform matchers.
…ag pagination The Beatport module struggled to match tracks containing featured artists in the local title (e.g., "Forever Ft. Sabrina Johnston"). Beatport's search API frequently chokes and returns 0 results when featured artists are included in the search string. Even when found, OneTagger's strict accuracy thresholds would fail the match because Beatport's base title didn't contain the featured artist. Furthermore, Beatport's Auto Tag fall back search was failing on valid tracks because the engine exited pagination prematurely (only checking Page 1) and didn't sort the fallback array by accuracy. The Solution This PR completely overhauls the Beatport matching logic and search sanitization to be bulletproof out-of-the-box. Key Changes: Hardcoded Search Sanitization: Baked a feature-stripping Regex ((?i)\s+(?:ft|feat|featuring)\.?\s+[^()]+) directly into clear_search_query. The Beatport API now always receives a clean base title, preventing 0-result API crashes. Universal Math Fallback: Added a smart fallback engine that dynamically strips features from both the local title and the Beatport API title during the Levenshtein calculation. This guarantees a 1.0 accuracy score for valid tracks even if the user has the OneTagger title cleanup regex empty. Autotag Pagination Fix: Modified the search loop to respect the user's max_pages config during Auto Tag runs, rather than prematurely returning after Page 1 if a match wasn't instantly found. Array Sorting: Forced matched_tracks to sort by accuracy descending so Auto Tag's automated selection reliably grabs the 100% match instead of a scrambled lower-tier match.
…ashes Two edge-case bugs were causing the Beatport module to freeze or aggressively fail out of track matching: Token Deadlock: If the Beatport API token expired mid-session, the update_token function attempted a recursive call while still holding the Mutex guard, causing the application thread to permanently freeze. ISRC Hard Crash: The ISRC matching path used the ? operator on the track detail API fetch. If the API rate-limited or dropped the connection on that specific call, it hard-failed the entire matching process instead of safely falling back to the text-search engine. The Solution Mutex Deadlock Fix: Explicitly added drop(token); to release the Mutex guard before the recursive update_token call, ensuring thread safety upon token expiry. Graceful ISRC Fallback: Replaced the ? operator in the ISRC search block with a match statement. If a track detail fetch fails, it now logs a warning and gracefully falls through to the standard text search, rather than crashing the tagger.
Implemented a bounded retry loop in the update_token method to handle expired tokens more effectively by preventing infinite recursion. Added error handling for cases where the Beatport API continuously provides expired tokens.
This PR introduces comprehensive stability improvements, dependency modernizations, and metadata formatting fixes across the core tagging engine, platform scrapers, and the CI/CD pipeline.
1. Dependency Modernization & Compatibility
2. Beatport and Beatsource API Fix (
beatport.rs) (beatsource.rs)The Issue: The Beatsource search API broke due to a frontend update that removed the
<script id="__NEXT_DATA__">tag, which the old web scraper relied on to extract search results.The Fix: HTML scraping logic and scraper dependencies entirely removed. The code now securely fetches a client credentials OAuth token from account.beatport.com and directly queries the official v4 catalog API (
api.beatsource.com/v4/catalog/search). This returns the exact same data natively without relying on fragile DOM parsing.Fixes #518 #520
3. Core ID3 & Tagging Engine Fixes (
id3.rs,lib.rs[tag],lib.rs [autotagger])TXXX) frames (e.g.,UNIQUEFILEID,WWWAUDIOFILE) would stack and duplicate infinitely upon overwriting. The ID3 writer now explicitly clears existing extended text frames before appending new ones.YEARframe is now strictly restricted to 4 digits (YYYY), while the fullYYYY-MM-DDstrings are injected into customRELEASETIMEandPUBLISHTIMEtags.4. API Decoding Stability (
beatport.rs,beatsource.rs)clear_search_queryhelper to strip parentheses from search strings before hitting thev4/catalog/searchendpoints. This prevents400/403errors when scraping tracks with bracketed metadata (like mixes or features) in the title.5. Workflow & CI/CD Updates
workflow_dispatchto theon:block in.github/workflows/build.ymlto allow the build process to be triggered manually from the GitHub Actions tab.