Skip to content

Audit cleanDiscogsBio: iOS and dj-site parse Discogs markup, so backend strip is lossy #1360

@jakebromberg

Description

@jakebromberg

Context

Discovered while triaging #1354 (closed as not-a-bug). The cleanDiscogsBio helper at shared/metadata/src/helpers/clean-discogs-bio.ts strips Discogs markup tokens ([a=Name], [l=Name], [r=…], [m=…], [url=…][/url]) from artist_bio before persisting. It is called from every artist_bio writer:

  • apps/backend/services/metadata/metadata.service.ts:142
  • apps/enrichment-worker/enrich.ts:156 and :191
  • jobs/flowsheet-metadata-backfill/enrich.ts:182
  • jobs/flowsheet-artwork-repair/repair.ts:153
  • jobs/album-level-backfill/job.ts:254
  • shared/metadata/src/normalize-lookup.ts:97

The premise of the helper is "consumers render raw text, so strip the markup or it shows as literal tokens." That premise doesn't hold for the two known consumers.

Consumer evidence

iOS (wxyc-ios-64, Shared/Metadata/Sources/Metadata/DiscogsMarkupParser.swift):

  • Recognizes all four numeric-id prefixes ([a12345], [l…], [r…], [m…]), the named-equals form ([a=Name], [l=Name]), [url=…][/url], and [b]/[i]/[u].
  • ArtistBioSection.swift:51-60 calls the async resolver path: [a12345] resolves to a tappable Discogs artist link via DiscogsAPIEntityResolver.shared.
  • Sync fallback drops unresolved IDs at DiscogsMarkupParser.swift:354 (return nil // Skip unresolved). The literal [a12345] is never rendered, in either path.
  • When the backend serves bioTokens (V1 proxy at apps/backend/controllers/proxy.controller.ts:221,504), iOS renders pre-parsed tokens directly with no parsing needed.

dj-site (src/components/experiences/modern/Rightbar/panels/album/AlbumCard.tsx:143):

  • Prefers <DiscogsMarkup tokens={bioTokens} /> when available; falls back to the raw artistBio string. The fallback is the only consumer that benefits from the strip — and only as a cosmetic cleanup for a rare path.

Cost of the current strip

So on the iOS V2 flowsheet path (no bioTokens served, raw artist_bio only), every strip is destroying link information that the iOS parser is fully equipped to consume.

V2 flowsheet doesn't serve bioTokens yet

apps/backend/controllers/proxy.controller.ts:221,504 populates bioTokens from artwork.profile_tokens on the iOS V1 proxy path. V2 flowsheet (/v2/flowsheet) serves artist_bio only — no bioTokens. V2 clients therefore pay the strip's cost with no offsetting benefit.

Remediation options

  1. Serve bioTokens everywhere artist_bio is served. Extend the V2 flowsheet DTO with the pre-parsed token array LML already produces (profile_tokens). Stop calling cleanDiscogsBio at write time; let clients render tokens directly. Needs a @wxyc/shared DTO change and a coordinated iOS + dj-site consumer migration. Long-term right answer.
  2. Stop stripping, accept text-fallback loss. Parser-equipped clients get full fidelity; raw-text fallback paths (dj-site AlbumCard else branch, any internal admin view) show literal markup. Minor cosmetic regression for fallbacks in exchange for restored link info on primary paths.
  3. Keep the strip, document the trade-off. Status quo. Add a doc-comment to cleanDiscogsBio noting it's a fallback-friendly text representation and that primary clients should consume bioTokens instead.

Scope of audit

  • Enumerate every consumer of artist_bio:
    • iOS V1 proxy (bioTokens served)
    • iOS V2 flowsheet (artist_bio only)
    • dj-site rightbar AlbumCard (bioTokens preferred, artistBio fallback)
    • Any other dj-site panel that reads artistBio
    • tubafrenzy mirror writes — does the mirror payload include the bio?
    • Internal/admin endpoints
  • For each, determine: parses markup, renders raw text, or uses bioTokens.
  • Decide between options 1 / 2 / 3.
  • If 1: extend @wxyc/shared DTO with bioTokens on the V2 flowsheet shape; propagate from flowsheet.profile_tokens (or wherever LML's tokens are persisted) through to the read path.

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    concern:contract-changeChanges a published API contractenhancementNew feature or requestlmlTouches library-metadata-lookup

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions