Skip to content

[FEATURE] Derive real release ages for synthetic torrents #7

@Zzackllack

Description

@Zzackllack

Summary

  • Sonarr/Prowlarr currently see every AniBridge result as 0-days old because we stamp Torznab items with datetime.now() when the response is rendered.
  • Parse the “Veröffentlicht bei uns” timestamp published below each AniWorld episode player, persist it in our cache, and emit it as the RSS pubDate (and qBittorrent added_on) so downstream automation can prioritise fresh releases correctly.

Context & Findings

  • Torznab feed hard-codes now: _build_item is always called with pubdate=now (app/api/torznab/api.py:85-175, 285-400), so Sonarr/Prowlarr interpret every hit as brand new. There is no attempt to read AniWorld’s release metadata.

  • AniWorld UI exposes release time: Episode pages render a banner under the player such as:

    <div style="text-align: center; color: white; font-size: 14px; padding: 12px 0; ...">
      Veröffentlicht bei uns: <strong>Freitag, 29.08.2025 18:46</strong> Uhr
    </div>

    Example: https://aniworld.to/anime/stream/one-punch-man/staffel-3/episode-1

  • Library already fetches the HTML: aniworld.models.Episode downloads and caches the episode page when we call get_direct_link (.venv/lib/python3.13/site-packages/aniworld/models.py:675-790), so the markup is available without an extra HTTP request if we hook in before the response is discarded.

  • No field to store the timestamp: Our availability cache (EpisodeAvailability.extra, app/db/models.py:83-106) is empty today; we could reuse it to persist release_at per slug/season/episode/language. Client tasks (app/db/models.py:107-128) default added_on to utcnow(), which also leads to zero-age torrents in the qBittorrent shim.

  • Operational risk: AniWorld’s ToS bans automated scraping. We should minimise redundant requests, respect rate limits, and document the legal risk surfaced in LEGAL.md §5. Potential mitigation: piggyback on existing downloads instead of issuing an extra GET solely for the release date.

Proposed Changes

  1. Parse & normalise release timestamp
    • Extend our downloader/availability probe pipeline to extract the “Veröffentlicht bei uns” text from the episode HTML (likely a simple BeautifulSoup selector on the cached Episode.html).
    • Convert the (German) date string to a timezone-aware datetime (they include day name + local time; we may need locale-aware parsing or a manual map of month names).
  2. Persist per-episode metadata
    • Store the parsed timestamp in EpisodeAvailability.extra (e.g., { "release_at": iso8601 }) when we probe availability, and surface it through helper functions so both Torznab and qBittorrent layers can reuse it without re-fetching the page.
    • Add defensive logic when the banner is missing or malformed (fallback to current behaviour).
  3. Emit age-aware feed data
    • Update Torznab _build_item callers to pass pubdate=release_at when available; keep now as the fallback.
    • Mirror the value in the fake torrent payloads (ClientTask.added_on, qBittorrent added_on field) so Sonarr’s activity view matches the RSS age.
  4. Cache invalidation & data refresh
    • Ensure the cached timestamp respects our availability TTL (re-parse when the episode page is re-fetched) and optionally expose it via API/debug logs for observability.
  5. Documentation & guardrails
    • Document the new scraping behaviour, reference AniWorld’s ToS warning, and mention rate-limit/backoff expectations in the troubleshooting section. Consider adding a feature flag to disable the scraping if operators prefer the current synthetic age.

Testing Ideas

  • Unit tests for the HTML parser that feed captured AniWorld snippets (with and without the banner) and verify we obtain the correct UTC datetime.
  • Integration tests that run probe_episode_quality with a mocked Episode HTML and confirm EpisodeAvailability.extra["release_at"] is populated and reused by Torznab.
  • Torznab functional test asserting that pubDate reflects the captured timestamp and that the resulting RSS age matches the expected value.
  • qBittorrent sync test ensuring added_on is set to the release timestamp when available, and that the previous behaviour remains intact if parsing fails.

Open Questions

  • AniWorld sometimes removes older episodes or adjusts timestamps—do we need to validate the stored value each time we refresh availability, or assume it is immutable after the first scrape?
  • How should we handle timezone/localisation? The banner appears to use Central European time; should we treat it as Europe/Berlin and convert to UTC, or surface it as-is?
  • Is the release date shown per language/provider or global? If language-specific, do we need to track separate timestamps per language variant?
  • Should we apply the same mechanism to other catalogues (e.g., future s.to support (see Add s.to catalogue support alongside AniWorld #6 )) to keep behaviour consistent across sources?
  • Do we need rate limiting/backoff logic around the initial HTML fetch to avoid triggering anti-bot systems, or is the existing library behaviour sufficient?

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requeststaleNo activity for ~90 days

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions