Skip to content

Feat/binary endpoint#349

Open
d7mtech wants to merge 4 commits into
ThePhaseless:mainfrom
d7mtech:feat/binary-endpoint
Open

Feat/binary endpoint#349
d7mtech wants to merge 4 commits into
ThePhaseless:mainfrom
d7mtech:feat/binary-endpoint

Conversation

@d7mtech

@d7mtech d7mtech commented May 5, 2026

Copy link
Copy Markdown

No description provided.

d7mtech and others added 4 commits May 6, 2026 00:45
…ssion

Adds GET /binary?url=... that fetches binary content (images, PDFs)
through the same Camoufox browser context that solves the CF challenge.

Strategy:
1. Try a direct subresource fetch via context.request.get(url) - uses
   the browser's TLS/H2 fingerprint, UA, and cookie jar.
2. If that returns a CF challenge (status >= 400 or HTML interstitial),
   navigate to the origin root, solve the challenge, then retry.

Bytes are streamed back with the upstream Content-Type. cf_clearance
never leaves the browser process, which avoids the cookie-portability
problem that breaks out-of-browser HTTP clients.

Tested against httpbin PNG fixture (unit) and olympustaff.com (e2e).
Upstream uses ubuntu:latest which currently resolves to ubuntu:26.04
(resolute), and Playwright 1.58 does not yet ship dependency lists for
that release. The 'install-deps firefox' step warns 'Cannot install
dependencies for ubuntu26.04-x64' and no-ops, then Camoufox fails at
launch with libgtk-3.so.0 not found.

Fix:
1. Pin base image to ubuntu:24.04 (LTS, supported by Playwright 1.58).
2. Belt-and-suspenders: install the Firefox runtime libs explicitly so
   future Ubuntu bumps don't silently break us.

Also add image-fetcher.mjs reference: a Node sidecar that fronts byparr
/binary for the unimanga import pipeline. Documented in this repo for
ops convenience; not part of the byparr container build.
Adds a length-mismatch guard inside _do_fetch(): when the upstream
response carries a Content-Length header, the buffered body is
verified to match. If it doesn't, we treat the response as transient
(sentinel status -1) and retry once at the call site before escalating
to 502.

Why: Playwright's APIRequestContext.body() can return early when
the upstream connection drops before the full body is delivered —
notably observed with large JPEGs from olympustaff over flaky CDN
edges. Sharp downstream then crashes with 'VipsJpeg: Premature end
of input file' on >50% of chapter pages, and the importer marks
tiles FAILED.

The check only triggers when Content-Length is present, so chunked
or unknown-length responses pass through unchanged.

Both Phase A (direct) and Phase B (post-warmup) paths get the same
truncation-retry behaviour for symmetry. Header dict is now always
lowercased inside _do_fetch so the downstream content-type lookup
no longer needs the dual-case fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d response

Root cause: undici (Node 24's fetch) auto-sends an Accept-Encoding
header, uvicorn honors it and returns a gzip-compressed response
with Content-Length set to the GZIP size (e.g. 562347B for an
image whose decompressed size is 563579B). undici transparently
decompresses while streaming, so upstream.body.getReader() yields
the larger decompressed body — but image-fetcher was forwarding
the smaller upstream Content-Length downstream. The client then
truncated the response at the gzip-size boundary, dropping the
final ~1KB of the JPEG (including the FF D9 EOI marker). Sharp
crashed with VipsJpeg: Premature end of input file on >50% of
chapter pages, marking tiles FAILED.

Two-pronged fix:
1. Send 'Accept-Encoding: identity' to byparr so it never
   compresses on this hop. Removes the mismatch entirely.
2. Defensively only forward Content-Length when content-encoding
   is absent or 'identity'. If a future intermediate re-introduces
   compression, we fall through to chunked transfer instead of
   sending a wrong length.

Verified end-to-end:
- Local Contabo 8192/fetch: 563579B, JPEG ends FF D9 (was 562347B)
- Public tunnel flaresolverr.unimanga.net/fetch: 563579B, EOI present

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant