Feat/binary endpoint#349
Open
d7mtech wants to merge 4 commits into
Open
Conversation
…ssion Adds GET /binary?url=... that fetches binary content (images, PDFs) through the same Camoufox browser context that solves the CF challenge. Strategy: 1. Try a direct subresource fetch via context.request.get(url) - uses the browser's TLS/H2 fingerprint, UA, and cookie jar. 2. If that returns a CF challenge (status >= 400 or HTML interstitial), navigate to the origin root, solve the challenge, then retry. Bytes are streamed back with the upstream Content-Type. cf_clearance never leaves the browser process, which avoids the cookie-portability problem that breaks out-of-browser HTTP clients. Tested against httpbin PNG fixture (unit) and olympustaff.com (e2e).
Upstream uses ubuntu:latest which currently resolves to ubuntu:26.04 (resolute), and Playwright 1.58 does not yet ship dependency lists for that release. The 'install-deps firefox' step warns 'Cannot install dependencies for ubuntu26.04-x64' and no-ops, then Camoufox fails at launch with libgtk-3.so.0 not found. Fix: 1. Pin base image to ubuntu:24.04 (LTS, supported by Playwright 1.58). 2. Belt-and-suspenders: install the Firefox runtime libs explicitly so future Ubuntu bumps don't silently break us. Also add image-fetcher.mjs reference: a Node sidecar that fronts byparr /binary for the unimanga import pipeline. Documented in this repo for ops convenience; not part of the byparr container build.
Adds a length-mismatch guard inside _do_fetch(): when the upstream response carries a Content-Length header, the buffered body is verified to match. If it doesn't, we treat the response as transient (sentinel status -1) and retry once at the call site before escalating to 502. Why: Playwright's APIRequestContext.body() can return early when the upstream connection drops before the full body is delivered — notably observed with large JPEGs from olympustaff over flaky CDN edges. Sharp downstream then crashes with 'VipsJpeg: Premature end of input file' on >50% of chapter pages, and the importer marks tiles FAILED. The check only triggers when Content-Length is present, so chunked or unknown-length responses pass through unchanged. Both Phase A (direct) and Phase B (post-warmup) paths get the same truncation-retry behaviour for symmetry. Header dict is now always lowercased inside _do_fetch so the downstream content-type lookup no longer needs the dual-case fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d response Root cause: undici (Node 24's fetch) auto-sends an Accept-Encoding header, uvicorn honors it and returns a gzip-compressed response with Content-Length set to the GZIP size (e.g. 562347B for an image whose decompressed size is 563579B). undici transparently decompresses while streaming, so upstream.body.getReader() yields the larger decompressed body — but image-fetcher was forwarding the smaller upstream Content-Length downstream. The client then truncated the response at the gzip-size boundary, dropping the final ~1KB of the JPEG (including the FF D9 EOI marker). Sharp crashed with VipsJpeg: Premature end of input file on >50% of chapter pages, marking tiles FAILED. Two-pronged fix: 1. Send 'Accept-Encoding: identity' to byparr so it never compresses on this hop. Removes the mismatch entirely. 2. Defensively only forward Content-Length when content-encoding is absent or 'identity'. If a future intermediate re-introduces compression, we fall through to chunked transfer instead of sending a wrong length. Verified end-to-end: - Local Contabo 8192/fetch: 563579B, JPEG ends FF D9 (was 562347B) - Public tunnel flaresolverr.unimanga.net/fetch: 563579B, EOI present Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.