The open-source page-intelligence API. Screenshot and extract product images + structured data from any site — even bot-protected ones. Self-hosted, free, fast.
Every commercial screenshot API charges $9–$79/mo, throttles you, and still gets blocked by modern bot protection. third-eye is the free, self-hostable alternative that goes further: point it at a product page and it returns the actual product images + structured data (title, brand, price, sizes), not just a PNG — purpose-built to feed similar-image search and catalog ingestion.
- 🛡️ Passes bot detection — Patchright (patched Chromium) defeats the headless/automation tells that block stock Playwright/Puppeteer. Captures Shopify, Next.js, SPAs, Uniqlo, and more where rivals 403.
- 🛍️ Product extraction —
/v1/extractpulls product images + data via JSON-LD / OpenGraph / Shopify-JSON / DOM heuristics. Returns the image from the URL, with a screenshot fallback. - 🧠 Readiness oracle — network-idle +
fonts.ready+ lazy-load scroll + animation freeze + canvas/Flutter first-frame detection (Flutter/WebGL apps render, not blank). - 🖼️ Screenshots done right — full-page single-pass (no scroll-stitch seams), element clip, device emulation, dark mode, PNG/JPEG/WebP/PDF.
- ♻️ Warm browser pool — per-request isolation, recycle-after-N, crash self-heal. ⚡ Sync + async (webhooks) + bulk via Redis/BullMQ.
- 🔑 API-key auth, per-plan rate limits, Prometheus metrics, graceful shutdown.
- 🧩 Pluggable storage (
none/s3/local), devices, readiness steps, output formats, extractors — see EXTENDING.md. - ☁️ One Docker image → any VPS, published via Cloudflare Tunnel. Not serverless (cold starts + no GL stack break the warm pool and canvas apps).
| third-eye | commercial screenshot APIs | |
|---|---|---|
| Price | Free / self-hosted (MIT) | $9–$79/mo, then per-shot overage |
| Rate limits | yours to set | 40–100/min typical |
| Passes modern bot detection | ✅ Patchright | ❌ mostly blocked |
| Product image + data extraction | ✅ built-in | ❌ none |
| Canvas/Flutter/WebGL rendering | ✅ SwiftShader + first-frame wait | |
| Screenshots / PDF / full-page | ✅ | ✅ |
| Data ownership | 100% yours | vendor-hosted |
Compared against allscreenshots, pikwy, site-shot, screenshotapi, microlink, screenshotone, screenshotapi.net, urlbox. Hardest-tier marketplaces behind Akamai sensor-data / PerimeterX (e.g. H&M, Zara, Myntra) still require residential proxies — see Limitations.
cp .env.example .env
npm install
npm run browsers:install # one-time: Chromium + OS deps
# Engine smoke test — no server/Redis needed:
npm run smoke -- https://example.com
npm run smoke -- https://flutter.dev --full # exercises the canvas path
# Full stack (API + worker + Redis) via Docker:
docker compose up --buildBase URL http://localhost:8080. Auth via x-api-key (dev key:
te_dev_local). Add ?response=binary|base64|json (default binary).
curl -X POST http://localhost:8080/v1/screenshot \
-H 'x-api-key: te_dev_local' -H 'content-type: application/json' \
-d '{"url":"https://example.com","fullPage":true,"format":"png"}' \
--output shot.pngConvenience GET (browser-friendly):
GET /v1/screenshot?url=https://example.com&full_page=true&device=iphone-15
curl -X POST http://localhost:8080/v1/screenshot/async \
-H 'x-api-key: te_dev_local' -H 'content-type: application/json' \
-d '{"url":"https://example.com","webhookUrl":"https://you.dev/hook"}'
# → { "jobId": "...", "status": "queued" }
curl http://localhost:8080/v1/jobs/<jobId> -H 'x-api-key: te_dev_local'curl -X POST http://localhost:8080/v1/bulk \
-H 'x-api-key: te_dev_local' -H 'content-type: application/json' \
-d '{"urls":["https://a.com","https://b.com"],"options":{"device":"desktop-hd"}}'Point at a product page (PDP); get the product images + structured data back.
curl -X POST http://localhost:8080/v1/extract \
-H 'x-api-key: te_dev_local' -H 'content-type: application/json' \
-d '{"url":"https://bluorng.com/products/flyway-linen-shirt"}'Return the primary product image bytes directly (screenshot fallback if none):
curl -X POST 'http://localhost:8080/v1/extract?response=image' \
-H 'x-api-key: te_dev_local' -H 'content-type: application/json' \
-d '{"url":"https://www.uniqlo.com/in/en/products/E482443-000/00"}' --output product.jpgListing (PLP) → every product card; plus async extraction:
curl -X POST http://localhost:8080/v1/extract/listing -H 'x-api-key: te_dev_local' \
-H 'content-type: application/json' -d '{"url":"https://bluorng.com/collections/all"}'
curl -X POST http://localhost:8080/v1/extract/async -H 'x-api-key: te_dev_local' \
-H 'content-type: application/json' -d '{"url":"...","webhookUrl":"https://you.dev/hook"}'Extraction strategy (precedence): JSON-LD Product → OpenGraph → Shopify
.json → DOM gallery heuristics, normalized to absolute URLs and deduped.
Key options (see src/core/schema.ts for the full contract)
| field | type | notes |
|---|---|---|
url |
string (required) | target page |
format |
png | jpeg | webp |
default png |
pdf |
bool | render PDF instead of an image |
fullPage |
bool | single-pass full-page capture |
selector / clip |
string / rect | capture one element or region |
device |
e.g. iphone-15, desktop-hd |
preset viewport + DPR + UA |
viewport / deviceScaleFactor |
rect / number | manual surface |
darkMode, reducedMotion, locale, timezone |
emulation | |
waitStrategy |
auto | networkidle | load | domcontentloaded |
default auto |
waitForSelector / waitForFunction / delayMs |
extra readiness gates | |
blockAds, blockCookieBanners, hideSelectors, removeSelectors |
clean shots | |
injectCss / injectJs / headers / cookies |
page setup / auth |
Extraction adds maxImages and includeScreenshot, and inherits the full
capture surface (device, stealth, waits, proxy).
Just the og:image URL, no browser on the happy path — a streamed fetch + parse,
cached. ~150–500ms (vs seconds for a full render); blocked/no-og sites fall back
to the browser automatically.
curl -s "https://your-host/v1/og?url=https://bluorng.com/products/flyway-linen-shirt" \
-H "x-api-key: $KEY"
# → {"image":"https://bluorng.com/cdn/shop/files/rvet5refd.jpg?v=…"}POST /v1/screenshot · GET /v1/screenshot · POST /v1/screenshot/async ·
POST /v1/extract · POST /v1/extract/async · POST /v1/extract/listing ·
GET /v1/og · GET /v1/jobs/:id · POST /v1/bulk · GET /healthz · GET /readyz · GET /metrics
The interesting engineering and the hard cases (Flutter/CanvasKit, when-is-a-page- ready, full-page stitching, memory) are documented in CLAUDE.md.
One Docker image, two roles (api + worker). Deploys to any Linux box with
Docker. Recommended: VPS / EC2 + Cloudflare Tunnel (TLS + public hostname,
zero inbound ports). Runbooks: DEPLOY.md (any VPS) ·
DEPLOY-AWS.md (EC2 + S3, with a one-paste bootstrap script).
git clone https://github.com/myselfshravan/third-eye.git && cd third-eye
cp .env.production.example .env # set API_KEYS, TUNNEL_TOKEN, pool sizes
docker compose -f docker-compose.prod.yml --env-file .env up -d --build
docker compose -f docker-compose.prod.yml up -d --scale worker=3 # scale out- Storage: Cloudflare R2 (zero egress) — set
STORAGE_DRIVER=s3and theS3_*vars; pointS3_PUBLIC_BASE_URLat your R2/CDN domain.localandnonedrivers are also built in (see EXTENDING.md).
Not built for serverless (Vercel/Lambda): cold starts kill the warm-pool advantage and there's no GL stack, so Flutter/WebGL pages render blank.
Drop URLs into examples/test-urls.json and run the
batch runner — captures each straight through the engine and prints a summary:
npm run batch # screenshot each URL → captures/
npm run batch -- --extract # extract product data/images → *.product.json
npm run batch -- path/to/urls.json # custom file- Hardest-tier bot walls. Sites on Akamai sensor-data / PerimeterX /
DataDome (e.g. H&M, Zara, Myntra) block at the edge on TLS/IP reputation +
behavioral signals — before JS runs — so stealth alone won't pass them.
third-eye detects this and reports
blocked: true+httpStatushonestly rather than returning a fake "success". Cracking these needs residential proxies (setPROXY_URL/ per-requestproxy) and is a roadmap item. Most D2C/Shopify/Next.js storefronts work out of the box. - No ML (yet). Extraction is structured-data + heuristics. A vision-model fallback and image embeddings (for direct similar-image search) are planned as a pluggable enrichment step.
Pre-1.0 and under active development. Capture, extraction, stealth, API, worker, and deploy path are working and tested; see the CHANGELOG and open issues.
Contributions are welcome! See CONTRIBUTING.md for setup and guidelines, EXTENDING.md for how to add storage backends, devices, readiness steps, or output formats, and CLAUDE.md for the architecture. By participating you agree to our Code of Conduct.
Found a vulnerability? Please report it privately — see SECURITY.md. Note the SSRF hardening guidance there before exposing third-eye publicly (it renders arbitrary user-supplied URLs).
MIT © Shravan Revanna
{ "title": "Flyway Linen Shirt", "brand": "Bluorng", "price": 8200, "currency": "INR", "sizes": ["XS","S","M","L","XL","XXL"], "images": [ { "url": "https://cdn.shopify.com/.../rvet5refd.jpg", "source": "shopify" }, … ], "primaryImage": "https://cdn.shopify.com/.../rvet5refd.jpg", "confidence": "high", // high = JSON-LD/Shopify · medium = OG · low = DOM heuristics "sources": ["shopify","og"], "blocked": false }