Skip to content

feat(render): --extract-text flag for hybrid output (tiles + text.md)#106

Open
aafaq-rashid-comprinno wants to merge 2 commits into
StarTrail-org:mainfrom
aafaq-rashid-comprinno:feat/output-hybrid
Open

feat(render): --extract-text flag for hybrid output (tiles + text.md)#106
aafaq-rashid-comprinno wants to merge 2 commits into
StarTrail-org:mainfrom
aafaq-rashid-comprinno:feat/output-hybrid

Conversation

@aafaq-rashid-comprinno

Copy link
Copy Markdown

Addresses #93.

Problem

Passing full screenshot tiles to LLMs costs many vision tokens. For text-heavy pages, extracting the text alongside the visual tiles lets users choose the cheaper representation when no charts/tables are present.

Solution

New --extract-text flag for pixelshot that extracts document.body.innerText via CDP after the page is rendered and saves it as text.md in the tile directory.

pixelshot https://comprinno.net/ -o ./tiles --extract-text --wait-network-idle

Output:

tiles/comprinno.net.png.tiles/
├── tile_0000.jpg   # visual screenshot
├── text.md         # extracted page text
└── tiles.json      # manifest

Design decisions

  • Zero-cost extraction: The DOM is already loaded for screenshotting — innerText is a single CDP call
  • Best-effort: Text extraction failures don't prevent tile capture
  • Opt-in: Default behavior unchanged (--extract-text is off by default)
  • No new dependencies: Uses existing CDP websocket connection

Tested

  • comprinno.net: text.md captures full page content (headings, paragraphs, nav)
  • All 23 existing tests pass
  • Lint clean

Adds an opt-in text extraction mode to pixelshot that saves a text.md
file alongside the screenshot tiles. Uses CDP Runtime.evaluate to grab
document.body.innerText after the page is rendered — zero-cost since
the DOM is already open.

Usage:
  pixelshot https://example.com -o ./tiles --extract-text

Output:
  tiles/example.com.png.tiles/
  ├── tile_0000.jpg   # visual tile (existing)
  ├── text.md         # page text as markdown (new)
  └── tiles.json      # manifest

This enables hybrid workflows where LLMs receive text for text-heavy
paragraphs (cheap tokens) and images only for charts/tables/diagrams
(expensive vision tokens).

Addresses StarTrail-org#93.
@vercel

vercel Bot commented Jun 24, 2026

Copy link
Copy Markdown

@aafaq-rashid-comprinno is attempting to deploy a commit to the andylizf's projects Team on Vercel.

A member of the Team first needs to authorize it.

Verifies text.md is created with page content when extract_text=True,
and not created when the flag is off (default).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant