Skip to content

PIGS-809: Adopt regex query contract for search_text_in_pdf#82

Merged
RogerThomas merged 5 commits into
mainfrom
PIGS-809-search-text-regex-contract
Jun 16, 2026
Merged

PIGS-809: Adopt regex query contract for search_text_in_pdf#82
RogerThomas merged 5 commits into
mainfrom
PIGS-809-search-text-regex-contract

Conversation

@RogerThomas

Copy link
Copy Markdown
Contributor

Summary

Updates the search_text_in_pdf tool to match the platform's new extract-text-bounding-boxes contract (PIGS-809), which adds regular-expression search support.

The backend now:

  • takes a queries array (each query is literal text or a regex) instead of a flat texts array, and
  • returns results under textBoxes as per-query results (query + matches[] with matchedText/boxes[]/groups[]) instead of flat text boxes.

The old tool sent { texts: [...] } and parsed textBoxes[].text, so it would break against the updated platform on both the request and response sides.

Changes

  • platformHandler.extractTextBoundingBoxes now accepts an array of query objects ({ text, isRegex?, regexFlags? }) and sends { queries }, omitting isRegex/regexFlags on the wire unless set.
  • search_text_in_pdf input is now a queries[] of { text, isRegex?, regexFlags? } objects (rather than texts[] + shared flags), giving full backend parity: a single call can mix literal and regex queries and assign distinct flags per query. Allowed regexFlags: ignore-case, multiline, dot-all. Flags without isRegex are rejected with a UserFacingError.
  • Match counts (totalMatches, uniqueTextsFound) are computed from the new per-query/matches shape.
  • Tests updated for the new request/response contract, including per-query regex and the flags-without-isRegex rejection path.

Only node-version/ is touched (the root Python code is legacy per CLAUDE.md).

Testing

  • task n:check — format, types, lint all pass
  • task n:test — 164 passed, 1 skipped

🤖 Generated with Claude Code

RogerThomas and others added 4 commits June 15, 2026 15:41
The platform's extract-text-bounding-boxes method now takes a `queries`
array (each query is literal text or a regex) instead of a flat `texts`
array, and returns results under `textBoxes` as per-query results with
`query`/`matches` rather than flat text boxes.

- platformHandler.extractTextBoundingBoxes now builds `queries` from the
  texts plus optional isRegex/regexFlags and sends `{ queries }`.
- search_text_in_pdf exposes optional isRegex + regexFlags (ignore-case,
  multiline, dot-all), rejects flags without isRegex, and computes match
  counts from the new query-results shape.
- Tests updated for the new request/response contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the flat texts[] + shared isRegex/regexFlags inputs with a
queries[] of {text, isRegex?, regexFlags?} objects, matching the backend
contract one-to-one. This lets a single call mix literal and regex
queries and assign distinct flags per query, rather than applying one
regex mode uniformly to every term.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CI node-check fails at `npm audit --audit-level=moderate` due to a
high-severity advisory (GHSA-gv7w-rqvm-qjhr / GHSA-g7r4-m6w7-qqqr)
covering esbuild 0.17.0-0.28.0. Bump esbuild to ^0.28.1 and tsx to
^4.22.4 so its nested esbuild dedupes to the patched top-level copy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous in-place install left the lockfile missing some @emnapi/*
optional deps, breaking `npm ci` on Linux CI. Regenerate cleanly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Node/TypeScript search_text_in_pdf tool and PlatformHandler.extractTextBoundingBoxes wrapper to align with the platform’s new extract-text-bounding-boxes contract that supports per-query regex searches (PIGS-809).

Changes:

  • Replaces texts: string[] with queries: { text, isRegex?, regexFlags? }[] on both the tool input schema and platform request payload.
  • Updates result parsing to use the new per-query textBoxes[].matches[] response shape and recomputes summary counts.
  • Adjusts and expands Vitest coverage for the new request/response contract and regex behavior; bumps a couple of dev dependencies.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
node-version/src/tools/extractions.ts Updates search_text_in_pdf tool schema + logic to send per-query literal/regex searches and parse new per-query results.
node-version/src/handlers/platformHandler.ts Updates extractTextBoundingBoxes to send { queries } payload and optionally include regex fields.
node-version/tests/extractions.test.ts Updates search_text_in_pdf tests for new input/output shape and adds regex-related scenarios.
node-version/tests/platformHandler.test.ts Updates handler tests to assert new { queries } request payload, including regex flags.
node-version/package.json Bumps esbuild and tsx dev dependency versions.
node-version/package-lock.json Lockfile refresh reflecting dependency updates/hoisting.
Files not reviewed (1)
  • node-version/package-lock.json: Generated file

Comment thread node-version/src/tools/extractions.ts
Comment on lines +283 to +286
async extractTextBoundingBoxes(
fileBytes: Buffer,
queries: { text: string; isRegex?: boolean | undefined; regexFlags?: string[] | undefined }[],
): Promise<Buffer> {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining this one: the | undefined annotations are intentional. This codebase compiles with exactOptionalPropertyTypes: true, and the caller passes a zod-inferred value typed as { isRegex?: boolean | undefined; regexFlags?: string[] | undefined }. Removing | undefined from the target makes that argument non-assignable and fails tsc (TS2345). Keeping the explicit | undefined is what allows the optional properties to line up.

Comment on lines +3176 to +3181
"node_modules/glob": {
"version": "7.2.3",
"resolved": "https://registry.npmjs.org/glob/-/glob-7.2.3.tgz",
"integrity": "sha512-nFR0zLpU2YCaRxwoCJvL6UvCH2JFyFVIvwTLsIf21AuHlMskA1hhTdk+LlYJtOlYt9v6dvszD2BGRqBL+iQK9Q==",
"deprecated": "Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me",
"license": "ISC",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining within the scope of this PR: glob@7.2.3 is not introduced or hoisted by the dependency changes here. It is a long-standing transitive dependency of exceljs (via archiver-utils / rimraf / zip-stream) and is already present on main prior to this branch. The esbuild/tsx bump in this PR does not touch that chain. Replacing it would mean forcing an override on a deep transitive of a production dependency, which is unrelated to the PIGS-809 regex change and better handled separately (e.g. via Dependabot on main).

@Existency Existency left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Address Copilot review: validation previously allowed `{ regexFlags: [] }`
to pass without isRegex. Reject any provided regexFlags regardless of
length, and cover the empty-array case with a test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@RogerThomas RogerThomas merged commit 029f1e5 into main Jun 16, 2026
5 checks passed
@RogerThomas RogerThomas deleted the PIGS-809-search-text-regex-contract branch June 16, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants