Skip to content

feat(engine): [YW-140] generic declarative RestConnector (HTTP/JSON any-source)#149

Merged
youhaowei merged 4 commits into
mainfrom
charixandra/yw-140-generic-rest-connector
Jun 17, 2026
Merged

feat(engine): [YW-140] generic declarative RestConnector (HTTP/JSON any-source)#149
youhaowei merged 4 commits into
mainfrom
charixandra/yw-140-generic-rest-connector

Conversation

@youhaowei

@youhaowei youhaowei commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds @dashframe/connector-rest — a new package implementing RestConnector, a generic declarative HTTP/JSON connector that interprets DataSource.config (endpoint, method, authRef, pagination, rowPath, fieldMap) to pull from any HTTP/JSON source. Config IS the mod — no executable code in config.
  • Implements all four pagination strategies: offset, cursor, page-number, and link-header (RFC 5988). Unsupported strategy strings are logged as declarative-ceiling gaps for the v0.4 code-plugin tier.
  • authRef resolves server-side through the SecretResolver scoped lease (vault's withSecret). Vault-absent + authRef-present → throw before any fetch (fail-closed). Plaintext never in config, formData, or any public interface.
  • connect() performs a live single-page test fetch (real-world side-effect — cannot ride a preview transaction).
  • query() returns { arrowBuffer (base64), fieldIds, fields } — fully serializable over IPC. Renderer materializes the DataFrame from arrowBuffer.
  • rowPath dot-path extraction and fieldMap key renaming, with Arrow IPC output via apache-arrow tableFromArrays/tableToIPC.

Tracked internally as YW-140.

Test plan

  • 24/24 tests pass (bunx vitest run in packages/connector-rest)
  • Full turbo typecheck clean (46/46 tasks pass)
  • Offset pagination: two-page fetch, stops on empty page
  • Cursor pagination: follows next_cursor until absent
  • Page-number pagination: increments page, stops on partial page
  • Link-header pagination: follows Link: rel="next" header, stops when absent
  • rowPath + fieldMap: nested extraction + key rename confirmed
  • authRef vault resolution: Bearer token injected into fetch headers via vault's withSecret; plaintext never in result or formData
  • Fail-closed: SecretResolver rejection → throw, fetch never called
  • Public endpoints (noopResolver): no Authorization header sent

🤖 Generated with Claude Code

Greptile Summary

This PR introduces @dashframe/connector-rest, a new declarative HTTP/JSON connector package that drives pagination, auth, row extraction, and Arrow serialization entirely from a RestConnectorConfig object — no executable code in config.

  • Implements four pagination strategies (offset, cursor, page-number, link-header/RFC 5988) with budget-aware early-stop, a same-origin SSRF guard on Link-header redirects, and fail-closed auth enforcement when authRef is configured.
  • Serializes results as Apache Arrow IPC (base64) with per-cell type coercion aligned to inferStringColumnType, null-prototype row objects to block prototype pollution, and fieldMap collision detection; 24 tests cover all strategies, security edges, and coercion edge-cases.

Confidence Score: 3/5

Safe to merge after addressing the cursor "next" probe bug; the rest of the connector is well-guarded and thoroughly tested.

The cursor pagination auto-detection probes "next" as a fallback field name. On APIs that place the next-page URL in a response body next field (a very common pattern), the connector silently duplicates page-1 rows until the row budget is exhausted — no error is raised, so callers receive wrong data without knowing it.

packages/connector-rest/src/connector.ts — specifically the resolveNextCursor function and the fetchPage body-less implementation for non-GET methods.

Important Files Changed

Filename Overview
packages/connector-rest/src/connector.ts Core connector implementation; contains the resolveNextCursor P1 bug (probing "next" as cursor name causes silent row duplication on DRF-style APIs) and a P2 gap where POST/PUT/PATCH methods have no body support.
packages/connector-rest/src/connector.test.ts Comprehensive 857-line test suite covering all four pagination strategies, auth/vault resolution, SSRF guard, prototype-pollution defence, limit pushdown, type coercion, and fieldMap collision detection.
packages/connector-rest/src/types.ts Declarative config types for RestConnectorConfig and PaginationStrategy; well-documented with no issues.
packages/connector-rest/src/index.ts Re-exports public API surface from connector and types; no issues.
packages/connector-rest/package.json New package manifest for @dashframe/connector-rest; dependencies and scripts look correct.
packages/connector-rest/vitest.config.ts Vitest config with workspace aliases for engine and secret-vault; coverage thresholds set at 80%; no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant RestConnector
    participant SecretVault as SecretResolver / Vault
    participant RemoteAPI

    Client->>RestConnector: query(databaseId, tableId, options?)
    RestConnector->>RestConnector: assertFieldMapNoCollision(fieldMap)
    RestConnector->>SecretVault: auth(callback)
    SecretVault-->>RestConnector: token (throws if authRef set and token empty)
    RestConnector->>RestConnector: authHeaders(token) → headers map

    loop fetchAllPages (offset / page-number / cursor / link-header)
        RestConnector->>RemoteAPI: "fetch(url, { method, headers })"
        RemoteAPI-->>RestConnector: JSON response
        RestConnector->>RestConnector: extractRows(data, rowPath)
        RestConnector->>RestConnector: check budget (sliceOffset + limit)
    end

    RestConnector->>RestConnector: filter and applyFieldMap(rows)
    RestConnector->>RestConnector: slice(sliceOffset, sliceOffset+limit)
    RestConnector->>RestConnector: inferFieldsFromRows(limitedRows)
    RestConnector->>RestConnector: coerceValueToType() per cell
    RestConnector->>RestConnector: tableFromArrays() → tableToIPC() → base64
    RestConnector-->>Client: "{ arrowBuffer, fieldIds, fields }"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant RestConnector
    participant SecretVault as SecretResolver / Vault
    participant RemoteAPI

    Client->>RestConnector: query(databaseId, tableId, options?)
    RestConnector->>RestConnector: assertFieldMapNoCollision(fieldMap)
    RestConnector->>SecretVault: auth(callback)
    SecretVault-->>RestConnector: token (throws if authRef set and token empty)
    RestConnector->>RestConnector: authHeaders(token) → headers map

    loop fetchAllPages (offset / page-number / cursor / link-header)
        RestConnector->>RemoteAPI: "fetch(url, { method, headers })"
        RemoteAPI-->>RestConnector: JSON response
        RestConnector->>RestConnector: extractRows(data, rowPath)
        RestConnector->>RestConnector: check budget (sliceOffset + limit)
    end

    RestConnector->>RestConnector: filter and applyFieldMap(rows)
    RestConnector->>RestConnector: slice(sliceOffset, sliceOffset+limit)
    RestConnector->>RestConnector: inferFieldsFromRows(limitedRows)
    RestConnector->>RestConnector: coerceValueToType() per cell
    RestConnector->>RestConnector: tableFromArrays() → tableToIPC() → base64
    RestConnector-->>Client: "{ arrowBuffer, fieldIds, fields }"
Loading

Comments Outside Diff (2)

  1. packages/connector-rest/src/connector.ts, line 751-806 (link)

    P1 Unbounded pagination loops — no max-pages or max-rows guard

    The cursor and link-header cases have no upper bound on iterations. If an API always returns a non-empty cursor or a Link: rel="next" header (e.g., a bug on the remote end, a circular ref, or a misconfigured cursorPath), fetchAllPages will loop indefinitely and the query() call will never return. The offset and page-number cases are partially protected by the rows.length === 0 check but are similarly unbounded for row accumulation.

    A simple guard — e.g., const MAX_PAGES = 1_000; let pages = 0; at the top of fetchAllPages with if (++pages > MAX_PAGES) { console.warn(...); break; } at the top of each loop — would prevent this without changing normal behaviour.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/connector-rest/src/connector.ts
    Line: 751-806
    
    Comment:
    **Unbounded pagination loops — no max-pages or max-rows guard**
    
    The `cursor` and `link-header` cases have no upper bound on iterations. If an API always returns a non-empty cursor or a `Link: rel="next"` header (e.g., a bug on the remote end, a circular ref, or a misconfigured `cursorPath`), `fetchAllPages` will loop indefinitely and the `query()` call will never return. The `offset` and `page-number` cases are partially protected by the `rows.length === 0` check but are similarly unbounded for row accumulation.
    
    A simple guard — e.g., `const MAX_PAGES = 1_000; let pages = 0;` at the top of `fetchAllPages` with `if (++pages > MAX_PAGES) { console.warn(...); break; }` at the top of each loop — would prevent this without changing normal behaviour.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code Fix in Codex Fix in Cursor

  2. packages/connector-rest/src/connector.ts, line 966-983 (link)

    P2 Full remote dataset fetched before QueryOptions row limit is applied

    fetchAllPages downloads every page into rawRows before options?.pagination?.limit slices the result. If the remote API serves thousands of pages and the UI requests only 50 rows, the connector still fetches and materializes the full dataset in memory. For the v0.1 scope this may be acceptable, but the current options.pagination slice gives callers no way to avoid the full download — they cannot tell from the interface that specifying a small limit won't reduce network traffic.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/connector-rest/src/connector.ts
    Line: 966-983
    
    Comment:
    **Full remote dataset fetched before `QueryOptions` row limit is applied**
    
    `fetchAllPages` downloads every page into `rawRows` before `options?.pagination?.limit` slices the result. If the remote API serves thousands of pages and the UI requests only 50 rows, the connector still fetches and materializes the full dataset in memory. For the v0.1 scope this may be acceptable, but the current `options.pagination` slice gives callers no way to avoid the full download — they cannot tell from the interface that specifying a small `limit` won't reduce network traffic.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code Fix in Codex Fix in Cursor

Reviews (3): Last reviewed commit: "fix(connector-rest): address security + ..." | Re-trigger Greptile

youhaowei and others added 2 commits June 17, 2026 07:23
…or/page/link-header pagination

Implements RestConnector, a new @dashframe/connector-rest package that interprets
a declarative DataSource.config (endpoint, method, authRef, pagination, rowPath,
fieldMap) to pull from an arbitrary HTTP/JSON source.

- Four pagination strategies: offset, cursor, page-number, Link-header.
  Unsupported config patterns are logged as declarative-ceiling gaps for the
  v0.4 code-plugin tier.
- Auth-blind data pipeline: constructor takes a bound SecretResolver
  (capability-attenuated lease pre-bound to one credential ref); connect() and
  query() carry no credential arguments — enforced by type. noopResolver
  provided for public endpoints.
- rowPath dot-path extraction + fieldMap key renaming + Arrow IPC output via
  apache-arrow tableFromArrays/tableToIPC (base64-encoded, JSON-serializable
  across IPC boundary).
- connect() does a live single-page test fetch (real-world side-effect,
  cannot ride a preview transaction).
- Tests cover all four pagination strategies, rowPath/fieldMap mapping,
  vault-resolution of authRef via bound SecretResolver, and fail-closed
  resolver-rejection behaviour.
- vitest.config.ts adds @wystack/secret-vault src alias (Vite can't resolve
  the bun export condition; mirrors pattern needed for CI test runs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…noopResolver)

When the secret resolver yields an empty token (noopResolver for public
endpoints), do not send Authorization: Bearer  — omit the header entirely.
Updated test expectation to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jun 17, 2026

Copy link
Copy Markdown

YW-140

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@youhaowei, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 27 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c42dc87f-7b1b-4cdf-9e70-b5bcbb4a66d3

📥 Commits

Reviewing files that changed from the base of the PR and between 158b727 and 416a9ed.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • packages/connector-rest/package.json
  • packages/connector-rest/src/connector.test.ts
  • packages/connector-rest/src/connector.ts
  • packages/connector-rest/src/index.ts
  • packages/connector-rest/src/types.ts
  • packages/connector-rest/tsconfig.json
  • packages/connector-rest/vitest.config.ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- cognitive-complexity: extracted per-strategy functions (fetchOffsetPages,
  fetchPageNumberPages, fetchCursorPages, fetchLinkHeaderPages,
  resolveNextCursor, parseLinkNext) so fetchAllPages is a thin dispatcher.
- slow-regex: replaced Link-header regex with a two-step split+indexOf
  approach (split on >,:, then indexOf < >) to avoid backtracking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b85058ba8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/connector-rest/src/connector.ts Outdated
Comment thread packages/connector-rest/src/connector.ts
Comment thread packages/connector-rest/src/connector.ts Outdated
Comment thread packages/connector-rest/src/connector.ts Outdated
Comment thread packages/connector-rest/src/connector.ts Outdated
Comment thread packages/connector-rest/src/connector.ts Outdated
Comment thread packages/connector-rest/src/connector.ts
Comment thread packages/connector-rest/src/connector.ts
Comment thread packages/connector-rest/src/connector.ts
SECURITY (credential-path, fail-closed):
- SSRF + credential leak: link-header pagination now forwards the vault-
  resolved Bearer token ONLY to a same-origin rel="next" URL. A cross-origin
  (or unresolvable) next-link stops pagination — the token is never sent
  off-origin. Origin is anchored to the immutable config endpoint, so a
  link chain cannot walk the trusted origin forward.
- Fail-closed credentials: #authHeaders throws BEFORE any fetch when authRef
  is configured but the resolver yields no token — never issues an
  unauthenticated request as if it were fine. Routes both connect() and query().
- Prototype pollution: __proto__/constructor/prototype response keys are
  dropped (cannot be real columns; also crash apache-arrow); row + column
  objects are null-prototype.

Correctness:
- parseLinkNext: RFC-5988-correct entry split (tracks <...> bracket depth so a
  comma inside a URL doesn't split the entry; per-entry rel check fixes the
  prev-before-next bug; tokenized rel match rejects rel=nextpage). No
  backtracking-prone regex.
- Query limit pushdown: pagination stops once offset+limit rows are collected
  instead of walking the whole remote chain then slicing.
- Relative Link headers resolved against the current page URL before fetch.
- pageSize coerced from declarative string config before offset arithmetic.
- Value coercion aligns with the engine's inference contract (string values via
  parseStringValueByType: "" → null not 0, yes/no → boolean; native primitives
  via parsePrimitiveValueByType) so the Arrow schema matches the inferred fields.
- fieldMap collisions rejected (config-static two-sources-one-target, and
  per-row target-onto-existing-key) before any network work.

Tests: +17 covering cross-origin token non-forwarding, fail-closed throw-before-
fetch, __proto__ non-pollution, limit pushdown, relative/comma-in-URL/prev-
before-next/rel=nextpage Link parsing, string pageSize, empty-string→null,
yes/no→boolean, and fieldMap collisions. 41 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@youhaowei youhaowei merged commit 0ea9a35 into main Jun 17, 2026
11 checks passed
@youhaowei youhaowei deleted the charixandra/yw-140-generic-rest-connector branch June 17, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant