feat(engine): [YW-140] generic declarative RestConnector (HTTP/JSON any-source)#149
Conversation
…or/page/link-header pagination Implements RestConnector, a new @dashframe/connector-rest package that interprets a declarative DataSource.config (endpoint, method, authRef, pagination, rowPath, fieldMap) to pull from an arbitrary HTTP/JSON source. - Four pagination strategies: offset, cursor, page-number, Link-header. Unsupported config patterns are logged as declarative-ceiling gaps for the v0.4 code-plugin tier. - Auth-blind data pipeline: constructor takes a bound SecretResolver (capability-attenuated lease pre-bound to one credential ref); connect() and query() carry no credential arguments — enforced by type. noopResolver provided for public endpoints. - rowPath dot-path extraction + fieldMap key renaming + Arrow IPC output via apache-arrow tableFromArrays/tableToIPC (base64-encoded, JSON-serializable across IPC boundary). - connect() does a live single-page test fetch (real-world side-effect, cannot ride a preview transaction). - Tests cover all four pagination strategies, rowPath/fieldMap mapping, vault-resolution of authRef via bound SecretResolver, and fail-closed resolver-rejection behaviour. - vitest.config.ts adds @wystack/secret-vault src alias (Vite can't resolve the bun export condition; mirrors pattern needed for CI test runs). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…noopResolver) When the secret resolver yields an empty token (noopResolver for public endpoints), do not send Authorization: Bearer — omit the header entirely. Updated test expectation to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 27 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (7)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- cognitive-complexity: extracted per-strategy functions (fetchOffsetPages, fetchPageNumberPages, fetchCursorPages, fetchLinkHeaderPages, resolveNextCursor, parseLinkNext) so fetchAllPages is a thin dispatcher. - slow-regex: replaced Link-header regex with a two-step split+indexOf approach (split on >,:, then indexOf < >) to avoid backtracking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b85058ba8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
SECURITY (credential-path, fail-closed): - SSRF + credential leak: link-header pagination now forwards the vault- resolved Bearer token ONLY to a same-origin rel="next" URL. A cross-origin (or unresolvable) next-link stops pagination — the token is never sent off-origin. Origin is anchored to the immutable config endpoint, so a link chain cannot walk the trusted origin forward. - Fail-closed credentials: #authHeaders throws BEFORE any fetch when authRef is configured but the resolver yields no token — never issues an unauthenticated request as if it were fine. Routes both connect() and query(). - Prototype pollution: __proto__/constructor/prototype response keys are dropped (cannot be real columns; also crash apache-arrow); row + column objects are null-prototype. Correctness: - parseLinkNext: RFC-5988-correct entry split (tracks <...> bracket depth so a comma inside a URL doesn't split the entry; per-entry rel check fixes the prev-before-next bug; tokenized rel match rejects rel=nextpage). No backtracking-prone regex. - Query limit pushdown: pagination stops once offset+limit rows are collected instead of walking the whole remote chain then slicing. - Relative Link headers resolved against the current page URL before fetch. - pageSize coerced from declarative string config before offset arithmetic. - Value coercion aligns with the engine's inference contract (string values via parseStringValueByType: "" → null not 0, yes/no → boolean; native primitives via parsePrimitiveValueByType) so the Arrow schema matches the inferred fields. - fieldMap collisions rejected (config-static two-sources-one-target, and per-row target-onto-existing-key) before any network work. Tests: +17 covering cross-origin token non-forwarding, fail-closed throw-before- fetch, __proto__ non-pollution, limit pushdown, relative/comma-in-URL/prev- before-next/rel=nextpage Link parsing, string pageSize, empty-string→null, yes/no→boolean, and fieldMap collisions. 41 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
@dashframe/connector-rest— a new package implementingRestConnector, a generic declarative HTTP/JSON connector that interpretsDataSource.config(endpoint, method, authRef, pagination, rowPath, fieldMap) to pull from any HTTP/JSON source. Config IS the mod — no executable code in config.authRefresolves server-side through theSecretResolverscoped lease (vault'swithSecret). Vault-absent + authRef-present → throw before any fetch (fail-closed). Plaintext never in config, formData, or any public interface.connect()performs a live single-page test fetch (real-world side-effect — cannot ride a preview transaction).query()returns{ arrowBuffer (base64), fieldIds, fields }— fully serializable over IPC. Renderer materializes the DataFrame from arrowBuffer.apache-arrowtableFromArrays/tableToIPC.Tracked internally as YW-140.
Test plan
bunx vitest runinpackages/connector-rest)next_cursoruntil absentLink: rel="next"header, stops when absent🤖 Generated with Claude Code
Greptile Summary
This PR introduces
@dashframe/connector-rest, a new declarative HTTP/JSON connector package that drives pagination, auth, row extraction, and Arrow serialization entirely from aRestConnectorConfigobject — no executable code in config.authRefis configured.inferStringColumnType, null-prototype row objects to block prototype pollution, and fieldMap collision detection; 24 tests cover all strategies, security edges, and coercion edge-cases.Confidence Score: 3/5
Safe to merge after addressing the cursor
"next"probe bug; the rest of the connector is well-guarded and thoroughly tested.The cursor pagination auto-detection probes
"next"as a fallback field name. On APIs that place the next-page URL in a response bodynextfield (a very common pattern), the connector silently duplicates page-1 rows until the row budget is exhausted — no error is raised, so callers receive wrong data without knowing it.packages/connector-rest/src/connector.ts — specifically the
resolveNextCursorfunction and thefetchPagebody-less implementation for non-GET methods.Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant Client participant RestConnector participant SecretVault as SecretResolver / Vault participant RemoteAPI Client->>RestConnector: query(databaseId, tableId, options?) RestConnector->>RestConnector: assertFieldMapNoCollision(fieldMap) RestConnector->>SecretVault: auth(callback) SecretVault-->>RestConnector: token (throws if authRef set and token empty) RestConnector->>RestConnector: authHeaders(token) → headers map loop fetchAllPages (offset / page-number / cursor / link-header) RestConnector->>RemoteAPI: "fetch(url, { method, headers })" RemoteAPI-->>RestConnector: JSON response RestConnector->>RestConnector: extractRows(data, rowPath) RestConnector->>RestConnector: check budget (sliceOffset + limit) end RestConnector->>RestConnector: filter and applyFieldMap(rows) RestConnector->>RestConnector: slice(sliceOffset, sliceOffset+limit) RestConnector->>RestConnector: inferFieldsFromRows(limitedRows) RestConnector->>RestConnector: coerceValueToType() per cell RestConnector->>RestConnector: tableFromArrays() → tableToIPC() → base64 RestConnector-->>Client: "{ arrowBuffer, fieldIds, fields }"%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant Client participant RestConnector participant SecretVault as SecretResolver / Vault participant RemoteAPI Client->>RestConnector: query(databaseId, tableId, options?) RestConnector->>RestConnector: assertFieldMapNoCollision(fieldMap) RestConnector->>SecretVault: auth(callback) SecretVault-->>RestConnector: token (throws if authRef set and token empty) RestConnector->>RestConnector: authHeaders(token) → headers map loop fetchAllPages (offset / page-number / cursor / link-header) RestConnector->>RemoteAPI: "fetch(url, { method, headers })" RemoteAPI-->>RestConnector: JSON response RestConnector->>RestConnector: extractRows(data, rowPath) RestConnector->>RestConnector: check budget (sliceOffset + limit) end RestConnector->>RestConnector: filter and applyFieldMap(rows) RestConnector->>RestConnector: slice(sliceOffset, sliceOffset+limit) RestConnector->>RestConnector: inferFieldsFromRows(limitedRows) RestConnector->>RestConnector: coerceValueToType() per cell RestConnector->>RestConnector: tableFromArrays() → tableToIPC() → base64 RestConnector-->>Client: "{ arrowBuffer, fieldIds, fields }"Comments Outside Diff (2)
packages/connector-rest/src/connector.ts, line 751-806 (link)The
cursorandlink-headercases have no upper bound on iterations. If an API always returns a non-empty cursor or aLink: rel="next"header (e.g., a bug on the remote end, a circular ref, or a misconfiguredcursorPath),fetchAllPageswill loop indefinitely and thequery()call will never return. Theoffsetandpage-numbercases are partially protected by therows.length === 0check but are similarly unbounded for row accumulation.A simple guard — e.g.,
const MAX_PAGES = 1_000; let pages = 0;at the top offetchAllPageswithif (++pages > MAX_PAGES) { console.warn(...); break; }at the top of each loop — would prevent this without changing normal behaviour.Prompt To Fix With AI
packages/connector-rest/src/connector.ts, line 966-983 (link)QueryOptionsrow limit is appliedfetchAllPagesdownloads every page intorawRowsbeforeoptions?.pagination?.limitslices the result. If the remote API serves thousands of pages and the UI requests only 50 rows, the connector still fetches and materializes the full dataset in memory. For the v0.1 scope this may be acceptable, but the currentoptions.paginationslice gives callers no way to avoid the full download — they cannot tell from the interface that specifying a smalllimitwon't reduce network traffic.Prompt To Fix With AI
Reviews (3): Last reviewed commit: "fix(connector-rest): address security + ..." | Re-trigger Greptile