feat(data-drains): add GCS, Azure Blob, BigQuery, Snowflake, and Datadog destinations#4552
feat(data-drains): add GCS, Azure Blob, BigQuery, Snowflake, and Datadog destinations#4552waleedlatif1 wants to merge 10 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Adds new destination implementations with provider-specific auth and delivery semantics (object-store uploads, BigQuery Updates the enterprise settings UI to configure these destinations (new form specs, destination labels/icons) and expands enterprise docs to describe the new destination options and their configuration. Reviewed by Cursor Bugbot for commit 98ca6e5. Configure here. |
Greptile SummaryThis PR adds five new data-drain destinations — GCS, Azure Blob Storage, BigQuery, Snowflake, and Datadog — with a shared
Confidence Score: 5/5Safe to merge — all destination implementations are additive, the DB migration is a non-destructive enum extension, and the previous round of reviewer feedback has been addressed. The remaining findings are non-blocking: the Snowflake retry loop skips jitter, the BigQuery 401 path leaves an unconsumed response body, and both the GCS and Snowflake API contract schemas omit the regex validators present in the runtime schemas — all producing suboptimal-but-recoverable behaviour rather than data loss or incorrect delivery. The API contract file ( Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Drain Run] --> B[NDJSON Body]
B --> C{Destination Type}
C -->|s3 / gcs / azure_blob| D[Object Store]
C -->|bigquery| E[BigQuery tabledata.insertAll]
C -->|snowflake| F[Snowflake SQL API v2]
C -->|datadog| G[Datadog v2 Logs]
C -->|webhook| H[HTTPS Webhook]
D -->|GCS| D1[google-auth-library JWT + fetchWithRetry]
D -->|Azure| D2[azure/storage-blob SDK]
D -->|S3| D3[AWS SDK PutObjectCommand]
E --> E1[google-auth-library JWT, 401 token refresh, sleepUntilAborted on 5xx/429]
F --> F1[jose SignJWT RS256, 202 poll loop, VARIANT 16MB guard]
G --> G1[gzipSync, 5MB uncompressed + 6MB wire guards, all sites]
D1 & D2 & D3 --> OBJ[date-partitioned NDJSON object key]
E1 & F1 & G1 & H --> DONE[DeliveryResult with locator]
OBJ --> DONE
Reviews (9): Last reviewed commit: "improvement(data-drains): consolidate ba..." | Re-trigger Greptile |
|
@greptile |
|
@cursor review |
|
@greptile |
|
@cursor review |
|
@greptile |
|
@cursor review |
|
@greptile |
|
@cursor review |
…tKey contract - BigQuery insertAll now wraps the fetch in try/catch inside the retry loop so DNS failures, socket resets, and timeouts are retried with backoff instead of propagating immediately. - Align azureBlobCredentialsBodySchema with the runtime schema (min 64 / max 120 / base64 regex) so obviously invalid keys are rejected at the API boundary rather than at drain-run time.
|
@greptile |
|
@cursor review |
…JSON line context - Extract a single parseRetryAfter helper (capped at 30s, returns number | null) into lib/data-drains/destinations/utils.ts and remove the five local copies in bigquery, datadog, gcs, snowflake, and webhook. - Datadog parseNdjson now wraps JSON.parse in try/catch and surfaces the failing line index, matching BigQuery's parser.
|
@greptile |
|
@cursor review |
- Datadog payload guard now checks the uncompressed size against the 5 MB limit and the wire size against the 6 MB compressed limit, so gzip cannot smuggle an oversized body past the client-side check. - Snowflake VARIANT limit is 16 MiB (16,777,216 bytes), not 16,000,000 bytes — small payloads between 16 MB and 16 MiB were being rejected unnecessarily. - Drop the unused apiKey field on Datadog PostInput; the key is already embedded in the prepared request headers.
|
@greptile |
|
@cursor review |
…tils Datadog, GCS, and webhook each had byte-identical backoff helpers (BASE 500ms, MAX 30s, jitter ±20%, Retry-After floor). Lift the helper into lib/data-drains/destinations/utils.ts alongside parseRetryAfter and sleepUntilAborted, and drop the per-file copies and their BASE_BACKOFF_MS/MAX_BACKOFF_MS constants.
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 98ca6e5. Configure here.
Summary
test()+openSession()/deliver()with provider-spec-correct auth, retry, byte-accurate size guards, and abort-signal forwardingtabledata.insertAllwith drainId-prefixed insertId dedup, partial-failure surfacing, 401 token refresh + 5xx/429 retryap2goog/google)@azure/storage-blobSDK with sovereign-cloudendpointSuffixsupportenterprise/data-drains.mdxType of Change
Testing
Tested manually. New unit tests cover schema validation, retry paths, byte-accurate size guards, gzip, sovereign-cloud routing, and partial-failure handling per destination.
Checklist