-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: capture managed-materialize output schema as asset metadata (#2a) #9812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
69ae09f
996b6d9
356aff3
b985650
2938cc2
6e57128
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| DROP TABLE IF EXISTS materialized_asset_schema; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| -- Captured output schema of a managed `// materialize` asset (gap #2a). | ||
| -- After a managed materialize commits, the worker DESCRIBEs the written table | ||
| -- and records its column list here as asset-level metadata. Schema is a | ||
| -- property of the asset/table, not of a partition slice, so it lives in its own | ||
| -- table keyed by (workspace, asset_kind, asset_path) rather than as a column on | ||
| -- materialized_partition (which would duplicate the identical schema across | ||
| -- every partition row). This is the producer-side capture that #2b (save-time | ||
| -- consumer-ref contract enforcement) reads back. | ||
| -- | ||
| -- Versioning across re-materializations: a new `version` row is inserted only | ||
| -- when the captured column set differs from the latest stored version for the | ||
| -- asset; an unchanged re-materialize re-affirms the latest row in place. So the | ||
| -- table is a compact schema-evolution history and MAX(version) is the current | ||
| -- contract. | ||
| CREATE TABLE IF NOT EXISTS materialized_asset_schema ( | ||
| workspace_id VARCHAR(50) NOT NULL REFERENCES workspace(id) ON DELETE CASCADE ON UPDATE CASCADE, | ||
| asset_kind ASSET_KIND NOT NULL, | ||
| asset_path VARCHAR(255) NOT NULL, | ||
| -- Monotonic per (workspace, asset_kind, asset_path), starting at 1; only | ||
| -- bumped when the schema actually changes. | ||
| version BIGINT NOT NULL, | ||
| -- The captured columns, ordered as the table presents them: | ||
| -- [{"name": "...", "type": "..."}, ...]. | ||
| columns JSONB NOT NULL, | ||
| -- DuckLake snapshot the schema was captured from (NULL for non-ducklake / | ||
| -- substrates without snapshots). | ||
| snapshot_id BIGINT, | ||
| job_id UUID, | ||
| captured_at TIMESTAMPTZ NOT NULL DEFAULT now(), | ||
| PRIMARY KEY (workspace_id, asset_kind, asset_path, version) | ||
| ); | ||
|
|
||
| -- Default privileges (migration 20250205131523) only apply to objects created | ||
| -- by the role that set them, so grant explicitly — the API reads/writes this | ||
| -- table as the invoking role (same fix as script_trigger in 20260619112847). | ||
| GRANT ALL ON materialized_asset_schema TO windmill_user; | ||
| GRANT ALL ON materialized_asset_schema TO windmill_admin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 — for partitioned assets, the captured
output_schemaincludes the synthetic_wm_partitioncolumn.This
DESCRIBEs the target table ({target_qualified}), not the user's output SELECT. For a// partitionedmaterialize, the managed table is bootstrapped with the extra partition column:So
DESCRIBE SELECT * FROM {target_qualified}returns the producer's columns plus{"name":"_wm_partition","type":"VARCHAR"}, and every partitioned asset's recorded schema carries that windmill-internal column. Since this is exactly the contract #2b enforcement will read back (MAX(version)= current contract), a consumer that declares only the real producer columns would mismatch on the leaked_wm_partition.Two options:
DESCRIBEthe user's output (DESCRIBE SELECT * FROM ({user_select})) instead of the physical table, or filter the partition column out of the captured list. The unpartitioned path is unaffected. Worth confirming this is intentional before it sets the grain #2b depends on.Fix this →