Fixes 29147: add support for databricks contraints in non-incremental ingestion#29148
Fixes 29147: add support for databricks contraints in non-incremental ingestion#29148mmigdiso wants to merge 4 commits into
Conversation
❌ PR checklist incompleteThis PR cannot be merged until the following are addressed on its linked issue:
The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically. Maintainers can bypass this check by adding the |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| def _get_tables_with_constraints(self) -> set[tuple[str, str, str]]: | ||
| """ | ||
| Build and execute SQL query to fetch table constraints. | ||
| Handles cases where catalog_name and/or schema_name may be None. | ||
| """ | ||
| schema_name = self.context.get().database_schema | ||
| catalog_name = self.context.get().database | ||
|
|
||
| sql = UNITY_CATALOG_TABLE_CONSTRAINTS | ||
| params = {} | ||
| # Build WHERE clause with proper handling of None values | ||
|
|
||
| if catalog_name is not None: | ||
| sql += " AND table_catalog = :catalog_name" | ||
| params["catalog_name"] = catalog_name |
There was a problem hiding this comment.
💡 Quality: No tests added for new constraint-fetch logic
The PR adds a new code path (_get_tables_with_constraints and the conditional full-fetch in get_tables_name_and_type) but includes no unit tests, and the PR checklist for tests is unchecked. Worth adding a test that mocks client.tables.list/get and the constraints query to verify only tables present in the constraints set trigger the extra get() call, and that constraints are propagated into the resulting table request.
Was this helpful? React with 👍 / 👎
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
7bdbf65 to
2e62c1c
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| msg = ( | ||
| f"Unexpected exception in fetching constraints " | ||
| f"(table [{table.full_name}]: {exc}. " | ||
| f"Contraints will be ignored." | ||
| ) |
There was a problem hiding this comment.
💡 Quality: Malformed warning message: unbalanced bracket and typo
The warning string built when client.tables.get() fails has a syntax/readability defect and a typo. The fragment f"(table [{table.full_name}]: {exc}. " opens ( and [ but never closes them, producing output like (table [cat.sch.tbl]: <err>. And "Contraints" should be "Constraints". Since this message is surfaced to users via self.status.warning, clean it up.
Fix the unbalanced brackets and the 'Contraints' typo.:
msg = (
f"Unexpected exception fetching constraints "
f"for table [{table.full_name}]: {exc}. "
f"Constraints will be ignored."
)
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
There was a problem hiding this comment.
This comment is based on an outdated version of the code. The typo and the paranthesis mentioned in the comment doesn't exist.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
8ad7435 to
0d5a5f0
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review 👍 Approved with suggestions 4 resolved / 6 findingsAdds support for Databricks constraints during non-incremental ingestion by fetching full table details for tables identified via information_schema. Resolves previous issues with exception handling and query efficiency, though unit tests for the new logic should be added. 💡 Quality: No tests added for new constraint-fetch logic📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:343-357 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py:140-146 The PR adds a new code path ( 💡 Quality: Malformed warning message: unbalanced bracket and typo📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:402-406 The warning string built when Fix the unbalanced brackets and the 'Contraints' typo.✅ 4 resolved✅ Bug: Unhandled tables.get() exception aborts whole schema ingestion
✅ Quality: Obvious comments and unused selected columns reduce clarity
✅ Bug: Debug leftover '+"test"' breaks all constraint fetches
✅ Edge Case: Constraint query scans whole metastore when catalog/schema are None
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Describe your changes:
Fixes #29147
I worked on #29147 29147 because starting from Databricks Runtime 18.2, databricks allows defining non-forced unique contraints. Previously, when this feature was not in place, databricks was allowing foregin keys only for relationships linking to a primary key which was a big limitation. Now that it is possible to defined foreign keys to unique keys, it is possible for unity catalog users to define all column relationships in unity catalog.
Considering that this information is key for AI agents to discover table relationships, it is important to support it not only in incremental ingestion but also for full ingestions.
Type of change:
High-level design:
For each schema, the code fetchs the list of the tables that has contraints on it.
When iterating over the tables returned by databricks /tables api, if any table has contraints, the code makes an extra call to the /tables/{table_name} API to fetch full details, which include the table contraints.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Greptile Summary
This PR extends the Unity Catalog ingestion source to fetch table constraint details during non-incremental (full) ingestion, mirroring what already works for incremental ingestion. For each schema, a SQL query against
system.information_schema.table_constraintsidentifies tables with constraints, and a targetedclient.tables.get()call then retrieves the full constraint metadata for each of those tables._get_tables_with_constraints()is a new helper that pre-builds the set of(catalog, schema, table)tuples that need a full fetch; it guards againstNonecontext and wraps the SQL call in a broad exception handler.get_tables_name_and_type()is updated to upgrade the lightweight listing object to a fully-detailed object when constraints are known to be present; astatus.warningis issued on fetch failure and the table is still processed without constraint data.UNITY_CATALOG_TABLE_CONSTRAINTSSQL constant inqueries.pyfollows the existingsystem.information_schemacross-catalog view pattern used byUNITY_CATALOG_EXTERNAL_TABLES.Confidence Score: 4/5
The new constraint-fetch path is safe to merge; the worst case on failure is that a table is ingested without constraint metadata — existing data is not corrupted and full ingestion continues.
The debug 'test' artifact from an earlier version has been removed and the SQL query is correct. The exception handler deliberately falls through to _process_table rather than skipping the table, which differs from the incremental path's pattern; whether this is intentional or an oversight is worth confirming before the PR lands, since a silent constraint-fetch failure results in constraint-free ingestion with only a warning log.
ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py — specifically the exception-handling block in get_tables_name_and_type (lines 401–411).
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant G as get_tables_name_and_type participant H as _get_tables_with_constraints participant DB as sql_connection (system.information_schema) participant API as client.tables (Databricks SDK) participant P as _process_table G->>H: call H->>DB: SELECT DISTINCT ... FROM system.information_schema.table_constraints DB-->>H: set of (catalog, schema, table) tuples H-->>G: tables_with_constraints set loop for each table in client.tables.list() G->>G: check if table in tables_with_constraints alt table has constraints G->>API: client.tables.get(table.full_name) API-->>G: table with full constraint details end G->>P: _process_table(table, catalog, schema) end%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant G as get_tables_name_and_type participant H as _get_tables_with_constraints participant DB as sql_connection (system.information_schema) participant API as client.tables (Databricks SDK) participant P as _process_table G->>H: call H->>DB: SELECT DISTINCT ... FROM system.information_schema.table_constraints DB-->>H: set of (catalog, schema, table) tuples H-->>G: tables_with_constraints set loop for each table in client.tables.list() G->>G: check if table in tables_with_constraints alt table has constraints G->>API: client.tables.get(table.full_name) API-->>G: table with full constraint details end G->>P: _process_table(table, catalog, schema) endReviews (6): Last reviewed commit: "short circuit contraints collection if n..." | Re-trigger Greptile