Skip to content

Fixes 29147: add support for databricks contraints in non-incremental ingestion#29148

Open
mmigdiso wants to merge 4 commits into
open-metadata:mainfrom
mmigdiso:29147_unity_contraints_fix
Open

Fixes 29147: add support for databricks contraints in non-incremental ingestion#29148
mmigdiso wants to merge 4 commits into
open-metadata:mainfrom
mmigdiso:29147_unity_contraints_fix

Conversation

@mmigdiso

@mmigdiso mmigdiso commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #29147

I worked on #29147 29147 because starting from Databricks Runtime 18.2, databricks allows defining non-forced unique contraints. Previously, when this feature was not in place, databricks was allowing foregin keys only for relationships linking to a primary key which was a big limitation. Now that it is possible to defined foreign keys to unique keys, it is possible for unity catalog users to define all column relationships in unity catalog.
Considering that this information is key for AI agents to discover table relationships, it is important to support it not only in incremental ingestion but also for full ingestions.

Type of change:

  • Bug fix

High-level design:

For each schema, the code fetchs the list of the tables that has contraints on it.
When iterating over the tables returned by databricks /tables api, if any table has contraints, the code makes an extra call to the /tables/{table_name} API to fetch full details, which include the table contraints.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Greptile Summary

This PR extends the Unity Catalog ingestion source to fetch table constraint details during non-incremental (full) ingestion, mirroring what already works for incremental ingestion. For each schema, a SQL query against system.information_schema.table_constraints identifies tables with constraints, and a targeted client.tables.get() call then retrieves the full constraint metadata for each of those tables.

  • _get_tables_with_constraints() is a new helper that pre-builds the set of (catalog, schema, table) tuples that need a full fetch; it guards against None context and wraps the SQL call in a broad exception handler.
  • Non-incremental loop in get_tables_name_and_type() is updated to upgrade the lightweight listing object to a fully-detailed object when constraints are known to be present; a status.warning is issued on fetch failure and the table is still processed without constraint data.
  • The new UNITY_CATALOG_TABLE_CONSTRAINTS SQL constant in queries.py follows the existing system.information_schema cross-catalog view pattern used by UNITY_CATALOG_EXTERNAL_TABLES.

Confidence Score: 4/5

The new constraint-fetch path is safe to merge; the worst case on failure is that a table is ingested without constraint metadata — existing data is not corrupted and full ingestion continues.

The debug 'test' artifact from an earlier version has been removed and the SQL query is correct. The exception handler deliberately falls through to _process_table rather than skipping the table, which differs from the incremental path's pattern; whether this is intentional or an oversight is worth confirming before the PR lands, since a silent constraint-fetch failure results in constraint-free ingestion with only a warning log.

ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py — specifically the exception-handling block in get_tables_name_and_type (lines 401–411).

Important Files Changed

Filename Overview
ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py Adds _get_tables_with_constraints() to pre-fetch tables that have constraints and client.tables.get() upgrades inside the non-incremental loop. Minor dead-code issue: the if catalog_name is not None / if schema_name is not None inner checks are unreachable due to the early-return guard above them.
ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py Adds UNITY_CATALOG_TABLE_CONSTRAINTS SQL constant using SELECT DISTINCT on the three identity columns from system.information_schema.table_constraints, consistent with the existing UNITY_CATALOG_EXTERNAL_TABLES pattern.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant G as get_tables_name_and_type
    participant H as _get_tables_with_constraints
    participant DB as sql_connection (system.information_schema)
    participant API as client.tables (Databricks SDK)
    participant P as _process_table

    G->>H: call
    H->>DB: SELECT DISTINCT ... FROM system.information_schema.table_constraints
    DB-->>H: set of (catalog, schema, table) tuples
    H-->>G: tables_with_constraints set

    loop for each table in client.tables.list()
        G->>G: check if table in tables_with_constraints
        alt table has constraints
            G->>API: client.tables.get(table.full_name)
            API-->>G: table with full constraint details
        end
        G->>P: _process_table(table, catalog, schema)
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant G as get_tables_name_and_type
    participant H as _get_tables_with_constraints
    participant DB as sql_connection (system.information_schema)
    participant API as client.tables (Databricks SDK)
    participant P as _process_table

    G->>H: call
    H->>DB: SELECT DISTINCT ... FROM system.information_schema.table_constraints
    DB-->>H: set of (catalog, schema, table) tuples
    H-->>G: tables_with_constraints set

    loop for each table in client.tables.list()
        G->>G: check if table in tables_with_constraints
        alt table has constraints
            G->>API: client.tables.get(table.full_name)
            API-->>G: table with full constraint details
        end
        G->>P: _process_table(table, catalog, schema)
    end
Loading

Reviews (6): Last reviewed commit: "short circuit contraints collection if n..." | Re-trigger Greptile

@mmigdiso mmigdiso requested a review from a team as a code owner June 17, 2026 22:44
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py Outdated
Comment on lines +343 to +357
def _get_tables_with_constraints(self) -> set[tuple[str, str, str]]:
"""
Build and execute SQL query to fetch table constraints.
Handles cases where catalog_name and/or schema_name may be None.
"""
schema_name = self.context.get().database_schema
catalog_name = self.context.get().database

sql = UNITY_CATALOG_TABLE_CONSTRAINTS
params = {}
# Build WHERE clause with proper handling of None values

if catalog_name is not None:
sql += " AND table_catalog = :catalog_name"
params["catalog_name"] = catalog_name

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: No tests added for new constraint-fetch logic

The PR adds a new code path (_get_tables_with_constraints and the conditional full-fetch in get_tables_name_and_type) but includes no unit tests, and the PR checklist for tests is unchecked. Worth adding a test that mocks client.tables.list/get and the constraints query to verify only tables present in the constraints set trigger the extra get() call, and that constraints are propagated into the resulting table request.

Was this helpful? React with 👍 / 👎

@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py Outdated
@mmigdiso mmigdiso force-pushed the 29147_unity_contraints_fix branch from 7bdbf65 to 2e62c1c Compare June 17, 2026 23:33
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +402 to +406
msg = (
f"Unexpected exception in fetching constraints "
f"(table [{table.full_name}]: {exc}. "
f"Contraints will be ignored."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Malformed warning message: unbalanced bracket and typo

The warning string built when client.tables.get() fails has a syntax/readability defect and a typo. The fragment f"(table [{table.full_name}]: {exc}. " opens ( and [ but never closes them, producing output like (table [cat.sch.tbl]: <err>. And "Contraints" should be "Constraints". Since this message is surfaced to users via self.status.warning, clean it up.

Fix the unbalanced brackets and the 'Contraints' typo.:

msg = (
    f"Unexpected exception fetching constraints "
    f"for table [{table.full_name}]: {exc}. "
    f"Constraints will be ignored."
)
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is based on an outdated version of the code. The typo and the paranthesis mentioned in the comment doesn't exist.

@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mmigdiso mmigdiso force-pushed the 29147_unity_contraints_fix branch from 8ad7435 to 0d5a5f0 Compare June 18, 2026 00:16
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot

gitar-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown
Code Review 👍 Approved with suggestions 4 resolved / 6 findings

Adds support for Databricks constraints during non-incremental ingestion by fetching full table details for tables identified via information_schema. Resolves previous issues with exception handling and query efficiency, though unit tests for the new logic should be added.

💡 Quality: No tests added for new constraint-fetch logic

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:343-357 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py:140-146

The PR adds a new code path (_get_tables_with_constraints and the conditional full-fetch in get_tables_name_and_type) but includes no unit tests, and the PR checklist for tests is unchecked. Worth adding a test that mocks client.tables.list/get and the constraints query to verify only tables present in the constraints set trigger the extra get() call, and that constraints are propagated into the resulting table request.

💡 Quality: Malformed warning message: unbalanced bracket and typo

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:402-406

The warning string built when client.tables.get() fails has a syntax/readability defect and a typo. The fragment f"(table [{table.full_name}]: {exc}. " opens ( and [ but never closes them, producing output like (table [cat.sch.tbl]: <err>. And "Contraints" should be "Constraints". Since this message is surfaced to users via self.status.warning, clean it up.

Fix the unbalanced brackets and the 'Contraints' typo.
msg = (
    f"Unexpected exception fetching constraints "
    f"for table [{table.full_name}]: {exc}. "
    f"Constraints will be ignored."
)
✅ 4 resolved
Bug: Unhandled tables.get() exception aborts whole schema ingestion

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:399-402
The new full-ingestion path calls table = self.client.tables.get(table.full_name) directly inside the for table in self.client.tables.list(...) loop, with no error handling. If this extra API call raises (network blip, permission error on a single table, rate limiting, etc.), the exception propagates out of the get_tables_name_and_type generator and aborts iteration over all remaining tables in the schema, silently dropping them from ingestion.

Note the incremental path already guards its tables.get() call with a try/except that records a StackTraceError and continues (see _get_incremental_tables, lines 429-439). The new code should do the same so a single problematic table does not break the whole schema.

Suggested fix: wrap the get() call in try/except, log/record the failure, and fall back to the list() version of the table (or skip it) instead of letting the exception bubble up.

Quality: Obvious comments and unused selected columns reduce clarity

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:353 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:365 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:369 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py:142
Per the repo's self-documenting-code guideline, several added comments restate what the code already says: # Build WHERE clause with proper handling of None values, # Collect unique tables with constraints, and the debug log inside the row loop. Also, the SQL selects constraint_name, constraint_type but the code only consumes table_catalog, table_schema, table_name to build the identifier tuple; selecting just those three (optionally SELECT DISTINCT) is clearer and slightly cheaper. An extra blank line was also introduced at line 263. These are minor cleanups.

Bug: Debug leftover '+"test"' breaks all constraint fetches

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:400
table = self.client.tables.get(table.full_name+"test") appends the literal string test to every table's fully-qualified name before calling the Databricks Tables API. No table named <catalog>.<schema>.<name>test exists, so tables.get() will raise for every table that has constraints. The exception is caught and downgraded to a warning, meaning constraint metadata is silently dropped for ALL tables and the feature this PR adds never works. This is clearly leftover debugging code that must be removed before merge.

Edge Case: Constraint query scans whole metastore when catalog/schema are None

📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:351-364 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py:140-146
_get_tables_with_constraints only appends table_catalog/table_schema filters when the corresponding context value is not None. If either is None the query degrades to SELECT ... FROM system.information_schema.table_constraints WHERE 1=1, scanning every constraint across the entire metastore. On large deployments this can be an expensive, unbounded query. Consider short-circuiting and returning an empty set (or requiring both values) when catalog/schema are not available, since the result set is only meaningful when scoped to the current schema being processed.

🤖 Prompt for agents
Code Review: Adds support for Databricks constraints during non-incremental ingestion by fetching full table details for tables identified via information_schema. Resolves previous issues with exception handling and query efficiency, though unit tests for the new logic should be added.

1. 💡 Quality: No tests added for new constraint-fetch logic
   Files: ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:343-357, ingestion/src/metadata/ingestion/source/database/unitycatalog/queries.py:140-146

   The PR adds a new code path (`_get_tables_with_constraints` and the conditional full-fetch in `get_tables_name_and_type`) but includes no unit tests, and the PR checklist for tests is unchecked. Worth adding a test that mocks `client.tables.list`/`get` and the constraints query to verify only tables present in the constraints set trigger the extra `get()` call, and that constraints are propagated into the resulting table request.

2. 💡 Quality: Malformed warning message: unbalanced bracket and typo
   Files: ingestion/src/metadata/ingestion/source/database/unitycatalog/metadata.py:402-406

   The warning string built when `client.tables.get()` fails has a syntax/readability defect and a typo. The fragment `f"(table [{table.full_name}]: {exc}. "` opens `(` and `[` but never closes them, producing output like `(table [cat.sch.tbl]: <err>.` And "Contraints" should be "Constraints". Since this message is surfaced to users via `self.status.warning`, clean it up.

   Fix (Fix the unbalanced brackets and the 'Contraints' typo.):
   msg = (
       f"Unexpected exception fetching constraints "
       f"for table [{table.full_name}]: {exc}. "
       f"Constraints will be ignored."
   )

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnityCatalog ingestor doesn't ingest constraints in non-incremental mode

1 participant