Skip to content

[Bug] [Postgres-CDC] Fix snapshot table schema lookup#10843

Draft
LeonYoah wants to merge 1 commit intoapache:devfrom
LeonYoah:lyb/fix-pg-cdc-snapshot-tableid
Draft

[Bug] [Postgres-CDC] Fix snapshot table schema lookup#10843
LeonYoah wants to merge 1 commit intoapache:devfrom
LeonYoah:lyb/fix-pg-cdc-snapshot-tableid

Conversation

@LeonYoah
Copy link
Copy Markdown
Collaborator

@LeonYoah LeonYoah commented Apr 30, 2026

Purpose of this pull request

Fix Postgres-CDC snapshot table schema lookup when the PostgreSQL JDBC driver returns a catalog value from DatabaseMetaData.

Starting with pgjdbc 42.7.5, DatabaseMetaData#getTables and getColumns may return the database name in TABLE_CAT. This follows the pgjdbc change described as Fix PgDatabaseMetaData implementation of catalog as param and return value in pgjdbc PR #3390.

With the same metadata test against a Docker PostgreSQL database named traffic:

pgjdbc 42.4.3:
getTables:  TABLE_CAT=null,    TABLE_SCHEM=public, TABLE_NAME=users
getColumns: TABLE_CAT=null,    TABLE_SCHEM=public, TABLE_NAME=users

pgjdbc 42.7.5:
getTables:  TABLE_CAT=traffic, TABLE_SCHEM=public, TABLE_NAME=users
getColumns: TABLE_CAT=traffic, TABLE_SCHEM=public, TABLE_NAME=users

The current Postgres snapshot reader always looks up the Debezium table schema with a catalog-less TableId:

new TableId(null, tableId.schema(), tableId.table())

That works with the old metadata shape, but when Debezium stores the table schema as database.schema.table, databaseSchema.tableFor(...) returns null. The next snapshot read path calls table.id() and fails with a NullPointerException in PostgresSnapshotSplitReadTask#createDataEventsForTable.

This patch keeps the old catalog-less lookup first for existing environments, then falls back to a catalog-aware TableId using the SnapshotSplit catalog or the configured database name. The generated PostgreSQL snapshot SQL is not changed because PostgresUtils#quoteSchemaAndTable still uses only schema and table names.

Does this PR introduce any user-facing change?

Yes. Postgres-CDC snapshot startup can now resolve the table schema when pgjdbc returns the database name as TABLE_CAT, avoiding a snapshot read NullPointerException for tables such as traffic.public.users.

How was this patch tested?

Added a focused unit test for the snapshot schema lookup fallback:

JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \
PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \
./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres \
  -Dskip.spotless=true -Dtest=PostgresSnapshotSplitReadTaskTest test

Result: BUILD SUCCESS, Tests run: 2, Failures: 0, Errors: 0, Skipped: 0.

JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \
PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \
./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres \
  -DskipTests -Dskip.spotless=true compile

Result: BUILD SUCCESS.

JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \
PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \
./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres \
  spotless:check -DskipTests

Result: BUILD SUCCESS.

I also tried compiling the module with upstream dependencies:

JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \
PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \
./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres -am \
  -DskipTests -Dskip.spotless=true compile

This failed before reaching the Postgres-CDC module, in seatunnel-config-shade, because the shaded org.apache.seatunnel.shade.com.typesafe.config.* classes were not found in this local checkout.

Check list

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LeonYoah, I pulled the latest head locally as seatunnel-review-10843 at 16baf2609254 and reviewed the full diff against upstream/dev.

Local review basis:

  • git fetch upstream pull/10843/head:seatunnel-review-10843: passed
  • git log --oneline upstream/dev..seatunnel-review-10843: shows two commits, including e0c09508b [Feature][CLI] Add SeaTunnel CLI for natural language config generation (#10789)
  • git diff --stat upstream/dev...seatunnel-review-10843: 37 files, +9647 / -3
  • git diff --check upstream/dev...seatunnel-review-10843: passed
  • gh pr view 10843 --json mergeable,mergeStateStatus,statusCheckRollup: MERGEABLE / BLOCKED, and Build is still in progress
  • I did not run local Maven/tests in this pass; this is a source-level review plus current GitHub metadata

What This PR Fixes

  • User pain point: the Postgres CDC snapshot path needs a Table from databaseSchema before it can build the split scan query. The old code always converted the split table id into TableId(null, schema, table), so a catalog-bearing split id could fail schema lookup.
  • Fix approach: replace the direct no-catalog lookup with resolveTable(tableId), which first tries the catalog-less id and then falls back to a catalog-aware id using tableId.catalog() or connectorConfig.databaseName().
  • One-line summary: the lookup direction is reasonable, but the current PR head is not mergeable as a Postgres bugfix because it also carries a large unrelated CLI / metadata-export / dist feature set.

Runtime Chain Reviewed

Postgres CDC snapshot
  -> PostgresSnapshotSplitReadTask.execute() [92-113]
      -> doExecute() [116-149]
          -> createDataEvents(ctx, snapshotSplit.getTableId()) [135-136]
              -> resolveTable(tableId) [173-201]
                  -> try TableId(null, schema, table)
                  -> then try TableId(catalog-or-databaseName, schema, table)
              -> createDataEventsForTable(...) [203-220]
                  -> PostgresUtils.buildSplitScanQuery(table, ...)
                  -> executeQuery()
                  -> dispatcher.dispatchSnapshotEvent(...)

Findings

Issue 1: this PR is not a clean Postgres-CDC bugfix; it also pulls in a large unrelated CLI / metadata-export / dist feature set

  • Location: seatunnel-cli/pyproject.toml:17, seatunnel-core/seatunnel-starter/src/main/java/org/apache/seatunnel/core/starter/seatunnel/command/MetadataExportCommand.java:1
  • Severity: High

The runtime fix itself is a small Postgres CDC change in PostgresSnapshotSplitReadTask, but the current head also includes:

  • seatunnel-cli/**
  • seatunnel-core/.../MetadataExportCommand.java
  • seatunnel-dist/**

The local commit list confirms why: upstream/dev..seatunnel-review-10843 contains another commit, e0c09508b [Feature][CLI] Add SeaTunnel CLI for natural language config generation (#10789), in addition to the Postgres fix commit.

So merging this PR would not mean “merge a Postgres snapshot lookup fix”; it would also merge a separate starter / packaging / Python CLI feature set under the wrong review scope.

Recommended fix:

  • Option A (recommended): rebase or cherry-pick the Postgres CDC fix onto a clean dev head and reopen/update the PR with only the connector change.
  • Option B: if the CLI / metadata-export work is intentional, split it into its own PR and let each topic go through its own focused review.

Issue 2: the PR still does not add a regression test that proves the catalog-aware snapshot lookup path is fixed

  • Location: seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/postgres/source/reader/snapshot/PostgresSnapshotSplitReadTask.java:164-201
  • Severity: Medium

The fix lives exactly on the snapshot runtime path between snapshotSplit.getTableId() and databaseSchema.tableFor(...), but the PR adds no focused test for a catalog-bearing TableId. The fallback logic looks directionally correct, but the exact failure mode this PR claims to fix is still not mechanically proven.

Recommended fix:

  • Add a focused regression test around resolveTable() / snapshot task setup with a catalog-bearing TableId, where the catalog-less lookup fails and the catalog-aware fallback succeeds.

Issue 3: the latest head still does not have a completed green Build

  • Location: GitHub checks / Build
  • Severity: Low

At review time, head 16baf2609254e18fb7254cd064e0578eb03d54fd still has Build in progress. Once the PR scope is cleaned up, I still want one green Build on the clean head before merge.

Merge Conclusion

Conclusion: merge after fixes

  1. Blocking items
  • Issue 1: split out the unrelated CLI / metadata-export / dist changes and keep this PR as a clean Postgres CDC fix.
  • Issue 2: add a focused regression test for the catalog-aware snapshot lookup path.
  • Issue 3: wait for a green Build on the cleaned-up latest head.
  1. Suggested non-blocking follow-up
  • No extra non-blocking code comment from this pass; the main concerns are PR scope and proof of the fix.

Overall assessment:
The resolveTable() direction itself makes sense, and I do think it is closer to the real problem than the old hardcoded no-catalog lookup. The blocker is that the current PR head is carrying a lot more than that one fix, so I would first reduce it back to a clean connector-only change and then recheck the focused regression coverage.

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I reviewed the current head locally against upstream/dev. I did not run Maven locally in this pass; this is a source-level review plus the current GitHub check metadata.

What This PR Fixes

  • User pain: the Postgres snapshot split reader currently looks up the table schema with TableId(null, schema, table) before it builds the real snapshot query. If the in-memory schema cache only keeps the catalog/database-qualified TableId, the normal snapshot-read path gets null here and then fails later on the real export flow.
  • Fix approach: move the lookup into resolveTable(...), try the catalog-less TableId first, then fall back to tableId.catalog() or connectorConfig.databaseName(), and fail with a clear exception if neither lookup works.
  • One-line value: this puts the schema lookup fix directly on the real snapshot-read path.

Runtime Chain I Checked

Postgres snapshot split
  -> PostgresSnapshotSplitReadTask.execute()
      -> doExecute()
          -> low watermark
          -> createDataEvents(snapshotContext, snapshotSplit.getTableId())
              -> resolveTable(tableId)
                  -> try TableId(null, schema, table)
                  -> if missing, try TableId(catalog-or-databaseName, schema, table)
              -> createDataEventsForTable(..., table)
                  -> buildSplitScanQuery(table, ...)
                  -> execute snapshot split SQL
          -> high watermark

Findings

  • createDataEvents(...) is the real normal-path handoff from watermark handling into snapshot table export, so this fix is placed on the correct execution path.
  • The old failure mode is real: if databaseSchema only contains the catalog-qualified key, the old databaseSchema.tableFor(new TableId(null, ...)) lookup returns null.
  • The new fallback logic is directionally correct, and the new exception message is much better than letting the code fail later through a null table object.
  • I do still see one gap before merge: there is no targeted regression test here for the catalog-qualified snapshot lookup case.

Merge Decision

Conclusion: merge after fixes

  1. Blocking items
  • Please add a focused regression test for this lookup behavior. This patch fixes a subtle but real normal-path snapshot bug, and without a test it is too easy to regress the TableId fallback logic later.
  • The current Build check is still failing, so merge should also wait until CI is back to green and the failure is confirmed.
  1. Suggested non-blocking follow-up
  • A short code comment near resolveTable(...) explaining why the catalog fallback is needed would make the maintenance intent easier to preserve.

Overall, the fix direction is right and it lands on the actual snapshot export path, but I would like to see the regression locked down with one focused test before merge.

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I re-reviewed the latest head locally against upstream/dev, focusing on the Postgres CDC snapshot read path that resolves the Debezium table schema before building the split query.

Runtime path I checked:

Postgres snapshot split
  -> PostgresSnapshotSplitReadTask.execute()
      -> createDataEvents(snapshotContext, snapshotSplit.getTableId())
          -> resolveTable(tableId)
              -> try TableId(null, schema, table)
              -> fallback to TableId(catalog-or-databaseName, schema, table)
          -> createDataEventsForTable(..., table)
              -> buildSplitScanQuery(...)
              -> read snapshot split SQL

Key findings:

  • The fix is on the real snapshot path, not on a dead helper path.
  • The old bug was real: if databaseSchema only retained the catalog-qualified TableId, the previous TableId(null, schema, table) lookup could return null and fail the main snapshot flow.
  • The current fallback direction is correct, and the failure mode is now clearer than an implicit downstream NullPointerException.

Issue 1: the current head still needs a focused regression test for the catalog-qualified lookup path

  • Location: seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/postgres/source/reader/snapshot/PostgresSnapshotSplitReadTask.java:164-201
  • Why this matters:
    This fix sits on a very specific lookup boundary that is easy to regress later. I do not see a minimal regression test that fixes the contract: when the schema cache only has the catalog-qualified TableId, the snapshot path should still resolve the table correctly.
  • Severity: Medium
  • Recommended fix:
    Please add a focused regression test around resolveTable(...) or the surrounding snapshot path so this lookup behavior is locked down.

Issue 2: the current Build check is still red

  • Location: GitHub Build
  • Severity: Low
  • Recommended fix:
    Please rerun or fix the current CI failures before merge.

Conclusion: merge after fixes

  1. Blocking items
  • Issue 1: add a minimal regression test for the catalog-qualified snapshot lookup path.
  • Issue 2: get the latest Build back to green.
  1. Suggested non-blocking follow-ups
  • A short comment above the fallback path would also help future readers understand why the catalog retry is needed.

Overall, the fix direction is right and it does hit the real snapshot read path, but I would like to see the lookup contract pinned down with a regression test before merge.

@LeonYoah LeonYoah force-pushed the lyb/fix-pg-cdc-snapshot-tableid branch from 16baf26 to 5280949 Compare May 7, 2026 03:07
Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I re-reviewed the latest head from scratch against upstream/dev, focusing on the real Postgres snapshot path that resolves the Debezium table schema before building the split scan query.

What this PR fixes

  • User pain: the snapshot reader needs a Table from databaseSchema before it can build the real split query. The old code forced the lookup to TableId(null, schema, table), so a cache that only kept the catalog-qualified key could miss on the normal snapshot path.
  • Fix approach: the current head centralizes the lookup in resolveTable(...), first tries the catalog-less id, then falls back to tableId.catalog() or connectorConfig.databaseName(), and now includes focused regression tests for both fallback branches.
  • One-line summary: this puts the schema lookup fix on the real snapshot execution path and finally locks the catalog fallback contract down with tests.

Runtime chain I checked

Postgres snapshot split execution
  -> PostgresSnapshotSplitReadTask.execute() [L116-L149]
      -> createDataEvents(snapshotContext, snapshotSplit.getTableId()) [L135-L136]
          -> resolveTable(tableId) [L173-L200]
              -> first try TableId(null, schema, table)
              -> on miss, retry TableId(catalog-or-databaseName, schema, table)
          -> createDataEventsForTable(..., table) [L203-L235]
              -> PostgresUtils.buildSplitScanQuery(table, ...)
              -> executeQuery()
              -> dispatcher.dispatchSnapshotEvent(...)

Key findings

  • The normal snapshot path definitely hits this change. This is not a dead helper path.
  • The original bug is real: if databaseSchema only retains the catalog-qualified key, the old TableId(null, ...) lookup can miss on the main path.
  • The current head addresses the right boundary and the two new tests now cover the exact fallback behavior Daniel asked for in the previous round.
  • I do not see a remaining code-side blocker on the current implementation.

Local verification I actually ran

  • git fetch upstream pull/10843/head:seatunnel-review-10843 --force: passed.
  • git diff --stat upstream/dev...seatunnel-review-10843: passed; current scope is now only 2 files, +135/-3.
  • git log --oneline upstream/dev..seatunnel-review-10843: passed; the unrelated CLI / dist commit from the earlier head is gone.
  • nl -ba PostgresSnapshotSplitReadTask.java / PostgresSnapshotSplitReadTaskTest.java: passed; re-checked the lookup path and the added regression tests.
  • gh pr checks 10843 -R apache/seatunnel: inspected; Build is still pending on the latest head.
  • Local Maven compile / tests: not run in this review round. This is a source-level review.

Issue 1: the latest Build gate is still pending

  • Location: GitHub checks / Build
  • Why it matters: I do not see a code blocker anymore, but I still do not want to merge a CDC-path change before the latest head has a completed green top-level gate.
  • Suggested fix: wait for the current Build result and only merge once the latest head is green.
  • Severity: low

Merge conclusion

Conclusion: can merge after fixes

  1. Blocking items
  • Issue 1: wait for the latest Build check to finish green.
  1. Suggested non-blocking follow-ups
  • None from my side on the current code path.

Overall assessment

  • Compared with the earlier Daniel-reviewed head, this version has already cleared the two real blockers: the PR scope is clean again, and the missing catalog-aware regression coverage is now in place.
  • I do not see a smaller or more accurate alternative than the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants