[Bug] [Postgres-CDC] Fix snapshot table schema lookup#10843
[Bug] [Postgres-CDC] Fix snapshot table schema lookup#10843LeonYoah wants to merge 1 commit intoapache:devfrom
Conversation
DanielLeens
left a comment
There was a problem hiding this comment.
Hi @LeonYoah, I pulled the latest head locally as seatunnel-review-10843 at 16baf2609254 and reviewed the full diff against upstream/dev.
Local review basis:
git fetch upstream pull/10843/head:seatunnel-review-10843: passedgit log --oneline upstream/dev..seatunnel-review-10843: shows two commits, includinge0c09508b [Feature][CLI] Add SeaTunnel CLI for natural language config generation (#10789)git diff --stat upstream/dev...seatunnel-review-10843: 37 files,+9647 / -3git diff --check upstream/dev...seatunnel-review-10843: passedgh pr view 10843 --json mergeable,mergeStateStatus,statusCheckRollup:MERGEABLE / BLOCKED, andBuildis still in progress- I did not run local Maven/tests in this pass; this is a source-level review plus current GitHub metadata
What This PR Fixes
- User pain point: the Postgres CDC snapshot path needs a
TablefromdatabaseSchemabefore it can build the split scan query. The old code always converted the split table id intoTableId(null, schema, table), so a catalog-bearing split id could fail schema lookup. - Fix approach: replace the direct no-catalog lookup with
resolveTable(tableId), which first tries the catalog-less id and then falls back to a catalog-aware id usingtableId.catalog()orconnectorConfig.databaseName(). - One-line summary: the lookup direction is reasonable, but the current PR head is not mergeable as a Postgres bugfix because it also carries a large unrelated CLI / metadata-export / dist feature set.
Runtime Chain Reviewed
Postgres CDC snapshot
-> PostgresSnapshotSplitReadTask.execute() [92-113]
-> doExecute() [116-149]
-> createDataEvents(ctx, snapshotSplit.getTableId()) [135-136]
-> resolveTable(tableId) [173-201]
-> try TableId(null, schema, table)
-> then try TableId(catalog-or-databaseName, schema, table)
-> createDataEventsForTable(...) [203-220]
-> PostgresUtils.buildSplitScanQuery(table, ...)
-> executeQuery()
-> dispatcher.dispatchSnapshotEvent(...)
Findings
Issue 1: this PR is not a clean Postgres-CDC bugfix; it also pulls in a large unrelated CLI / metadata-export / dist feature set
- Location:
seatunnel-cli/pyproject.toml:17,seatunnel-core/seatunnel-starter/src/main/java/org/apache/seatunnel/core/starter/seatunnel/command/MetadataExportCommand.java:1 - Severity: High
The runtime fix itself is a small Postgres CDC change in PostgresSnapshotSplitReadTask, but the current head also includes:
seatunnel-cli/**seatunnel-core/.../MetadataExportCommand.javaseatunnel-dist/**
The local commit list confirms why: upstream/dev..seatunnel-review-10843 contains another commit, e0c09508b [Feature][CLI] Add SeaTunnel CLI for natural language config generation (#10789), in addition to the Postgres fix commit.
So merging this PR would not mean “merge a Postgres snapshot lookup fix”; it would also merge a separate starter / packaging / Python CLI feature set under the wrong review scope.
Recommended fix:
- Option A (recommended): rebase or cherry-pick the Postgres CDC fix onto a clean
devhead and reopen/update the PR with only the connector change. - Option B: if the CLI / metadata-export work is intentional, split it into its own PR and let each topic go through its own focused review.
Issue 2: the PR still does not add a regression test that proves the catalog-aware snapshot lookup path is fixed
- Location:
seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/postgres/source/reader/snapshot/PostgresSnapshotSplitReadTask.java:164-201 - Severity: Medium
The fix lives exactly on the snapshot runtime path between snapshotSplit.getTableId() and databaseSchema.tableFor(...), but the PR adds no focused test for a catalog-bearing TableId. The fallback logic looks directionally correct, but the exact failure mode this PR claims to fix is still not mechanically proven.
Recommended fix:
- Add a focused regression test around
resolveTable()/ snapshot task setup with a catalog-bearingTableId, where the catalog-less lookup fails and the catalog-aware fallback succeeds.
Issue 3: the latest head still does not have a completed green Build
- Location: GitHub checks /
Build - Severity: Low
At review time, head 16baf2609254e18fb7254cd064e0578eb03d54fd still has Build in progress. Once the PR scope is cleaned up, I still want one green Build on the clean head before merge.
Merge Conclusion
Conclusion: merge after fixes
- Blocking items
- Issue 1: split out the unrelated CLI / metadata-export / dist changes and keep this PR as a clean Postgres CDC fix.
- Issue 2: add a focused regression test for the catalog-aware snapshot lookup path.
- Issue 3: wait for a green Build on the cleaned-up latest head.
- Suggested non-blocking follow-up
- No extra non-blocking code comment from this pass; the main concerns are PR scope and proof of the fix.
Overall assessment:
The resolveTable() direction itself makes sense, and I do think it is closer to the real problem than the old hardcoded no-catalog lookup. The blocker is that the current PR head is carrying a lot more than that one fix, so I would first reduce it back to a clean connector-only change and then recheck the focused regression coverage.
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I reviewed the current head locally against upstream/dev. I did not run Maven locally in this pass; this is a source-level review plus the current GitHub check metadata.
What This PR Fixes
- User pain: the Postgres snapshot split reader currently looks up the table schema with
TableId(null, schema, table)before it builds the real snapshot query. If the in-memory schema cache only keeps the catalog/database-qualifiedTableId, the normal snapshot-read path getsnullhere and then fails later on the real export flow. - Fix approach: move the lookup into
resolveTable(...), try the catalog-lessTableIdfirst, then fall back totableId.catalog()orconnectorConfig.databaseName(), and fail with a clear exception if neither lookup works. - One-line value: this puts the schema lookup fix directly on the real snapshot-read path.
Runtime Chain I Checked
Postgres snapshot split
-> PostgresSnapshotSplitReadTask.execute()
-> doExecute()
-> low watermark
-> createDataEvents(snapshotContext, snapshotSplit.getTableId())
-> resolveTable(tableId)
-> try TableId(null, schema, table)
-> if missing, try TableId(catalog-or-databaseName, schema, table)
-> createDataEventsForTable(..., table)
-> buildSplitScanQuery(table, ...)
-> execute snapshot split SQL
-> high watermark
Findings
createDataEvents(...)is the real normal-path handoff from watermark handling into snapshot table export, so this fix is placed on the correct execution path.- The old failure mode is real: if
databaseSchemaonly contains the catalog-qualified key, the olddatabaseSchema.tableFor(new TableId(null, ...))lookup returnsnull. - The new fallback logic is directionally correct, and the new exception message is much better than letting the code fail later through a null table object.
- I do still see one gap before merge: there is no targeted regression test here for the catalog-qualified snapshot lookup case.
Merge Decision
Conclusion: merge after fixes
- Blocking items
- Please add a focused regression test for this lookup behavior. This patch fixes a subtle but real normal-path snapshot bug, and without a test it is too easy to regress the
TableIdfallback logic later. - The current
Buildcheck is still failing, so merge should also wait until CI is back to green and the failure is confirmed.
- Suggested non-blocking follow-up
- A short code comment near
resolveTable(...)explaining why the catalog fallback is needed would make the maintenance intent easier to preserve.
Overall, the fix direction is right and it lands on the actual snapshot export path, but I would like to see the regression locked down with one focused test before merge.
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I re-reviewed the latest head locally against upstream/dev, focusing on the Postgres CDC snapshot read path that resolves the Debezium table schema before building the split query.
Runtime path I checked:
Postgres snapshot split
-> PostgresSnapshotSplitReadTask.execute()
-> createDataEvents(snapshotContext, snapshotSplit.getTableId())
-> resolveTable(tableId)
-> try TableId(null, schema, table)
-> fallback to TableId(catalog-or-databaseName, schema, table)
-> createDataEventsForTable(..., table)
-> buildSplitScanQuery(...)
-> read snapshot split SQL
Key findings:
- The fix is on the real snapshot path, not on a dead helper path.
- The old bug was real: if
databaseSchemaonly retained the catalog-qualifiedTableId, the previousTableId(null, schema, table)lookup could returnnulland fail the main snapshot flow. - The current fallback direction is correct, and the failure mode is now clearer than an implicit downstream
NullPointerException.
Issue 1: the current head still needs a focused regression test for the catalog-qualified lookup path
- Location:
seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/postgres/source/reader/snapshot/PostgresSnapshotSplitReadTask.java:164-201 - Why this matters:
This fix sits on a very specific lookup boundary that is easy to regress later. I do not see a minimal regression test that fixes the contract: when the schema cache only has the catalog-qualifiedTableId, the snapshot path should still resolve the table correctly. - Severity: Medium
- Recommended fix:
Please add a focused regression test aroundresolveTable(...)or the surrounding snapshot path so this lookup behavior is locked down.
Issue 2: the current Build check is still red
- Location: GitHub
Build - Severity: Low
- Recommended fix:
Please rerun or fix the current CI failures before merge.
Conclusion: merge after fixes
- Blocking items
- Issue 1: add a minimal regression test for the catalog-qualified snapshot lookup path.
- Issue 2: get the latest
Buildback to green.
- Suggested non-blocking follow-ups
- A short comment above the fallback path would also help future readers understand why the catalog retry is needed.
Overall, the fix direction is right and it does hit the real snapshot read path, but I would like to see the lookup contract pinned down with a regression test before merge.
16baf26 to
5280949
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the latest head from scratch against upstream/dev, focusing on the real Postgres snapshot path that resolves the Debezium table schema before building the split scan query.
What this PR fixes
- User pain: the snapshot reader needs a
TablefromdatabaseSchemabefore it can build the real split query. The old code forced the lookup toTableId(null, schema, table), so a cache that only kept the catalog-qualified key could miss on the normal snapshot path. - Fix approach: the current head centralizes the lookup in
resolveTable(...), first tries the catalog-less id, then falls back totableId.catalog()orconnectorConfig.databaseName(), and now includes focused regression tests for both fallback branches. - One-line summary: this puts the schema lookup fix on the real snapshot execution path and finally locks the catalog fallback contract down with tests.
Runtime chain I checked
Postgres snapshot split execution
-> PostgresSnapshotSplitReadTask.execute() [L116-L149]
-> createDataEvents(snapshotContext, snapshotSplit.getTableId()) [L135-L136]
-> resolveTable(tableId) [L173-L200]
-> first try TableId(null, schema, table)
-> on miss, retry TableId(catalog-or-databaseName, schema, table)
-> createDataEventsForTable(..., table) [L203-L235]
-> PostgresUtils.buildSplitScanQuery(table, ...)
-> executeQuery()
-> dispatcher.dispatchSnapshotEvent(...)
Key findings
- The normal snapshot path definitely hits this change. This is not a dead helper path.
- The original bug is real: if
databaseSchemaonly retains the catalog-qualified key, the oldTableId(null, ...)lookup can miss on the main path. - The current head addresses the right boundary and the two new tests now cover the exact fallback behavior Daniel asked for in the previous round.
- I do not see a remaining code-side blocker on the current implementation.
Local verification I actually ran
git fetch upstream pull/10843/head:seatunnel-review-10843 --force: passed.git diff --stat upstream/dev...seatunnel-review-10843: passed; current scope is now only 2 files,+135/-3.git log --oneline upstream/dev..seatunnel-review-10843: passed; the unrelated CLI / dist commit from the earlier head is gone.nl -ba PostgresSnapshotSplitReadTask.java/PostgresSnapshotSplitReadTaskTest.java: passed; re-checked the lookup path and the added regression tests.gh pr checks 10843 -R apache/seatunnel: inspected;Buildis still pending on the latest head.- Local Maven compile / tests: not run in this review round. This is a source-level review.
Issue 1: the latest Build gate is still pending
- Location: GitHub checks /
Build - Why it matters: I do not see a code blocker anymore, but I still do not want to merge a CDC-path change before the latest head has a completed green top-level gate.
- Suggested fix: wait for the current
Buildresult and only merge once the latest head is green. - Severity: low
Merge conclusion
Conclusion: can merge after fixes
- Blocking items
- Issue 1: wait for the latest
Buildcheck to finish green.
- Suggested non-blocking follow-ups
- None from my side on the current code path.
Overall assessment
- Compared with the earlier Daniel-reviewed head, this version has already cleared the two real blockers: the PR scope is clean again, and the missing catalog-aware regression coverage is now in place.
- I do not see a smaller or more accurate alternative than the current implementation.
Purpose of this pull request
Fix Postgres-CDC snapshot table schema lookup when the PostgreSQL JDBC driver returns a catalog value from
DatabaseMetaData.Starting with pgjdbc 42.7.5,
DatabaseMetaData#getTablesandgetColumnsmay return the database name inTABLE_CAT. This follows the pgjdbc change described asFix PgDatabaseMetaData implementation of catalog as param and return valuein pgjdbc PR #3390.With the same metadata test against a Docker PostgreSQL database named
traffic:The current Postgres snapshot reader always looks up the Debezium table schema with a catalog-less
TableId:That works with the old metadata shape, but when Debezium stores the table schema as
database.schema.table,databaseSchema.tableFor(...)returnsnull. The next snapshot read path callstable.id()and fails with aNullPointerExceptioninPostgresSnapshotSplitReadTask#createDataEventsForTable.This patch keeps the old catalog-less lookup first for existing environments, then falls back to a catalog-aware
TableIdusing theSnapshotSplitcatalog or the configured database name. The generated PostgreSQL snapshot SQL is not changed becausePostgresUtils#quoteSchemaAndTablestill uses only schema and table names.Does this PR introduce any user-facing change?
Yes. Postgres-CDC snapshot startup can now resolve the table schema when pgjdbc returns the database name as
TABLE_CAT, avoiding a snapshot readNullPointerExceptionfor tables such astraffic.public.users.How was this patch tested?
Added a focused unit test for the snapshot schema lookup fallback:
Result:
BUILD SUCCESS,Tests run: 2, Failures: 0, Errors: 0, Skipped: 0.JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \ PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \ ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres \ -DskipTests -Dskip.spotless=true compileResult:
BUILD SUCCESS.JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \ PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \ ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres \ spotless:check -DskipTestsResult:
BUILD SUCCESS.I also tried compiling the module with upstream dependencies:
JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home \ PATH=/Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home/bin:$PATH \ ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-postgres -am \ -DskipTests -Dskip.spotless=true compileThis failed before reaching the Postgres-CDC module, in
seatunnel-config-shade, because the shadedorg.apache.seatunnel.shade.com.typesafe.config.*classes were not found in this local checkout.Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.