Skip to content

Transaction-aligned flush for cross-table read consistency #15

Description

@viggy28

Problem

Streambed flushes per-table based on row count (--flush-rows) or time thresholds (--flush-interval), not transaction boundaries. A single Postgres transaction's writes can land in different Iceberg snapshots — either across tables or even within the same table if a flush fires mid-transaction.

Example

Postgres transaction:

BEGIN;
INSERT INTO orders (...);
INSERT INTO order_items (...);
UPDATE inventory SET qty = qty - 1 WHERE ...;
COMMIT;

With per-table flush, a reader querying streambed between flushes could see:

  • orders with the new row, but order_items without it (cross-table inconsistency)
  • Half of a batch of inserts in one Iceberg snapshot, the other half in the next (within-table split)

The final state is always correct — all rows arrive eventually. But there is no point-in-time consistency guarantee across tables or across flush boundaries.

Why This Happens

pgoutput v1 streams events in transaction order (BEGIN → events → COMMIT), but streambed doesn't track transaction boundaries for flush decisions. The pipeline buffers rows per-table and flushes whenever thresholds are hit, regardless of whether the current transaction has committed yet.

Note: pgoutput v2 (streaming mode) doesn't help here — it adds streaming of large in-progress transactions but doesn't change the flush atomicity model.

Potential Approaches

1. Transaction-aware buffering

  • Track BEGIN/COMMIT messages
  • Only make rows visible to flush after their COMMIT arrives
  • Flush at transaction boundaries when thresholds are met
  • Trade-off: large transactions (e.g., bulk INSERT of 1M rows) must be fully buffered in memory before any flush

2. Cross-table atomic commit

  • Buffer all tables touched by a transaction, flush them together on COMMIT
  • Blocker: Iceberg has no cross-table atomic commit — each table is an independent snapshot. This would require a coordination layer (e.g., a "commit log" that readers consult)

3. Commit-timestamp watermark

  • Tag each Iceberg snapshot with the Postgres CommitMessage.CommitTime
  • Readers query "as of timestamp T" across all tables
  • Trade-off: requires reader-side logic; DuckDB/iceberg_scan doesn't support time-travel filtering natively in the current setup

Current Status

This is explicitly deferred in specs/rearchitecture.md:

Transaction-aligned flush (when per-table reader-visibility atomicity becomes a user-visible requirement)

For most analytics use cases (dashboards, reporting, aggregations on lagged data), eventual consistency within seconds is acceptable. This becomes important for workloads that need snapshot isolation across tables — e.g., financial reports joining accounts and transactions expecting consistency as of the same Postgres commit.

Scope

  • Not a correctness bug — final state is always correct
  • Optimization / feature — provides stronger read consistency guarantees
  • Likely requires changes to pipeline.go (transaction tracking), writer.go (flush coordination), and possibly the query server (time-travel support)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions