[CDC] DocDB: Per-phase latency and batch-size metrics for CDCSDK GetChanges#10
Draft
bdbene wants to merge 2 commits into
Draft
[CDC] DocDB: Per-phase latency and batch-size metrics for CDCSDK GetChanges#10bdbene wants to merge 2 commits into
bdbene wants to merge 2 commits into
Conversation
…hanges Adds per-tablet histograms and counters on CDCSDKTabletMetrics so operators can attribute slow GetChanges RPCs to specific phases. Today the only visibility is the server-wide handler_latency and a few aggregate counters, which is not enough to tell whether time is going to WAL reads, intent reads, schema lookups, the cdc_state checkpoint write, or somewhere else. Per-phase latency histograms (microseconds, per stream/tablet): - cdcsdk_get_changes_total_latency total RPC duration (per-tablet view) - cdcsdk_get_changes_preflight_latency RPC entry up to GetChangesForCDCSDK - cdcsdk_get_last_checkpoint_latency cdc_state read on missing from_op_id - cdcsdk_update_checkpoint_latency cdc_state write on EXPLICIT ack - cdcsdk_wal_read_latency GetConsistentWALRecords / GetWALRecords - cdcsdk_process_intents_latency ProcessIntentsWithInvalidSchemaRetry - cdcsdk_get_intents_latency tablet->GetIntentsForCDC alone - cdcsdk_populate_write_record_latency single-shard PopulateCDCSDKWriteRecord - cdcsdk_schema_lookup_latency client->GetTableSchemaFromSysCatalog - cdcsdk_safe_time_wait_latency GetConsistentStreamSafeTime Per-call batch-size histograms (one observation per successful response): - cdcsdk_wal_records_read / cdcsdk_wal_bytes_read - cdcsdk_intents_per_txn - cdcsdk_response_records / cdcsdk_response_bytes Per-optype counters incremented in the GetChangesForCDCSDK main switch: - cdcsdk_write_ops_seen, cdcsdk_update_txn_ops_seen, cdcsdk_change_metadata_ops_seen, cdcsdk_truncate_ops_seen, cdcsdk_split_ops_seen Plumbing: the existing CDCThroughputMetrics struct that already threads counters from the producer to the service layer is extended with a non-owning xrepl::CDCSDKTabletMetrics* pointer plus per-call WAL accumulators. CDCServiceImpl::GetChanges acquires the metric instance via the existing GetCDCSDKTabletMetrics() lookup (skipped for xCluster and the sys catalog tablet), populates the pointer before calling GetChangesForCDCSDK, and emits batch-size histograms at the end of the call. ScopedLatencyMetric (null-safe) is used at every observation site so the same code paths work on the xCluster path where the per-tablet pointer is null. Scope: CDCSDK only. xCluster path is unchanged.
The server-wide handler_latency_yb_cdc_CDCService_GetChanges already covers total RPC duration, and the per-phase histograms in this PR let operators attribute time per phase. The extra per-tablet total adds no signal and costs one EventStats per (stream, tablet).
Author
|
/gemini review |
egladysh
pushed a commit
that referenced
this pull request
Jun 18, 2026
…tion Summary: ## Problem A READ COMMITTED retry triggered by a deadlock / abort could SIGSEGV at `AfterTriggerEndSubXact` during the subsequent ROLLBACK. Stack trace: ``` Signal: SIGSEGV #0 GetMemoryChunkContext(pointer=0x0) memutils.h:141:12 (inlined) #1 pfree(pointer=0x0) mcxt.c:1500:26 (inlined) #2 AfterTriggerEndSubXact trigger.c:5657:4 #3 AbortSubTransaction xact.c:5726:3 #4 CommitTransactionCommand xact.c:3650:4 #5 CommitTransactionCommand xact.c:0 #6 yb_exec_simple_query_impl postgres.c:3023:3 (inlined finish_xact_command) #7 yb_exec_simple_query_impl postgres.c:1494:4 #8 yb_exec_simple_query_impl postgres.c:5804:2 #9 yb_exec_query_wrapper_one_attempt postgres.c:5764:3 #10 PostgresMain postgres.c:5796:3 #11 PostgresMain postgres.c:5821:2 (inlined yb_exec_simple_query) yugabyte#12 PostgresMain postgres.c:6623:8 ``` ### Reading the stack - **#2 `AfterTriggerEndSubXact` (trigger.c:5657)** -- the call site is `pfree(afterTriggers.state)` in the abort branch, gated on `trans_stack[my_level].state != NULL`. - **#3-#5 `CommitTransactionCommand` -> `AbortSubTransaction`** -- called because of ROLLBACK. ### Root cause The retry path before this change ran, in order: ``` yb_restart_transaction -> YBCRestartWriteTransaction -> AfterTriggerEndXact(false) // wipes trans_stack to NULL, maxtransdepth to 0 // and afterTriggers.state to NULL -> AfterTriggerBeginXact() -> RollbackAndReleaseCurrentSubTransaction -> YbBeginInternalSubTransactionForReadCommittedStatement -> AfterTriggerBeginSubXact -> MemoryContextAlloc(8 * sizeof(AfterTriggersTransData)) // NOT AllocZero: leaves N-1 slots uninitialized -> initializes trans_stack[my_level] only ``` Any live subxact with level < `my_level` -- a user `SAVEPOINT` and the per-statement RC internal subxact above it -- now points at an uninitialized slot. ROLLBACK reads garbage as `state`, the non-NULL gate passes, and `pfree(afterTriggers.state)` faults because `EndXact` already cleared that field. ## Fix `YBCRestartWriteTransaction` now rolls back every savepoint / subtransaction before recreating the top-level write state, so the PG-side `trans_stack` is empty by the time the surgical reset wipes the after-trigger state. Removed the now-redundant `RollbackAndReleaseCurrentSubTransaction()` from the else branch of `yb_restart_transaction`. Test Plan: Jenkins Reviewers: pjain, smishra Reviewed By: pjain Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D54387
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-tablet histograms and counters on
CDCSDKTabletMetricsso operators can attribute slow CDCSDKGetChangesRPCs to specific phases. Today the only visibility is the server-widehandler_latency_yb_cdc_CDCService_GetChangesand a few aggregate counters, which is not enough to tell whether the time is going to WAL reads, intent reads, schema lookups, thecdc_statecheckpoint write, or somewhere else.Scope: CDCSDK only. The xCluster path is unchanged.
Per-phase latency histograms (microseconds, per stream/tablet)
Total RPC duration is already covered by the server-wide
handler_latency_yb_cdc_CDCService_GetChanges; the per-phase metrics below let operators attribute time within that total.cdcsdk_get_changes_preflight_latencyGetChangesForCDCSDK(semaphore, validation, stream/tablet/leader lookup, schema/enum/composite caches)cdcsdk_get_last_checkpoint_latencyGetLastCheckpoint(cdc_stateread) when the client sends nofrom_op_idcdcsdk_update_checkpoint_latencyUpdateCheckpointAndActiveTime(cdc_statewrite) on EXPLICIT ackcdcsdk_wal_read_latencyGetConsistentWALRecords/GetWALRecordscdcsdk_process_intents_latencyProcessIntentsWithInvalidSchemaRetryper multi-shard txncdcsdk_get_intents_latencytablet->GetIntentsForCDCalone (separates IntentsDB read from record population)cdcsdk_populate_write_record_latencyPopulateCDCSDKWriteRecordfor single-shard writescdcsdk_schema_lookup_latencyclient->GetTableSchemaFromSysCatalog(master round-trips)cdcsdk_safe_time_wait_latencyGetConsistentStreamSafeTime(waiting for safe time / txn load)Per-call batch-size histograms (one observation per successful response)
cdcsdk_wal_records_read,cdcsdk_wal_bytes_read,cdcsdk_intents_per_txn,cdcsdk_response_records,cdcsdk_response_bytes.Per-optype counters
Incremented in the
GetChangesForCDCSDKmain switch:cdcsdk_write_ops_seen,cdcsdk_update_txn_ops_seen,cdcsdk_change_metadata_ops_seen,cdcsdk_truncate_ops_seen,cdcsdk_split_ops_seen.Implementation notes
CDCThroughputMetricsstruct that already threads counters from the producer to the service layer. Extended with a non-owningxrepl::CDCSDKTabletMetrics*pointer plus per-call WAL byte/record accumulators.CDCServiceImpl::GetChangesacquires the per-tablet metric via the existingGetCDCSDKTabletMetrics()lookup (skipped for xCluster and the sys catalog tablet), populates the pointer before callingGetChangesForCDCSDK, and emits batch-size histograms at the end of the call.ScopedLatencyMetric<EventStats>, which is null-safe — so the same code paths work when the per-tablet pointer is unset (xCluster, sys catalog, metric acquisition failure).(streams × tablets)scaling as the existingcdcsdk_sent_lag_microsetc., so no new cardinality regime, but worth measuring tserver heap impact on a busy cluster.Test plan
./yb_build.sh release daemons initdb --sj --skip-pg-parquet --no-odyssey --no-ybcbuilds clean./yb_build.sh release --cxx-test cdc_service-testandcdcsdk_producer-testcdcsdk_*_latency/cdcsdk_*_read/cdcsdk_*_per_txnseries appear athttp://<tserver>:9000/prometheus-metricswith non-zero counts and sensible percentilestable_name/tablet_id/stream_id(same labels ascdcsdk_sent_lag_micros)