feat(eap-items): ingest conversation_id and session_id#8100
Open
phacops wants to merge 5 commits into
Open
Conversation
Populate the conversation_id and session_id columns (added by migration 0060) in the eap-items processor from the new TraceItem proto fields. Both arrive as strings and are written to their non-nullable UUID columns, mapping absent or unparseable values to the nil UUID so a missing/malformed identifier never drops the whole item. Upgrade sentry-protos to 0.34.0 (Python and Rust), the first version that exposes conversation_id/session_id on TraceItem. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
The eap-items JSON insert path now emits conversation_id and session_id (nil UUID when absent), so refresh the insta snapshot to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
Rather than persisting the all-zero "magic" UUID when a TraceItem has no conversation_id/session_id (or sends an unparseable/nil value), generate a fresh random UUID. The all-zero value would be indistinguishable from a real all-zero id and would balloon the bloom-filter index with a single high-frequency value. This keeps the columns non-nullable (no migration). The schema-snapshot test redacts the two now-random fields to stay deterministic. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
The conversation_id/session_id columns are UUID, but the proto fields are unconstrained strings. Producers are expected to send UUID-formatted ids; a present value that fails to parse as a UUID now increments the `eap_items.non_uuid_id` metric (tagged by field) so non-UUID producers are detectable, instead of being silently swallowed. We still randomize (absent, explicit-nil, or non-UUID) so a valid UUID is always stored and an optional field never drops the whole item. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
…r_random The metric was emitted when a malformed conversation_id or session_id was encountered. Removing it in favour of silent randomization to keep the ingestion path lean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Populates the
conversation_idandsession_idcolumns in the eap-items processor from the newTraceItemproto fields.The columns themselves already exist — they were added as non-nullable
UUIDcolumns (aftertrace_id) by migration0060_add_conversation_id_and_session_id(#8053) and are declared in theeap_itemsstorage schema — but nothing was writing to them. This change wires the ingestion path so the values actually land in ClickHouse when present on theTraceItem.Changes
sentry-protosto0.34.0for both Python (pyproject.toml+uv.lock) and Rust (Cargo.toml+Cargo.lock).0.33.0is the first version that exposesconversation_id(field 10) andsession_id(field 11) onTraceItem;0.34.0is the current latest and contains no breaking changes vs0.32.0.rust_snuba/src/processors/eap_items.rs:conversation_id/session_idto bothEAPItemand the RowBinaryEAPItemRow(with the existing ClickHouse UUID serde adapter), inserted right aftertrace_idto match the on-disk column order.EAPItemRow::COLUMN_NAMESso the explicitINSERT ... FORMAT RowBinarycolumn list stays aligned with the struct's wire order.parse_uuid_or_randomhelper parses them into UUIDs.Behavior — absent ids are randomized, not zero
The columns are non-nullable
UUID. Rather than persisting the all-zero "magic" UUID when aTraceItemhas noconversation_id/session_id:The all-zero value would be indistinguishable from a real (if absurd) all-zero id and would balloon the
bf_conversation_id/bf_session_idbloom-filter indexes with a single high-frequency value. Randomizing keeps the columns non-nullable, so no schema migration is required. (Nullable(UUID)was considered but rejected to avoid altering the just-merged migration0060.)Tests
test_column_names_match_struct_layoutfor the two new columns and shifted indices.test_schemasinsta snapshot and added redactions for the two now-random fields so it stays deterministic.cargo buildpasses and allprocessorsunit tests pass. The one exception istest_row_binary_clickhouse_insert, which requires a live ClickHouse instance (not available in the dev sandbox); it runs in CI and exercises the full RowBinary insert against the real table — including the new column ordering.🤖 Generated with Claude Code