Skip to content

feat(eap-items): ingest conversation_id and session_id#8100

Open
phacops wants to merge 5 commits into
masterfrom
pierre/lucid-ramanujan-rqdn2f
Open

feat(eap-items): ingest conversation_id and session_id#8100
phacops wants to merge 5 commits into
masterfrom
pierre/lucid-ramanujan-rqdn2f

Conversation

@phacops

@phacops phacops commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Populates the conversation_id and session_id columns in the eap-items processor from the new TraceItem proto fields.

The columns themselves already exist — they were added as non-nullable UUID columns (after trace_id) by migration 0060_add_conversation_id_and_session_id (#8053) and are declared in the eap_items storage schema — but nothing was writing to them. This change wires the ingestion path so the values actually land in ClickHouse when present on the TraceItem.

Changes

  • Upgrade sentry-protos to 0.34.0 for both Python (pyproject.toml + uv.lock) and Rust (Cargo.toml + Cargo.lock). 0.33.0 is the first version that exposes conversation_id (field 10) and session_id (field 11) on TraceItem; 0.34.0 is the current latest and contains no breaking changes vs 0.32.0.
  • Ingest the new fields in rust_snuba/src/processors/eap_items.rs:
    • Add conversation_id / session_id to both EAPItem and the RowBinary EAPItemRow (with the existing ClickHouse UUID serde adapter), inserted right after trace_id to match the on-disk column order.
    • Add both to EAPItemRow::COLUMN_NAMES so the explicit INSERT ... FORMAT RowBinary column list stays aligned with the struct's wire order.
    • The proto fields arrive as strings. A parse_uuid_or_random helper parses them into UUIDs.

Behavior — absent ids are randomized, not zero

The columns are non-nullable UUID. Rather than persisting the all-zero "magic" UUID when a TraceItem has no conversation_id/session_id:

  • Present and valid (non-nil) → stored as the parsed UUID.
  • Absent (empty string), malformed, or explicitly all-zero → replaced with a fresh random UUID.

The all-zero value would be indistinguishable from a real (if absurd) all-zero id and would balloon the bf_conversation_id/bf_session_id bloom-filter indexes with a single high-frequency value. Randomizing keeps the columns non-nullable, so no schema migration is required. (Nullable(UUID) was considered but rejected to avoid altering the just-merged migration 0060.)

Tests

  • Added unit tests covering present / absent / malformed / explicit-nil values on both the typed conversion and the RowBinary path.
  • Updated test_column_names_match_struct_layout for the two new columns and shifted indices.
  • Updated the test_schemas insta snapshot and added redactions for the two now-random fields so it stays deterministic.
  • cargo build passes and all processors unit tests pass. The one exception is test_row_binary_clickhouse_insert, which requires a live ClickHouse instance (not available in the dev sandbox); it runs in CI and exercises the full RowBinary insert against the real table — including the new column ordering.

🤖 Generated with Claude Code

Populate the conversation_id and session_id columns (added by migration
0060) in the eap-items processor from the new TraceItem proto fields.
Both arrive as strings and are written to their non-nullable UUID
columns, mapping absent or unparseable values to the nil UUID so a
missing/malformed identifier never drops the whole item.

Upgrade sentry-protos to 0.34.0 (Python and Rust), the first version
that exposes conversation_id/session_id on TraceItem.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
@phacops phacops requested a review from a team as a code owner June 24, 2026 23:27
claude added 3 commits June 24, 2026 23:42
The eap-items JSON insert path now emits conversation_id and session_id
(nil UUID when absent), so refresh the insta snapshot to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
Rather than persisting the all-zero "magic" UUID when a TraceItem has no
conversation_id/session_id (or sends an unparseable/nil value), generate
a fresh random UUID. The all-zero value would be indistinguishable from a
real all-zero id and would balloon the bloom-filter index with a single
high-frequency value. This keeps the columns non-nullable (no migration).

The schema-snapshot test redacts the two now-random fields to stay
deterministic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
The conversation_id/session_id columns are UUID, but the proto fields are
unconstrained strings. Producers are expected to send UUID-formatted ids;
a present value that fails to parse as a UUID now increments the
`eap_items.non_uuid_id` metric (tagged by field) so non-UUID producers
are detectable, instead of being silently swallowed. We still randomize
(absent, explicit-nil, or non-UUID) so a valid UUID is always stored and
an optional field never drops the whole item.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TS8xhdkBBisbYXFeTo2EyA
…r_random

The metric was emitted when a malformed conversation_id or session_id was
encountered. Removing it in favour of silent randomization to keep the
ingestion path lean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants