feat(eap-items): read array attributes from typed map columns for recent queries by phacops · Pull Request #8101 · getsentry/snuba

phacops · 2026-06-25T00:31:48Z

Summary

Array attributes have been double-written into the typed attributes_array_{string,int,float,bool} map columns since 2026-06-22 (#8079, follow-up to #8056). This PR switches the array-attribute filter read path to those typed columns when the queried window is new enough that the columns are fully populated, and otherwise keeps reading the legacy attributes_array JSON column.

The gating mirrors the existing use_sampling_factor pattern: read the typed columns only when the query's earliest bound is on/after 2026-06-23 00:00 UTC (the day after double-writing began), so we never read typed columns for data that predates the double-write.

Linear: EAP-562

What it does

USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS (1782172800, 2026-06-23 UTC) + use_array_map_columns(meta) in web/rpc/common/common.py, mirroring use_sampling_factor. Overridable via the use_array_map_columns_timestamp_seconds state config; a value of 0 disables the typed-column read path entirely.
type_array_to_membership_array_expression_from_typed_columns in protos/common.py — the typed-column counterpart of type_array_to_membership_array_expression. It builds a normalized Array(String) via arrayConcat over all four typed columns (string as-is, int/float via toString, bool as 'true'/'false'), so the per-element comparisons (_type_array_membership_rhs_expression) keep matching the JSON-column behaviour. Array elements are routed to exactly one typed column by type (AttributeMap::insert_array) — including array ints, which live only in attributes_array_int — so all four columns are read and no element is duplicated.
Threaded use_array_map_columns through trace_item_filters_to_expression / _trace_item_filter_key_expression, and passed use_array_map_columns(meta) at every resolver/endpoint filter call site (trace-item table, time series, stats, heatmap, cross-item queries, get-traces, trace-item details).
Threaded it through the conditional-aggregation chain too (aggregation_to_expression, get_count_column, get_average_sample_rate_column, get_confidence_interval_column, the extrapolation/_get_ci_* helpers → _get_condition_in_aggregation), so countIf/sumIf predicates filtering on array attributes also use the typed columns. (Addresses the Seer review.)
get_field_existence_expression now treats the typed arrayConcat membership expression like arrayMap (notEmpty), so exists_filter on array keys works against the typed columns.

Scope / safety

This change covers the filter (WHERE / membership) read path, including conditional-aggregation filter predicates — the explicitly-named "read-path filter support" follow-up. The SELECT clause and the aggregate field itself continue to read the always-written attributes_array JSON column, so queries stay read-consistent for any window (both column families hold identical data for post-2026-06-22 rows). Migrating those representation-returning paths is left as a separate change.

Tests

tests/web/rpc/test_common.py:

test_use_array_map_columns — cutoff/boundary behaviour and the 0-disables config, mirroring test_use_sampling_factor.
TestTraceItemFiltersArrayMapColumns — LIKE / EQUALS / exists_filter on array keys route to arrayConcat over the four typed columns when enabled, and to arrayMap over attributes_array by default; the typed builder rejects non-array keys.

tests/web/rpc/test_aggregation.py:

test_conditional_aggregation_array_filter_uses_typed_columns — a conditional aggregation whose filter is on an array attribute reads the four typed columns when enabled, and the JSON column otherwise.

Verified locally (Python 3.13 venv + built Rust extension): all unit/expression tests in test_common.py and test_aggregation.py pass, ruff format --check + ruff check clean. (Integration tests needing a live ClickHouse run in CI.)

🤖 Generated with Claude Code

…ent queries Array attributes have been double-written into the typed `attributes_array_{string,int,float,bool}` map columns since 2026-06-22 (#8079). This switches the array-attribute filter read path to those typed columns for query windows new enough that the columns are fully populated, falling back to the legacy `attributes_array` JSON column otherwise. - Add `USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS` (2026-06-23 00:00 UTC) and `use_array_map_columns(meta)`, mirroring `use_sampling_factor`. The cutoff is the day after the double-write began, so only data guaranteed to be present in the typed columns is read from them. A `use_array_map_columns_timestamp_seconds` state config of 0 disables the typed-column read path. - Add `type_array_to_membership_array_expression_from_typed_columns`, the typed-column counterpart of `type_array_to_membership_array_expression`. It builds a normalized `Array(String)` via `arrayConcat` over the string, float and bool columns so per-element comparisons keep matching. Ints are double-written into the float column (the read path resolves numerics to float), so the int column is not read to avoid duplicate elements. - Thread `use_array_map_columns` through `trace_item_filters_to_expression` and `_trace_item_filter_key_expression`, and pass `use_array_map_columns(meta)` at every resolver/endpoint filter call site. `get_field_existence_expression` treats the `arrayConcat` membership expression like `arrayMap` (notEmpty). The SELECT and aggregate array paths keep reading the always-written JSON column, so this change is read-consistent for any window. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8e90787. Configure here.}

…array Array integers are written only to `attributes_array_int`, not also to the float column — `AttributeMap::insert_array` (eap_items Rust processor) routes each element to exactly one typed column by type, unlike the scalar `insert_int` double-write. Reading only string/float/bool therefore missed int-only arrays entirely: membership filters never matched and `exists_filter` reported the key as absent. Read all four typed columns in the `arrayConcat`, mapping ints to strings via `toString`. Element types never overlap across the typed columns, so no element is duplicated. Caught by Cursor Bugbot review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A

…regations Conditional-aggregation filters (the predicate inside countIf/sumIf, built by `_get_condition_in_aggregation`) on array attributes were still resolving against the legacy `attributes_array` JSON column, because the aggregation helpers didn't propagate `use_array_map_columns`. Results were correct (the JSON column is always written) but didn't use the typed columns. Thread `use_array_map_columns` through the aggregation chain alongside the existing `use_sampling_factor` parameter — `aggregation_to_expression`, `get_count_column`, `get_average_sample_rate_column`, `get_confidence_interval_column`, `get_extrapolated_function`, and the `_get_ci_*` helpers — down to `_get_condition_in_aggregation`, and pass `use_array_map_columns(meta)` at the trace-item-table and time-series call sites. The aggregate field itself (and SELECT) keep reading the JSON column, as before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A

phacops requested review from a team as code owners June 25, 2026 00:31

cursor Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread snuba/protos/common.py Outdated

sentry Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread snuba/web/rpc/common/common.py

claude added 2 commits June 25, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(eap-items): read array attributes from typed map columns for recent queries#8101

feat(eap-items): read array attributes from typed map columns for recent queries#8101
phacops wants to merge 3 commits into
masterfrom
pierre/keen-noether-dn31x9

phacops commented Jun 25, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

phacops commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Scope / safety

Tests

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phacops commented Jun 25, 2026 •

edited

Loading