Skip to content

feat(eap-items): read array attributes from typed map columns for recent queries#8101

Open
phacops wants to merge 3 commits into
masterfrom
pierre/keen-noether-dn31x9
Open

feat(eap-items): read array attributes from typed map columns for recent queries#8101
phacops wants to merge 3 commits into
masterfrom
pierre/keen-noether-dn31x9

Conversation

@phacops

@phacops phacops commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Array attributes have been double-written into the typed attributes_array_{string,int,float,bool} map columns since 2026-06-22 (#8079, follow-up to #8056). This PR switches the array-attribute filter read path to those typed columns when the queried window is new enough that the columns are fully populated, and otherwise keeps reading the legacy attributes_array JSON column.

The gating mirrors the existing use_sampling_factor pattern: read the typed columns only when the query's earliest bound is on/after 2026-06-23 00:00 UTC (the day after double-writing began), so we never read typed columns for data that predates the double-write.

Linear: EAP-562

What it does

  • USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS (1782172800, 2026-06-23 UTC) + use_array_map_columns(meta) in web/rpc/common/common.py, mirroring use_sampling_factor. Overridable via the use_array_map_columns_timestamp_seconds state config; a value of 0 disables the typed-column read path entirely.
  • type_array_to_membership_array_expression_from_typed_columns in protos/common.py — the typed-column counterpart of type_array_to_membership_array_expression. It builds a normalized Array(String) via arrayConcat over all four typed columns (string as-is, int/float via toString, bool as 'true'/'false'), so the per-element comparisons (_type_array_membership_rhs_expression) keep matching the JSON-column behaviour. Array elements are routed to exactly one typed column by type (AttributeMap::insert_array) — including array ints, which live only in attributes_array_int — so all four columns are read and no element is duplicated.
  • Threaded use_array_map_columns through trace_item_filters_to_expression / _trace_item_filter_key_expression, and passed use_array_map_columns(meta) at every resolver/endpoint filter call site (trace-item table, time series, stats, heatmap, cross-item queries, get-traces, trace-item details).
  • Threaded it through the conditional-aggregation chain too (aggregation_to_expression, get_count_column, get_average_sample_rate_column, get_confidence_interval_column, the extrapolation/_get_ci_* helpers → _get_condition_in_aggregation), so countIf/sumIf predicates filtering on array attributes also use the typed columns. (Addresses the Seer review.)
  • get_field_existence_expression now treats the typed arrayConcat membership expression like arrayMap (notEmpty), so exists_filter on array keys works against the typed columns.

Scope / safety

This change covers the filter (WHERE / membership) read path, including conditional-aggregation filter predicates — the explicitly-named "read-path filter support" follow-up. The SELECT clause and the aggregate field itself continue to read the always-written attributes_array JSON column, so queries stay read-consistent for any window (both column families hold identical data for post-2026-06-22 rows). Migrating those representation-returning paths is left as a separate change.

Tests

tests/web/rpc/test_common.py:

  • test_use_array_map_columns — cutoff/boundary behaviour and the 0-disables config, mirroring test_use_sampling_factor.
  • TestTraceItemFiltersArrayMapColumns — LIKE / EQUALS / exists_filter on array keys route to arrayConcat over the four typed columns when enabled, and to arrayMap over attributes_array by default; the typed builder rejects non-array keys.

tests/web/rpc/test_aggregation.py:

  • test_conditional_aggregation_array_filter_uses_typed_columns — a conditional aggregation whose filter is on an array attribute reads the four typed columns when enabled, and the JSON column otherwise.

Verified locally (Python 3.13 venv + built Rust extension): all unit/expression tests in test_common.py and test_aggregation.py pass, ruff format --check + ruff check clean. (Integration tests needing a live ClickHouse run in CI.)

🤖 Generated with Claude Code

…ent queries

Array attributes have been double-written into the typed
`attributes_array_{string,int,float,bool}` map columns since 2026-06-22
(#8079). This switches the array-attribute filter read path to those typed
columns for query windows new enough that the columns are fully populated,
falling back to the legacy `attributes_array` JSON column otherwise.

- Add `USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS` (2026-06-23 00:00 UTC) and
  `use_array_map_columns(meta)`, mirroring `use_sampling_factor`. The cutoff is
  the day after the double-write began, so only data guaranteed to be present in
  the typed columns is read from them. A `use_array_map_columns_timestamp_seconds`
  state config of 0 disables the typed-column read path.
- Add `type_array_to_membership_array_expression_from_typed_columns`, the
  typed-column counterpart of `type_array_to_membership_array_expression`. It
  builds a normalized `Array(String)` via `arrayConcat` over the string, float
  and bool columns so per-element comparisons keep matching. Ints are
  double-written into the float column (the read path resolves numerics to
  float), so the int column is not read to avoid duplicate elements.
- Thread `use_array_map_columns` through `trace_item_filters_to_expression` and
  `_trace_item_filter_key_expression`, and pass `use_array_map_columns(meta)` at
  every resolver/endpoint filter call site. `get_field_existence_expression`
  treats the `arrayConcat` membership expression like `arrayMap` (notEmpty).

The SELECT and aggregate array paths keep reading the always-written JSON column,
so this change is read-consistent for any window.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
@phacops phacops requested review from a team as code owners June 25, 2026 00:31

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8e90787. Configure here.

Comment thread snuba/protos/common.py Outdated
Comment thread snuba/web/rpc/common/common.py
claude added 2 commits June 25, 2026 00:37
…array

Array integers are written only to `attributes_array_int`, not also to the
float column — `AttributeMap::insert_array` (eap_items Rust processor) routes
each element to exactly one typed column by type, unlike the scalar
`insert_int` double-write. Reading only string/float/bool therefore missed
int-only arrays entirely: membership filters never matched and `exists_filter`
reported the key as absent.

Read all four typed columns in the `arrayConcat`, mapping ints to strings via
`toString`. Element types never overlap across the typed columns, so no element
is duplicated. Caught by Cursor Bugbot review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
…regations

Conditional-aggregation filters (the predicate inside countIf/sumIf, built by
`_get_condition_in_aggregation`) on array attributes were still resolving
against the legacy `attributes_array` JSON column, because the aggregation
helpers didn't propagate `use_array_map_columns`. Results were correct (the
JSON column is always written) but didn't use the typed columns.

Thread `use_array_map_columns` through the aggregation chain alongside the
existing `use_sampling_factor` parameter — `aggregation_to_expression`,
`get_count_column`, `get_average_sample_rate_column`,
`get_confidence_interval_column`, `get_extrapolated_function`, and the
`_get_ci_*` helpers — down to `_get_condition_in_aggregation`, and pass
`use_array_map_columns(meta)` at the trace-item-table and time-series call
sites. The aggregate field itself (and SELECT) keep reading the JSON column, as
before.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants