feat(eap-items): read array attributes from typed map columns for recent queries#8101
Open
phacops wants to merge 3 commits into
Open
feat(eap-items): read array attributes from typed map columns for recent queries#8101phacops wants to merge 3 commits into
phacops wants to merge 3 commits into
Conversation
…ent queries
Array attributes have been double-written into the typed
`attributes_array_{string,int,float,bool}` map columns since 2026-06-22
(#8079). This switches the array-attribute filter read path to those typed
columns for query windows new enough that the columns are fully populated,
falling back to the legacy `attributes_array` JSON column otherwise.
- Add `USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS` (2026-06-23 00:00 UTC) and
`use_array_map_columns(meta)`, mirroring `use_sampling_factor`. The cutoff is
the day after the double-write began, so only data guaranteed to be present in
the typed columns is read from them. A `use_array_map_columns_timestamp_seconds`
state config of 0 disables the typed-column read path.
- Add `type_array_to_membership_array_expression_from_typed_columns`, the
typed-column counterpart of `type_array_to_membership_array_expression`. It
builds a normalized `Array(String)` via `arrayConcat` over the string, float
and bool columns so per-element comparisons keep matching. Ints are
double-written into the float column (the read path resolves numerics to
float), so the int column is not read to avoid duplicate elements.
- Thread `use_array_map_columns` through `trace_item_filters_to_expression` and
`_trace_item_filter_key_expression`, and pass `use_array_map_columns(meta)` at
every resolver/endpoint filter call site. `get_field_existence_expression`
treats the `arrayConcat` membership expression like `arrayMap` (notEmpty).
The SELECT and aggregate array paths keep reading the always-written JSON column,
so this change is read-consistent for any window.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8e90787. Configure here.
…array Array integers are written only to `attributes_array_int`, not also to the float column — `AttributeMap::insert_array` (eap_items Rust processor) routes each element to exactly one typed column by type, unlike the scalar `insert_int` double-write. Reading only string/float/bool therefore missed int-only arrays entirely: membership filters never matched and `exists_filter` reported the key as absent. Read all four typed columns in the `arrayConcat`, mapping ints to strings via `toString`. Element types never overlap across the typed columns, so no element is duplicated. Caught by Cursor Bugbot review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
…regations Conditional-aggregation filters (the predicate inside countIf/sumIf, built by `_get_condition_in_aggregation`) on array attributes were still resolving against the legacy `attributes_array` JSON column, because the aggregation helpers didn't propagate `use_array_map_columns`. Results were correct (the JSON column is always written) but didn't use the typed columns. Thread `use_array_map_columns` through the aggregation chain alongside the existing `use_sampling_factor` parameter — `aggregation_to_expression`, `get_count_column`, `get_average_sample_rate_column`, `get_confidence_interval_column`, `get_extrapolated_function`, and the `_get_ci_*` helpers — down to `_get_condition_in_aggregation`, and pass `use_array_map_columns(meta)` at the trace-item-table and time-series call sites. The aggregate field itself (and SELECT) keep reading the JSON column, as before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J9drsbsd3QyVPEFy8ALB7A
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Array attributes have been double-written into the typed
attributes_array_{string,int,float,bool}map columns since 2026-06-22 (#8079, follow-up to #8056). This PR switches the array-attribute filter read path to those typed columns when the queried window is new enough that the columns are fully populated, and otherwise keeps reading the legacyattributes_arrayJSON column.The gating mirrors the existing
use_sampling_factorpattern: read the typed columns only when the query's earliest bound is on/after 2026-06-23 00:00 UTC (the day after double-writing began), so we never read typed columns for data that predates the double-write.Linear: EAP-562
What it does
USE_ARRAY_MAP_COLUMNS_TIMESTAMP_SECONDS(1782172800, 2026-06-23 UTC) +use_array_map_columns(meta)inweb/rpc/common/common.py, mirroringuse_sampling_factor. Overridable via theuse_array_map_columns_timestamp_secondsstate config; a value of0disables the typed-column read path entirely.type_array_to_membership_array_expression_from_typed_columnsinprotos/common.py— the typed-column counterpart oftype_array_to_membership_array_expression. It builds a normalizedArray(String)viaarrayConcatover all four typed columns (string as-is, int/float viatoString, bool as'true'/'false'), so the per-element comparisons (_type_array_membership_rhs_expression) keep matching the JSON-column behaviour. Array elements are routed to exactly one typed column by type (AttributeMap::insert_array) — including array ints, which live only inattributes_array_int— so all four columns are read and no element is duplicated.use_array_map_columnsthroughtrace_item_filters_to_expression/_trace_item_filter_key_expression, and passeduse_array_map_columns(meta)at every resolver/endpoint filter call site (trace-item table, time series, stats, heatmap, cross-item queries, get-traces, trace-item details).aggregation_to_expression,get_count_column,get_average_sample_rate_column,get_confidence_interval_column, the extrapolation/_get_ci_*helpers →_get_condition_in_aggregation), socountIf/sumIfpredicates filtering on array attributes also use the typed columns. (Addresses the Seer review.)get_field_existence_expressionnow treats the typedarrayConcatmembership expression likearrayMap(notEmpty), soexists_filteron array keys works against the typed columns.Scope / safety
This change covers the filter (WHERE / membership) read path, including conditional-aggregation filter predicates — the explicitly-named "read-path filter support" follow-up. The SELECT clause and the aggregate field itself continue to read the always-written
attributes_arrayJSON column, so queries stay read-consistent for any window (both column families hold identical data for post-2026-06-22 rows). Migrating those representation-returning paths is left as a separate change.Tests
tests/web/rpc/test_common.py:test_use_array_map_columns— cutoff/boundary behaviour and the0-disables config, mirroringtest_use_sampling_factor.TestTraceItemFiltersArrayMapColumns— LIKE / EQUALS / exists_filter on array keys route toarrayConcatover the four typed columns when enabled, and toarrayMapoverattributes_arrayby default; the typed builder rejects non-array keys.tests/web/rpc/test_aggregation.py:test_conditional_aggregation_array_filter_uses_typed_columns— a conditional aggregation whose filter is on an array attribute reads the four typed columns when enabled, and the JSON column otherwise.Verified locally (Python 3.13 venv + built Rust extension): all unit/expression tests in
test_common.pyandtest_aggregation.pypass,ruff format --check+ruff checkclean. (Integration tests needing a live ClickHouse run in CI.)🤖 Generated with Claude Code