ref(options): migrate runtime config to sentry-options (Python RPC/query + Rust killswitches)#8096
Draft
phacops wants to merge 35 commits into
Draft
ref(options): migrate runtime config to sentry-options (Python RPC/query + Rust killswitches)#8096phacops wants to merge 35 commits into
phacops wants to merge 35 commits into
Conversation
Adds the sentry-options Python client and uses it for the
`enable_any_attribute_filter` flag, which previously lived in
Redis-backed runtime config (`state.get_int_config`). This mirrors how
the Rust consumers already read the `snuba` options namespace.
- Add `sentry-options>=1.1.1` dependency (uv.lock updated)
- Declare `enable_any_attribute_filter` (boolean, default true) in the
snuba sentry-options schema
- Add `snuba/state/sentry_options.py` wrapping `init()` /
`options("snuba").get()` with a safe fallback to each call site's
default; initialized from `setup_sentry()`
- Swap the RPC call site to `get_option(...)`, preserving the
default-on kill-switch semantics
- Add unit + integration tests; point conftest at the in-repo schema
via SENTRY_OPTIONS_DIR so init() is cwd-independent
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ptions
Continues the migration of Redis-backed runtime config to sentry-options
(the Python counterpart to how the Rust consumers already read the `snuba`
namespace). Migrates 12 read-only feature flags / tuning knobs that have a
single source of truth and are safe to manage centrally:
boolean: aggregation_deprecation_enabled, enable_trace_pagination,
use.low.cardinality.processor, cross_item_queries_no_sample_outer
integer: default_tier, export_trace_items_default_page_size,
use_sampling_factor_timestamp_seconds,
ExecutionStage.max_query_size_bytes
number: EndpointGetTrace.apply_final_rollout_percentage,
rpc_logging_sample_rate, rpc_logging_flush_logs
string: ExecutionStage.disable_max_query_size_check_for_clusters
- Add typed accessors get_bool_option/get_int_option/get_float_option/
get_str_option to snuba/state/sentry_options.py (mirroring get_int_config
& friends) so call sites stay typed under strict mypy. Each falls back to
the call site default if sentry-options is unavailable, matching the Rust
`.ok()...unwrap_or(default)` semantics.
- Declare each key in the snuba sentry-options schema with the type and
default matching the previous runtime-config default (behavior-preserving).
- Swap each call site from state.get_*_config(...) to the typed accessor.
- Update tests that toggled these via state.set_config(...) to use
sentry_options.testing.override_options(...) instead.
Schema defaults match prior get_*_config defaults, so behavior is unchanged
until a value is set in sentry-options-automator.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…options
The Rust consumers read some config via `runtime_config::get_str_config`,
which calls back into Python `snuba.state` (Redis) over PyO3. Migrate the
two static, non-parameterized boolean killswitches to sentry-options
instead, matching how the Rust consumers already read the `snuba` namespace
(see blq_router.rs). This also removes their PyO3/Redis round-trip.
- eap_items_drop_invalid_timestamps (utils.rs): drop messages with event
timestamps >1 week future / >30 days past.
- experimental_healthcheck (healthcheck.rs): treat commit-request progress
as healthy.
Both are declared in the snuba sentry-options schema (boolean, default
false) and read via `options("snuba").get(key).as_bool()` with a
fallback to false, identical to the existing BLQ pattern. Healthcheck
tests now use `sentry_options::testing::override_options` instead of
`runtime_config::patch_str_config_for_test`.
Not migrated (left on runtime config): the per-storage / per-consumer-group
parameterized keys (clickhouse_load_balancing:<storage>,
clickhouse_max_insert_block_size:<storage>,
eap_items_dlq_grace_period_min:<storage>,
quantized_rebalance_consumer_group_delay_secs__<group>) and the
string-valued generic_metrics_use_case_killswitch — dynamic keys cannot be
declared in a static sentry-options schema.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates the read-only query-cache feature flags read in web/db_query.py: enable_cache_partitioning (bool, default true), randomize_query_id (bool, default false), retry_duplicate_query_id (bool, default false), and enable_bypass_cache_referrers (bool, default false). Swaps the call sites to get_bool_option and converts the one test toggle to override_options. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Aligns get_option with its docstring's "any reason" fallback contract. NotInitializedError/SchemaError/UnknownNamespaceError/UnknownOptionError all subclass OptionsError and were already handled, but a non-OptionsError escaping the client would have propagated into hot query paths. Catch and log those, returning the call-site default, and add a regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates seven read-only operational knobs to sentry-options: - debug_buffer_size_bytes, http_batch_join_timeout (clickhouse/http.py) - project_quota_time_percentage, counter_window_size_minutes, allows_skipping_single_project_replacements (utils/bucket_timer.py) - use_sentry_metrics (utils/metrics/backends/dualwrite.py) - ondemand_profiler_hostnames (utils/profiler.py) None has a test toggle. debug_buffer_size_bytes maps to integer default 0 because the downstream check is `size < (value or 0)`, so None and 0 were already equivalent; the redundant isinstance assert is dropped. simultaneous_queries_sleep_seconds (read at two sites with different defaults) and optimize_parallel_threads (caller-supplied default) are intentionally left on runtime config: a single schema default cannot preserve their semantics. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates seven read-only flags and converts their test toggles to override_options (using the context-manager-as-decorator form): - throw_on_uniq_select_and_having (uniq_in_select_and_having) - function-validator.enabled (query/validation/functions) - mandatory_condition_enforce (conditions_enforcer) - eap.reject_string_timestamp_filters (time_series_request_visitor) - trace_ids_cross_item_query_limit (cross_item_queries) - storage_routing.enable_get_cluster_loadinfo (storage_routing) - max_spans_per_transaction (transactions_processor) The max_spans_per_transaction try/except + isinstance assert is dropped since get_int_option already coerces and falls back; mandatory_condition_enforce and eap.reject_string_timestamp_filters become real booleans. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ptions Migrates six read-only flags and converts their test toggles to override_options (autouse fixtures become override_options yield-fixtures; function-scoped toggles use the decorator form): - admin.querylog_threads (admin/clickhouse/querylog.py) - enable_eap_readonly_table (storage_selectors/eap_items.py) - enable_events_readonly_table (storage_selectors/errors.py) - use_cross_item_path_for_single_item_queries (endpoint_get_traces.py) - executor_queue_size_factor (subscriptions/executor_consumer.py) - snuba_api_cogs_probability (querylog/__init__.py) admin.querylog_threads now reads via get_int_option, which always returns a valid int, so the BadThreadsValue path (and its now-unreachable test) is removed. Also fixes two pre-existing E712 lint errors in touched files. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Revert the http_batch_join_timeout migration flagged by review. Its default is
settings.BATCH_JOIN_TIMEOUT = int(os.environ.get("BATCH_JOIN_TIMEOUT", 10)),
an env-var-derived value, not a constant. sentry-options returns the schema
default when an option is unset, so deployments that raised the timeout only
via the BATCH_JOIN_TIMEOUT env var would have silently dropped back to 10.
Same class of issue as optimize_parallel_threads/simultaneous_queries_sleep_seconds,
which were never migrated for the same reason. debug_buffer_size_bytes (constant
default) stays on sentry-options.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates cache_expiry_sec (int) and read_through_cache.short_circuit (bool) in state/cache/redis/backend.py. Converts the short_circuit test toggles to override_options across four test files: decorator form for function/method toggles, and a class-level autouse override_options yield-fixture in test_max_rows_enforcer where the flag was set in a shared _insert_event helper and had to persist for the whole test. Also fixes two pre-existing E712 lint errors and one latent mypy attr-defined error surfaced by touching these files. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates nine read-only deletion knobs and converts their test toggles to override_options (decorators, a context-manager helper for the off-peak window tests, and with-blocks where a flag flips mid-test): - lw_deletions_offpeak_enabled/start/end (lw_deletions/off_peak.py) - org_ids_delete_allowlist, max_parts_mutating_for_delete (lw_deletions/strategy.py) - permit_delete_by_attribute (web/bulk_delete_query.py) - MAX_ONGOING_MUTATIONS_FOR_DELETE, storage_deletes_enabled, enforce_max_rows_to_delete (web/delete_query.py) settings.MAX_ONGOING_MUTATIONS_FOR_DELETE (5) and MAX_PARTS_MUTATING_FOR_DELETE (20) are constants, so the schema defaults match. lightweight_deletes_sync is intentionally left on runtime config: it uses `is not None` to decide whether to set the ClickHouse setting at all, which a typed scalar option cannot express. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…tions Migrates six more read-only keys and converts their test toggles to override_options (decorators, with-blocks for parametrized values, and a context helper): - ignore_clickhouse_settings_override (clickhouse_settings_override.py) - enable_long_term_retention_downsampling (routing_strategies/outcomes_based.py) - storage_routing_config_override, default_storage_routing_config (routing_strategy_selector.py) — JSON-blob configs kept as string options - subscription_primary_task_builder (subscriptions/scheduler.py) — stored as the TaskBuilderMode value string, schema default "jittered" - consistent_override (request/validation.py) — the None/str tri-state becomes a string option where empty means "no override" Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Migrates the remaining replacement read-only knobs and converts their test toggles to override_options (decorators for per-test values, with-blocks where a value flips mid-test or has a pre-read): - skip_final_subscriptions_projects, post_replacement_consistency_projects_denylist, max_group_ids_exclude (query/processors/physical/replaced_groups.py) - max_group_ids_exclude (replacers/projects_query_flags.py) — same key, both sites - skip_seen_offsets, consumer_groups_to_reset_offset_check (replacer.py) - write_node_replacements_global, replacements_bypass_projects (replacers/errors_replacer.py) settings.REPLACER_MAX_GROUP_IDS_TO_EXCLUDE (256) is a constant so the schema default matches. Bracketed-list string configs keep their "[]" string form. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ptions (rust) Reads the generic-metrics use-case killswitch from sentry-options instead of Redis runtime config, matching the other Rust consumer killswitches. The string is substring-matched against the message use_case_id. should_use_killswitch now takes Option<String> (sentry-options reads yield an Option, no Result wrapper); its unit tests are updated accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Switches the enable_any_attribute_filter read from the raw get_option to the typed get_bool_option, matching every other boolean key in the migration. The schema already enforces a boolean so behavior is unchanged, but this keeps the call sites consistent and adds the same defensive coercion as the rest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7659d7a. Configure here.
…n-4njhnb # Conflicts: # pyproject.toml # uv.lock
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Some runtime-config keys were named dynamically — one Redis key per storage,
topic, dataset, or bucket (f"{prefix}_{name}") — which a static sentry-options
schema cannot enumerate. Collapse each family into a single object option (a
dict declared with additionalProperties, defaulting to {}) keyed by the dynamic
name, and read one entry via new get_mapped_{int,float,str}_option helpers.
Migrates the five remaining dynamic-name keys:
- lw_deletes_killswitch_<storage> -> lw_deletes_killswitch (dict[str,str])
- lw_deletes_split_by_partition_<storage> -> lw_deletes_split_by_partition (dict[str,int])
- validate_schema_<topic> -> validate_schema_sample_rate (dict[str,number])
- <dataset>_ignore_consistent_queries_..rate -> ignore_consistent_queries_sample_rate (dict[str,number])
- mem_rate_limit_per_sec_<bucket> -> mem_rate_limit_per_sec (dict[str,number])
An absent entry falls back to the call-site default, preserving the previous
per-key default. Test toggles converted from set_config to override_options
with the dict value.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…options migration This migration touches files whose latent type errors were never surfaced because mypy.ini excludes tests/datasets/ and tests/query/, while pre-commit passes changed files to mypy explicitly (which bypasses that exclude). Fix the errors properly rather than masking them: - test_errors_replacer.py: narrow process_message() results (assert non-None), narrow Replacement to the errors_replacer subclass that defines get_query_time_flags/get_project_id, route re.sub through a helper that asserts the query string is non-None, annotate args/parametrize. - test_transaction_processor.py: correct serialize()/build_result() return types to the concrete dicts they return; isinstance-narrow processed messages. - test_replaced_groups.py: pass ReplacerState.<X>.value (the str the constructor expects) instead of the enum member. - test_db_query.py: narrow excinfo.value to QueryException before .extra/__cause__. - test_uniq_in_select_and_having.py: alias param is Optional[str]. - consumer.py / query_execution.py: type-only casts for confluent_kafka produce() args and a QueryExtraData TypedDict field. All changes are type-only / behavior-preserving. mypy is clean on the full changed set and ruff passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
The test conftest used os.environ.setdefault to point sentry-options at the
in-repo schemas, but the Docker test image sets ENV
SENTRY_OPTIONS_DIR=/etc/sentry-options (the production values mount, which
ships no schemas). setdefault is therefore a no-op in the container, so
sentry_options.init() raised SchemaError ("Failed to read file"), init_options()
swallowed it, and every override_options-based test failed with
NotInitializedError. Force-assign the in-repo path so init() reads the
committed schema.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…acer test
test_query_time_flags_bounded_size patched
settings.REPLACER_MAX_GROUP_IDS_TO_EXCLUDE to bound the excluded-group set, but
the production read was migrated to
get_int_option("max_group_ids_exclude", settings.REPLACER_MAX_GROUP_IDS_TO_EXCLUDE).
Once sentry-options initializes, that returns the schema default (256), ignoring
the patched settings fallback, so no bounding occurred and the test saw all 10
group ids instead of the most-recent 5. Override the sentry-option instead.
(This surfaced only after the conftest SENTRY_OPTIONS_DIR fix let init() succeed;
previously every override_options test died at NotInitializedError first.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…sentry-options
This dynamic-name key (slicing_mega_cluster_partitions_<storage_set>) was missed
in the first dynamic-options pass because its key is built into a local variable
(key = f"{PREFIX}_{storage_set.value}") rather than passed as an f-string
literal directly to get_config. Migrate it to a dict option keyed by storage-set
name (value is the bracketed logical-partition list), via get_mapped_str_option;
also drops a redundant get_config call. Test toggles converted to
override_options (which also removes a latent bug in the old delete_config key).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ector tests
test_strategy_selector.py set default_storage_routing_config and
storage_routing_config_override through runtime state.set_config (via the
imported key constants), but RoutingStrategySelector reads them with
get_str_option. Once sentry-options initializes, those reads return the schema
default ("{}") and ignore the runtime config, so the configured routing was
never exercised (test_valid_config_is_parsed_correctly failed in CI; two
"expects default" tests passed only by coincidence). Convert all 11 set_config
sites to override_options.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…-options The MappingOptimizer reads a per-storage killswitch whose key name comes from the storage YAML (self.__killswitch). The distinct names are a fixed set of four static keys, so migrate them as boolean options (default true, matching the old get_config(..., 1) "enabled unless explicitly disabled" behavior): - tags_hash_map_enabled - generic_metrics/tags_hash_map_enabled - events_tags_hash_map_enabled - events_flags_hash_map_enabled Switch the read to get_bool_option(self.__killswitch, True); convert the one test toggle to override_options. (Confirmed sentry-options accepts the '/' in the generic_metrics key.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ions
Migrate the dynamic-by-storage runtime_config reads in the Rust consumers to
sentry-options dict options (object, additionalProperties keyed by storage
name), read via options("snuba") instead of the Redis runtime-config bridge:
- clickhouse_load_balancing (string dict, default "in_order")
- clickhouse_load_balancing_first_offset (string dict; read in the same
get_load_balancing_config(), migrated together to avoid splitting its reads)
- clickhouse_max_insert_block_size (integer dict; <1048449 still ignored)
- eap_items_dlq_grace_period_min (integer dict)
Test toggles in runtime_config and writer_v2 converted from
patch_str_config_for_test to sentry_options::testing::override_options + a
once-init of the embedded SNUBA_SCHEMA (matching the blq_router/healthcheck
pattern). cargo check/test/fmt all pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
_get_query_settings_from_config read ClickHouse query settings from runtime
config via get_all_configs() and prefix filtering. Migrate to four sentry-options
dicts (object, additionalProperties string):
- query_settings / async_query_settings: {setting: value}
- query_settings_by_prefix / query_settings_by_referrer: keyed by prefix/referrer,
each value a JSON-object string {setting: value} (sentry-options can't express
dict-of-dict, so the second level is JSON-in-string), preserving the
referrer > prefix > base precedence.
Values are string-typed (ClickHouse HTTP settings are strings on the wire). The
parametrized test now applies config via override_options (no longer needs redis)
and compares against stringified expected values.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
Add a test that overrides the eap_items_dlq_grace_period_min dict option and
asserts get_dlq_grace_period_min returns the per-storage value (Some(45)), plus
absent-key (None) and negative-value (rejected) cases. This exercises the nested
serde_json::Value get on the value returned by options("snuba").get(...), which
the other migrated reads (load balancing, max insert block size) already rely on.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…ptions A full audit of `state.get_config`/`get_configs` read sites surfaced seven read-only runtime configs that were still backed by Redis. Migrate each to the `snuba` sentry-options namespace: - optimize_parallel_threads (clickhouse/optimize/util.py) - http_batch_join_timeout (clickhouse/http.py) - simultaneous_queries_sleep_seconds (clickhouse/native.py, two sites) - max_days / date_align_seconds (query/snql/parser.py) - snql_disabled_dataset__<dataset> -> snql_disabled_dataset dict (request/validation.py) - quantized_rebalance_consumer_group_delay_secs__<group> -> dict (rust_snuba rebalancing) - bypass_rate_limit / rate_history_sec / rate_limit_shard_factor (state/rate_limit.py) The two configs whose fallback came from a caller arg / env setting (optimize_parallel_threads, http_batch_join_timeout) use a sentinel-0 schema default and fall back to the original value, preserving the prior "option is only an override" behavior. The two per-suffix keys collapse into single dict options keyed by the dynamic part, matching the pattern used for the earlier dynamic-name migrations; a new get_mapped_bool_option helper backs the boolean dict. Migrating the rebalancing consumer was the last Rust caller of the Python-bridge runtime_config::get_str_config, so that reader (plus its cache and test-patch helper) is removed. Tests that previously set these via state.set_config now use sentry_options.testing.override_options. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…n-4njhnb # Conflicts: # pyproject.toml # uv.lock
…c to sentry-options The master merge combined #8106's new get_str_config("lightweight_delete_mode") read with our branch's import block (which no longer imported get_str_config), producing an undefined-name failure in pre-commit. Migrate both lightweight-delete ClickHouse-setting reads in lw_deletions/strategy.py to sentry-options instead of re-adding the legacy import: - lightweight_deletes_sync -> integer option, schema default -1 ("unset", leave ClickHouse's own default in place) - lightweight_delete_mode -> string option, schema default "" (unset) test_clickhouse_settings now drives the two flushes via override_options. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…tions Standalone read-only runtime config in the replacer's auto-replacements bypass-expiry path; not part of the ConfigurableComponent system. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…s to sentry-options
Migrate the remaining runtime-config reads in the allocation-policy / storage-
routing-strategy (ConfigurableComponent) subsystem to sentry-options:
- ConfigurableComponent.get_config_value now consults a single sentry-options
dict, `configurable_component_overrides`, keyed by the same fully-qualified
config key these configs have always used ({resource}.{ClassName}.{config}
[.{param}:{value},...]). Values are stored as strings and coerced to each
config's declared value_type. This is a single chokepoint, so it covers every
allocation policy and routing strategy at once, and it is the authoritative
source: a key absent from the option falls back to the legacy Redis runtime
config and then the code default. With the option defaulting to {}, behavior is
unchanged until the automator is populated.
- The storage-routing strategies' direct state.get_int_config reads
(time_budget_ms, sampled_too_low_threshold, max_items_before_downsampling,
min_timerange_to_query_outcomes) now read per-strategy dict options keyed by
class name, preserving the per-strategy -> global ("StorageRouting") -> default
fallback chain.
The legacy set_config_value / admin write path is left intact as the transitional
fallback; editing now happens centrally via the sentry-options-automator.
Tests that set these via state.set_config now use override_options; added
coverage for the new override precedence (incl. coercion and parameterized keys).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…aster merge Master #8101 added use_array_map_columns() in web/rpc/common/common.py reading state.get_int_config("use_array_map_columns_timestamp_seconds", ...) with a typing.cast. This branch had already removed the state/cast imports from that file when migrating the neighbouring use_sampling_factor read, so the merge produced undefined-name failures (a semantic conflict with no textual clash). Migrate the new read to get_int_option (mirroring use_sampling_factor), which also removes the need for the dropped imports. Add the use_array_map_columns_ timestamp_seconds integer option (default 1782172800) and convert its new test from snuba_set_config to override_options. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
…gration test_expiry_window_changes mock.patched snuba.replacers.replacements_and_expiry.get_int_config, but that read was migrated to get_int_option, so the patch target no longer existed (AttributeError at collection of the patched test). Patch get_int_option instead (preserving the side_effect=[5, 10] per-call semantics) and read the class-level baseline via get_int_option too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01U2Cu68uGZRcCVS14jcyd3E
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Migrates a batch of Redis-backed runtime config keys to sentry-options, and wires up the Python
sentry-optionsclient so Python code can read the samesnubanamespace the Rust consumers already use (via thesentry-optionscrate, e.g.blq_router.rs). Values stay managed centrally in sentry-options-automator; Snuba reads them read-only.Three commits:
enable_any_attribute_filter.Infrastructure
sentry-options>=1.1.1(pyproject.toml,uv.lock) — the Python binding of the same client the Rust consumers use.snuba/state/sentry_options.py:init_options()— idempotent, never raises (a missing/misconfigured options mount must not break startup); called fromsetup_sentry(), the chokepoint every entrypoint andpytest_configurealready hits.get_option(key, default)and typedget_bool_option/get_int_option/get_float_option/get_str_option(key, default)— return the configured value (or schema default), falling back to the caller'sdefaulton anyOptionsError. Mirrors the Rust.ok()...unwrap_or(default)semantics so call sites behave exactly as before.conftest.pypointsSENTRY_OPTIONS_DIRat the in-repo schema soinit()is cwd-independent.Keys migrated
Python (RPC / query path) — schema type / default match the prior
get_*_configdefault, so behavior is unchanged until a value is set in automator:enable_any_attribute_filteraggregation_deprecation_enabledenable_trace_paginationuse.low.cardinality.processorcross_item_queries_no_sample_outerdefault_tierexport_trace_items_default_page_sizeuse_sampling_factor_timestamp_secondsExecutionStage.max_query_size_bytesEndpointGetTrace.apply_final_rollout_percentagerpc_logging_sample_raterpc_logging_flush_logsExecutionStage.disable_max_query_size_check_for_clustersRust consumers (previously read via
runtime_config::get_str_config, i.e. a PyO3 round-trip into Python/Redis — now read natively, no PyO3):eap_items_drop_invalid_timestampsexperimental_healthcheckTests that toggled these via
state.set_config(...)/patch_str_config_for_test(...)now usesentry_options.testing.override_options(...)(Python) /sentry_options::testing::override_options(...)(Rust).Deliberately NOT migrated
sentry-options requires every key to be declared in a static schema, so dynamically-named / parameterized keys can't move, and the runtime-config management plane must stay:
clickhouse_load_balancing:<storage>,clickhouse_max_insert_block_size:<storage>,eap_items_dlq_grace_period_min:<storage>,quantized_rebalance_consumer_group_delay_secs__<group>,validate_schema_<topic>, allocation-policy configs, rate-limiter buckets.snuba.stateCRUD and audit log — sentry-options has no in-Snuba write path, and it manages the keys that can't migrate.generic_metrics_use_case_killswitch) — these don't map cleanly to a scalar option and are left for a follow-up.enable_long_term_retention_downsampling— migratable, but its test refactor needs care; deferred to a follow-up.Operational note
After cutover these keys are read-only from Snuba and edited in sentry-options-automator (not the admin UI). For each, set the value in automator to match any current production override before this lands; effective defaults are otherwise unchanged. Several are operational killswitches — flagging so reviewers can veto moving any specific one off the live admin toggle. Deploy-infra detail to confirm: the options ConfigMap is mounted for the Python web/RPC pods (the Rust consumer pods already have it).
Verification
pytest tests/state/test_sentry_options.py→ 6 passed;mypy(strict) clean on changed source;ruff check+ruff format --checkclean.cargo checkclean;cargo test healthcheck(2) +utils(4) pass. Theruntime_configPyO3 tests fail only in this sandbox (no Python bootstrap) — they're unrelated to these changes and run in CI.🤖 Generated with Claude Code
Generated by Claude Code