ref(rpc): always apply a limit to GetTrace queries#8108
Draft
phacops wants to merge 2 commits into
Draft
Conversation
EndpointGetTrace could issue unbounded ClickHouse queries: when a request arrived without a limit and no global cap was configured (the default, ENDPOINT_GET_TRACE_PAGINATION_MAX_ITEMS = 0), or whenever pagination was disabled, `limit` was left as None and the query ran with no LIMIT clause. These unbounded queries are the worst offenders for ClickHouse and granian memory. Ensure every GetTrace query is bounded: - `_get_pagination_limit` now always returns a positive int, falling back to a default row limit (10k, matching the sibling table/traces endpoints) when no user limit and no configured cap are present. - The limit is computed and applied even when pagination is disabled; the cross-item decrement/page-token logic stays gated on pagination being on. Also fix the misleading `eap_trace_request_without_limit` metric. It was emitted unconditionally on every request, so it tracked total call volume rather than requests without a limit. It now increments only when the request arrives without a user-supplied limit, making the name accurate and giving real signal on how often the default cap is relied upon. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TheCLK7ZEBCnw5bc3Qboth
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
EndpointGetTracecould issue unbounded ClickHouse queries. When a request arrived without a limit and no global cap was configured — which is the default, sinceENDPOINT_GET_TRACE_PAGINATION_MAX_ITEMS = 0("0 means no limit") — or whenever pagination was disabled,limitwas left asNoneand the query ran with noLIMITclause at all. These unbounded queries are the worst offenders for both ClickHouse and granian memory.This came out of investigating the
snuba.rpc.eap_trace_request_without_limitmetric, which turned out to be doubly misleading:GetTracerequest, so it tracked total call volume, not requests without a limit (~3.5M/week, exclusivelyendpoint_name:endpointgettrace).GetTracesboth fall back to a default row limit), soGetTracewas the one genuinely emittingLIMIT-less SQL.Changes
_get_pagination_limitnow always returns a positiveint, falling back to a default row limit (_DEFAULT_ROW_LIMIT = 10_000, matching the sibling table/traces endpoints) when there is neither a user-supplied limit nor a configured cap.enable_trace_paginationflag; only the cross-item decrement / page-token logic stays gated on pagination being enabled. (Trade-off: with pagination disabled and a trace larger than the cap, results are truncated to the cap with no page token — bounded beats unbounded, and pagination is on by default.)eap_trace_request_without_limitnow increments only when the request arrives without a user-supplied limit (in_msg.limit <= 0), so the name is accurate and it gives real signal on how often the default cap is relied upon, per referrer.ENDPOINT_GET_TRACE_PAGINATION_MAX_ITEMSdoc comment to reflect that<= 0now means "fall back to the default" rather than "no limit".Out of scope (flagged for follow-up)
GetTraces(plural) cross-item path technically has aLIMIT, but its default fallback is_TRACE_LIMIT = 50_000_000— a nominal cap so high it's effectively unbounded for memory. Left untouched here since it does emit aLIMITclause and lowering it has broader trace-explorer implications worth a separate discussion.Test plan
test_get_pagination_limit_is_always_boundedcovering the full truth table (no/negative/positive configured cap × with/without user limit) — asserts the result is always a positive int.test_query_is_bounded_when_pagination_disabledspies on the built Snuba request and asserts the query carries_DEFAULT_ROW_LIMITeven with pagination disabled and no user limit.GetTrace/ pagination tests still describe the same behavior (data sets are far below the 10k cap).🤖 Generated with Claude Code
https://claude.ai/code/session_01TheCLK7ZEBCnw5bc3Qboth
Generated by Claude Code