fix(trace-stats)!: add grpc_method to aggregation key#2151
Conversation
Spans with different gRPC methods were previously merged into the same stats group (only the first span's method was kept). Adding grpc_method to FixedAggregationKey ensures each method gets a separate bucket. The OtlpExactGroup.grpc_method field is now sourced from the key rather than a GroupedStats sidecar. The agent /v0.6/stats protobuf wire format is unchanged (no grpc_method field in ClientGroupedStats). SHM_VERSION bumped to 2 because FixedAggregationKey<StringRef> is #[repr(C)] and the new field changes the layout; mismatched sidecar/worker pairs will safely fail with a version-mismatch error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be18c4394d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
Bincode encodes struct fields positionally. Inserting grpc_method before http_status_code shifted all subsequent field positions, breaking IPC fallback decoding (OwnedShmSpanInput) between mismatched worker/sidecar versions. Moving it to the end of the struct preserves all existing field positions, so old-format IPC messages are decoded correctly up to the grpc_method field; the decode then fails with EOF rather than silently misinterpreting existing fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VianneyRuhlmann
left a comment
There was a problem hiding this comment.
Some concerns regarding cardinality for non-otlp exporter otherwise LGTM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xport span.resource already carries the full gRPC method path for gRPC spans (e.g. /package.Service/Method). Adding grpc_method as a separate aggregation dimension or OTLP attribute is redundant and adds cardinality. Removes: - grpc_method from FixedAggregationKey - grpc_method from OtlpExactGroup - rpc.method from the OTLP data-point attributes emitted by build_attributes - GRPC_METHOD_FIELD constant and get_grpc_method() helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?
Removes `grpc_method` / `rpc.method` entirely from the OTLP trace metrics pipeline.
Previously `grpc_method` was a sidecar on `GroupedStats` — present in the OTLP metric payload as an explicit attribute, but not part of the aggregation key. This PR removes it from both:
Rationale
For gRPC spans, `span.resource` already carries the full method path (e.g. `/package.Service/Method`). Emitting `rpc.method` as a separate OTLP attribute is redundant — `span.resource` (mapped to `span.name` on the data point) is the authoritative carrier of gRPC method identity and is already an aggregation dimension.
Adding `rpc.method` as a separate dimension would also increase cardinality for gRPC services without providing any information not already present on the metric.
Note: `rpc.response.status_code` (from `grpc.status.code`) is kept as a first-class aggregation dimension — it is not redundant with any existing field.
Motivation
Correctness and cleanliness for the OTLP trace metrics feature landed in #2067.
Risk
Breaking changes (major semver):
Blast radius is limited because:
Urgency: low. No live production systems are affected. This must land before the next release that ships the OTLP metrics feature.
How to test the change?