enhancement(metrics): support for metrics v3 protocol#1175
Conversation
Binary Size Analysis (Agent Data Plane)Baseline: da59d73 · Comparison: e78a605 · diff ✅ Binary size difference within thresholdChanges by Module
Detailed Symbol Changes |
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ✅ No significant changes detectedFine details of change detection per experiment (5)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
|
This is temporarily blocked on there being a version of the Datadog Agent for us to test against in correctness tests that has up-to-date v3 metrics support. Currently, we're hitting an issue related to rate intervals being delta encoded when they shouldn't be. That bug is fixed in DataDog/datadog-agent#45825 but won't be released until 7.77: roughly 2 weeks from now before an RC is available to use. We can potentially do a hacky image build or something for keep going in the meantime and then switch back to a proper Agent version once available, we'll see. |
79cdda1 to
59636cd
Compare
|
We've temporarily handled the issue of correctness tests by using a "dev" container image ( We can't merge this as-is: we need to wait for at least an RC build of Datadog Agent 7.77 so we can pin to a non-development image. In the meantime, I'm going to work on making sure we've integrated all of the same small fixes/changes that have been steadily being made upstream in the Datadog Agent repository for V3 support. |
30ee642 to
898021d
Compare
be9a81c to
a9f5109
Compare
a9f5109 to
31b5f82
Compare
ca5ebc6 to
16afa2e
Compare
This comment has been minimized.
This comment has been minimized.
f27ed86 to
f4967da
Compare
f4967da to
d804b60
Compare
348f227 to
d203dbf
Compare
ef6d87a to
fc9e7f8
Compare
a0adf59 to
f059af1
Compare
Adds experimental V3 columnar encoding for series and sketch metrics behind the serializer_experimental_use_v3_api.* config keys, including a V2/V3 validation mode, V3 intake routing/filtering, intake-side V3 payload parsing for correctness testing, and the dsd-plain-v3 correctness cases. Existing V1/V2 series and sketch encoding is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
<!-- Please provide a brief summary about what this PR does. This should help the reviewers give feedback faster and with higher quality. --> Adds V3 series shadow sampling support to ADP, matching the Core Agent config/defaults for: - `serializer_experimental_use_v3_api.series.shadow_sample_rate` - `serializer_experimental_use_v3_api.series.shadow_sites` - `serializer_experimental_use_v3_api.series.beta_route` When series V3 is not authoritative, ADP can now sample V2 series flushes and send a correlated V3 beta shadow payload with the same metrics validation batch headers. Shadowing is limited to V2 series baselines, matching the Core Agent behavior. Also fixes the V3 correctness harness so the new `dsd-plain-v3` cases decode and compare V3 payloads correctly. Fake intake now handles V3 metric routes, and `stele` normalizes V3 columnar payloads, including host resources and Agent-compatible sketch summary ordering. Todo / Follow Up: Shadow sampling is currently encoder-scoped in ADP. A follow-up should make it endpoint/resolver-scoped like the Core Agent for mixed endpoint and multi-site configurations. - [ ] Bug fix - [x] New feature - [ ] Non-functional (chore, refactoring, docs) - [ ] Performance <!-- Please how you tested these changes here --> Unit Tests / CI <!-- Please list any issues closed by this PR. --> <!-- - Closes: <issue link> --> <!-- Any other issues or PRs relevant to this PR? Feel free to list them here. -->
## Summary <!-- Please provide a brief summary about what this PR does. This should help the reviewers give feedback faster and with higher quality. --> Fixes logical mismatches found when testing the v3 pipeline in validation mode where v2 tag hashes differed from v3 the v3 tag hash. ## Change Type - [x] Bug fix - [ ] New feature - [ ] Non-functional (chore, refactoring, docs) - [ ] Performance ## How did you test this PR? <!-- Please how you tested these changes here --> ## References <!-- Please list any issues closed by this PR. --> <!-- - Closes: <issue link> --> <!-- Any other issues or PRs relevant to this PR? Feel free to list them here. -->
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->
Adds ADP telemetry needed for Metrics V3 rollout dashboards.
New/remapped telemetry:
- `serializer.v3_column_size{column,compressed}`
- `serializer.v3_values_count{type}`
- `serializer.v3_payload_split_reason{reason}`
- `sketch_series.sketch_too_big`
- transaction bytes by endpoint/domain/status
- [ ] Bug fix
- [ ] New feature
- [x] Non-functional (chore, refactoring, docs)
- [ ] Performance
<!-- Please how you tested these changes here -->
Deployed custom image to staging
<!-- Please list any issues closed by this PR. -->
<!--
- Closes: <issue link>
-->
<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
<!-- Please provide a brief summary about what this PR does. This should help the reviewers give feedback faster and with higher quality. --> This pr adds ADP Metrics V3 enablement config while mirroring Datadog Agent V3 routing semantics. Config keys added: - `use_v3_api.series.enabled` - `use_v3_api.series.endpoints` - `observability_pipelines_worker.metrics.use_v3_api.series` - `vector.metrics.use_v3_api.series` ADP keeps V3 series off by default with the ADP-only safety gate: - `data_plane.metrics.v3.series.enabled` ADP also mirrors the Agent behavior that disables Metrics V3 when the compressor kind is `zlib`. - [ ] Bug fix - [ ] New feature - [ ] Non-functional (chore, refactoring, docs) - [ ] Performance <!-- Please how you tested these changes here --> Will be doing a deploy utilizing the new config flags. <!-- Please list any issues closed by this PR. --> <!-- - Closes: <issue link> --> <!-- Any other issues or PRs relevant to this PR? Feel free to list them here. -->
## Summary
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->
This pr makes authoritative ADP Metrics V3 batching use V3-specific
limits instead of following V2 flush boundaries.
Changes include:
- Tracking buffered V3 series/sketch point counts.
- Flushing authoritative V3 on the configured point limit, metric count
limit, or timeout.
- Carrying the current metric into the next V3 batch when adding it
would exceed the point
limit.
- Recording `serializer.v3_payload_split_reason{reason:max_points}` for
point-limit splits.
- Keeps V3 validation and shadow modes aligned with V2 flush boundaries.
## Change Type
- [ ] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
## How did you test this PR?
<!-- Please how you tested these changes here -->
Will deploy a custom image to staging
## References
<!-- Please list any issues closed by this PR. -->
<!--
- Closes: <issue link>
-->
<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
b4f8c37 to
e78a605
Compare

Summary
Change Type
How did you test this PR?
References