Skip to content

enhancement(metrics): support for metrics v3 protocol#1175

Draft
tobz wants to merge 7 commits into
mainfrom
tobz/datadog-metrics-v3-payload-support
Draft

enhancement(metrics): support for metrics v3 protocol#1175
tobz wants to merge 7 commits into
mainfrom
tobz/datadog-metrics-v3-payload-support

Conversation

@tobz

@tobz tobz commented Feb 6, 2026

Copy link
Copy Markdown
Member

Summary

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

How did you test this PR?

References

@dd-octo-sts dd-octo-sts Bot added area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. encoder/datadog-metrics Datadog Metrics encoder. forwarder/datadog Datadog forwarder. labels Feb 6, 2026

tobz commented Feb 6, 2026

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@tobz tobz changed the title claude-generated v3 implementation enhancement(datadog encoder): support for metrics v3 protocol Feb 6, 2026
@pr-commenter

pr-commenter Bot commented Feb 6, 2026

Copy link
Copy Markdown

Binary Size Analysis (Agent Data Plane)

Baseline: da59d73 · Comparison: e78a605 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 40.55 MiB (baseline) vs 40.75 MiB (comparison)
Size Change: +208.42 KiB (+0.50%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
saluki_components::encoders::datadog +115.30 KiB 428
core +72.91 KiB 15490
figment +58.52 KiB 645
saluki_components::common::datadog +57.63 KiB 539
serde -42.73 KiB 94
alloc -35.21 KiB 3064
hashbrown +33.45 KiB 1481
saluki_components::sources::dogstatsd -21.29 KiB 355
anon.c631d4f28c2b2db28eff0b5e986a13c9.248.llvm.11665572444932606208 -16.70 KiB 1
anon.a14e3b46faabe6cbf5d496845f50a791.1.llvm.5370817080494531371 +16.70 KiB 1
anon.0beb901ed35d6fa3b889001aad262adc.649.llvm.782532261936460638 +15.25 KiB 1
anon.661ef608d0e9e2423c9230617f2bc873.62.llvm.10199967635082326453 -15.25 KiB 1
datadog_protos::trace_piecemeal_include::datadog +15.18 KiB 21
piecemeal -14.10 KiB 43
tokio -13.62 KiB 4749
&mut serde_json +12.61 KiB 91
hyper_util -11.49 KiB 161
quick_cache -8.91 KiB 166
serde_core -8.42 KiB 992
saluki_core::topology::blueprint -8.33 KiB 89
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.6%  +148Ki  +0.7%  +122Ki    [55159 Others]
  [NEW] +57.7Ki  [NEW] +57.6Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h43bea40865104461
  [NEW] +40.3Ki  [NEW] +40.2Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hedea565e69c57390
  [NEW] +39.1Ki  [NEW] +38.9Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h4aaa6af49c9fde5b
  [NEW] +36.2Ki  [NEW] +35.9Ki    _<saluki_components::common::datadog::obfuscation::_::<impl serde_core::de::Deserialize for saluki_components::common::datadog::obfuscation::ObfuscationConfig>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map::h85965b50bfe07de9
  [NEW] +30.3Ki  [NEW] +30.2Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::h3f44af8f5c14ae9f
  [NEW] +29.0Ki  [NEW] +28.8Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::h5b92aa19ac8a2b33
  [NEW] +28.5Ki  [NEW] +28.4Ki    saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::translate_metrics::he59c2dac62fdc695
  [NEW] +25.7Ki  [NEW] +25.5Ki    _<saluki_components::sources::dogstatsd::DogStatsDConfiguration as saluki_core::components::sources::builder::SourceBuilder>::build::_{{closure}}::h8b6bbe704f229baf
  [NEW] +25.1Ki  [NEW] +24.9Ki    core::ptr::drop_in_place<agent_data_plane::cli::run::handle_run_command::{{closure}}>::h93f3449c5a9aaa43
  [NEW] +24.8Ki  [NEW] +24.7Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::h62ba5dfe67284c63
  [NEW] +24.6Ki  [NEW] +24.4Ki    agent_data_plane::internal::remote_agent::run_remote_agent_registration_loop::_{{closure}}::h0ee20f9c16ea5c40
  [DEL] -24.5Ki  [DEL] -24.3Ki    agent_data_plane::internal::remote_agent::run_remote_agent_registration_loop::_{{closure}}::hcf083b5830259c14
  [DEL] -25.0Ki  [DEL] -24.9Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::h4618b6207a99eb2b
  [DEL] -25.1Ki  [DEL] -24.9Ki    core::ptr::drop_in_place<agent_data_plane::cli::run::handle_run_command::{{closure}}>::h9f679e5a6425f93c
  [DEL] -28.3Ki  [DEL] -28.2Ki    saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::translate_metrics::h6032372c5d685c9d
  [DEL] -30.4Ki  [DEL] -30.3Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::hd47e947b534c018e
  [DEL] -33.7Ki  [DEL] -33.5Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::hb4a84cbbc21a072f
  [DEL] -39.8Ki  [DEL] -39.7Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h772878347210b56d
  [DEL] -40.0Ki  [DEL] -39.8Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h622a7084ee8b9e54
  [DEL] -54.3Ki  [DEL] -54.1Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h2c37bd2f0a7c3fc7
  +0.5%  +208Ki  +0.5%  +182Ki    TOTAL

@pr-commenter

pr-commenter Bot commented Feb 6, 2026

Copy link
Copy Markdown

Regression Detector (Agent Data Plane)

Run ID: 520349ea-7d68-4e04-9933-42aa0034b0e7
Baseline: da59d73b · Comparison: e78a6054 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (5)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
quality_gates_rss_idle memory ⚪ +1.32 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ +0.87 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ +0.68 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.41 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ +0.30 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 131 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 42.9 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 65.6 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 191 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 28.8 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

@tobz

tobz commented Feb 11, 2026

Copy link
Copy Markdown
Member Author

This is temporarily blocked on there being a version of the Datadog Agent for us to test against in correctness tests that has up-to-date v3 metrics support.

Currently, we're hitting an issue related to rate intervals being delta encoded when they shouldn't be. That bug is fixed in DataDog/datadog-agent#45825 but won't be released until 7.77: roughly 2 weeks from now before an RC is available to use. We can potentially do a hacky image build or something for keep going in the meantime and then switch back to a proper Agent version once available, we'll see.

@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from 79cdda1 to 59636cd Compare February 12, 2026 18:34
@dd-octo-sts dd-octo-sts Bot added area/io General I/O and networking. area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels Feb 12, 2026
@tobz

tobz commented Feb 13, 2026

Copy link
Copy Markdown
Member Author

We've temporarily handled the issue of correctness tests by using a "dev" container image (datadoghq/agent-dev) based on the latest fix for V3 support in the Datadog Agent. With this in place, we're now currently passing for both dsd-plain (V2 payloads) and dsd-plain-v3 (new, V3 payloads only).

We can't merge this as-is: we need to wait for at least an RC build of Datadog Agent 7.77 so we can pin to a non-development image. In the meantime, I'm going to work on making sure we've integrated all of the same small fixes/changes that have been steadily being made upstream in the Datadog Agent repository for V3 support.

@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from 30ee642 to 898021d Compare February 25, 2026 13:23
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from be9a81c to a9f5109 Compare March 9, 2026 13:28
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from a9f5109 to 31b5f82 Compare March 30, 2026 17:50
Comment thread lib/saluki-components/src/encoders/datadog/metrics/v3/writer.rs Outdated
Comment thread lib/saluki-components/src/encoders/datadog/metrics/v3/writer.rs Outdated
Comment thread bin/correctness/ground-truth/src/analysis/metrics/types.rs Outdated
Comment thread lib/saluki-components/src/encoders/datadog/metrics/mod.rs Outdated
Comment thread lib/saluki-components/src/encoders/datadog/metrics/mod.rs Outdated
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from ca5ebc6 to 16afa2e Compare April 9, 2026 19:44
Comment thread lib/saluki-components/src/encoders/datadog/metrics/mod.rs Outdated
Comment thread lib/saluki-components/src/encoders/datadog/metrics/mod.rs Outdated
@dd-octo-sts dd-octo-sts Bot removed the area/ci CI/CD, automated testing, etc. label May 19, 2026
@datadog-datadog-prod-us1

This comment has been minimized.

@dd-octo-sts dd-octo-sts Bot added the area/ci CI/CD, automated testing, etc. label May 27, 2026
@rayz rayz changed the title enhancement(datadog encoder): support for metrics v3 protocol enhancement(metrics): support for metrics v3 protocol May 27, 2026
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from f27ed86 to f4967da Compare June 1, 2026 19:04
@dd-octo-sts dd-octo-sts Bot added area/docs Reference documentation. and removed area/ci CI/CD, automated testing, etc. forwarder/datadog Datadog forwarder. labels Jun 1, 2026
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from f4967da to d804b60 Compare June 1, 2026 19:49
@dd-octo-sts dd-octo-sts Bot removed the area/docs Reference documentation. label Jun 1, 2026
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from 348f227 to d203dbf Compare June 1, 2026 20:48
@dd-octo-sts dd-octo-sts Bot added area/docs Reference documentation. forwarder/datadog Datadog forwarder. labels Jun 1, 2026
@rayz rayz force-pushed the tobz/datadog-metrics-v3-payload-support branch from ef6d87a to fc9e7f8 Compare June 11, 2026 16:00
@dd-octo-sts dd-octo-sts Bot added area/docs Reference documentation. and removed area/docs Reference documentation. labels Jun 11, 2026
@tobz tobz force-pushed the tobz/datadog-metrics-v3-payload-support branch from a0adf59 to f059af1 Compare June 11, 2026 17:33
tobz and others added 7 commits June 25, 2026 17:37
Adds experimental V3 columnar encoding for series and sketch metrics behind the serializer_experimental_use_v3_api.* config keys, including a V2/V3 validation mode, V3 intake routing/filtering, intake-side V3 payload parsing for correctness testing, and the dsd-plain-v3 correctness cases. Existing V1/V2 series and sketch encoding is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

Adds V3 series shadow sampling support to ADP, matching the Core Agent
config/defaults for:

  - `serializer_experimental_use_v3_api.series.shadow_sample_rate`
  - `serializer_experimental_use_v3_api.series.shadow_sites`
  - `serializer_experimental_use_v3_api.series.beta_route`

When series V3 is not authoritative, ADP can now sample V2 series
flushes and send a correlated V3 beta shadow payload with the same
metrics validation batch headers. Shadowing is limited to V2 series
baselines, matching the Core Agent behavior.

Also fixes the V3 correctness harness so the new `dsd-plain-v3` cases
decode and compare V3 payloads correctly. Fake intake now handles V3
metric routes, and `stele` normalizes V3 columnar payloads, including
host resources and Agent-compatible sketch summary ordering.

Todo / Follow Up: Shadow sampling is currently encoder-scoped in ADP. A
follow-up should make it endpoint/resolver-scoped like the Core Agent
for mixed endpoint and multi-site configurations.

- [ ] Bug fix
- [x] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance

<!-- Please how you tested these changes here -->
Unit Tests / CI

<!-- Please list any issues closed by this PR. -->

<!--
- Closes: <issue link>
-->

<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
## Summary
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

Fixes logical mismatches found when testing the v3 pipeline in
validation mode where v2 tag hashes differed from v3 the v3 tag hash.

## Change Type
- [x] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance


## How did you test this PR?
<!-- Please how you tested these changes here -->

## References

<!-- Please list any issues closed by this PR. -->

<!--
- Closes: <issue link>
-->

<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

  Adds ADP telemetry needed for Metrics V3 rollout dashboards.

  New/remapped telemetry:

  - `serializer.v3_column_size{column,compressed}`
  - `serializer.v3_values_count{type}`
  - `serializer.v3_payload_split_reason{reason}`
  - `sketch_series.sketch_too_big`
  - transaction bytes by endpoint/domain/status

- [ ] Bug fix
- [ ] New feature
- [x] Non-functional (chore, refactoring, docs)
- [ ] Performance

<!-- Please how you tested these changes here -->
Deployed custom image to staging

<!-- Please list any issues closed by this PR. -->

<!--
- Closes: <issue link>
-->

<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

This pr adds ADP Metrics V3 enablement config while mirroring Datadog
Agent V3 routing semantics.

Config keys added:

  - `use_v3_api.series.enabled`
  - `use_v3_api.series.endpoints`
  - `observability_pipelines_worker.metrics.use_v3_api.series`
  - `vector.metrics.use_v3_api.series`

ADP keeps V3 series off by default with the ADP-only safety gate:
  - `data_plane.metrics.v3.series.enabled`

ADP also mirrors the Agent behavior that disables Metrics V3 when the
compressor kind is `zlib`.

- [ ] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance

<!-- Please how you tested these changes here -->
Will be doing a deploy utilizing the new config flags.

<!-- Please list any issues closed by this PR. -->

<!--
- Closes: <issue link>
-->

<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
## Summary
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

This pr makes authoritative ADP Metrics V3 batching use V3-specific
limits instead of following V2 flush boundaries.

Changes include:
  - Tracking buffered V3 series/sketch point counts.
- Flushing authoritative V3 on the configured point limit, metric count
limit, or timeout.
- Carrying the current metric into the next V3 batch when adding it
would exceed the point
  limit.
- Recording `serializer.v3_payload_split_reason{reason:max_points}` for
point-limit splits.
- Keeps V3 validation and shadow modes aligned with V2 flush boundaries.

## Change Type
- [ ] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance


## How did you test this PR?
<!-- Please how you tested these changes here -->

Will deploy a custom image to staging

## References

<!-- Please list any issues closed by this PR. -->

<!--
- Closes: <issue link>
-->

<!-- Any other issues or PRs relevant to this PR? Feel free to list them
here. -->
@rayz rayz force-pushed the tobz/datadog-metrics-v3-payload-support branch from b4f8c37 to e78a605 Compare June 25, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/core Core functionality, event model, etc. area/docs Reference documentation. area/io General I/O and networking. area/test All things testing: unit/integration, correctness, SMP regression, etc. encoder/datadog-metrics Datadog Metrics encoder. forwarder/datadog Datadog forwarder.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants