Skip to content

chore(config): generate datadog config deserializer#1863

Open
webern wants to merge 1 commit into
mainfrom
m/config-gen-datadog-deserializer
Open

chore(config): generate datadog config deserializer#1863
webern wants to merge 1 commit into
mainfrom
m/config-gen-datadog-deserializer

Conversation

@webern

@webern webern commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Human Summary

This PR introduces a deserialization structure (nested) for Datadog configuration, generated from the specification. It isn't wired into the configuration system yet so at the moment it is inert bagage. Later it will be placed in front of ADP's access to Datadog Agent configuration to prevent the leakage of GenericConfiguration throughout the system (stay tuned).

AI Summary

Generates a public, typed DatadogConfiguration deserializer in datadog-agent-config for the
Datadog Agent config keys ADP actually supports (support: full / support: partial in the
overlay), and frees up that name by renaming the forwarder's struct first.

This is the first PR of the mapless Configuration Translation System. The generated type is a typed
input surface for supported Agent config. It is expected to be mostly unused until the
translator PR (PR 3) — it is intentionally dead code for now, not an accidental leftover.

What it does:

  • Renames the forwarder DatadogConfigurationDatadogForwarderConfiguration
    (saluki-components forwarders + the two call sites in agent-data-plane) so the generated type
    can own the DatadogConfiguration name.
  • Adds a build-time generator (config/build/datadog_config_gen.rs) that:
    1. Projects the vendored schema down to only the overlay-supported keys, preserving the YAML
      nesting (132 leaves across 93 top-level keys), and
    2. Hands that pruned JSON Schema to Oxide's typify
      to generate a nested struct tree (DatadogConfiguration,
      DatadogConfigurationApmConfig, ...; 31 structs).
  • Exposes datadog_agent_config::DatadogConfiguration (re-exported at the crate root).

Design notes:

  • Overlay decides which keys; typify decides the Rust shape. Our domain knowledge (the overlay)
    drives the key selection via the pruning step; typify does the mechanical schema→Rust generation.
  • No schema type rewriting. The generated struct is a faithful mirror of the vendored schema.
    The Datadog schema types all numerics as number, so ports/sizes/timeouts surface as f64.
    Semantic refinement (e.g. port → u16) belongs in the downstream translator, where the
    overlay/semantic knowledge already lives — not in this mirror. typify has no per-field type knob
    (its conversions match by schema shape, and these keys share an identical {type: number}
    schema), so honoring "no ugly schema manipulation" means accepting f64 here.
  • Field optionality follows the schema's defaults. typify's native behavior: a key with a
    non-null default becomes a non-optional field with #[serde(default = "...")]; a key with no
    default (or a null default) becomes Option<T>. Object sections without a default are therefore
    Option<Section>. additional_endpoints becomes HashMap<String, Vec<String>>.

New build-dependencies on datadog-agent-config (all permissive licenses): typify (Apache-2.0),
schemars (MIT), prettyplease, syn (plus transitive regress, dyn-clone, schemars_derive,
serde_derive_internals, typify-impl, recorded in LICENSE-3rdparty.csv). The generated
src/datadog_configuration.rs is checked in with a // @generated header (same pattern as
classifier_data.rs) and is rustfmt-canonicalized by the generator so it stays stable across
regenerations.

The generator post-processes typify's output for readability: strips the embedded
<details>...```json```...</details> schema blocks; converts multi-line /** */ doc blocks to
/// lines and normalizes leading spaces (on the syn AST); shortens fully qualified prelude paths
(::std::option::Option -> Option, ::std::collections::HashMap -> HashMap + a use, etc.) via
a syn VisitMut (serde derives and the boilerplate error module's ::std::fmt/Cow paths are
left qualified to avoid a fmt::Result vs prelude Result collision); and inserts blank lines
between fields and between top-level items. Because the generated doc comments are verbatim schema
prose (identifiers, units, etc.), the file is exempted from the Vale prose linter via the
check-docs glob in the Makefile.

Change Type

  • Non-functional (chore, refactoring, docs)

How did you test this PR?

  • make check-all passes (fmt + cargo sort, clippy, docs/Vale, deny, licenses, unused-deps,
    api-docs, features) and make test-all passes. (The generator shells the output through rustfmt,
    which auto-discovers the repo rustfmt.toml; the lone nightly-only option group_imports is a
    no-op since the generated file has no use imports.)

  • cargo build -p saluki-components -p agent-data-plane compiles after the rename; the renamed
    forwarder unit test passes.

  • Overlay-driven generation proof (local only): temporarily changed aggregator_buffer_size
    from support: none to support: full in schema_overlay.yaml, ran
    make build-schema-overlay, and verified the regenerated DatadogConfiguration gained
    pub aggregator_buffer_size: f64 plus its schema default function. The same run moved the key
    from unsupported to supported generated registry/docs outputs. Reverted the temporary overlay and
    generated-file changes afterward.

  • Supported-key audit: parsed the generated DatadogConfiguration struct tree and compared leaf
    paths against the overlay. Result: 132 generated leaf fields == 132 support: full / support: partial overlay keys; 0 generated leaves outside that set; 0 missing supported keys; 0
    unsupported or excluded keys present.

  • DROPME proof against a real Datadog Agent (removed before merge): a temporary block in
    run.rs (at the config-check point, where the Core Agent's authoritative config-stream config is
    in hand) deserializes the generated DatadogConfiguration and logs the result. A Panoramic
    integration case (test/integration/cases/adp-datadog-config-deserialize) runs ADP inside the
    bundled image alongside the real Datadog Agent (non-standalone, RAR registration, new config
    stream endpoint) and asserts the OK line. Run with
    panoramic run --tests adp-datadog-config-deserialize --runtime linux. Result: PASS. Observed log
    sequence:

    i.e. the generated type deserializes the config the real Agent actually delivers over the config
    stream, and ADP stays stable with no panics afterward. The DROPME runtime block and the test case
    are removed before review.

References

@dd-octo-sts dd-octo-sts Bot added area/components Sources, transforms, and destinations. forwarder/datadog Datadog forwarder. labels Jun 12, 2026
@datadog-prod-us1-6

This comment has been minimized.

@pr-commenter

pr-commenter Bot commented Jun 12, 2026

Copy link
Copy Markdown

Binary Size Analysis (Agent Data Plane)

Baseline: 9c1abde · Comparison: 0b4737b · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 40.37 MiB (baseline) vs 40.37 MiB (comparison)
Size Change: -2.00 KiB (-0.00%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
anon.db13c838dff0ff8ee9d7f038b638fedf.281.llvm.9875580372017412076 -16.69 KiB 1
anon.4971a138abcbf76d70dd2f0a87a5dffb.52.llvm.8870522310722751024 +16.61 KiB 1
anon.377800c85754e3c34a43cb9f4c161761.44.llvm.10872382834989709062 +15.25 KiB 1
anon.661ef608d0e9e2423c9230617f2bc873.62.llvm.10199967635082326453 -15.25 KiB 1
anon.a9030c24ed0837a8b572f3c7e56188a8.26.llvm.9410433398480454732 +13.73 KiB 1
anon.832b9be9c2c572291f8c640f6565567f.834.llvm.18149987771942695302 -13.73 KiB 1
anon.6396380801bd336505fbbac2d6f4a6a2.103.llvm.12314569896552618589 +8.15 KiB 1
anon.77d0d1ecb3d3a439681fa998b99095e3.101.llvm.14984887614861076458 -8.15 KiB 1
anon.4ca187a8df3537fef265e620f4b32dbe.162.llvm.18445489589776564424 -6.72 KiB 1
anon.261bb94b8d6de96de010fe9a3c608bbb.4.llvm.7628180700126258038 +6.72 KiB 1
anon.2bdeff8a4d91f2fa2bb756f31c5e7f0b.5.llvm.11162411782438170453 +4.57 KiB 1
anon.041c5d0df62dca0864405ccd9777d0ae.6.llvm.18097319022462627246 -4.56 KiB 1
anon.0cdb6c17c6f65ce5d46409fbadd031b9.210.llvm.2224256004584807259 +3.99 KiB 1
anon.781685e7ede517c1f7c2c1a483910085.72.llvm.2359613053287896254 -3.98 KiB 1
anon.4d15634fe070e9b7fe1171631812c8b4.203.llvm.18414242758698911687 -3.83 KiB 1
anon.516c76a6e1c24b1e55a00dcc0bda4cd4.206.llvm.5501150626697134242 +3.83 KiB 1
anon.7d3b2a3484d0d5c519a8a050cc1efb51.66.llvm.5944714133933245223 +3.71 KiB 1
anon.4aedc6703c41247bb17dcc65a65ea065.62.llvm.14003249708032674070 -3.71 KiB 1
anon.4e30bd26b3561b4178ac48831e5a00b4.42.llvm.16543335135297065867 +3.37 KiB 1
anon.596a1291f8a527b81c57a004e978eefc.317.llvm.13618964835189821112 -3.37 KiB 1
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +53.4Ki  [NEW] +53.3Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h26e8e9bf65e15955
  [NEW] +39.8Ki  [NEW] +39.7Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h3e40f4fc55cf096a
  [NEW] +35.4Ki  [NEW] +35.2Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::hc3032d29823403c8
  [NEW] +34.6Ki  [NEW] +34.4Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h108d28e789c33579
  [NEW] +30.4Ki  [NEW] +30.3Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::h60eee8631507ab78
  [NEW] +28.5Ki  [NEW] +28.4Ki    saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::translate_metrics::hc391aa61f403600b
  [NEW] +25.1Ki  [NEW] +24.9Ki    core::ptr::drop_in_place<agent_data_plane::cli::run::handle_run_command::{{closure}}>::h30d9c0d7a0d58a31
  [NEW] +25.1Ki  [NEW] +25.0Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::hf50efc60c67609d2
  [NEW] +24.5Ki  [NEW] +24.3Ki    agent_data_plane::internal::remote_agent::run_remote_agent_registration_loop::_{{closure}}::h70368a6cf95c0d34
  [NEW] +24.3Ki  [NEW] +24.1Ki    saluki_env::workload::collectors::containerd::NamespaceWatcher::build_initial_metadata_operations::_{{closure}}::hfcdcc08aa12db62e
  -0.0% -2.55Ki  +0.0%    +816    [58857 Others]
  [DEL] -24.2Ki  [DEL] -24.0Ki    saluki_env::workload::collectors::containerd::NamespaceWatcher::build_initial_metadata_operations::_{{closure}}::hdd89207138347ceb
  [DEL] -24.5Ki  [DEL] -24.3Ki    agent_data_plane::internal::remote_agent::run_remote_agent_registration_loop::_{{closure}}::hcf083b5830259c14
  [DEL] -25.0Ki  [DEL] -24.9Ki    saluki_components::sources::dogstatsd::drive_stream::_{{closure}}::h4618b6207a99eb2b
  [DEL] -25.1Ki  [DEL] -24.9Ki    core::ptr::drop_in_place<agent_data_plane::cli::run::handle_run_command::{{closure}}>::h9f679e5a6425f93c
  [DEL] -28.4Ki  [DEL] -28.3Ki    saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::translate_metrics::h6032372c5d685c9d
  [DEL] -30.4Ki  [DEL] -30.3Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::hd47e947b534c018e
  [DEL] -34.4Ki  [DEL] -34.2Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h622a7084ee8b9e54
  [DEL] -35.2Ki  [DEL] -35.1Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::hb4a84cbbc21a072f
  [DEL] -39.8Ki  [DEL] -39.7Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h772878347210b56d
  [DEL] -53.4Ki  [DEL] -53.2Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h2c37bd2f0a7c3fc7
  -0.0% -2.00Ki  +0.0% +1.35Ki    TOTAL

@pr-commenter

pr-commenter Bot commented Jun 12, 2026

Copy link
Copy Markdown

Regression Detector (Agent Data Plane)

Run ID: fe1e39eb-5993-4ebc-98f2-c5e9ce5ab59d
Baseline: 9c1abdeb · Comparison: 0b4737b3 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
otlp_ingest_metrics_5mb_memory memory ⚪ +4.28 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ +3.77 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ -1.32 metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ +1.10 metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic) cpu ⚪ +0.97 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ +0.38 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.37 metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ +0.23 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ +0.13 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ -0.09 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ +0.08 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ +0.08 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ +0.06 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ +0.05 metrics profiles logs
quality_gates_rss_idle memory ⚪ +0.00 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ -0.00 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ +0.01 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ -0.05 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ -0.06 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ -0.21 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ -0.27 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ -0.32 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ +0.42 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ +0.53 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ -0.68 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ -0.74 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu ⚪ -1.17 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ -1.17 metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ -2.61 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ -3.34 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ -3.96 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 133 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 42.2 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 64.4 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 192 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 28.1 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

@webern webern force-pushed the m/config-gen-datadog-deserializer branch 6 times, most recently from 239555c to b4e40b9 Compare June 19, 2026 10:58
@webern webern force-pushed the m/config-gen-datadog-deserializer branch from b4e40b9 to 5a9e285 Compare June 23, 2026 10:34
@webern webern force-pushed the m/config-gen-datadog-deserializer branch from 5a9e285 to 101a99e Compare June 24, 2026 10:03
@webern webern marked this pull request as ready for review June 24, 2026 15:30
@webern webern requested a review from a team as a code owner June 24, 2026 15:30

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2aa1e33820

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/datadog-agent/config/build/datadog_config_gen.rs

@tobz tobz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one non-blocking nit.

/// Generated at build time from `core_schema.yaml` plus `schema_overlay.yaml`. Contains only keys
/// inventoried as `support: full` or `support: partial`. Mostly unused until the configuration
/// translator consumes it.
pub mod datadog_configuration;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might just want to call this module generated to really highlight that it's generated code.

@webern webern force-pushed the m/config-gen-datadog-deserializer branch from 2aa1e33 to 0b4737b Compare June 24, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. forwarder/datadog Datadog forwarder.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants