Skip to content

dogstatsd: ADP silently drops metrics with invalid UTF-8 in tag block #1634

Description

@jszwedko

Summary

When a DogStatsD datagram carries invalid UTF-8 bytes in the tag block (e.g. \xff\xfe), ADP rejects the entire datagram silently while the Datadog Agent's built-in DogStatsD implementation accepts the same payload. This is a silent migration regression for any client that emits binary-tainted tag values (locale bytes, hashed identifiers, upstream encoding bugs).

Behavior

ADP: datagram is dropped. adp__component_errors_total{component_id="dsd_in", error_type="decode", message_type="metrics"} increments by 1. The metric never appears downstream. No log line names the dropped metric name.

Core agent DogStatsD (ADP disabled): No error. Metric is accepted and forwarded normally.

Note: adp__component_events_dropped_total does not increment when this occurs — the drop is invisible to that counter (separate issue).

Root cause

lib/saluki-io/src/deser/codec/dogstatsd/helpers.rs — the tags() parser runs simdutf8::basic::from_utf8 over the entire tag block before any parsing:

move |input| match simdutf8::basic::from_utf8(input) {
    Ok(tags) => Ok((&[], RawTags::new(tags, max_tag_count, max_tag_len))),
    Err(_) => Err(nom::Err::Error(Error::new(input, ErrorKind::Verify))),
}

Any invalid UTF-8 byte anywhere in the tag block causes the whole datagram to be dropped.

Reproduction

# Send a gauge with an invalid UTF-8 byte in the tag block
printf 'mymetric:1|g|#env:prod,tag:\xff\xfe' | nc -u -w1 localhost 8125

ADP logs:

WARN | Failed to parse frame. ... error:"encountered error 'Verify' while processing message '...'"

Core agent: metric is accepted, no error.

Expected behavior

ADP should match the core agent's behavior. The core agent's parseTags splits the raw tag bytes on , and converts each slice to a string via Go's string([]byte) — which performs no UTF-8 validation and passes invalid bytes through verbatim. There is no validity check on tag bytes anywhere in the core agent's parsing path.

ADP should accept datagrams with invalid UTF-8 tag bytes rather than dropping the entire metric. Invalid bytes should be preserved or replaced with \u{FFFD} — but the metric must not be silently discarded.

Suggested mitigation

Regardless of how invalid UTF-8 is handled, add an error_subtype="invalid_utf8" label to the adp__component_errors_total increment so operators can distinguish UTF-8 issues from other decode errors when alerting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/componentsSources, transforms, and destinations.effort/intermediateInvolves changes that can be worked on by non-experts but might require guidance.source/dogstatsdDogStatsD source.type/bugBug fixes.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions