Summary
When a DogStatsD datagram carries invalid UTF-8 bytes in the tag block (e.g. \xff\xfe), ADP rejects the entire datagram silently while the Datadog Agent's built-in DogStatsD implementation accepts the same payload. This is a silent migration regression for any client that emits binary-tainted tag values (locale bytes, hashed identifiers, upstream encoding bugs).
Behavior
ADP: datagram is dropped. adp__component_errors_total{component_id="dsd_in", error_type="decode", message_type="metrics"} increments by 1. The metric never appears downstream. No log line names the dropped metric name.
Core agent DogStatsD (ADP disabled): No error. Metric is accepted and forwarded normally.
Note: adp__component_events_dropped_total does not increment when this occurs — the drop is invisible to that counter (separate issue).
Root cause
lib/saluki-io/src/deser/codec/dogstatsd/helpers.rs — the tags() parser runs simdutf8::basic::from_utf8 over the entire tag block before any parsing:
move |input| match simdutf8::basic::from_utf8(input) {
Ok(tags) => Ok((&[], RawTags::new(tags, max_tag_count, max_tag_len))),
Err(_) => Err(nom::Err::Error(Error::new(input, ErrorKind::Verify))),
}
Any invalid UTF-8 byte anywhere in the tag block causes the whole datagram to be dropped.
Reproduction
# Send a gauge with an invalid UTF-8 byte in the tag block
printf 'mymetric:1|g|#env:prod,tag:\xff\xfe' | nc -u -w1 localhost 8125
ADP logs:
WARN | Failed to parse frame. ... error:"encountered error 'Verify' while processing message '...'"
Core agent: metric is accepted, no error.
Expected behavior
ADP should match the core agent's behavior. The core agent's parseTags splits the raw tag bytes on , and converts each slice to a string via Go's string([]byte) — which performs no UTF-8 validation and passes invalid bytes through verbatim. There is no validity check on tag bytes anywhere in the core agent's parsing path.
ADP should accept datagrams with invalid UTF-8 tag bytes rather than dropping the entire metric. Invalid bytes should be preserved or replaced with \u{FFFD} — but the metric must not be silently discarded.
Suggested mitigation
Regardless of how invalid UTF-8 is handled, add an error_subtype="invalid_utf8" label to the adp__component_errors_total increment so operators can distinguish UTF-8 issues from other decode errors when alerting.
Summary
When a DogStatsD datagram carries invalid UTF-8 bytes in the tag block (e.g.
\xff\xfe), ADP rejects the entire datagram silently while the Datadog Agent's built-in DogStatsD implementation accepts the same payload. This is a silent migration regression for any client that emits binary-tainted tag values (locale bytes, hashed identifiers, upstream encoding bugs).Behavior
ADP: datagram is dropped.
adp__component_errors_total{component_id="dsd_in", error_type="decode", message_type="metrics"}increments by 1. The metric never appears downstream. No log line names the dropped metric name.Core agent DogStatsD (ADP disabled): No error. Metric is accepted and forwarded normally.
Note:
adp__component_events_dropped_totaldoes not increment when this occurs — the drop is invisible to that counter (separate issue).Root cause
lib/saluki-io/src/deser/codec/dogstatsd/helpers.rs— thetags()parser runssimdutf8::basic::from_utf8over the entire tag block before any parsing:Any invalid UTF-8 byte anywhere in the tag block causes the whole datagram to be dropped.
Reproduction
ADP logs:
Core agent: metric is accepted, no error.
Expected behavior
ADP should match the core agent's behavior. The core agent's
parseTagssplits the raw tag bytes on,and converts each slice to a string via Go'sstring([]byte)— which performs no UTF-8 validation and passes invalid bytes through verbatim. There is no validity check on tag bytes anywhere in the core agent's parsing path.ADP should accept datagrams with invalid UTF-8 tag bytes rather than dropping the entire metric. Invalid bytes should be preserved or replaced with
\u{FFFD}— but the metric must not be silently discarded.Suggested mitigation
Regardless of how invalid UTF-8 is handled, add an
error_subtype="invalid_utf8"label to theadp__component_errors_totalincrement so operators can distinguish UTF-8 issues from other decode errors when alerting.