Skip to content

Improvements to the ACR IPC protocol#1855

Draft
nathan-b wants to merge 3 commits into
mainfrom
nbaker/acr-ipc-protocol
Draft

Improvements to the ACR IPC protocol#1855
nathan-b wants to merge 3 commits into
mainfrom
nbaker/acr-ipc-protocol

Conversation

@nathan-b

Copy link
Copy Markdown

Summary

Implement improvements to the IPC protocol between ADP and ACR.

Adds new types (sketches for histograms, the new openmetrics histogram bucket, extend log type). Also tries to make the protocol more robust and build failure detection into it.

How did you test this PR?

Ran ACR against both ADP and the core agent to confirm data types are correctly sent

nathan-b added 3 commits June 10, 2026 09:39
Protocol-level messages:
 - Hello
 - HelloResp
 - ConfigData
 - CheckDataMsg
 - CheckDataAck
 - CheckResultMsg

Add Histogram and EventPlatformEvent data types to the CheckData oneof.

Extend MetricType with MonotonicCount and Historate.

Remove gRPC service definition as it is no longer needed.
Restore the AcrIpc gRPC service with four RPCs:
 - Handshake (unary)
 - SendCheckData (unary with ACK)
 - SendCheckResult (unary)
 - StreamConfig (server streaming for config push).

 Re-enable server and client codegen for the checks proto compilation.
…init_config

extend Log:
 - Add fields to the Log message to support all the metadata the agent
   and ADP support
 - The fields map 1:1 onto ADP's internal Log struct

Histogram / sketch:
 - Renamed histogram.proto to sketch.proto and replaced its contents
   with the DDSketch wire format used by the agent's SketchPayload
 - CheckData oneof slot 5 is repurposed to carry pre-aggregated sketches

Note: lib/saluki-components/src/sources/checks_ipc/mod.rs is left
broken — it still imports the old Checks service.

init_config:
 - The check API has both `config` and `init_config`, and many
   integrations rely on the init_config block for shared defaults
 - The previous CheckInstance message only carried the
   per-instance config, so an ACR-scheduled check would receive an
   empty init_config

openmetrics:
 - Add support for the new openmetrics histogram bucket type
@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented Jun 11, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 3 Pipeline jobs failed

DataDog/saluki | check-docs   View in Datadog   GitLab

DataDog/saluki | check-fmt   View in Datadog   GitLab

Semantic PR Title Check | Check For Semantic PR Title   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 40ce8a8 | Docs | Datadog PR Page | Give us feedback!

@nathan-b nathan-b requested a review from fabbing June 11, 2026 15:00
@dd-octo-sts dd-octo-sts Bot added area/io General I/O and networking. area/components Sources, transforms, and destinations. labels Jun 11, 2026
message SendCheckResultResponse {}

// gRPC service for ACR <-> ADP IPC.
service AcrIpc {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call this CheckRunner, and drop the IPC part?

More generic, fits the fact that realistically anything can implement it, avoids the stuttering of "IPC" in the name when that's already implied by how it's used, etc.

Comment on lines +89 to +98
rpc Handshake(Hello) returns (HelloResp);

// Check data submission with acknowledgement.
rpc SendCheckData(CheckDataMsg) returns (CheckDataAck);

// Check result reporting.
rpc SendCheckResult(CheckResultMsg) returns (SendCheckResultResponse);

// Server pushes config updates to the client as a stream.
rpc StreamConfig(StreamConfigRequest) returns (stream ConfigData);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this is unwritten, but: can we follow the existing naming scheme for gRPC request/response message types in the Agent ecosystem? FooRequest for requests and FooResponse for responses, such that they're always paired by prefix for the same RPC operation.

As well, don't truncate words like request, response, or message.

Comment on lines +19 to +23
// Slot 5 was previously `histogram` (an unused stub). Repurposed
// to carry pre-aggregated DDSketches; the wire bytes from any
// legacy producer of the old Histogram message would be
// mis-parsed here, so don't rely on cross-version compatibility
// for slot 5 specifically.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comments that describe a change are a code smell of not reviewing LLM output.

We document to explain the current state of things, Git history is for everything that came before, unless there's a strong reason to couple them together.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, there are some commits that are mostly AI generated. Most of the code has been reviewed piecemeal, but I opened the PR as a draft so that I could do a higher-level review (I like the ability to mark files as reviewed because it helps me remember where I left off). Sorry for sending you the slop!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in draft so I definitely don't see it as you asking us/me to review slop. 😅

If nothing else, this should mostly just serve as a reminder that LLMs love to do this sort of thing (code comments as a description of what they changed, rather than the thing they changed something to) and that we need to be vigilant in cleaning that up.

Comment on lines +44 to +47
// Distinguishes the two sender upcalls. False (the default)
// routes to `Sender.OpenmetricsBucket`; true routes to
// `Sender.HistogramBucket`. Python `submit_histogram_bucket`
// always lands as false.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: opaque comment that requires implicit cross-repository knowledge.

It's also... confusing at face value. The naming implies multiple buckets, but this type is, definitionally, a value representing a single bucket.

/// AcrIpc protocol version this server speaks. Bumped on
/// wire-incompatible changes; today the protocol is wire-additive so
/// we stay on 1.
const PROTOCOL_VERSION: u32 = 1;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we should probably define this as an enum in the checks proto stuff so that we have a consistent baseline of what protocol versions are valid or not.


/// Identifier surfaced in the Handshake response so a client can tell
/// which server flavor it connected to.
const SERVER_ID: &str = "saluki-checks-ipc";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be deriving the server ID through the information exposed in saluki_metadata (similarly for client ID on the ACR side) so that we can get things like the actual version of ADP for free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/io General I/O and networking.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants