Skip to content

fix: receive span covers broker call; add process span (#4085)#4091

Open
thomhurst wants to merge 5 commits into
BrighterCommand:masterfrom
thomhurst:fix/4085-receive-span-broker-latency
Open

fix: receive span covers broker call; add process span (#4085)#4091
thomhurst wants to merge 5 commits into
BrighterCommand:masterfrom
thomhurst:fix/4085-receive-span-broker-latency

Conversation

@thomhurst
Copy link
Copy Markdown
Contributor

@thomhurst thomhurst commented Apr 26, 2026

Fixes #4085.

Problem

BrighterTracer.CreateSpan(Receive, ...) was called after Channel.Receive / ReceiveAsync returned, then ended after dispatch + ack. Net effect: the span named <topic> receive measured dispatch and ack work — not the broker call — and the messaging.client.operation.duration{operation=receive} histogram derived from it lied about broker latency.

The MessagePumpSpanOperation.Process enum value and the matching messaging.process.duration histogram had been defined for some time but were never produced anywhere in production code.

Fix

  • The receive span is now started before the broker call and enriched with message-derived tags after the call returns. Its Duration reflects only broker latency.
  • A new process span (using the existing MessagePumpSpanOperation.Process value) wraps dispatch — translation, handler invocation, and the ack decision. Feeds the previously-dead messaging.process.duration histogram.
  • Process span carries the producer's traceparent so handler spans descend from the producer trace; receive span has a local pump-span parent. Matches the OpenTelemetry messaging conventions of separate receive and process operations.

Public API

Two new members on IAmABrighterTracer:

  • CreateReceiveSpan(RoutingKey, MessagingSystem, InstrumentationOptions) — starts the receive span before the broker call.
  • EnrichReceiveSpan(Activity?, Message, InstrumentationOptions) — adds message-derived tags and propagates producer tracestate/baggage after the broker call returns.

⚠️ Interface break

IAmABrighterTracer is no longer source-compatible: external implementations must add the two new members. Per review discussion the risk is acceptable — the interface is not expected to have external implementers (none exist in this repo), and the cleaner pattern (separation of concerns: span work belongs with the tracer, not the pump) outweighs preserving binary compatibility for an unlikely consumer. Release notes should call this out.

The earlier extension-method approach (BrighterTracerReceiveExtensions) is gone. Reasons interface members were chosen over extensions:

  • Separation of concerns — addresses original review feedback that span helpers belong on BrighterTracer, not the pump or a detached extensions class.
  • Symmetry with every other span-creating method on the interface (CreateSpan, CreateProducerSpan, CreateMessagePumpSpan, etc.).
  • Single TimeProvider source — the tracer's _timeProvider drives both span start and EndSpan, fixing an inconsistency where the extension took the pump's TimeProvider for start time but EndSpan used the tracer's for end time.
  • Drops the redundant TimeProvider? parameter — resolves the CodeScene "Excess Number of Function Arguments" advisory (5→3 args).
  • Mockable by test doubles; static extensions are not.

Pump call sites use the null-conditional operator (Tracer?.CreateReceiveSpan(...) / Tracer?.EnrichReceiveSpan(...)) since Tracer is IAmABrighterTracer?.

Behaviour preservation

Walked every exit path of the receive section to confirm no regressions vs the pre-fix code:

Path Receive span ended? Status set? Tags preserved?
BrokenCircuitException ✓ via finally Error + exception attached tags from before broker call
ChannelFailureException ✓ via finally Error + exception attached tags from before broker call
Generic Exception ✓ via finally Error + exception attached tags from before broker call
message is null (throws) ✓ via finally Error + description from broker call
MT_NONE (empty) ✓ via finally Ok (default via EndSpan) enriched
MT_UNACCEPTABLE ✓ via finally Error + parse-failure description enriched, incl. MessageHeaders and MessageBody for diagnostic
MT_QUIT ✓ via finally Ok (default via EndSpan) enriched
Serviceable ✓ via finally, before dispatch Ok enriched

Receive span continues to carry all the message tags it carried pre-fix (MessageHeaders, MessageBody, all CloudEvents fields, etc.) plus producer traceState and Baggage (with correlationId) — verified necessary for the MT_UNACCEPTABLE path where there is no process span and operators need full diagnostic context.

Acceptance (per #4085)

  • <topic> receive span's Duration covers the broker call
  • messaging.client.operation.duration{operation=receive} reflects broker latency
  • <topic> process span created around dispatch; messaging.process.duration records non-zero values for handled messages
  • Existing observability tests pass (tests/Paramore.Brighter.Core.Tests/Observability/MessageDispatch/*)

Tests

  • Updated three existing tests in Observability/MessageDispatch/:
    • Span counts adjusted for the new process span (6→7 dispatch, 7→8 channel failure cases).
    • ParentId assertion moved from receive span to process span (process span inherits the producer traceparent; receive span has a local parent).
    • Both spans now asserted to exist with their respective messaging.operation.type tag.
  • New test When_A_Message_Is_Processed_It_Should_Have_A_Process_Span covers the acceptance criteria explicitly.

Test results on net10.0: 62 passed (2 pre-existing skips) for Observability filter; 85 passed for MessageDispatch filter.

Follow-ups (separate issues)

Surfaced during review of this fix, filed separately to keep this PR focused:

  • Perf: message header serialized twice per message in pump observability hot path #4089 — Perf: JsonSerializer.Serialize(message.Header, ...) runs twice per serviceable message (once for receive span, once for process span). Acceptable trade-off here for diagnostic completeness on MT_UNACCEPTABLE; the issue tracks recovering throughput with conditional enrichment based on MessageType.
  • Pump observability: pumpSpan and processSpan can leak on exception paths #4090pumpSpan and processSpan can leak (started but never ended) on certain throw paths because their lifetimes aren't bounded by try/finally. Pre-existing in master; receive span now does the right thing and the same pattern should be applied to the other two spans.
  • Per review on this PR: split BrighterTracer by concern (Producer / Consumer / Inbox / Outbox) once more receive-side helpers accumulate.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@thomhurst thomhurst force-pushed the fix/4085-receive-span-broker-latency branch from 0f6a2e3 to 2cbb18a Compare April 26, 2026 14:11
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@iancooper
Copy link
Copy Markdown
Member

I checked https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans, and this seems correct. The receive span should check the time for the call to the broker and process our time accordingly.

So this looks correct

Copy link
Copy Markdown
Member

@iancooper iancooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push those tracer methods to BrighterTracer, not the Pump? We are mixing concerns. If we think that BrighterTracer is too large, let's raise a separate issue and split it by Producer/Consumer/Inbox/Outbox or similar convention.

}
}

// Pump-internal helpers for the receive span. Live here (not on BrighterTracer) to avoid growing the public API.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would disagree with that on the basis of the separation of concerns. These would be better on BrighterTracer. If BrighterTracer grows unwieldy, the answer is to break that up by concern, not to locate the helper in the Pump

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iancooper In order to use the new methods on the interface, we'd either need to add those new methods to the interface, which would be breaking if anyone else implements it, or add them as extension methods.

I think the latter would be less disruptive?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt anyone else implements it, so I think the risk is acceptable, provided we add to the release notes.

@iancooper iancooper added 2 - In Progress Bug .NET Pull requests that update .net code V10.X labels Apr 26, 2026
thomhurst added a commit to thomhurst/Brighter that referenced this pull request Apr 26, 2026
…terTracer

Per review feedback (BrighterCommand#4091): tracer span work belongs with the tracer, not
the pump. Helpers reference only ActivitySource, BrighterSemanticConventions
and baggage propagation — pure tracer concerns.

Implemented as extension methods rather than interface members so existing
external IAmABrighterTracer implementations continue to compile (no source
break). Pump's PumpTimeProvider is passed through as an optional argument so
test-injected time still controls the receive span start time.
codescene-delta-analysis[bot]

This comment was marked as outdated.

}

var activity = tracer.ActivitySource.StartActivity(
name: $"{topic} {operation.ToSpanName()}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use the subscription name, not the topic

Suggested change
name: $"{topic} {operation.ToSpanName()}",
name: $"{subscriptionName} {operation.ToSpanName()}",

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@iancooper
Copy link
Copy Markdown
Member

@thomhurst Sorry, my Span fix has made this stale, can you fix (warnings as errors issue):

"Error: /home/runner/work/Brighter/Brighter/src/Paramore.Brighter/Observability/BrighterTracerReceiveExtensions.cs(118,70): error CS0618: 'MessageBody.Bytes' is obsolete: 'Use Memory for zero-copy access. This property allocates on every call.' [/home/runner/work/Brighter/Brighter/src/Paramore.Brighter/Paramore.Brighter.csproj::TargetFramework=netstandard2.0]"

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

thomhurst added 5 commits May 12, 2026 21:04
…righterCommand#4085)

`BrighterTracer.CreateSpan(Receive, ...)` was called AFTER `Channel.Receive` /
`ReceiveAsync` returned, so the receive span's Duration excluded broker latency.
The `messaging.client.operation.duration{operation=receive}` histogram derived
from it was consequently mis-bucketed.

Receive span is now started before the broker call and enriched with
message-derived tags after it returns, so its Duration reflects only broker
time. Dispatch is wrapped in a new "process" span (the
`MessagePumpSpanOperation.Process` enum was already defined and metered, just
never produced) — matches the OpenTelemetry messaging conventions of separate
`receive` and `process` operations and revives the previously-dead
`messaging.process.duration` histogram.

Implementation lives in two `internal static` helpers on `MessagePump` so no
public API is added (no changes to `IAmABrighterTracer` or `BrighterTracer`).

Producer `traceState` and `Baggage` (with correlationId injection) are propagated
onto the receive span — preserves consumer-side trace context for
`MT_UNACCEPTABLE` messages where there is no process span.
…ed instance

Addresses CodeScene advisory (CreateReceiveSpan exceeded 4-arg max).
Helpers now pull Tracer / InstrumentationOptions / PumpTimeProvider from
existing pump fields, leaving 2 args at the call sites.

`private protected` keeps them invisible to external subclassers — same
visibility guarantee as the previous `internal static`.
…terTracer

Per review feedback (BrighterCommand#4091): tracer span work belongs with the tracer, not
the pump. Helpers reference only ActivitySource, BrighterSemanticConventions
and baggage propagation — pure tracer concerns.

Implemented as extension methods rather than interface members so existing
external IAmABrighterTracer implementations continue to compile (no source
break). Pump's PumpTimeProvider is passed through as an optional argument so
test-injected time still controls the receive span start time.
Bytes was marked obsolete because it allocates on every call; Memory
exposes Length without the copy.
Replaces the static BrighterTracerReceiveExtensions with instance
members on IAmABrighterTracer (CreateReceiveSpan, EnrichReceiveSpan)
implemented on BrighterTracer. Pump call sites use null-conditional
invocation.

BREAKING: external implementations of IAmABrighterTracer must add the
two new members. Per review feedback the risk is accepted (no external
implementers known) in exchange for separation of concerns, symmetry
with other span methods, a single TimeProvider source for span
start/end, and mockability. Drops a redundant TimeProvider parameter,
also resolving the CodeScene "Excess Number of Function Arguments"
advisory.
@thomhurst thomhurst force-pushed the fix/4085-receive-span-broker-latency branch from ba7537e to 0f2c0eb Compare May 12, 2026 20:04
@thomhurst
Copy link
Copy Markdown
Contributor Author

@iancooper should be fixed. Updated the tracer interface. Technically a breaking change, but like you say, probably no consumers

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
4 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Done Bug .NET Pull requests that update .net code V10.X

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Receive span starts after Channel.Receive — duration excludes broker call

3 participants