fix: receive span covers broker call; add process span (#4085)#4091
fix: receive span covers broker call; add process span (#4085)#4091thomhurst wants to merge 5 commits into
Conversation
0f6a2e3 to
2cbb18a
Compare
|
I checked https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans, and this seems correct. The receive span should check the time for the call to the broker and process our time accordingly. So this looks correct |
iancooper
left a comment
There was a problem hiding this comment.
Can you push those tracer methods to BrighterTracer, not the Pump? We are mixing concerns. If we think that BrighterTracer is too large, let's raise a separate issue and split it by Producer/Consumer/Inbox/Outbox or similar convention.
| } | ||
| } | ||
|
|
||
| // Pump-internal helpers for the receive span. Live here (not on BrighterTracer) to avoid growing the public API. |
There was a problem hiding this comment.
I would disagree with that on the basis of the separation of concerns. These would be better on BrighterTracer. If BrighterTracer grows unwieldy, the answer is to break that up by concern, not to locate the helper in the Pump
There was a problem hiding this comment.
@iancooper In order to use the new methods on the interface, we'd either need to add those new methods to the interface, which would be breaking if anyone else implements it, or add them as extension methods.
I think the latter would be less disruptive?
There was a problem hiding this comment.
I doubt anyone else implements it, so I think the risk is acceptable, provided we add to the release notes.
…terTracer Per review feedback (BrighterCommand#4091): tracer span work belongs with the tracer, not the pump. Helpers reference only ActivitySource, BrighterSemanticConventions and baggage propagation — pure tracer concerns. Implemented as extension methods rather than interface members so existing external IAmABrighterTracer implementations continue to compile (no source break). Pump's PumpTimeProvider is passed through as an optional argument so test-injected time still controls the receive span start time.
| } | ||
|
|
||
| var activity = tracer.ActivitySource.StartActivity( | ||
| name: $"{topic} {operation.ToSpanName()}", |
There was a problem hiding this comment.
Maybe we should use the subscription name, not the topic
| name: $"{topic} {operation.ToSpanName()}", | |
| name: $"{subscriptionName} {operation.ToSpanName()}", |
|
@thomhurst Sorry, my Span fix has made this stale, can you fix (warnings as errors issue): "Error: /home/runner/work/Brighter/Brighter/src/Paramore.Brighter/Observability/BrighterTracerReceiveExtensions.cs(118,70): error CS0618: 'MessageBody.Bytes' is obsolete: 'Use Memory for zero-copy access. This property allocates on every call.' [/home/runner/work/Brighter/Brighter/src/Paramore.Brighter/Paramore.Brighter.csproj::TargetFramework=netstandard2.0]" |
…righterCommand#4085) `BrighterTracer.CreateSpan(Receive, ...)` was called AFTER `Channel.Receive` / `ReceiveAsync` returned, so the receive span's Duration excluded broker latency. The `messaging.client.operation.duration{operation=receive}` histogram derived from it was consequently mis-bucketed. Receive span is now started before the broker call and enriched with message-derived tags after it returns, so its Duration reflects only broker time. Dispatch is wrapped in a new "process" span (the `MessagePumpSpanOperation.Process` enum was already defined and metered, just never produced) — matches the OpenTelemetry messaging conventions of separate `receive` and `process` operations and revives the previously-dead `messaging.process.duration` histogram. Implementation lives in two `internal static` helpers on `MessagePump` so no public API is added (no changes to `IAmABrighterTracer` or `BrighterTracer`). Producer `traceState` and `Baggage` (with correlationId injection) are propagated onto the receive span — preserves consumer-side trace context for `MT_UNACCEPTABLE` messages where there is no process span.
…ed instance Addresses CodeScene advisory (CreateReceiveSpan exceeded 4-arg max). Helpers now pull Tracer / InstrumentationOptions / PumpTimeProvider from existing pump fields, leaving 2 args at the call sites. `private protected` keeps them invisible to external subclassers — same visibility guarantee as the previous `internal static`.
…terTracer Per review feedback (BrighterCommand#4091): tracer span work belongs with the tracer, not the pump. Helpers reference only ActivitySource, BrighterSemanticConventions and baggage propagation — pure tracer concerns. Implemented as extension methods rather than interface members so existing external IAmABrighterTracer implementations continue to compile (no source break). Pump's PumpTimeProvider is passed through as an optional argument so test-injected time still controls the receive span start time.
Bytes was marked obsolete because it allocates on every call; Memory exposes Length without the copy.
Replaces the static BrighterTracerReceiveExtensions with instance members on IAmABrighterTracer (CreateReceiveSpan, EnrichReceiveSpan) implemented on BrighterTracer. Pump call sites use null-conditional invocation. BREAKING: external implementations of IAmABrighterTracer must add the two new members. Per review feedback the risk is accepted (no external implementers known) in exchange for separation of concerns, symmetry with other span methods, a single TimeProvider source for span start/end, and mockability. Drops a redundant TimeProvider parameter, also resolving the CodeScene "Excess Number of Function Arguments" advisory.
ba7537e to
0f2c0eb
Compare
|
@iancooper should be fixed. Updated the tracer interface. Technically a breaking change, but like you say, probably no consumers |
There was a problem hiding this comment.
Gates Passed
4 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
Fixes #4085.
Problem
BrighterTracer.CreateSpan(Receive, ...)was called afterChannel.Receive/ReceiveAsyncreturned, then ended after dispatch + ack. Net effect: the span named<topic> receivemeasured dispatch and ack work — not the broker call — and themessaging.client.operation.duration{operation=receive}histogram derived from it lied about broker latency.The
MessagePumpSpanOperation.Processenum value and the matchingmessaging.process.durationhistogram had been defined for some time but were never produced anywhere in production code.Fix
Durationreflects only broker latency.MessagePumpSpanOperation.Processvalue) wraps dispatch — translation, handler invocation, and the ack decision. Feeds the previously-deadmessaging.process.durationhistogram.traceparentso handler spans descend from the producer trace; receive span has a local pump-span parent. Matches the OpenTelemetry messaging conventions of separatereceiveandprocessoperations.Public API
Two new members on
IAmABrighterTracer:CreateReceiveSpan(RoutingKey, MessagingSystem, InstrumentationOptions)— starts the receive span before the broker call.EnrichReceiveSpan(Activity?, Message, InstrumentationOptions)— adds message-derived tags and propagates producer tracestate/baggage after the broker call returns.IAmABrighterTraceris no longer source-compatible: external implementations must add the two new members. Per review discussion the risk is acceptable — the interface is not expected to have external implementers (none exist in this repo), and the cleaner pattern (separation of concerns: span work belongs with the tracer, not the pump) outweighs preserving binary compatibility for an unlikely consumer. Release notes should call this out.The earlier extension-method approach (
BrighterTracerReceiveExtensions) is gone. Reasons interface members were chosen over extensions:BrighterTracer, not the pump or a detached extensions class.CreateSpan,CreateProducerSpan,CreateMessagePumpSpan, etc.).TimeProvidersource — the tracer's_timeProviderdrives both span start andEndSpan, fixing an inconsistency where the extension took the pump'sTimeProviderfor start time butEndSpanused the tracer's for end time.TimeProvider?parameter — resolves the CodeScene "Excess Number of Function Arguments" advisory (5→3 args).Pump call sites use the null-conditional operator (
Tracer?.CreateReceiveSpan(...)/Tracer?.EnrichReceiveSpan(...)) sinceTracerisIAmABrighterTracer?.Behaviour preservation
Walked every exit path of the receive section to confirm no regressions vs the pre-fix code:
BrokenCircuitExceptionfinallyError+ exception attachedChannelFailureExceptionfinallyError+ exception attachedExceptionfinallyError+ exception attachedmessage is null(throws)finallyError+ descriptionMT_NONE(empty)finallyOk(default viaEndSpan)MT_UNACCEPTABLEfinallyError+ parse-failure descriptionMessageHeadersandMessageBodyfor diagnosticMT_QUITfinallyOk(default viaEndSpan)finally, before dispatchOkReceive span continues to carry all the message tags it carried pre-fix (
MessageHeaders,MessageBody, all CloudEvents fields, etc.) plus producertraceStateandBaggage(withcorrelationId) — verified necessary for theMT_UNACCEPTABLEpath where there is no process span and operators need full diagnostic context.Acceptance (per #4085)
<topic> receivespan'sDurationcovers the broker callmessaging.client.operation.duration{operation=receive}reflects broker latency<topic> processspan created around dispatch;messaging.process.durationrecords non-zero values for handled messagestests/Paramore.Brighter.Core.Tests/Observability/MessageDispatch/*)Tests
Observability/MessageDispatch/:ParentIdassertion moved from receive span to process span (process span inherits the producertraceparent; receive span has a local parent).messaging.operation.typetag.When_A_Message_Is_Processed_It_Should_Have_A_Process_Spancovers the acceptance criteria explicitly.Test results on net10.0: 62 passed (2 pre-existing skips) for Observability filter; 85 passed for MessageDispatch filter.
Follow-ups (separate issues)
Surfaced during review of this fix, filed separately to keep this PR focused:
JsonSerializer.Serialize(message.Header, ...)runs twice per serviceable message (once for receive span, once for process span). Acceptable trade-off here for diagnostic completeness onMT_UNACCEPTABLE; the issue tracks recovering throughput with conditional enrichment based onMessageType.pumpSpanandprocessSpancan leak (started but never ended) on certain throw paths because their lifetimes aren't bounded bytry/finally. Pre-existing in master; receive span now does the right thing and the same pattern should be applied to the other two spans.BrighterTracerby concern (Producer / Consumer / Inbox / Outbox) once more receive-side helpers accumulate.