Skip to content

feat: add structuredErrorHandler middleware alongside errorMiddleware#117

Open
bheddens wants to merge 5 commits into
masterfrom
feat/structured-error-handler
Open

feat: add structuredErrorHandler middleware alongside errorMiddleware#117
bheddens wants to merge 5 commits into
masterfrom
feat/structured-error-handler

Conversation

@bheddens

@bheddens bheddens commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a new structuredErrorHandler export to src/sdk/middleware.ts that pairs with the existing errorMiddleware. Consumers can opt-in for structured logging without touching their current errorMiddleware(...) wiring.

This is the response to review feedback on screener-integration-service#87 — lift the fix into the shared SDK so every integration benefits with one change.

Why

Today's errorMiddleware(onError) is generic: it sets a 500 response and invokes a side-effect callback with the error. Several Envoy integration services pass console.log as the callback, which stringifies via error.toString() and produces bare INFO log lines in Datadog with:

  • no stack trace
  • no operation tag
  • no integration name / install id / company id context
  • no severity (INFO instead of ERROR)

Errors like "Error: Could not initialize plugin client." end up unsearchable, undebuggable.

API

import { structuredErrorHandler, StructuredErrorLogger } from '@envoy/envoy-integrations-sdk';

// Default: read req.logger (if upstream middleware attached one), else console.error.
app.use(structuredErrorHandler());

// Override: take full control of how the error is logged.
app.use(structuredErrorHandler((err, req) => {
  myLogger.error('Plugin pipeline error', err, { reqId: (req as any).id });
}));

The exported StructuredErrorLogger interface (just error(message, error, metadata?)) lets consumers type their req.logger against the contract the SDK consumes.

Why a new export instead of changing errorMiddleware

Per PR review feedback: a new name means existing consumers that switch over without first wiring req.logger upstream get an observable console.error(err) fallback instead of silent data loss. They can adopt at their own pace.

Default behavior

req.logger state Behavior
req.logger.error(...) is a function Calls it with (message, err, { operation, httpMethod, httpUrl })
req.logger missing or malformed console.error(err) so the raw error is at least surfaced
req.logger.error throws Caught; logged as console.error('structuredErrorHandler: req.logger.error threw', logErr) so a buggy logger never masks the original error

The 500 / application/json / { message } response shape is identical to errorMiddleware.

Test plan

  • tsc --noEmit clean
  • eslint clean (test file uses /* eslint-disable no-console */ around the spy block, matching the pattern in test/util/userAgent.test.ts)
  • All 80 SDK tests pass (72 existing + 8 new)
  • Once published: bump SDK in screener-integration-service and replace the inline implementation from Fix deprecated actions/cache v2 in package workflow #87 with the SDK export.

New test coverage

  • Logger-present happy path emits (message, err, metadata) with operation, httpMethod, httpUrl
  • Falls back to console.error when req.logger is missing or lacks an error method
  • Catches and surfaces exceptions thrown by req.logger.error without losing the original error
  • onError override receives (err, req) and bypasses the default logger lookup
  • Catches and surfaces exceptions thrown by the onError callback
  • 500 / JSON response when headers haven't been sent
  • Delegates to next(err) when headers have already been sent

Downstream

Once this merges and publishes, screener-integration-service#87 will be reworked to consume structuredErrorHandler from the SDK instead of duplicating the implementation locally. Same opportunity for pacs, presence, emno, and any other integration services that currently use errorMiddleware(console.log).


Architecture note — AsyncLocalStorage as a follow-up

This PR sticks with the established Envoy pattern of mutating req.logger across the middleware chain. That choice is deliberate — it matches screener-integration-service, pacs-integration-service, and every other Envoy service today, so there's no per-service migration cost beyond swapping errorMiddleware(console.log)structuredErrorMiddleware() and wiring loggerFactory / envoyLoggerContextMiddleware.

However, the modern Node (16+) best practice for request-scoped context is AsyncLocalStorage from async_hooks, and it's worth flagging that as a follow-up rather than letting this PR enshrine the req.logger pattern indefinitely.

What AsyncLocalStorage would change

import { AsyncLocalStorage } from 'async_hooks';

const loggerStore = new AsyncLocalStorage<StructuredLogger>();

export function getRequestLogger(): StructuredLogger | undefined {
  return loggerStore.getStore();
}

// One middleware does the whole job:
export function envoyMiddleware(options: { logger: StructuredLogger }): RequestHandler {
  return (req, res, next) => {
    // ...body parse + envoy init as today...
    const requestLogger = options.logger.child({
      requestId: makeId(),
      ...extractLoggerContext(req as EnvoyRequest),
    });
    req.logger = requestLogger;
    loggerStore.run(requestLogger, () => next());  // ← key line
  };
}

Then anywhere in the call stack — services, helpers, vendor clients — getRequestLogger().error(...) returns the correct request's logger with no req plumbing.

Advantages

  • No req plumbing. Services 4 layers deep into vendor clients can log with full context without each function signature growing a logger parameter. Today this is the single biggest source of context-less logs in our codebases.
  • Collapses the API surface. loggerFactory + envoyLoggerContextMiddleware + addRequestLoggerContext + the if-present-child-else-construct dance all reduce to one middleware and one getRequestLogger() helper.
  • Stronger isolation than a registry. Context binds to the async resource chain (Promise → Promise → timer → I/O callback), not to a Map<requestId, ctx>. There's no code path that could look up the wrong context for a concurrent request.
  • Automatic cleanup. When a request's async chain unwinds, the stored value becomes unreachable and is GC'd. No delete map[id], no "request finished" hook, no per-request teardown to forget.
  • Industry-standard. dd-trace (already in production here), OpenTelemetry, pino-http, NestJS's nestjs-cls, and Fastify's request.log all use ALS internally. It's well-trodden ground.

Disadvantages

  • Modest runtime overhead. Node 16 added ~2–3% overhead per async op; Node 20 brought that down further with V8-level hooks. Not a concern at our scale, but non-zero.
  • Leak vectors require discipline. Three real pitfalls — long-lived timers/intervals that outlive the request, module-level variables that capture the stored value, and unresolved Promises — all hold the context alive past request end. Same as any closure leak, but worth code-reviewing for.
  • Harder to reason about in stack traces. "Where did this logger come from?" has no syntactic answer with ALS — you have to know the chain. req.logger is grep-able; getRequestLogger() requires understanding what als.run() wrapped the handler in.
  • Test fixtures change. Today's tests stub req.logger. With ALS, tests have to wrap the unit under test in loggerStore.run(stubLogger, () => unitUnderTest()), or expose a test helper that does. Minor but pervasive.
  • Migration cost is real. Every existing service that reads req.logger directly keeps working (we'd keep req.logger set as a compatibility surface), but code that wants to benefit from ALS — i.e. deep helpers that today take a logger param — has to change to call getRequestLogger(). This is an opt-in, per-call-site migration.

Recommendation

Land this PR as the structured-error fix for today, and open a separate RFC issue for the ALS migration as a follow-up. Combining both in one PR would balloon scope and risk.

If we do move to ALS later, the StructuredLogger interface and EnvoyLoggerContext types from this PR carry over unchanged — the change is purely how req.logger gets propagated to deep code, not what it looks like.

Existing errorMiddleware accepts an onError(err) callback for side-effect
logging. When consumers pass console.log (as several integration services
do), errors are stringified via error.toString() and emitted as bare INFO
log lines with no attributes — no stack trace, no integration context,
nothing actionable in Datadog.

Adds a new structuredErrorHandler export that preserves the same response
shape (500 / application/json / { message }) but emits errors structurally:

  - Default: looks for a StructuredErrorLogger at req.logger (attached by
    upstream middleware such as plugin/integration setup) and calls
    req.logger.error(message, err, { operation, httpMethod, httpUrl }).
    Falls back to console.error(err) when no req.logger is present, so
    errors are never silently lost.
  - Override: optional onError(err, req) callback for full control. The
    callback receives both the error and the request so consumers can
    extract request-scoped context however they like.

Named separately from errorMiddleware so existing consumers that swap
without first wiring req.logger upstream get an observable console.error
fallback instead of silent data loss, per code-review feedback on
screener-integration-service#87.

Exports the StructuredErrorLogger interface so consumers can type their
request logger against the contract the SDK consumes.

Includes 8 unit tests covering: logger-present happy path, missing
req.logger fallback, malformed req.logger fallback, logger throw safety,
onError override, onError throw safety, 500 response handling, and
headers-already-sent delegation.

@JustWalters JustWalters left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update package.json and package-lock.json

Comment thread src/sdk/middleware.ts Outdated
Comment on lines +162 to +164
// Don't let a logging callback failure swallow the original error.
// eslint-disable-next-line no-console
console.error('structuredErrorHandler: onError callback threw', cbErr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're logging the error onError threw but the original error might be swallowed if onError throws before logging it. The comment suggests we'd log err.

Comment thread src/sdk/middleware.ts Outdated
if (requestLogger && typeof requestLogger.error === 'function') {
try {
requestLogger.error('Unhandled error in request pipeline', err, {
operation: 'structuredErrorHandler',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this operation represent? I'd expect it to refer to something about the failed request. I could see 'structuredErrorHandler' being a useful value if operation is standard metadata we log, but I don't remember seeing it elsewhere.

} as Request;
}

function buildRes(headersSent = false): MockResponse {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the places we're using the result of this function, we're asserting it as Response. What do you think of asserting the return type as Response and only asserting the type where we need to treat it as a MockResponse. Or maybe we can use one of the jest utility types like jest.Mocked<> so the same type works in both contexts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your first idea better, but I get the feeling jest.Mocked is probably the easier route. I'm open to either.

Zmatarasso and others added 4 commits May 19, 2026 15:56
Tightens the structured-error pipeline so consumers can rely on a typed
contract instead of inline casts:

- New StructuredLogger module exporting the minimum interface the SDK
  needs (`error(message, err, metadata?)`) plus an `isRequestWithLogger`
  type guard. structuredErrorMiddleware now narrows req.logger through
  the guard instead of an unchecked cast + ad-hoc typeof check.
- EnvoyBaseRequest declares `logger?: StructuredLogger`, so downstream
  handlers see it as typed without each consumer redeclaring the shape.
- envoyMiddleware accepts `loggerFactory?: (req, ctx) => StructuredLogger`
  and attaches the result to req.logger synchronously at the top of the
  handler, before body parsing and EnvoyPluginAPI.loginAsPlugin(). The
  ctx surfaces `integrationName` parsed from `?name=` — the dispatch
  convention used by Envoy monolith integration services — so factories
  can build a child logger pre-tagged with it. A throwing factory is
  caught and surfaced via console.error rather than breaking the
  request.
- Renamed structuredErrorHandler -> structuredErrorMiddleware to match
  the existing `errorMiddleware` naming convention in the same file.

Tests: 91 passing (added cases for the type guard, the
integrationName context wiring, factory-throw recovery, and the
null-logger fallback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…actLoggingContext

Pulls every standard Envoy logging facet onto EnvoyLoggerContext so a
service can swap its bespoke pluginMiddleware logger setup for the SDK's
loggerFactory without losing any Datadog facet it already indexes on:

- integrationName (from req.query.name — Envoy monolith services convention)
- companyId, locationId, installId (from req.envoy.meta)
- recordId, recordType (from req.envoy.payload — event requests only)
- event, category (from req.envoy.meta.event + mirrored category mapping)

The loggerFactory now runs once per request, after req.envoy is
initialized, so the factory receives the full context in a single call.
Errors that fire before req.envoy is set up (body parser, plugin login)
fall through to console.error in structuredErrorMiddleware — same
trade-off the current errorMiddleware(console.log) pattern makes.

extractLoggerContext is exported so a downstream middleware can rebuild
the same context when replacing req.logger with a more enriched child
logger. The extractor is defensive against EnvoyPluginSDK's
meta/payload getters throwing on unverified requests — a bad signature
should not also cost us log context.

isRequestWithLogger reshaped from a multi-line && chain to an
early-return shape so prettier and eslint's operator-linebreak rule
agree on layout.

Tests: 95 passing (added direct unit tests for extractLoggerContext
across event/route/unknown-event/missing-envoy shapes, plus an
end-to-end test that drives envoyMiddleware with EnvoyPluginAPI.loginAsPlugin
stubbed and body-parser short-circuited via req._body + req[VERIFIED]).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-init-only design lost the original goal of replacing
errorMiddleware(console.log): errors that fire before req.envoy is set up
(plugin login, EnvoyPluginSDK construction) had no req.logger and fell
through to console.error, the same bare-string outcome we're trying to
get rid of.

Switch to a two-call pattern:

1. **Early pass** runs synchronously at the top of envoyMiddleware,
   before body parsing. integrationName from req.query.name is the only
   field available pre-parse, but that's enough to guarantee req.logger
   exists for any subsequent error (body-parser, loginAsPlugin,
   EnvoyPluginSDK init).

2. **Enrichment pass** runs after req.envoy is initialized, with the
   full EnvoyLoggerContext (companyId, installId, event, category, …),
   and replaces req.logger so route-handler errors carry the standard
   Envoy facets.

If the enrichment pass throws, req.logger is *not* downgraded —
the early logger is retained. console.error tells the caller which pass
blew up so the failure mode is debuggable.

The only error class still served by structuredErrorMiddleware's
console.error fallback is malformed-JSON body-parser errors, which have
no payload/meta to enrich with regardless.

Factory signature loosens back to `(req: Request, ctx: EnvoyLoggerContext)
=> StructuredLogger` — `req.envoy` is not set on the early call, so
typing the arg as EnvoyRequest would be a lie.

Tests: 97 passing (added early-pass observation, two-call replacement,
enrichment-throws-keeps-early-logger, and early-throws-leaves-logger-
undefined cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous design called the loggerFactory twice (once early with
integrationName, once after envoy init with the full context) to give
loginAsPlugin / EnvoyPluginSDK construction errors a structured logger.
Replaces that with a single factory invocation plus a chained
.child(context) enrichment pattern that any number of middleware can
participate in.

New surface in src/sdk/StructuredLogger.ts:

- StructuredLogger now requires `child(context): StructuredLogger`
  alongside `error(...)`. The internal SDK's StructuredLogger and every
  mainstream logging library (pino, bunyan, winston) already implement
  this — the SDK just makes it part of the contract.
- isRequestWithLogger checks for both methods so a partial logger fails
  closed (structuredErrorMiddleware → console.error fallback).
- addRequestLoggerContext(req, ctx, fallback?) is the new universal
  helper. If req.logger exists, it's replaced with req.logger.child(ctx);
  otherwise the optional fallback constructs one from scratch;
  otherwise no-op. Exceptions from either path are caught and surfaced
  via console.error — req.logger never downgrades to undefined.

envoyMiddleware:

- Calls loggerFactory exactly once, synchronously at the top of the
  handler, with `{ integrationName }`. That guarantees req.logger
  exists for body-parser / loginAsPlugin / EnvoyPluginSDK errors.
- No more post-init enrichment pass.

envoyLoggerContextMiddleware (new):

- Place after envoyMiddleware so req.envoy is set.
- Adds the full EnvoyLoggerContext (companyId, installId, event,
  category, …) via addRequestLoggerContext — i.e. req.logger.child(...).
- Optional `fallback` to construct from scratch if no upstream logger.

This is the recommended pattern for every downstream middleware: call
addRequestLoggerContext(req, myContext, optionalFallback). pluginMiddleware
in screener / pacs / etc. can drop their bespoke "rootLogger.child(...)
+ try/catch + envoyRequest.logger = logger" boilerplate in favor of one
line.

Tests: 105 passing (StructuredLogger.test.ts now covers .child contract
+ addRequestLoggerContext happy/fallback/throw paths; middleware.test.ts
covers single factory call, envoyLoggerContextMiddleware
replace/fallback/no-op behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants