Skip to content

[middleman-workflows] processProviderStatus: improve activity payload + error specificity for debugging #294

@jorgecuesta

Description

@jorgecuesta

Problem

When debugging ProviderStatus workflows from the Temporal UI we lose two important pieces of information:

  1. Provider identity in activity I/O is opaque: processProviderStatus input is just the hex pubkey (66 chars), and on the unreachable/unhealthy path the result is the minimal upsert shape { statusResult: { id, status } }. No human-readable name/url anywhere, so figuring out which provider failed requires manually cross-referencing against listProviders output.

  2. Failure reason is collapsed to a parse error: apps/middleman-workflows/src/lib/provider/index.ts:30-63 calls await status.json() without checking response.ok or Content-Type. Any non-JSON body (4xx HTML page, plain-text 502, etc.) throws a SyntaxError that gets logged generically, then the activity returns Unreachable regardless of the real failure mode. For the unhealthy branch the response body is destructured into statusProps and silently dropped.

Evidence (mainnet, 2026-05-26)

Provider Status persisted Logged error Actual HTTP Real body
Nodefleet unreachable SyntaxError: Unexpected non-whitespace character after JSON at position 4 404 404 page not found
vido.info unreachable SyntaxError: Unexpected token '<', "<!DOCTYPE "... 401 HTML 401 page
Poktpool unreachable SyntaxError: Unexpected token 'B', "Bad Gateway"... 502 Bad Gateway
WeaversNodes unreachable TypeError: fetch failed timeout (>10s)
Qspider unhealthy (nothing logged) 500 {"error":"Invalid request"}

All five problems look the same in the Temporal UI: statusResult: { id, status }. Operators can't tell "the provider is down" from "we are misconfigured against this provider" without crawling app logs.

Suggested change

1. Carry provider name/url through the activity boundary

  • Pass { identity, name, url } as the activity input instead of just identity. The workflow already has these from listProviders.
  • Always echo { name, identity, url } on the activity result, regardless of branch:
    {
      "name": "Nodefleet",
      "identity": "03df…aab1",
      "url": "https://igniter.nodefleet.org",
      "statusResult": { "id": 9, "status": "unreachable" },
      "failure": { "kind": "http_error", "httpStatus": 404, "bodySnippet": "404 page not found" }
    }

2. Diagnose the response before parsing

In apps/middleman-workflows/src/lib/provider/index.ts status():

  • Check response.ok first. If not OK, capture { httpStatus, statusText, bodySnippet: text.slice(0, 200), contentType } and return Unreachable with that context.
  • Only call response.json() after confirming Content-Type is JSON-ish.
  • For network errors (the current catch), capture error.cause?.code (DNS, ECONNRESET, UND_ERR_HEADERS_TIMEOUT, etc.) and surface it as failure: { kind: "network", code: "...", message: "..." }.

3. Preserve the unhealthy reason

On the !healthy branch, keep the provider's self-reported diagnostics (e.g. reason, failingChecks, whatever the response carries) and include them on the activity result and optionally on the providers.supplier_stats JSONB or a new last_status_reason column.

Why

This is a pure observability change — no behavior change to which providers are marked healthy/unreachable/unhealthy, just makes activity history self-describing in the Temporal UI and lets us tell provider-side outages apart from middleman-side misconfiguration without SSHing into the workflow pod.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions