Skip to content

Metrics fix#65

Open
bd-mkt wants to merge 3 commits into
mesaglio:mainfrom
bd-mkt:metrics_fix
Open

Metrics fix#65
bd-mkt wants to merge 3 commits into
mesaglio:mainfrom
bd-mkt:metrics_fix

Conversation

@bd-mkt

@bd-mkt bd-mkt commented May 7, 2026

Copy link
Copy Markdown

fix: include metric & log services in /api/services (and unblock the dropdown)

Summary

The /api/services endpoint was wired to the MetricsHandler but only
queried the traces table. Deployments that ingest only metrics or
logs (e.g. an upstream collector forwarding docker_stats or Prometheus
scrapes) saw an empty list, even though the underlying data had
perfectly good service_name values. The frontend dropdowns that
consume /api/services therefore stayed empty.

This PR fixes both sides.

Backend (Go)

  • internal/store/metrics.go — add MetricsStore.GetServices()
    returning DISTINCT service_name FROM metrics (filtered for
    non-empty).
  • internal/store/logs.go — add LogsStore.GetServices() returning
    DISTINCT service_name FROM logs (filtered for non-empty).
  • internal/server/handlers/metrics.goGetServices now queries
    traces, metrics, and logs, dedupes via a set, and returns a sorted
    list. Per-store errors are logged as warnings rather than failing the
    whole request, so a single broken table doesn't black-hole the
    endpoint.

API contract is unchanged: GET /api/services still returns
{ "services": ["..."], "count": N } — but the array is now a true
union across signal types.

Frontend (TypeScript)

The frontend was incorrectly typing the response as
{ service_name: string }[] and calling .map(s => s.service_name),
which silently produced [undefined, ...] and rendered an empty
dropdown. The backend has always returned string[].

  • frontend/src/services/api.tsgetServices() typed as
    Promise<string[]>; remove now-unused Service import.
  • frontend/src/hooks/useServices.ts — consume the response directly
    (.then(setServices)); drop Service import and the bogus .map.
  • frontend/src/pages/Traces/TracesList.tsx — same fix as
    useServices (this was the second caller of apiClient.getServices()
    and was caught by tsc --noEmit after the type tightening).
  • frontend/src/types/api.ts — remove the unused Service interface.

Behaviour change

  • /api/services now reflects every signal in the store, not just
    traces.
  • Service dropdowns (Traces, Logs, Metrics, Dashboard) populate
    correctly when only metrics or logs are being ingested.

Test plan

  • go build ./... (or go vet ./...) on the Go side.
  • npm run build in frontend/ — verifies the type tightening
    (no tsc errors).
  • Ingest metrics-only against the collector: /api/services
    returns the docker container / vLLM service names; UI dropdowns
    populate.

@mesaglio

Copy link
Copy Markdown
Owner

The fix is correct — verified against main: the handler at internal/server/handlers/metrics.go:107-120 only queries Traces.GetServices, and the frontend mis-types the
response as Service[] when the backend has always returned string[] (see TracesStore.GetServices in internal/store/traces.go:349).

A few notes before merging:

  1. Third caller of getServices() not mentioned in the description. frontend/src/pages/Dashboard.tsx:29 also calls apiClient.getServices(). It only reads
    services.length, so tsc didn't flag it and behavior is unaffected — but the PR body says "two callers" when there are three. Worth updating so future readers aren't
    confused.

  2. No tests for the new store methods. traces_test.go:237 covers TracesStore.GetServices; replicating it for MetricsStore.GetServices and LogsStore.GetServices
    would keep coverage symmetrical.

  3. Error semantics changed silently. Per-store failures are now logged as `Warn` and dropped. If all three stores fail at once, the endpoint returns `200 {services: [],
    count: 0}` instead of the previous `500`, which hides outages from the UI. Consider returning `500` only when all three stores error — partial success stays `200`, total
    failure stays `5xx`.

  4. Minor consistency nit. `TracesStore.GetServices` doesn't filter `NULL`/empty `service_name`; the two new ones do. The handler's `add()` skips empty strings so
    visible behavior matches, but aligning the query would remove the implicit dependency.

None blocking — 2 and 3 are the ones I'd address before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants