Send X-Caller-Budget-Ms to LML on /lookup (pair with LML#370 + BS#1053)

## Problem

`services/lookup_client.py` calls `POST /api/v1/lookup` on LML without an `X-Caller-Budget-Ms` header. LML accepts this header per [LML#345](https://github.com/WXYC/library-metadata-lookup/issues/345) (closed) and uses it to bound its search-pipeline budget, falling back to the env default `LML_SEARCH_BUDGET_MS=4000ms` when absent.

After #158 (closed 2026-06-09) raised ROM's `per_attempt_timeout` to 20s, the mismatch widened: ROM gives LML up to 20s to find a match, but LML internally cuts off at 4s because it doesn't know the caller will wait longer. Conversely, when ROM is in a degraded budget posture, LML can't shorten its work either.

This is the ROM-side mirror of WXYC/Backend-Service#1053. Same problem, different repo, same header.

## Why this matters now

- [LML#370](https://github.com/WXYC/library-metadata-lookup/issues/370) is in flight: cascade-exhaustion hard cap that's *supposed to* respect the caller budget. Without the header, LML#370 will cap at `LML_SEARCH_BUDGET_MS` regardless of ROM's actual intent.
- #158's fix bought ROM more wall-clock to wait for a result, but that extra time is wasted if LML still gives up at 4s internally and returns "no match." User-visible symptom: cold-path lookups still degrade to `search_unavailable` even when LML *could* have found a match in the 5-15s window.

## Suggested fix

In `services/lookup_client.py`'s `__init__` (or wherever the lookup HTTP request is composed), expose a `caller_budget_ms` parameter that defaults to the effective per-attempt budget minus ~200ms transport overhead (matching LML#345's server-side subtraction), and set the `X-Caller-Budget-Ms` request header on each `POST /api/v1/lookup`:

```python
caller_budget_ms = max(int(self.per_attempt_timeout * 1000) - 200, 1000)
headers["X-Caller-Budget-Ms"] = str(caller_budget_ms)
```

Pair with `Authorization: Bearer <LML_API_KEY>` in the existing `_auth_headers()` builder.

## Acceptance

- [ ] `LookupServiceClient` sets `X-Caller-Budget-Ms` from the effective per-attempt timeout.
- [ ] Unit test pins: when `per_attempt_timeout=20.0`, header is `19800`.
- [ ] Unit test pins: when caller passes an explicit `caller_budget_ms`, the explicit value wins.
- [ ] Confirm in Sentry post-deploy: ROM's LML `httpx` spans on prod include the `lml.caller_budget_ms` attribute that LML emits server-side (per LML#345).

## Related

- Pairs with: WXYC/Backend-Service#1053 (BS-side equivalent — exact same fix in `shared/lml-client/src/index.ts`).
- Pairs with: WXYC/library-metadata-lookup#345 (closed; the server-side header acceptance).
- Blocked-by (soft): WXYC/library-metadata-lookup#370 — without the cascade hard cap, the header is honored but no-result cascades still grind to LML's process-level timeout. The header is still useful on its own (it scopes the successful path), but the cost-saving punch requires both.
- Surfaced during 2026-06-10 issue triage; see #160.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send X-Caller-Budget-Ms to LML on /lookup (pair with LML#370 + BS#1053) #161

Problem

Why this matters now

Suggested fix

Acceptance

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Send X-Caller-Budget-Ms to LML on /lookup (pair with LML#370 + BS#1053) #161

Description

Problem

Why this matters now

Suggested fix

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions