Skip to content

feat(request): enforce request-line bans via BS /auth/check-request-ban#154

Merged
jakebromberg merged 3 commits into
mainfrom
feature/150-enforce-request-bans
Jun 1, 2026
Merged

feat(request): enforce request-line bans via BS /auth/check-request-ban#154
jakebromberg merged 3 commits into
mainfrom
feature/150-enforce-request-bans

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Summary

Adds an optional pre-parse gate on POST /request that consults Backend-Service's new POST /auth/check-request-ban endpoint (BS#1261) so abusive listeners are blocked before consuming Groq TPM, LML cache budget, or Slack noise. Behind the ENFORCE_REQUEST_BANS feature flag (default off) so the code can land and deploy before iOS 3.2 (WXYC/wxyc-ios-64#351) reaches App Store rollout.

Behavior matrix

caller state result
valid JWT, unbanned proceed (LML + Slack as usual)
banned (user or fingerprint) 403, no Groq/LML/Slack, request_blocked PostHog event with user_id + fingerprint + ban_reason + ban_source
BS returns 401 (invalid/expired JWT) or 404 (user not found) proceed-as-unauth — caller MUST NOT see 401 on POST /request
neither header present skip BS call entirely (matches v3.1 iOS clients in prod)
BS unreachable fail-open: Sentry breadcrumb, degraded_mode=ban_check_unavailable, proceed
feature flag off dependency provider returns None; ban check never runs

Why not forward X-Internal-Key?

The ticket asks for it, but BS#1261's apps/auth/app.ts registration explicitly comments that the endpoint is "intentionally public (no X-Internal-Key gate)" — callers authenticate per-request via JWT + fingerprint, and BS bounds the cost with per-IP rate limiting. Forwarding the header would be a no-op at best and confusing at worst, so we don't. BS_INTERNAL_KEY is still parsed into settings because ROM holds the same secret for future internal calls (e.g. ROM#151's admin API path on BS).

Files

  • services/ban_check_client.py (new) — typed httpx wrapper. BanCheckResult (frozen dataclass), BanCheckUnavailableError. Treats BS 401/404 as proceed-as-unauth; raises on network/timeout/5xx/400 so the router can fail open.
  • routers/request.py — ban check runs after the empty-message guard and before parse, exactly per the ticket. New ban_check_unavailable degraded mode + always-on ban_check_degraded telemetry property so LML-down and ban-check-down show up together in PostHog.
  • core/dependencies.pyget_ban_check_client returns None unless both flag and URL are set, so the router has a single guard.
  • config/settings.pybs_check_request_ban_url, bs_internal_key, enforce_request_bans (default False).
  • tests/unit/test_ban_check_client.py (new, 12 tests) — client contract: happy paths, proceed-as-unauth, fail-open, request shape.
  • tests/unit/test_request_ban_enforcement.py (new, 11 tests) — router behavior matrix, including the LML-down × BS-down precedence case.
  • tests/unit/test_dependencies.pyget_ban_check_client provider gating (flag off, URL unset, both set).
  • tests/integration/test_ban_enforcement_e2e.py (new, external_api) — live BS contract smoke; skipped unless BS_CHECK_REQUEST_BAN_URL is set so it's safe to leave in the default CI matrix.
  • docs/architecture.md, docs/env-vars.md, .env.example — flow + env-var documentation.

Local checks

  • ruff check . clean
  • ruff format --check . clean
  • mypy . clean (63 files)
  • pytest tests/unit/ -q → 378 passed (+24 new)
  • check_marker_ci_sync.py → PASS

Test plan

  • Unit: valid + unbanned → proceed
  • Unit: valid + banned → 403, no Slack, no Groq, no LML, request_blocked emitted with all 4 properties
  • Unit: invalid JWT (BS 401) → proceed-as-unauth (200, not 401)
  • Unit: user not found (BS 404) → proceed-as-unauth
  • Unit: no headers → proceed AND don't call BS
  • Unit: fingerprint-only + banned → 403
  • Unit: BS unreachable → fail-open + degraded_mode=ban_check_unavailable
  • Unit: BS unreachable + LML down → degraded_mode=search_unavailable (more severe wins) + ban_check_degraded=true (independent property)
  • Unit: feature flag off → dependency provider returns None, BS never called
  • Unit: dependency provider gating (flag off, URL unset, both set)
  • Manual: against staging once BS#1261 is deployed there (BS_CHECK_REQUEST_BAN_URL=... TEST_ENV=staging pytest tests/integration/test_ban_enforcement_e2e.py -m external_api)
  • Manual: flip ENFORCE_REQUEST_BANS=true on staging and verify a known-banned fingerprint gets 403 from staging ROM

Related

Closes #150

jakebromberg added a commit that referenced this pull request May 31, 2026
… fail-open boundaries

Addresses the substantive bucket of the max-effort review (PR #154):

* **Ban bypass closed**: BanCheckClient now validates fingerprint against BS's UUID regex. A banned listener appending `X-Device-Fingerprint: not-a-uuid` would otherwise trigger BS 400 -> ROM treating it as `BanCheckUnavailableError` -> router failing open. Malformed fingerprint is dropped client-side; if no signal survives, the standard no-signal path engages and the router skips the BS call entirely.
* **Required-key validation**: BS 200 must carry the `banned` key. A regression returning `{}` previously coerced to `banned=False` silently disabling enforcement; now raises `BanCheckUnavailableError` so the router fails open AND the operator sees the Sentry breadcrumb instead of a quiet drop in `request_blocked` events.
* **InvalidURL fail-open**: `_NETWORK_ERRORS` broadened from the three explicit TransportError subclasses to `httpx.HTTPError` (the base) so a misconfigured `BS_CHECK_REQUEST_BAN_URL` (typo, missing scheme) fails open via the documented path instead of escaping as an unhandled `httpx.InvalidURL` 500-ing every /request.
* **Shadow-ban hardened**: `posthog_client.capture` on the banned path is wrapped in try/except so a PostHog ingest outage can't prevent the 403 and surface a 500 instead — which would tell a banned listener that the backend is doing extra work, defeating shadow-ban.

Tests: +9 cases (TestFingerprintValidation, TestMissingBannedKey, TestInvalidUrlFailsOpen) covering the new boundaries.
Adds an optional pre-parse gate on POST /request that consults Backend-Service's new POST /auth/check-request-ban endpoint (BS#1261) so abusive listeners are blocked before consuming Groq TPM, LML cache budget, or Slack noise. Behind the ENFORCE_REQUEST_BANS feature flag (default off) so the code can deploy before iOS 3.2 reaches App Store rollout.

services/ban_check_client.py is a typed httpx wrapper. It treats BS 401/404 as proceed-as-unauth (v3.1 clients send no Authorization header and must never see 401 here) and raises BanCheckUnavailableError on network/timeout/5xx/400 so the router can fail open.

routers/request.py runs the ban check after the empty-message guard and before parse. Banned callers get 403 with no Slack/Groq/LML and a request_blocked PostHog event (user_id, fingerprint, ban_reason, ban_source). BS outages log a Sentry breadcrumb and emit a new ban_check_unavailable degraded_mode plus an always-on ban_check_degraded telemetry property.

config/settings.py + core/dependencies.py add ENFORCE_REQUEST_BANS, BS_CHECK_REQUEST_BAN_URL, BS_INTERNAL_KEY (held for future internal calls; the public /auth/check-request-ban handler does not gate on it).

Unit coverage in tests/unit/test_ban_check_client.py and tests/unit/test_request_ban_enforcement.py for the full behavior matrix; contract smoke in tests/integration/ gated on BS_CHECK_REQUEST_BAN_URL (external_api marker, skipped by default). docs/architecture.md + docs/env-vars.md + .env.example updated.

Closes #150
… fail-open boundaries

Addresses the substantive bucket of the max-effort review (PR #154):

* **Ban bypass closed**: BanCheckClient now validates fingerprint against BS's UUID regex. A banned listener appending `X-Device-Fingerprint: not-a-uuid` would otherwise trigger BS 400 -> ROM treating it as `BanCheckUnavailableError` -> router failing open. Malformed fingerprint is dropped client-side; if no signal survives, the standard no-signal path engages and the router skips the BS call entirely.
* **Required-key validation**: BS 200 must carry the `banned` key. A regression returning `{}` previously coerced to `banned=False` silently disabling enforcement; now raises `BanCheckUnavailableError` so the router fails open AND the operator sees the Sentry breadcrumb instead of a quiet drop in `request_blocked` events.
* **InvalidURL fail-open**: `_NETWORK_ERRORS` broadened from the three explicit TransportError subclasses to `httpx.HTTPError` (the base) so a misconfigured `BS_CHECK_REQUEST_BAN_URL` (typo, missing scheme) fails open via the documented path instead of escaping as an unhandled `httpx.InvalidURL` 500-ing every /request.
* **Shadow-ban hardened**: `posthog_client.capture` on the banned path is wrapped in try/except so a PostHog ingest outage can't prevent the 403 and surface a 500 instead — which would tell a banned listener that the backend is doing extra work, defeating shadow-ban.

Tests: +9 cases (TestFingerprintValidation, TestMissingBannedKey, TestInvalidUrlFailsOpen) covering the new boundaries.
…eturn

pytest.fail() in mocked httpx handlers tripped mypy's [return] check on CI. AssertionError gives mypy the explicit NoReturn it wants and the test semantic is identical.
@jakebromberg jakebromberg force-pushed the feature/150-enforce-request-bans branch from cdf5dea to fe8d023 Compare June 1, 2026 04:53
@jakebromberg jakebromberg merged commit 7470c40 into main Jun 1, 2026
7 checks passed
jakebromberg added a commit that referenced this pull request Jun 1, 2026
… fail-open boundaries

Addresses the substantive bucket of the max-effort review (PR #154):

* **Ban bypass closed**: BanCheckClient now validates fingerprint against BS's UUID regex. A banned listener appending `X-Device-Fingerprint: not-a-uuid` would otherwise trigger BS 400 -> ROM treating it as `BanCheckUnavailableError` -> router failing open. Malformed fingerprint is dropped client-side; if no signal survives, the standard no-signal path engages and the router skips the BS call entirely.
* **Required-key validation**: BS 200 must carry the `banned` key. A regression returning `{}` previously coerced to `banned=False` silently disabling enforcement; now raises `BanCheckUnavailableError` so the router fails open AND the operator sees the Sentry breadcrumb instead of a quiet drop in `request_blocked` events.
* **InvalidURL fail-open**: `_NETWORK_ERRORS` broadened from the three explicit TransportError subclasses to `httpx.HTTPError` (the base) so a misconfigured `BS_CHECK_REQUEST_BAN_URL` (typo, missing scheme) fails open via the documented path instead of escaping as an unhandled `httpx.InvalidURL` 500-ing every /request.
* **Shadow-ban hardened**: `posthog_client.capture` on the banned path is wrapped in try/except so a PostHog ingest outage can't prevent the 403 and surface a 500 instead — which would tell a banned listener that the backend is doing extra work, defeating shadow-ban.

Tests: +9 cases (TestFingerprintValidation, TestMissingBannedKey, TestInvalidUrlFailsOpen) covering the new boundaries.
@jakebromberg jakebromberg deleted the feature/150-enforce-request-bans branch June 1, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforce request-line ban on POST /request via BS /auth/check-request-ban

1 participant