Skip to content

feat(auth): OIDC client + signed session-cookie codec#69

Merged
jakebromberg merged 5 commits into
mainfrom
feat/oidc-core-auth
Jun 10, 2026
Merged

feat(auth): OIDC client + signed session-cookie codec#69
jakebromberg merged 5 commits into
mainfrom
feat/oidc-core-auth

Conversation

@jakebromberg

Copy link
Copy Markdown
Member

Closes #67.

Adds the pure foundation module the verifier-side wiring (in #68) will consume. Includes the design plan as the first commit so future readers can navigate the whole effort from one place.

Summary

  • core/auth.py: ReviewerSession dataclass, build_authorize_url / exchange_code (OIDC code + PKCE + id_token verify), encode_session / decode_session (12h signed cookie codec), sign_one_shot / verify_one_shot (10-min redirect-survival cookies). All env-configured; module-level lazy discovery doc + JWKS cache. Deviation from core/gemini.py's DI pattern is documented in the module docstring.
  • plans/oidc-auth.md: the design plan, with rationale for option B (OIDC) over the alternatives, file-by-file diff, rollout sequence, and the open questions resolved during planning.
  • Deps: authlib>=1.3, itsdangerous>=2.2. (python-jose was in the plan but is unused — authlib has its own JOSE.)
  • 16 new tests at tests/unit/test_auth.py covering session round-trip, tamper, expiry, PKCE URL shape, exchange happy path + aud/iss/signature failures + transport failure.

No call sites use this module yet; behavior is unchanged on main until #68 wires it into verifier/serve.py.

Test plan

  • pytest tests/unit/test_auth.py — 16 passed
  • Full suite (pytest) — 554 passed, 8 skipped
  • ruff check, ruff format --check, mypy core cli.py clean
  • CI green

Plan covers the two-repo rollout: a trustedClient entry on Backend-Service's oidcProvider plus a session-cookie gate in verifier/serve.py backed by an OIDC code + PKCE round-trip. Replaces the BasicAuth gate and gives every verified.json save a stable reviewer_id for future per-reviewer queues / leaderboards / quality flagging.

Includes file-by-file diff, rollout sequence, local-dev story, and the open questions resolved during planning (gating /api/lookup, keeping the discovery cache in-process).
New pure module powering the verifier's upcoming OIDC dance against api.wxyc.org/auth (Better Auth's oidcProvider). Public surface: build_authorize_url / exchange_code for the OIDC code+PKCE flow, encode_session / decode_session for the 12-hour signed reviewer cookie, sign_one_shot / verify_one_shot for the redirect-survival cookies (oidc_state, oidc_verifier, oidc_return_to), and the ReviewerSession dataclass. id_token verification pins aud and iss against the configured client_id and issuer so a Wiki.js-issued token can't authenticate to the verifier.

Module-level cached discovery doc + JWKS, lazy-fetched on first request — deliberate deviation from core/gemini.py's DI pattern. Rationale documented in the module docstring: discovery + JWKS are configuration, not per-call state, and re-fetching per login wastes a round-trip. Lazy (not eager-at-boot) keeps the verifier resilient to brief auth-server outages so /api/version healthchecks stay green for Railway. The function-level _load_metadata / _load_jwks boundary is the test seam; _reset_metadata_cache() is the explicit reset for parametrized tests.

Deps added: authlib (handles OIDC discovery, PKCE, token exchange, and id_token signature + claim validation as one library) and itsdangerous (signed cookies). python-jose was in the plan but is unused — authlib has its own JOSE implementation.

Routes, middleware, and reviewer_id threading land in a follow-up commit.

Plan: plans/oidc-auth.md (PR #2 scope).
Code review (max effort, 9 angles + verify + sweep) surfaced 15 substantive findings. This commit addresses the ones with concrete failure scenarios; the cosmetic/altitude items are deferred.

Correctness: `claims["sub"]` raised uncaught KeyError because `claims_options` only declared `iss` and `aud` essential and `JWTClaims.validate()` doesn't enforce `sub` by default — added `sub: {"essential": True}` so a missing sub raises JoseError as documented. `body["id_token"]` raised raw KeyError and `token_response.json()` raised raw JSONDecodeError; both bypassed the documented httpx.HTTPError/JoseError failure-mode contract — wrap with explicit ValueError messages so the route layer's 400-mapping is unambiguous, and add `Accept: application/json` on the token POST so the server can't return form-urlencoded by default. `metadata[<endpoint>]` accesses raised uncaught KeyError on a non-conformant discovery doc — pre-check with `.get()` and raise ValueError with the missing field name. JWKS cache had no invalidation on signature failure — an auth-server key rotation broke logins until the verifier process restarted; now BadSignatureError invalidates the JWKS cache once and retries decode against the refreshed JWKS, mirroring the rotation-graceful behavior the module docstring promised. `_load_metadata`/`_load_jwks` race: concurrent first-callers each passed `if cache is None` and each fetched — added `asyncio.Lock`s with double-checked locking. Authorize-URL assembly used `f"{endpoint}?{urlencode}"` which produces a malformed URL with two `?`s if the endpoint already has a query string — switched to `authlib.common.urls.add_params_to_uri`, which parses and merges correctly. JWKS trust pool included every key in the JWKS regardless of `use`/`alg` — added `_signing_keys_only` filter so an encryption key or unsupported-alg key in the same JWKS can't join the signature-verification trust pool.

Data quality: `claims.get("email", "")` silently produced ReviewerSession(email="") for users without an email claim — now `email: str | None` and absent claims surface as None. `claims.get("preferred_username") or claims.get("username") or None` falsy-collapsed empty strings, masking IdP misconfig — replaced with `_first_present(...)` which distinguishes absent claim (None) from explicitly-empty claim (preserved as "").

Efficiency: TimestampSigner constructed per `encode_session`/`decode_session` call rerouted HMAC key derivation on every gated request — cached via `lru_cache` keyed on (secret, salt) so the request-gating middleware doesn't pay the HMAC-init cost per request. `JsonWebToken(["RS256","ES256"])` hoisted to module scope (`_JWT`). `JsonWebKey.generate_key("RSA", 2048, ...)` test fixtures changed from function-scoped to module-scoped (was ~1-3s of pure RSA keygen per pytest run).

Reuse: hand-rolled S256 PKCE replaced with `authlib.oauth2.rfc7636.create_s256_code_challenge` (authlib is already a dep, the previous implementation was the kind of crypto code we shouldn't own).

Test gaps closed: SESSION_TTL expiry branch of `decode_session` (held the wall clock 12h+60s forward); token endpoint 4xx (`httpx.HTTPStatusError`) — previously only ConnectError was covered; 2xx response missing `id_token` field → ValueError; 2xx response with non-JSON body → ValueError; id_token missing `sub` claim → JoseError; JWKS rotation: BadSignatureError triggers refetch + retry, pinned by a test that flips the stub JWKS between calls.

Deferred: OIDC nonce (code+PKCE flow with a locally-consumed id_token gets equivalent replay protection from the verifier-binding; added a module-docstring section explaining why); ReviewerSession dataclass vs Pydantic divergence (style; PR #70 already wraps with VerifiedBy Pydantic at the storage boundary); env-accessor consolidation (5 near-identical helpers; cosmetic).
Iter 2 of /code-review max surfaced 2 substantive bugs introduced by iter 1's fixes plus a couple of test-quality gaps. Iter 1's claim-extraction helpers replaced the falsy-coalesce form to distinguish 'absent' from 'empty string', but `str(None)` silently produces the literal string 'None' — exactly the silent corruption the field-type change was meant to prevent. Switched both helpers to `claims.get(name)` + `is not None` check so JSON null is treated as 'not really there' rather than coerced to the four-character string 'None'.

Correctness: `_optional_str` and `_first_present` now reject both absent AND null claims (returning None), preserving 'present-with-content' values including empty string. Without this, an IdP that emits `preferred_username: null` would short-circuit `_first_present` and inject 'None' as the username instead of falling through to `username: "djradio"`. `decode_session` now also catches `RuntimeError` — a config-reload bug that unsets `WXYC_SESSION_SECRET` mid-process would otherwise 500 every gated request instead of redirecting to /auth/login (the docstring promises None → 'no session, send to login', not a 500 leak).

Tests: new tests pin the null-claim contract (`exchange_code` treats `email: null` as None on the ReviewerSession, not 'None'; falls through to the next field on first-present); pin the RuntimeError catch in decode_session (env-unset mid-process returns None); pin `_invalidate_jwks_cache` directly (the rotation integration test monkeypatches `_load_jwks` so it doesn't actually exercise the cache state). Sub-missing test tightened from `JoseError` to `MissingClaimError` so a regression that strips `sub` from claims_options is distinguishable from a signature/aud/iss failure (the former bypasses the JoseError family entirely with a raw KeyError; the test must catch that exactly).

Deferred (low impact): JWKS keyset memoization across rotation retry (saves microseconds), `kty` filter on `_signing_keys_only` (defense-in-depth only — auth server controls its own JWKS), lru_cache reference retention of rotated session secrets (cosmetic), pinning the token POST body shape in tests (would require enriching the test fakes to capture request kwargs).
Iter 3 of /code-review max found no remaining correctness bugs but surfaced two small follow-ups: the `decode_session` docstring didn't mention the `RuntimeError` branch (inviting a future reader to "clean up" the catch that protects against config-reload bugs), and the iter-2 cache-invalidation test added a redundant `_reset_metadata_cache()` call that contradicted the autouse fixture pattern.

Also: the 1-char tamper logic in `test_decode_session_returns_none_on_tamper` and `test_verify_one_shot_raises_on_tamper` was the source of an intermittent flake on this branch — base64url's 27-char signature segment has 2 padding bits in the last char, so 4-of-64 random last-char replacements decode to the same HMAC bytes and the tampered value verifies. The fix to swap 4 chars (touching multiple HMAC bytes deterministically) lived on PR #70's branch and didn't carry back to PR #69; applied it here too so this branch's CI is stable in isolation.

3 consecutive full-suite runs clean (532 passed each).
@jakebromberg jakebromberg merged commit 8ad503c into main Jun 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OIDC reviewer identity for the verifier UI

1 participant