Skip to content

fix: Harden webhook, vault, session, and base client paths#654

Merged
gjtorikian merged 13 commits into
mainfrom
updates
May 11, 2026
Merged

fix: Harden webhook, vault, session, and base client paths#654
gjtorikian merged 13 commits into
mainfrom
updates

Conversation

@gjtorikian
Copy link
Copy Markdown
Contributor

Summary

  • Webhook/action signature verification: reject future-skewed timestamps in webhooks/_resource, webhooks/_verification, and actions._verify_signature by comparing abs(seconds_since_issued); honor an explicit tolerance=0 (was silently coerced to the 180s default).
  • Vault crypto: preserve empty associated_data/aad (was dropped, causing encrypt/decrypt AAD divergence) and bound _decode_u32_leb128 to 32 bits so a malformed payload can't yield a >32-bit key_len. Empty-AAD ciphertexts written by older SDKs need migration — noted for release notes.
  • Public client isolation: create_public_client now forces api_key=None and sets is_public=True, so a PKCE/public client cannot pick up WORKOS_API_KEY from the process env.
  • Session refresh: map auth-flow/network exceptions to structured AuthenticateWithSessionCookieFailureReason members (MFA_CHALLENGE_REQUIRED, SSO_REQUIRED, EMAIL_VERIFICATION_REQUIRED, ORGANIZATION_SELECTION_REQUIRED, REFRESH_DENIED, REFRESH_NETWORK_ERROR) instead of stringifying. Additive — string consumers keep working.
  • Base client URL safety: percent-encode every path segment via a single _encode_path helper so user-supplied identifiers can't escape their segment with /, ?, #, or %. Covers all ~115 generated call sites.

Test plan

  • uv run ruff format --check .
  • uv run ruff check .
  • uv run pyright
  • uv run pytest (in particular tests/test_actions.py, tests/test_webhook_verification.py, vault, session, and base-client suites)
  • Manually verify a public client built via create_public_client does not include client_secret even when WORKOS_API_KEY is set in the environment
  • Confirm release notes call out the empty-AAD wire-format change for locally-persisted vault ciphertexts

gjtorikian and others added 9 commits May 7, 2026 14:35
Centralize path-segment encoding so user-supplied identifiers cannot
escape their intended URL segment via reserved characters such as
'/', '?', '#', or '%'. A new helper `_encode_path` splits on '/'
and applies `urllib.parse.quote(seg, safe='')` to each segment, and
is invoked from both the sync and async `request` methods. This
covers all ~115 generated call sites without per-site changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds new members to `AuthenticateWithSessionCookieFailureReason`
(`MFA_CHALLENGE_REQUIRED`, `SSO_REQUIRED`,
`EMAIL_VERIFICATION_REQUIRED`, `ORGANIZATION_SELECTION_REQUIRED`,
`REFRESH_DENIED`, `REFRESH_NETWORK_ERROR`) and maps the relevant
auth-flow / network exceptions raised during refresh to those
reasons instead of stringifying the exception. The `reason` field
is already typed `Union[..., str]`, so existing string-based
consumers keep working — additive only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`create_public_client` now passes `api_key=None` and `is_public=True`
to `WorkOSClient`. The base client honors `is_public` by forcing
`_api_key` to `None` and ignoring the `WORKOS_API_KEY` environment
variable, so a public/PKCE client cannot accidentally pick up an
API key from the process environment.

The existing `if self._client._api_key` guards in PKCE token-
exchange paths (`sso/_resource.py`, `user_management/_resource.py`)
keep `client_secret` out of the request body for public clients
without further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the freshness window only rejected timestamps that were
older than the tolerance — clock skew that put the issued timestamp
in the future was silently accepted, opening the verifier to
replay-style attacks once the attacker's clock drifted forward.

Compare `abs(seconds_since_issued)` against `max_seconds_since_issued`
in both the sync and async `verify_header` implementations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_decode_u32_leb128` previously kept reading continuation bytes
after the 4th byte (5+ continuations could yield a 35+ bit value
that would later be used as `key_len` for slicing the payload).
Tighten the loop to reject a 5th continuation byte (`i >= 4 and
b & 0x80`) and validate the decoded result against the 32-bit
ceiling (`> 0xFFFFFFFF`) before returning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the truthy `if aad:` / `if associated_data` guards with
explicit `is not None` checks in `_aes_gcm_encrypt`,
`_aes_gcm_decrypt`, and the four `aad_buffer` construction sites.
Previously an empty string passed as `associated_data` (or an empty
`bytes` aad) was silently dropped, so encrypt and decrypt paths
disagreed on the AAD bound to the GCM tag.

Wire-format risk: any locally-persisted vault ciphertext that was
encrypted by an earlier version of this SDK with `associated_data=""`
(or `aad=b""`) was actually written without AAD. After this fix the
SDK now binds an empty AAD, so re-encrypting with the same empty
string will produce a tag the old SDK cannot verify, and decrypting
old empty-AAD ciphertexts must continue to use no AAD. WorkOS
Vault ciphertexts live server-side, so production impact is bounded;
applications that have stashed ciphertexts locally must migrate.
Document in the next release migration note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tolerance or DEFAULT_TOLERANCE` silently coerced an explicit
`tolerance=0` (caller wants no tolerance) to the default 180-second
window. Use `tolerance if tolerance is not None else DEFAULT_TOLERANCE`
so callers that ask for a strict zero window get one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff format --check flagged the ORGANIZATION_SELECTION_REQUIRED return as
exceeding line length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 7f58e9f closed the future-timestamp replay gap in
`webhooks/_resource.verify_header`, but the same vulnerable
comparison still lived in `actions._verify_signature` and the
standalone `webhooks/_verification.verify_header` helper — both
rejected only past-skewed timestamps, silently accepting issued
timestamps in the future.

Compare `abs(seconds_since_issued)` in both helpers and add
regression tests covering the future-timestamp case for actions,
the webhook resource, and the standalone verifier.
@gjtorikian gjtorikian requested review from a team as code owners May 8, 2026 19:13
@gjtorikian gjtorikian requested a review from awolfden May 8, 2026 19:13
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR hardens several security-critical code paths: webhook/action signature verification now rejects future-skewed timestamps with abs() and honours an explicit tolerance=0; create_public_client forces the server credential to None so the process-env value cannot leak into PKCE flows; the LEB128 decoder in vault is bounded to 32 bits; and session-refresh exceptions are mapped to structured enum members instead of raw strings.

  • Path encoding (_base_client.py): all ~115 call-sites converted from bare f-string paths to tuple segments; _encode_path percent-encodes each segment with quote(seg, safe=\"\") and rejects bare strings with a TypeError, preventing ?, #, and % injection into path segments.
  • Vault LEB128 (vault.py): _decode_u32_leb128 now caps the byte-count at 5 and validates the final value against 0xFFFFFFFF, preventing a crafted payload from yielding a key-blob length larger than 32 bits.
  • Session refresh (session.py): _map_refresh_exception_to_reason maps all auth-flow and network exceptions to AuthenticateWithSessionCookieFailureReason enum members; unknown exceptions fall back to str(exc) for backward compatibility.

Confidence Score: 5/5

Safe to merge; all security-relevant changes are internally consistent and the new tests confirm the intended behavior.

The changes are well-scoped and correctly implemented: timestamp skew is now rejected symmetrically with abs(), explicit zero tolerance is honoured, the public-client factory reliably clears the server credential regardless of environment, LEB128 decoding is bounded to 32 bits with two independent guards, and session-refresh exceptions are exhaustively mapped to structured enum members. No new logic errors were found.

No files require special attention beyond what was flagged in prior review rounds.

Important Files Changed

Filename Overview
src/workos/_base_client.py Adds _encode_path static method that percent-encodes tuple segments; all request/build_url methods now require Sequence[str] paths. is_public flag correctly suppresses env-var credential lookup.
src/workos/vault.py _decode_u32_leb128 now raises on 5-byte LEB128 with continuation bit set and checks final value against 0xFFFFFFFF. All KV paths converted to segment tuples.
src/workos/session.py Adds _map_refresh_exception_to_reason mapping all auth-flow and network exceptions to AuthenticateWithSessionCookieFailureReason enum members; unknown exceptions fall back to str(exc).
src/workos/webhooks/_verification.py abs() wraps seconds_since_issued to reject future-skewed timestamps. tolerance is now checked via is not None so an explicit tolerance=0 is honoured instead of falling back to 180s default.
src/workos/actions.py abs() added around seconds_since_issued in _verify_signature; future-dated action timestamps are now correctly rejected.
src/workos/public_client.py create_public_client now passes is_public=True so the client credential is forced to None regardless of what the process environment contains, preventing credential leakage in PKCE flows.
src/workos/webhooks/_resource.py Both sync and async Webhooks.verify_header now use abs() for timestamp comparison and tolerance is not None guard, matching the standalone _verification.py fixes.
tests/test_actions.py New tests cover future-timestamp rejection confirming the abs() fix is exercised.
tests/test_session.py Tests cover _map_refresh_exception_to_reason for all mapped exception types and the generic fallback.

Reviews (5): Last reviewed commit: "fix(client): pass path as sequence of se..." | Re-trigger Greptile

Comment thread src/workos/session.py
@gjtorikian gjtorikian changed the title Harden webhook, vault, session, and base client paths fix: Harden webhook, vault, session, and base client paths May 8, 2026
gjtorikian and others added 3 commits May 8, 2026 12:33
…tured reasons

Refresh failures from MfaEnrollmentError, OrganizationAuthMethodsRequiredError,
AuthenticationMethodNotAllowedError, and RadarChallengeError previously fell
through the AuthenticationError check (they inherit from AuthorizationError)
and surfaced as bare strings. Add enum members and isinstance branches so all
explicit AuthKit flow outcomes resolve to structured reasons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression guard for the standalone _verification.verify_header tolerance
handling: a 1-second-old signature with tolerance=0 must be rejected, so
reverting to the prior `tolerance or DEFAULT_TOLERANCE` form would fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 0c12462.

The original truthy `if aad:` guard was self-consistent within the SDK
(encrypt and decrypt agreed: empty string and None both meant "no AAD"),
and matches the convention used by sibling WorkOS SDKs. Switching to
`is not None` made `aad=""` semantically distinct from `aad=None`, which
is cryptographically more precise but breaks decryption of every vault
ciphertext previously encrypted with `associated_data=""` once a caller
upgrades — the new SDK calls `authenticate_additional_data(b"")` and the
GCM tag (computed without AAD on the encrypt side) will not verify.

The threat being closed (a caller passing "" expecting an empty AAD to
be authenticated, but silently getting none) is a narrow integrity
misperception, not an exploitable hole, and there is no signal anyone
in the wild relies on it. The wire-format break is the larger harm.
A coordinated cross-SDK change with a versioned ciphertext header is
the right path if this ever needs to be revisited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/workos/_base_client.py Outdated
Callers now pass each path component separately (e.g. ("organizations",
organization_id)) so _encode_path can URL-encode each element with
safe="" before joining. A "/" or ".." inside a caller-supplied id is
now encoded as %2F / %2E%2E and cannot escape its intended segment —
the structural fix Greptile flagged on PR #654.

- _base_client.py: request/request_raw/request_list/request_page and
  build_url (sync + async) take Sequence[str]; _encode_path raises
  TypeError on a bare str so a forgotten tuple wrapper fails loudly.
- Hand-maintained files (session, passwordless, vault, and PKCE/SSO
  helpers fenced inside generated resources) converted to tuples.
- Regenerated all resource files via oagen; the urllib.parse.quote
  import is no longer needed in generated code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gjtorikian gjtorikian merged commit d21f3b4 into main May 11, 2026
10 checks passed
@gjtorikian gjtorikian deleted the updates branch May 11, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant