Skip to content

feat(session): token refresh on 401 at the session layer (LIB-9)#114

Merged
andrewkoltsov merged 5 commits into
masterfrom
claude/lib-9-token-refresh
May 31, 2026
Merged

feat(session): token refresh on 401 at the session layer (LIB-9)#114
andrewkoltsov merged 5 commits into
masterfrom
claude/lib-9-token-refresh

Conversation

@andrewkoltsov

@andrewkoltsov andrewkoltsov commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

  • LibrusSession.forChild() clients now transparently refresh the Synergia bearer token and retry once on a 401, instead of surfacing the expired-token error to the caller
  • Concurrent 401s for the same child collapse to a single refresh, and a 401 that persists after refresh throws LibrusAuthenticationError with code AUTH_REFRESH_FAILED (the original 401 preserved as cause)
  • LibrusSession.refreshBearerToken(childId) is also available as a public refresh primitive
  • Standalone new SynergiaApiClient(token) with no callback is unchanged — still throws on 401

Closes LIB-9.

Design notes

  • The single-retry-on-401 loop lives in SynergiaApiClient.withAuthRefresh (where the mutable token slot and callback live). request.ts gains an explicit GET-only/idempotent invariant comment.
  • onAuthInvalidated is an internal-only option (not in exported SynergiaApiClientOptions, not in public .d.ts) — SynergiaApiClientInternalOptions is unexported.
  • 401-refresh and 5xx/backoff retry (LIB-10) remain separate and compose cleanly.
  • latestTokenPerChild map eliminates redundant portal refreshes when a delayed stale-token 401 fires after the in-flight map clears, and ensures forChild(childObject) uses the freshest known token regardless of what the caller's ChildAccount reference holds.
  • Token captured before run() in withAuthRefresh so parallel requests on the same client each report the token they actually used, even if a sibling handler already updated this.accessToken.

Test plan

  • npm run build — type-check passes
  • npm test — 408 tests pass (24 test files), including new tests:
    • Refresh once on 401 → retry with fresh Bearer
    • Stampede: parallel 401s → exactly one getFreshSynergiaAccount call
    • Second 401 after refresh → AUTH_REFRESH_FAILED + cause = original 401
    • Standalone client unchanged on 401
    • Secret-leak guard: no token/password appears in logs or errors
    • Delayed same-client parallel 401 (deterministic gate) collapses to one portal refresh
    • forChild(childObject) after refreshBearerToken starts with fresh token
    • refreshBearerToken always triggers a portal call regardless of cache
  • npm run lint — clean

🤖 Generated with Claude Code

Session-created child clients transparently refresh the Synergia bearer
token and retry once on a 401, instead of surfacing the raw error.
Concurrent 401s for the same child collapse to a single portal round-
trip (stampede protection). A 401 that persists after the fresh token
throws LibrusAuthenticationError with code AUTH_REFRESH_FAILED, with
the original 401 preserved as cause. Standalone SynergiaApiClient with
no callback is unchanged. Adds LibrusSession.refreshBearerToken(childId)
as a public refresh primitive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewkoltsov

andrewkoltsov commented May 31, 2026

Copy link
Copy Markdown
Owner Author

Code review

Several issues should be addressed before merging:

  1. [P2] forChildWiadomosci() does not receive the token refresh callback (src/sdk/LibrusSession.ts:459). The returned client is still used for API 3.0 methods such as getGrades(), message attachments, and receiver groups. Those methods currently surface a raw LibrusApiError on 401, unlike clients created by forChild(). Please pass onAuthInvalidated: () => this.refreshBearerToken(child.id) here as well.

  2. [P2] The refreshed token is not persisted in session-owned state (src/sdk/LibrusSession.ts:431). refreshBearerToken() returns a fresh token, but accountsCache still contains the stale value. As a result, after await session.refreshBearerToken(101), a subsequent await session.forChild("101") starts with the stale token again, incurs an unnecessary 401, and performs another portal refresh. The session should maintain the latest token per child id or update the cached account entry.

  3. [P3] cause does not match the documented contract (src/sdk/synergia/SynergiaApiClient.ts:338). The CHANGELOG and test state that the original 401 is preserved, but the implementation stores the second 401 (retryError) as cause. Please either use cause: error or update the contract and strengthen the test with an identity assertion.

  4. [P3] The internal callback is exposed in the public API, while the new public method is undocumented (src/sdk/synergia/SynergiaApiClient.ts:233, README.md). onAuthInvalidated is present in the generated public .d.ts despite being described as internal-only. Meanwhile, the public LibrusSession.refreshBearerToken() method is missing from the README session-method list.

Verified locally: npm test (405 tests), npm run build, npm run lint, npm run format:check, npm run pack:check, and git diff --check. All GitHub checks were green at review time.

- forChildWiadomosci() now receives the onAuthInvalidated callback so
  API 3.0 methods on those clients also benefit from token refresh
- performBearerTokenRefresh() updates accountsCache after a successful
  refresh so a subsequent forChild() starts with the fresh token instead
  of the now-stale one stored at login time
- withAuthRefresh: use cause: error (first/original 401) instead of
  cause: retryError (second 401), consistent with CHANGELOG and docs
- onAuthInvalidated moved from exported SynergiaApiClientOptions to an
  internal SynergiaApiClientInternalOptions interface so it no longer
  appears in the public .d.ts as a settable option
- refreshBearerToken() documented in README session methods list
- Test: identity-checks that cause was the pre-refresh 401 (stale token)
  not the post-refresh one (fresh token)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewkoltsov

Copy link
Copy Markdown
Owner Author

Follow-up code review

The previous findings around forChildWiadomosci(), the preserved cause, the public .d.ts, and README documentation have been addressed. One session-state issue remains, with two observable cases:

  1. [P2] Delayed parallel 401s can still trigger multiple portal refreshes (src/sdk/LibrusSession.ts:415). refreshInFlight deduplicates only while the refresh promise is still pending. If two clients start requests with the same stale token, the first 401 can refresh to fresh-1 and clear the map before the second stale request returns 401. The second client then performs another portal round-trip and refreshes to fresh-2. This does not satisfy the stated single-refresh behavior for concurrent stale requests. A session-owned latest-token value or token generation/version check is needed so a delayed stale failure can reuse the token that was already refreshed.

    Reproduced request sequence: Bearer stale, Bearer stale, Bearer fresh-1, Bearer fresh-2; portal refresh count: 2.

  2. [P3] Reusing a previously returned ChildAccount still bypasses the refreshed cache (src/sdk/LibrusSession.ts:394). Updating accountsCache fixes forChild("101"), but forChild(child) continues to read child.accessToken directly. A common flow such as const [child] = await session.listChildren(); await session.refreshBearerToken(child.id); await session.forChild(child) therefore starts with the stale token and incurs another 401 plus portal refresh. Keeping the latest token in a session-owned map keyed by child id would cover both selector and object-based calls.

Verified locally on head 9ec7b06: npm test (405 tests), npm run build, npm run lint, npm run format:check, npm run pack:check, and git diff --check. The corrected forChildWiadomosci() refresh path was also exercised manually. All GitHub checks are green.

Add latestTokenPerChild map to LibrusSession so that:

- A delayed 401 that fires after the in-flight refresh has already
  cleared can detect the superseded token (staleToken arg passed by
  onAuthInvalidated) and return the cached fresh token without a second
  portal round-trip. Previously the in-flight dedup only helped while
  the promise was still pending.

- forChild(childObject) always starts with the freshest known token,
  even when the caller holds a stale ChildAccount reference from before
  a manual refreshBearerToken() call.

onAuthInvalidated now receives the client's stale token as an argument
so LibrusSession.refreshBearerToken() can compare it against the latest
known value. The public refreshBearerToken(childId) signature is
unchanged for external callers (staleToken is an optional second arg).

Two new tests cover both scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewkoltsov

Copy link
Copy Markdown
Owner Author

Follow-up code review

The two previously reported cross-client stale-token cases are fixed on e6e311c: separate clients now reuse the session-owned latest token, and forChild(childObject) starts with the refreshed token. Three issues remain:

  1. [P2] Parallel requests on the same SynergiaApiClient can still perform redundant portal refreshes (src/sdk/synergia/SynergiaApiClient.ts:337). withAuthRefresh() reads this.accessToken after the failed request returns. If two requests were sent with stale, the first request can refresh the mutable field to fresh-1 before the second stale request returns 401. The second handler then incorrectly reports fresh-1 as its stale token, so refreshBearerToken() cannot recognize that the failure came from the superseded stale token and performs a second portal refresh to fresh-2.

    Reproduced with one client and two parallel getLuckyNumber() calls: request sequence Bearer stale, Bearer stale, Bearer fresh-1, Bearer fresh-2; portal refresh count: 2.

    Capture the token used for the initial attempt before invoking run(), then pass that captured value to onAuthInvalidated. The new delayed-401 test calls refreshBearerToken() directly, so it does not cover this client-level race.

  2. [P3] forChildBff(childObject) still ignores the session-owned latest token (src/sdk/LibrusSession.ts:527). After await session.refreshBearerToken(child.id), passing a previously returned stale ChildAccount to forChildBff(child) still constructs BffApiClient with child.accessToken. The BFF request is therefore sent with Bearer stale. Use latestTokenPerChild.get(String(child.id)) ?? child.accessToken, as the other child-client factories now do.

  3. [P3] The internal stale-token optimization is exposed as a public argument (src/sdk/LibrusSession.ts:426). refreshBearerToken(childId, staleToken?) is emitted in the public .d.ts, even though the second argument is documented as internal callback plumbing. It also changes the semantics of an explicit public refresh: a caller passing any token different from the cached one skips the portal round-trip. Keep the public method as refreshBearerToken(childId) and move the stale-token-aware path into a private helper used by onAuthInvalidated.

Verified locally on head e6e311c: npm test (407 tests), npm run build, npm run lint, npm run format:check, npm run pack:check, and git diff --check. All GitHub checks are green.

SynergiaApiClient.withAuthRefresh now captures this.accessToken before
run() so that concurrent sibling requests each report the token they
actually used, not the value a racing handler may have already updated.
This lets acquireFreshToken() correctly detect superseded stale tokens.

LibrusSession: split refreshBearerToken (public, no stale-token arg)
from acquireFreshToken (private helper used by onAuthInvalidated).
The stale-token parameter no longer appears in the public .d.ts.

forChildBff(childObject) now reads latestTokenPerChild so a previously
refreshed token is used instead of child.accessToken.

Test: replace the direct-refreshBearerToken stale-token dedup test with
a same-client parallel-request test that exercises the real code path;
add assertion that explicit double-refresh does two portal calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewkoltsov

Copy link
Copy Markdown
Owner Author

Follow-up code review

No blocking findings on 0144512.

The three previously reported issues are addressed:

  • same-client parallel 401s now capture the token used before the request and collapse to one portal refresh;
  • forChildBff(childObject) now uses the session-owned latest token;
  • stale-token deduplication moved behind the private acquireFreshToken() helper, restoring the public refreshBearerToken(childId) signature.

I re-ran the delayed same-client race manually with a deliberately late second 401. The observed request sequence is now Bearer stale, Bearer stale, Bearer fresh-1, Bearer fresh-1, with exactly one portal refresh. I also verified that a BFF client created from a stale ChildAccount sends Bearer fresh-1 after a manual refresh.

Non-blocking test cleanup: the test named reuses the already-refreshed token when a delayed 401 fires after the in-flight refresh cleared now checks repeated explicit refresh semantics rather than a delayed 401, and the new same-client test does not intentionally delay the second response. Consider renaming the former and making the latter delay deterministic so the original race remains covered. The PR description also still says 405 tests; the current suite has 408.

Verified locally: npm run validate (including 408 tests with coverage), git diff --check, and the manual race/BFF scenarios above. All GitHub checks are green.

Rename "reuses the already-refreshed token when a delayed 401 fires
after the in-flight refresh cleared" → "refreshBearerToken always
triggers a portal call regardless of the latest-token cache" to match
what the test now verifies (explicit double-refresh semantics).

Make the same-client parallel-401 test deterministic: gate the second
stale-token 401 response with a Promise, release it via vi.waitFor
after getFreshSynergiaAccount has been called (macrotask boundary
guarantees latestTokenPerChild is set). This explicitly exercises the
"delayed 401 after in-flight cleared" path rather than relying on
Promise.all scheduling.

Also update PR description test count from 405 to 408.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewkoltsov

Copy link
Copy Markdown
Owner Author

Follow-up code review

No findings on 89180b4.

The remaining non-blocking test cleanup from the previous review is addressed:

  • the explicit-refresh test now has an accurate name;
  • the same-client parallel-401 regression test deterministically holds the second stale response until the first refresh has populated latestTokenPerChild;
  • the PR description now reports the current 408-test suite and documents the new coverage.

I re-ran the delayed same-client race manually. The observed sequence remains Bearer stale, Bearer stale, Bearer fresh-1, Bearer fresh-1, with exactly one portal refresh.

Verified locally: npm run validate, git diff --check, and the manual delayed-race scenario. All GitHub checks are green.

@andrewkoltsov andrewkoltsov merged commit 33013f8 into master May 31, 2026
11 checks passed
@andrewkoltsov andrewkoltsov deleted the claude/lib-9-token-refresh branch May 31, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant