perf(editor): ttl-refresh traject index snapshots, bound body cache, implements index by tdjager · Pull Request #794 · MinBZK/regelrecht

tdjager · 2026-06-11T13:23:34Z

Problem (audit advice #9)

Three staleness/perf problems on the editor-api read paths:

Stale-forever traject caches — TrajectCorpusCache entries were only invalidated on traject create/delete. The per-traject source-map index snapshot never picked up upstream changes (new laws merged, re-harvests, saves on another replica) until process restart, and the lazily-fetched body cache was unbounded and never refreshed either.
O(corpus) display names per request — GET /corpus/laws called resolve_display_name per law per request: a line scan of every body, plus a full serde_yaml_ng parse for name: '#ref' laws. For the unfiltered list that is O(total corpus bytes) per page load.
O(corpus) implementor discovery with serialized fetches — implementors_in_scope lazily fetched every metadata-only law's body sequentially under the per-source mutex on every request, and silently skipped fetch failures, so a throttled scan returned a silently incomplete list.

Changes

TTL + refresh on the per-traject index snapshot (`TRAJECT_INDEX_TTL = 60s`)

Follows the existing CHANGED_LAWS_TTL precedent (same 60s convergence target). The first request past the deadline re-enumerates the sources and swaps in a fresh SourceMap; concurrent requests keep serving the stale snapshot while one refresh is in flight (single-flighted per traject), and a failed refresh re-arms the stale snapshot for another TTL window instead of erroring reads. A refresh where any source fails to enumerate is treated as failed — serving the previous complete snapshot beats swapping in one with thousands of laws missing. The failed-refresh warn is rate-limited (first failure and every 10th consecutive one; the rest log at debug) so a permanently broken source doesn't re-warn every minute.

Why this is safe w.r.t. saves: the refreshed instance carries over

the backend map (same Arc<Mutex<…>> instances — an in-flight save keeps excluding writers across the swap, and the writable-own → seed lock-ordering invariant from fix(editor): optimistic concurrency (ETag/If-Match) for law and scenario saves #790 is untouched);
the post-save overlay (shared Arc) — a refresh never resurrects pre-save content, and a record_save racing the swap on the old instance is visible through the new one;
the changed-laws cache (shared Arc) — a refresh cannot undo save_law's invalidation;
the implements index (served stale while rebuilt) and its cross-snapshot memo — see below.

Overlay reconciliation at refresh (read-your-writes-or-newer)

The overlay is carried across refreshes, but no longer unconditionally: each refresh compares every overlay entry against its write-target branch file (one per-entry backend lock — never held across the loop, and no seed lock, so the writable-own → seed lock order can't be violated) and drops the entry when the branch content no longer equals the saved body. That is the cross-replica convergence story: a law saved on this process serves the saved body until someone else (another replica, a direct push) overwrites it on the branch, after which reads converge to the newer branch content within a TTL window — instead of this process pinning its own save forever, which made every cross-replica If-Match save 412 against content the loser could never fetch. A read error keeps the entry (a network blip must not drop a save), and removal is compare-and-remove so a save landing concurrently with the reconcile pass is never dropped.

Bounded, per-snapshot body cache (`BODY_CACHE_MAX_ENTRIES = 1024`)

The save overlay and the lazy-read cache used to share one unbounded map. They are now split: saves stay in the overlay (bounded by laws edited in the traject, survives refreshes), lazily fetched bodies live in a FIFO-bounded per-snapshot cache that the refresh discards — that is how upstream content changes (not just index changes) converge within the TTL. FIFO instead of LRU keeps cache hits on the cheap read guard; at ~tens of KB per body the cap bounds the cache at low tens of MB per traject.

Implements index: stale-while-revalidate + content-addressed memo

LoadedLaw now carries display_name and implements, derived once (one shared YAML parse) at load / update_yaml_content time in regelrecht-corpus. The law list reads display_name straight off the entry; the global implementors lookup is an in-memory reverse scan.
Traject scopes build an implements index on the first lookup of the process — the one O(corpus) body scan. A TTL refresh does not repeat it: the previous index is carried over flagged stale, the first post-refresh lookup rebuilds it under a try_lock (same single-flight pattern as the snapshot refresh) while concurrent lookups keep serving the carried index — the federated implementors hot path (fix(editor): stop scenario dependency-scan request storm + un-gate result buttons #762) never queues behind a rescan again.
The rebuild itself is content-addressed: the GitHub Trees enumeration's blob shas are now plumbed through regelrecht-corpus onto metadata-only LoadedLaw entries (content_sha), and the editor keeps a cross-snapshot (source_id, blob_sha) → implements memo (shared Arc, like the overlay). A rebuild fetches/parses only bodies whose blob changed; an unchanged corpus rebuilds with zero Contents API calls. Overlay-saved laws are parsed from the overlay body (never memoized under the pre-save sha), loaded (local) laws reuse the corpus-parsed list, and each rebuild swaps the memo wholesale so it stays bounded by the current corpus size. record_save still updates the live index in place.
Fetch failures during a scan are recorded and reported (the build warns once with skipped/indexed counts; the per-request signal is debug) instead of being indistinguishable from "no implementors"; the failed set is retried at the next rebuild. The response is a bare id array, so there was no non-breaking field for a partiality flag — left as a possible follow-up if the frontend wants to surface it.

Small fix

The global scenario GET leaked raw backend error text via e.to_string(); it now uses the same generic-message/log-details pattern as corpus_write_error.

Possible follow-up

A partial refresh (keep healthy sources' fresh enumeration when one source fails, instead of all-or-nothing serve-stale) would tighten convergence further for multi-source trajects with one flaky seed; deliberately out of scope here.

Tests

corpus: display-name/implements precompute on local load, fetched files, metadata-only entries (incl. content_sha), and update_yaml_content recompute.
editor-api lib: body-cache FIFO bound, TTL freshness rule, lazy-fetch caching + overlay precedence, implements-index build/reuse, fetch-failure reporting, record_save in-place index updates, refresh carry-over semantics (shared backends/overlay/changed-cache), SWR (carried index served while a rebuild holds the build lock), memo (unchanged corpus rebuilds with zero fetches; a changed sha refetches only that body), and overlay reconciliation (entry kept while branch == saved body, dropped when the branch moved past the save).
editor-api integration (traject_reads_test, testcontainers): end-to-end TTL refresh through the real handlers — an upstream-added law appears after the refresh, a fresh save survives it (read-your-writes), and an external overwrite of a saved law converges to the external content instead of being pinned. All 11 tests green.

Note: save_annotations_test and trajects_test do not compile on origin/main either (stale imports like SESSION_KEY_ACTIVE_TRAJECT); untouched here.

just format, just lint, just build-check all green.

…load time Listing endpoints called resolve_display_name per law per request (a line scan of every body, plus a full serde_yaml parse for laws with 'name: #ref'), and implementor discovery re-parsed every body per request. Derive both fields once, with a single shared YAML parse, when a law is loaded (local scan or fetched file) and recompute them in update_yaml_content; metadata-only index entries keep them empty until their body is known.

…or lookups Three read-path problems in the per-traject corpus: - The cached TrajectCorpus was only invalidated on traject create or delete, so its index snapshot never picked up upstream changes (new laws merged, re-harvests, saves on another replica) until a process restart. The snapshot now expires after TRAJECT_INDEX_TTL (60s, the CHANGED_LAWS_TTL precedent): the first request past the deadline re-enumerates the sources and swaps in a fresh SourceMap while the backend mutexes, the post-save overlay and the changed-laws cache carry over, so an in-flight save keeps its lock and a refresh can never resurrect pre-save content. Concurrent requests serve the stale snapshot while one refresh runs; a failed refresh re-arms the stale snapshot for another TTL instead of erroring reads. - The lazily-fetched law-body cache was unbounded and shared a map with the read-your-writes save overlay. They are now split: saves stay in the (per-edit-bounded, refresh-surviving) overlay, lazy fetches live in a per-snapshot FIFO-bounded cache (1024 entries) that a refresh discards, so upstream body changes converge within the TTL. - implementors_in_scope fetched and parsed every law body sequentially on every request and silently skipped fetch failures. Traject scopes now build a per-snapshot implements index (one O(corpus) scan, updated in place by record_save, discarded with the snapshot) and report fetch-skipped laws via a warn log instead of passing off an incomplete scan as 'no implementors'; the global scope uses the implements lists precomputed at load time. Also routes the global scenario-read error through the generic-message/ log-details pattern instead of leaking raw backend error text.

github-actions · 2026-06-11T13:35:37Z

Preview Deployment — harvester-admin

Your changes have been deployed to a preview environment:

URL: https://harvester-admin-pr794-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

github-actions · 2026-06-11T13:37:00Z

Preview Deployment — harvester-worker

Your changes have been deployed to a preview environment:

URL: https://harvester-worker-pr794-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

github-actions · 2026-06-11T13:37:57Z

Preview Deployment — enrichworker

Your changes have been deployed to a preview environment:

URL: https://enrichworker-pr794-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

github-actions · 2026-06-11T13:40:57Z

Preview Deployment — editor — editor

Your changes have been deployed to a preview environment:

URL: https://editor-pr794-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

… save overlay Review follow-ups on the TTL-refresh snapshot caches: - Stale-while-revalidate for the implements index: a TTL refresh carries the previous index over flagged stale instead of discarding it. The first post-refresh lookup rebuilds it under a try_lock (snapshot pattern) while concurrent lookups keep serving the carried index, so the implementors hot path never queues behind a corpus rescan again. - Content-addressed rebuild memo: the Trees enumeration's blob shas are now plumbed through regelrecht-corpus onto metadata-only LoadedLaw entries (content_sha), and the editor keeps a cross-snapshot (source_id, sha) -> implements memo shared like the overlay. A rebuild only fetches/parses bodies whose blob actually changed; an unchanged corpus rebuilds with zero Contents API calls. Each rebuild swaps the memo wholesale so it stays bounded by the current corpus size. - Overlay reconciliation at refresh: every overlay entry is compared against its write-target branch file (one per-entry backend lock, no seed lock) and dropped when the branch moved past the save — another replica's edit or a direct push now converges within a TTL window instead of pinning this process's save forever and 412-looping every cross-replica edit. A read error keeps the entry, and removal is compare-and-remove so a concurrent save is never dropped. - Demote the per-request "implementors incomplete" warn to debug (the index build already warns once with counts). - Rate-limit the failed-refresh warn: first failure and every 10th consecutive one warn, the rest log at debug; the counter resets on a successful refresh. Tests: SWR serving while a rebuild holds the lock, memo skipping unchanged bodies (and refetching changed ones), overlay kept while the branch matches the save and dropped when it moved, plus the integration test now asserting cross-replica convergence instead of forever-pinned saves.

tdjager added 2 commits June 11, 2026 13:18

tdjager deployed to pr794 June 11, 2026 13:33 — with GitHub Actions View deployment

tdjager had a problem deploying to pr794 June 11, 2026 14:23 — with GitHub Actions Failure

tdjager changed the title ~~perf(editor): TTL-refresh traject index snapshots, bound body cache, implements index~~ perf(editor): ttl-refresh traject index snapshots, bound body cache, implements index Jun 11, 2026

tdjager had a problem deploying to pr794 June 11, 2026 14:37 — with GitHub Actions Failure

tdjager marked this pull request as ready for review June 11, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(editor): ttl-refresh traject index snapshots, bound body cache, implements index#794

perf(editor): ttl-refresh traject index snapshots, bound body cache, implements index#794
tdjager wants to merge 3 commits into
mainfrom
perf-editor-api-index-caches

tdjager commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tdjager commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem (audit advice #9)

Changes

TTL + refresh on the per-traject index snapshot (TRAJECT_INDEX_TTL = 60s)

Overlay reconciliation at refresh (read-your-writes-or-newer)

Bounded, per-snapshot body cache (BODY_CACHE_MAX_ENTRIES = 1024)

Implements index: stale-while-revalidate + content-addressed memo

Small fix

Possible follow-up

Tests

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview Deployment — harvester-admin

Uh oh!

github-actions Bot commented Jun 11, 2026

Preview Deployment — harvester-worker

Uh oh!

github-actions Bot commented Jun 11, 2026

Preview Deployment — enrichworker

Uh oh!

github-actions Bot commented Jun 11, 2026

Preview Deployment — editor — editor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tdjager commented Jun 11, 2026 •

edited

Loading

TTL + refresh on the per-traject index snapshot (`TRAJECT_INDEX_TTL = 60s`)

Bounded, per-snapshot body cache (`BODY_CACHE_MAX_ENTRIES = 1024`)

github-actions Bot commented Jun 11, 2026 •

edited

Loading