Skip to content

kaizen: Embed smalltable#535

Merged
timbray merged 37 commits into
timbray:mainfrom
sayrer:embed-smalltable
Jun 10, 2026
Merged

kaizen: Embed smalltable#535
timbray merged 37 commits into
timbray:mainfrom
sayrer:embed-smalltable

Conversation

@sayrer

@sayrer sayrer commented May 31, 2026

Copy link
Copy Markdown
Contributor

This one carries #521 further, fixes some bugs that caused buildshellstyle quadratic behavior, and also speeds up Nfa2dfa.

sayrer and others added 7 commits May 30, 2026 14:33
Adds tableShareKey, a stable identifier for the share group of
slice-headers inside a smallTable. After smallTable is embedded
into faState by value, two states that hold copies of one source
smallTable still share the underlying steps/ceilings/epsilons
backing arrays — and a key built from SliceData(steps) + len(steps)
captures that equivalence. This replaces *smallTable-pointer-identity
as the dedup key in the next commit.

No behavioral change yet; this commit only adds the helper and unit
tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches closureBuffers.tables from map[*smallTable]*tableMark to
map[tableShareKey]*tableMark. With *smallTable pointer identity
still intact (pre-embed), the new key is equivalent: two states
that share a *smallTable trivially share their steps backing.

No behavioral change. Sets up the next commit, which embeds
smallTable in faState by value — at which point pointer-identity
goes away but slice-backing identity remains.
faState.table changes from *smallTable to smallTable (inline). This
shrinks per-state memory:
  - faState 64B + smallTable 80B-class = 144B per state pair
  - embedded faState fits 128B size class = 128B per state
  - saves 16B/state on workloads without table-pointer sharing

vmFields.startTable becomes startState *faState (the start state
owns the start table inline). traverseNFA, epsilonClosure, and
related helpers take *faState instead of *smallTable.

Epsilon-closure dedup continues to work via tableShareKey (added in
the previous two commits) — value-copies of one source smallTable
still share their slice backings, and SliceData(steps) is the new
identity.

Size-assertion tests are not yet recalibrated; that follows in the
next commit.
Per-state size shrunk (faState now 128B with inline smallTable). Update
hand-calibrated constants in TestMcBasicSizes, TestQuaminaMemoryCost,
TestMcNfaSizes. Update TestPP s/t numbers and restore trailing whitespace
in wanted literal (producer still emits "[s/t X/Y] \n" with the trailing
space before the newline).

TestTablePointerDedup's tableSharing and totalEntries expectations also
needed updating — the dedup metric now uses slice-backing identity
(via tableShareKey from earlier commits) rather than *smallTable pointer
identity, so value-copies of a source smallTable register as shared.
Recalibrated to the new ground-truth values.

Minor cleanups: stale "startTable" reference in TestQuaminaMemoryCostSingleton
comment (renamed to startState); inline a dead local in copyShellNode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port of the build-context-extract fix (6947edf) to the embedded-smallTable
design. embed-smalltable inherited the same two regressions from the shared
32dc2a9 ancestor; shellstyle build at 1000 words was ~7x slower than main.

1. closureForNfa's walk dedup was broken. The refactor collapsed two
   independent counters into a single bufs.gen that closureForState mutates,
   so the walk's visited check never matched after the first state and the
   heavily-shared shellstyle graph got re-traversed (O(V*E)). Restored via
   bufs.walkGen, a snapshot closureForState never touches.

2. Per-call allocation. epsilonClosure allocated fresh maps per call and
   tableMarkOf heap-allocated a *tableMark per share group. closureBuffers
   are now pooled (sync.Pool, GC-reclaimable so no steady-state cost), maps
   are reused via a monotonic generation (no clearing), and tableMark is
   stored by value.

Unlike the *smallTable-keyed bce branch, the walk here dedups by *faState
identity rather than tableShareKey. Share-key dedup is unsafe for the walk:
distinct states can share a steps backing array yet have different epsilons,
and the zero key collapses all no-byte tables. The faState pointer is the
natural unique identity now that smallTable is embedded by value. The
post-pass table-pointer dedup keeps tableShareKey (collision-safe there, as
it re-checks sameFieldTransitions) and now skips the zero key explicitly.

Shellstyle build at 1000 words: 1363ms -> 473ms. Full suite passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nfa2Dfa/intern is slated to go live. Profiling BenchmarkNfa2Dfa showed it
~64% slower than main, entirely inside intern()'s dedup: clear(sl.seen) plus
a per-state map assign into a map[*faState]struct{} cost ~600ms where main's
faState.closureSetGen generation-counter compare cost ~50ms. That field was
removed to shrink steady-state memory, so the map was the fallback.

intern already sorts the state set by pointer to build a canonical key, so
duplicates are adjacent after the sort. Replacing the map-based dedup with
slices.Compact over the sorted buffer removes the map (and its clear()) with
no per-faState field and no extra sort — sorting was already happening.

Nfa2Dfa vs main (geomean, n=6): time +63.8% -> +2.65%, B/op -1.05% -> -1.65%,
allocs/op -7.8% -> -9.6%. Full suite passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	epsi_closure.go
#	memory_cost_test.go
#	state_lists.go
@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

(afk for a while, will take a closer look)

@sayrer sayrer changed the title Embed smalltable kaizen: Embed smalltable May 31, 2026
@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

OK, checked out 535 and ran the research-main CSV generator. I'll attach the xlsx files and a graph of sec/100-addP-calls

may30 pr535 [2026-05-31.xlsx](https://github.com/user-attachments/files/28443966/2026-05-31.xlsx) [2026-05-30.xlsx](https://github.com/user-attachments/files/28443967/2026-05-30.xlsx)

@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

Now, granted, all these AddPattern calls are wildcards, and the epsilon merging gets kind of ugly, so probably not a representative workload.

@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

I'm just trying to figure out whether I care about this. Don't have an opinion yet.

timbray and others added 2 commits May 31, 2026 14:34
Signed-off-by: Tim Bray <tbray@textuality.com>
(cherry picked from commit d89e1e4)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sayrer

sayrer commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

I just got that workload, seems like I can do some damage to it with some small changes.

sayrer and others added 5 commits May 31, 2026 14:55
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes AddPattern's epsilon closure incremental. closureForNfa now owns the
closureForState call and prunes any state already closed by a prior
epsilonClosure call, so each add walks only the states it created instead of
re-walking the whole growing NFA (was O(N^2)). closureForState is moved out of
the caller loops into closureForNfa so the prune cannot fire on a state closed
earlier in the same walk.
@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

I gotta say, if those numbers are correct, the permanent memory saving is just not that dramatic and the AddPattern impact is pretty significant. So maybe the thing to do is back out the last PR and then cherrypick the good stuff from the faState/smallTable branch with the add-ons that made nfa2Dfa faster and so on?

@sayrer

sayrer commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

I think I can get it with no regression, just measuring.

Per-walk reset of nfaWalkCount (was a pooled lifetime total); fix stale
'post-pass below' reference; document the mergeFAs ancestor invariant that
makes the prune safe. No logic change beyond the counter reset.
@sayrer

sayrer commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

Even just the first step went well. The closure walk was improved, and had been going over the more space-efficient structures too much.

That's here:
5fe237b

  Incremental epsilon closure — results

  Plan executed (commits on embed-smalltable-research): spec 5fe237b → plan
  534327a → immutability gate 18547f6 → order-independence guard a6c44a9 →
  prune 5976a4f → review nits f96e39a (plus 20c980d research -cpuprofile
  flag).

  The win (10k-pattern research harness):

  ┌──────────────────────┬──────────────┬─────────────────────────────────┐
  │                      │ before prune │           after prune           │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ patterns/sec         │ 133–140      │ 418 (3×)                        │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ wall time            │ ~72 s        │ 24.6 s                          │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ build-scale @ N=1000 │ 473 ms       │ 225 ms (~2.1×)                  │
  ├──────────────────────┼──────────────┼─────────────────────────────────┤
  │ closureForNfa flat   │ 8.46 s       │ 0.06 s (walk re-traversal gone) │
  └──────────────────────┴──────────────┴─────────────────────────────────┘

@sayrer

sayrer commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

So, this is already getting close on AddPattern, and the footprint improvements improve the Matcher.

@timbray

timbray commented May 31, 2026

Copy link
Copy Markdown
Owner

Is this PR up-to-date with your latest?

@sayrer

sayrer commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

No, this is on https://github.com/sayrer/quamina/tree/embed-smalltable-research, which has everything in this PR, your research PR, and further patches to see if the AddPattern cost is really necessary. I didn't want to pile them all in to one review, but I did want to see whether it's possible to avoid any unpleasant trade-offs, and that's looking promising.

@timbray

timbray commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Oh, ok, I'll pull that and have a look around tomorrow sometime.

sayrer and others added 2 commits May 31, 2026 18:06
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timbray

timbray commented Jun 5, 2026

Copy link
Copy Markdown
Owner

OK, this PR now stable from your POV?

@sayrer

sayrer commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

yes

@timbray timbray left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three distinct things in here:

  1. Mostly have things start on faState rather than smallTable, including the field in valueMatcher. I have wanted to do this for like a year, so this is great. I think this job perhaps isn't 100% done (see comments on arguments to mergeFAs and valueMatcher generally) but probably don't need to block on that. I may be able to go back and drastically simplify {"exists":true} matching depending simply on whether vm.start is nil or not.
  2. Turn state.table from *smallTable to smallTable. Excellent.
  3. Refactor epsilonClosure. Frankly I don't understand what's happening here and there are a lot of fields to support it. I've suggested adding a large overview comment, probably sourced from your Claude dialog. I'd prefer not to add the Claude stuff in docs/ to the project as a thing in itself.

Anyhow, this is huge and almost ready, thanks!

Comment thread dedup_key.go
Comment thread dedup_key_test.go
Comment thread epsi_closure.go
Comment thread epsi_closure.go
Comment thread epsi_closure.go Outdated
Comment thread state_lists.go
@@ -38,23 +36,16 @@ func newStateLists() *stateLists {
// which either has already been computed for the set or is created and empty, and
// a boolean indicating whether the DFA state has already been computed or not.
func (sl *stateLists) intern(list []*faState) ([]*faState, *faState, bool) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be interested in what the profiler says. This used to dominate nfa2Dfa performance…

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiled BenchmarkNfa2Dfa (M5, CPU profile). Intern still dominates nfa2Dfa, ~65% of its CPU (1.94s of 3.0s cum).

But the cost has moved. After this branch's changes (sort+compact dedup, reusable scratch buffers, PutUint64 key-build, alloc-free string(keyBuf) lookup), the marshalling is now cheap and the hot line is the map lookup itself:

┌───────────────────────────────┬───────────────────────┬─────────────────────────┐
│             line              │         cost          │          what           │
├───────────────────────────────┼───────────────────────┼─────────────────────────┤
│ sl.entries[string(sl.keyBuf)] │ 1.08s — 56% of intern │ string-hash + map probe │
├───────────────────────────────┼───────────────────────┼─────────────────────────┤
│ slices.SortFunc               │ 280ms                 │ canonical-key sort      │
├───────────────────────────────┼───────────────────────┼─────────────────────────┤
│ append(sortBuf[:0], list...)  │ 250ms                 │ copy input to scratch   │
├───────────────────────────────┼───────────────────────┼─────────────────────────┤
│ key-build loop (PutUint64)    │ 130ms                 │ per-state               │
├───────────────────────────────┼───────────────────────┼─────────────────────────┤
│ rest (compact / miss alloc)   │ ~140ms                │                         │
└───────────────────────────────┴───────────────────────┴─────────────────────────┘

So the code around the lookup is about as tight as it gets; the residual is the irreducible set-dedup hash. Further wins would need a cheaper key/hash or fewer intern calls, not trimming the surrounding code. (The big madvise/GC in the raw profile is n2dNode building DFA states across nfa2Dfa, not intern.)

Comment thread value_matcher.go Outdated
Comment thread value_matcher.go
Comment thread value_matcher.go Outdated
Comment thread epsi_closure.go
sayrer and others added 8 commits June 7, 2026 17:57
Restore the explanation of what closureRep holds (the representative faState
for a table-share group that others collapse onto) and document closureGen
(the dedup generation a mark was last written in), addressing review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Share groups only arise from copying a whole steps slice-header (the spinner
merges in nfa.go), so two tables sharing the data pointer always share the
length too; nothing in the package reslices steps. Pointer identity alone
identifies a share group, so the length never breaks a tie the pointer did
not already break. Reduce the key to just the steps backing-array pointer and
document why, addressing review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
smallTable.epsilons only ever holds explicit, non-nil links: the build hot
path traverseEpsilons dereferences them unguarded, so a nil would be a
corrupted structure rather than a normal case. Remove the misleading
eps != nil guard (keeping the legitimate one on steps, whose entries can be
nil for no-transition byte ranges) and note the asymmetry, addressing review
feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single AddPattern build is single-threaded, so the buffers are never shared
concurrently within a build. Reword the comment to say that, and explain the
actual reasons sync.Pool is used over a plain free list: closureBufferPool is
a package-level global drawn from by independent matcher builds in separate
goroutines, and sync.Pool drops its contents on GC. Addresses review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The field now holds a *faState (the FA start state), and 'startState' was a
leftover from when it paired with startTable. Shorten to 'start' per review
feedback. Pure rename of the field's declaration, selectors, struct-literal
key, and comment prose; unrelated local variables named startState in
monocase.go/rune_range_test.go/regexp_nfa_test.go are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a file-level comment explaining how epsilon-closure building works at a
high level: the eager architecture (closures consumed on the match hot path),
the three nested passes (epsilonClosure / closureForNfa / closureForState),
the three optimizations (incremental walk pruning, self-only sentinel,
table-pointer dedup) and their rationale, and the generation-counter scheme
that the closureBuffers fields encode. Addresses review feedback asking for
a high-level explanation of the otherwise dense scratch structures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The old comment justified the start state's trivial closure by claiming all
Quamina automata begin by matching the opening quote — but true/false/null and
numbers are unquoted, and traverseNFA can be fed an unquoted Q-number value
(value_matcher.go). The conclusion still holds for a different reason: epsilon
transitions are only ever introduced in states reached after the first input
byte, so the entry state carries only byte transitions and never epsilons.
Restate the comment accordingly and note the leading byte is usually, but not
always, the opening quote. Addresses review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…timization

The 'dedup collapsed onto self' branch is flagged in review as never taken in
test coverage. It is a pure optimization, not a correctness guard: consumers
treat len==0 as self-only and otherwise iterate the closure (which includes
self), so a 1-element {self} slice would behave identically. Its only job is to
preserve the no-length-1 invariant on the post-dedup path. Add a brief note
explaining it is deliberately uncovered because dedup collapsing a >=2-member
closure entirely onto self appears structurally unreachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sayrer

sayrer commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

OK, I think I got everything or explained them at least.

@timbray

timbray commented Jun 7, 2026

Copy link
Copy Markdown
Owner

I'll go through this a little later, but a quick note on first glance: If you go through quamina.AddPattern() you can't have a spinner on the start state, because regexps can only apply to strings, so the first step in the regexp is always a literal " or 0x22. In testing, it's easy to build a wildcard or regexp that starts with * or .*, but not in actual use.

Arguably Quamina always retaining the enclosing quotations adds 2 extra steps to FA traversal, but it simplifies a bunch of other things.

@timbray timbray left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only remaining quibble is the sync.Pool() stuff. Also see earlier note about root spinners (can't happen)

Comment thread docs/superpowers/plans/2026-05-31-incremental-epsilon-closure.md Outdated
Comment thread epsi_closure.go Outdated
// A single build (everything under one AddPattern) is single-threaded, so the
// buffers are never shared concurrently within a build. sync.Pool is used
// rather than a plain free list for two reasons: this is a package-level
// global, so independent matcher builds running in separate goroutines draw

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "independent matcher builds" on an entirely separate Quamina instance? In the same Go program? That seems wildly unlikely. As the comment makes clear, you can't have concurrent matcher builds on the same instance.

I still can't see why it wouldn't be simpler & probably faster just to have a single set of closureBuffers around per instance and have a resetClosureBuffers() call to clear the maps at the start of AddPattern. There's no reason ever to GC this stuff unless you know for sure you're never going to call AddPattern again.

@timbray

timbray commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Now reading https://pkg.go.dev/sync#Pool and trying to figure out what I think.

sayrer and others added 2 commits June 8, 2026 08:14
Drop the Claude-generated design docs under docs/superpowers (incremental
epsilon closure and self-only closure sentinel plans/specs); they were working
artifacts and don't belong in the repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The branch version merged two leading-* wildcards (*z, *y), where * is the
first byte so the start state itself is a spinner. That baked a 'start is never
merged as a spinner' assumption into the test. Switch to the wildcard a*z (start
matches 'a', spinner interior) merged with the string az: the value az matches
both (* matching empty), routing through the merged splice and requiring a
multi-member epsilon closure, so clearing closures drops it 2->0.

Could not restore main's exact a*z/abc form: on this branch a cleared (nil)
closure has len 0 and reads as the self-only sentinel (process self), so main's
'nil closure => no match' premise no longer holds and that pair stays matched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sayrer

sayrer commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I'll go through this a little later, but a quick note on first glance: If you go through quamina.AddPattern() you can't have a spinner on the start state, because regexps can only apply to strings, so the first step in the regexp is always a literal " or 0x22. In testing, it's easy to build a wildcard or regexp that starts with * or .*, but not in actual use.

Arguably Quamina always retaining the enclosing quotations adds 2 extra steps to FA traversal, but it simplifies a bunch of other things.

I wrote a buggy test that starts with *. will fix.

sayrer and others added 2 commits June 8, 2026 09:33
Add mergeStartStates, an faState-typed merge entry point, and use it for the
two value_matcher start merges so they stop reaching into .table and re-wrapping
the result — removing the faState->smallTable->faState round-trip. mergeFAs
(smallTable) stays for callers composing raw tables (regexp_nfa, makeStringFA,
tests). Addresses review feedback asking to pass states rather than tables.

Verified: full suite passes (incl. TestToxicStack/TestRegexpValidity); aggregate
FA structure and build time are unchanged. This is an API cleanup, not a
structural change — the prior blocker (TestEpsilonClosureRequired asserting on
leading-* root spinners) was first rewritten to an interior-spinner case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same fix as traverseNFA: the old comment justified the start state's trivial
closure by claiming all Quamina automata begin by matching the opening quote,
which is false (true/false/null and numbers are unquoted, and the path can be
fed an unquoted Q-number). The real reason holds: epsilons are only introduced
in states reached after the first input byte, so the entry state carries only
byte transitions. Restate accordingly and note the first byte is usually, but
not always, the quote.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timbray

timbray commented Jun 9, 2026

Copy link
Copy Markdown
Owner

The more I think about it, the less comfortable I get with sync.Pool. Apparently it randomly ejects things from cache in GC operations. But our pattern is super-deterministic. For each AddPattern, we

  1. Clear the maps in the cache
  2. add lots of stuff and consult it frequently while we build automata and epsilon closures
  3. Are done with it (could clear it here)

I'm not 100% sure what the effect is of syncPool() deciding to expire stuff while we're at stage 2 in the list above, but it's not going to be helpful.

Could we take out sync.Pool() and just have a simple use/clear rhythm?

…ed per AddPattern

The epsilon-closure scratch was drawn from a package-global sync.Pool. But the
build's use of it is deterministic and single-threaded per matcher: addPattern
holds coreMatcher.lock for the whole build, and within one AddPattern it makes
many epsilonClosure calls against the same buffers. sync.Pool can evict an idle
buffer during a GC that happens between those calls, forcing the maps to be
reallocated mid-build, and the pool's concurrency-safety was only needed because
it was a shared global.

Replace it with a single closureBuffers owned by each coreMatcher, created in
newCoreMatcher and threaded to epsilonClosureInto alongside the printer
(addPatternWithPrinter -> fieldMatcher.addTransition -> valueMatcher.addTransition).
Independent matchers each own theirs, so there is no shared state to race on
(addPattern serializes on lock anyway; verified with -race).

Clear the maps once per AddPattern (closureBuffers.reset). This is the key to
the win: with the incremental closure walk, each build only touches its new
states, so a cleared map stays small, whereas a never-cleared buffer grows to
O(all matcher states) and map lookups slow as the matcher gets large. Measured
on the 10k-shellstyle research harness (clean, repeated runs):

  sync.Pool                         ~599 patterns/sec
  matcher-owned, never cleared      ~555 patterns/sec  (-7%, unbounded maps)
  matcher-owned, cleared per add    ~732 patterns/sec  (+22% vs pool)

The generation counters stay and are still essential: a build makes many
closureForState/dedup passes that share the maps without clearing between them
(an O(1) gen bump replaces an O(states) clear per pass). reset() bounds the maps
across builds; the gen counters avoid clearing within a build.

Match path is untouched (the new param is build-only); match benchmarks unchanged.
epsilonClosure(start) is kept as a one-shot wrapper for tests and standalone use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sayrer

sayrer commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Could we take out sync.Pool() and just have a simple use/clear rhythm?

  - Full suite green; -race clean on the Copy()/concurrent-AddPattern paths 
    (the key risk of dropping the concurrency-safe pool).

  - Build +22% vs the pool (732 vs 599 patterns/sec, clean repeated harness runs) — and
    notably the naïve never-clear version was −7%, so the per-AddPattern clear() is what
    makes it a win (keeps the scratch maps holding one build's working set instead of
    growing to O(all states).

  - Match path untouched; match benchmarks unchanged.
  
  - Generation counters retained (they avoid clearing within a build;
    reset() bounds size across builds).

@timbray

timbray commented Jun 10, 2026

Copy link
Copy Markdown
Owner

This is about the fifth time I've observed sync.Pool being more expensive than seems reasonable.

Heh, Claude needs to really internalize the Quamina concurrency model. "-race clean" is not a surprise.

Anyhow, this looks good, I'll do one more sweep but I think we can merge.

One question: GitHub is whining about an unsigned commit. Is this an admin error you remember? if so, no problem. Otherwise, maybe an attacker?

@timbray timbray left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become da big one, happy to land it.

At some future point it may be worth trying to simplify epsi_closure.go which is now maybe the most complicated piece of logic in the whole project. But it'll do for now.

Comment thread value_matcher.go
// value to be wrapped in an faState; makeRegexpNFA and a few NFA builders
// return *faState directly. After this switch, newFAState is the start
// faState for the new automaton.
var newFAState *faState

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some future point I'm going to rename this back to newFA but first I'm going to merge this.

@sayrer

sayrer commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

One question: GitHub is whining about an unsigned commit. Is this an admin error you remember? if so, no problem. Otherwise, maybe an attacker?

It's a new laptop (no Migration Assistant) and I forgot to do the ssh key. It's fixed now, but once one commit slips in, it won't verify the following ones in a PR.

@timbray

timbray commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Got it, thanks. Merging…

@timbray timbray merged commit db17e11 into timbray:main Jun 10, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants