-
Notifications
You must be signed in to change notification settings - Fork 27
kaizen: Embed smalltable #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
ee7c58f
dedup_key: add tableShareKey for post-embed smallTable identity
sayrer c6d4ef1
epsi_closure: dedup via tableShareKey instead of *smallTable
sayrer 2559315
nfa: embed smallTable into faState by value
sayrer b8d2149
tests: recalibrate size assertions after embedding smallTable
sayrer 69415ce
epsi_closure: pool buffers, restore two-counter dedup
sayrer 3242841
state_lists: dedup intern() via sort+compact, drop the seen map
sayrer 885141d
Merge remote-tracking branch 'origin/main' into embed-smalltable
sayrer edc66b7
kaizen: add research mainline which generates CSV data
timbray 5fe237b
docs: spec for incremental epsilon closure via walk pruning
sayrer 534327a
docs: implementation plan for incremental epsilon closure
sayrer 18547f6
docs: record epsilon/step immutability verification for closure prune
sayrer a6c44a9
test: order-independence guard for incremental epsilon closure
sayrer 20c980d
research: add -cpuprofile flag for profiling the harness
sayrer 5976a4f
epsi_closure: prune closure walk at already-closed states
sayrer f96e39a
epsi_closure: address review — reset walk counter, clarify comments
sayrer 00251fc
docs: spec for self-only epsilon-closure sentinel
sayrer 48844f5
docs: implementation plan for self-only closure sentinel
sayrer e5b1370
nfa: add len==0 self-only discriminator to closure consumers
sayrer 033aacb
epsi_closure: store {self} closures as a zero-alloc sentinel
sayrer ebd46bd
epsi_closure: document closure sentinel encoding; fix stale test guards
sayrer 8c8446a
docs: record self-only sentinel benchmark results
sayrer d855451
Merge branch 'main' into embed-smalltable
sayrer c2b6bbf
Resolve CPU profile merge conflict.
sayrer 90062c4
gofmt: fix struct field alignment in faState
sayrer 0049695
epsi_closure: document closureGen/closureRep tableMark fields
sayrer 28f1ff7
dedup_key: drop redundant stepsLen from tableShareKey
sayrer 03c455e
memory_cost: drop impossible nil guard on epsilons walk
sayrer 83492f8
epsi_closure: clarify why the buffer pool uses sync.Pool
sayrer a63957e
value_matcher: rename vmFields.startState to start
sayrer 2d472e8
epsi_closure: add high-level overview of closure construction
sayrer f088b2b
nfa: fix traverseNFA start-closure comment rationale
sayrer a43a651
epsi_closure: note the post-dedup self-only branch is an uncovered op…
sayrer fcce28d
docs: remove superpowers plan/spec design docs
sayrer 3b6e169
test: rewrite TestEpsilonClosureRequired with interior-spinner a*z
sayrer f234c4a
nfa: pass faStates to the value-matcher start merges
sayrer 76b2051
nfa: fix stale nfa2Dfa start-closure comment
sayrer 36486fd
epsi_closure: drop sync.Pool for a matcher-owned closureBuffers clear…
sayrer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| package quamina | ||
|
|
||
| import "unsafe" | ||
|
|
||
| // tableShareKey returns a stable identifier for a smallTable's "share group". | ||
| // Two states whose smallTables hold slice-headers pointing at the same `steps` | ||
| // backing array (which is what happens when one smallTable struct value is | ||
| // copied into multiple faStates during construction) will produce equal | ||
| // keys. This replaces *smallTable-pointer identity as the dedup key in | ||
| // epsilon-closure computation after smallTable is embedded into faState | ||
| // by value. | ||
| // | ||
| // The key is just the steps backing-array pointer: share groups are only ever | ||
| // born by copying a whole steps slice-header (see the spinner merges in | ||
| // nfa.go), so two tables that share the data pointer always share the length | ||
| // too — nothing in the package reslices steps. Pointer identity is therefore | ||
| // sufficient to identify a share group; carrying the length as well would | ||
| // never break a tie the pointer didn't already break. | ||
| // | ||
| // A zero key (nil pointer) means "no share group" — used for tables with no | ||
| // byte transitions. Callers that want to dedup such tables should skip the | ||
| // zero key. | ||
| type tableShareKey struct { | ||
| stepsData unsafe.Pointer | ||
| } | ||
|
|
||
| func newTableShareKey(t *smallTable) tableShareKey { | ||
| return tableShareKey{ | ||
| stepsData: unsafe.Pointer(unsafe.SliceData(t.steps)), | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| package quamina | ||
|
|
||
| import ( | ||
| "testing" | ||
| ) | ||
|
|
||
| func TestTableShareKey_SharedBackings(t *testing.T) { | ||
| // Construct one smallTable, value-copy it (simulating post-embed share). | ||
| src := smallTable{ | ||
| ceilings: []byte{'a', 'b', byte(byteCeiling)}, | ||
| steps: []*faState{nil, nil, nil}, | ||
| } | ||
| copy1 := src | ||
| copy2 := src | ||
| if newTableShareKey(©1) != newTableShareKey(©2) { | ||
|
sayrer marked this conversation as resolved.
|
||
| t.Errorf("value-copied tables should share key; got %v vs %v", | ||
| newTableShareKey(©1), newTableShareKey(©2)) | ||
| } | ||
| } | ||
|
|
||
| func TestTableShareKey_DistinctBackings(t *testing.T) { | ||
| t1 := smallTable{ | ||
| ceilings: []byte{'a', byte(byteCeiling)}, | ||
| steps: []*faState{nil, nil}, | ||
| } | ||
| t2 := smallTable{ | ||
| ceilings: []byte{'a', byte(byteCeiling)}, | ||
| steps: []*faState{nil, nil}, | ||
| } | ||
| if newTableShareKey(&t1) == newTableShareKey(&t2) { | ||
| t.Errorf("independently-built tables should not share key") | ||
| } | ||
| } | ||
|
|
||
| // TestTableShareKey_AppendBreaksShare verifies that when a value-copy | ||
| // is mutated via append in a way that reallocates the backing array, | ||
| // the keys diverge. We force reallocation by starting at cap=1 and | ||
| // appending many entries. | ||
| func TestTableShareKey_AppendBreaksShare(t *testing.T) { | ||
| src := smallTable{ | ||
| ceilings: make([]byte, 0, 1), | ||
| steps: make([]*faState, 0, 1), | ||
| } | ||
| src.ceilings = append(src.ceilings, byte(byteCeiling)) | ||
| src.steps = append(src.steps, nil) | ||
| copy1 := src | ||
| // Appending 8 entries to a slice with cap=1 guarantees at least one | ||
| // realloc of the steps backing. | ||
| for i := 0; i < 8; i++ { | ||
| copy1.steps = append(copy1.steps, nil) | ||
| copy1.ceilings = append(copy1.ceilings, byte(i)) | ||
| } | ||
| if newTableShareKey(&src) == newTableShareKey(©1) { | ||
| t.Errorf("expected keys to diverge after append-with-realloc; got equal: %v", | ||
| newTableShareKey(&src)) | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.