FuzzyVN V3.0.0#3
Conversation
… thread-safe pooling
📝 WalkthroughWalkthroughRefactors the search implementation to use byte-normalized indexes, adds a UnigramFilter for candidate reduction, implements Jaro–Winkler similarity, introduces in-memory frecency tracking (FileMemory) with decay and persistence, and provides parallel top‑K fuzzy matching with min‑heap merging. Search API now accepts SearchOptions for context boosts. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Searcher
participant UnigramFilter
participant Worker as FuzzyWorker
participant FileMemory
Client->>Searcher: Search(query, opts)
Searcher->>Searcher: Normalize query (bytes)
Searcher->>FileMemory: GetBoostScores(query)
FileMemory-->>Searcher: boost map
Searcher->>UnigramFilter: Filter(query bytes)
alt candidates returned
UnigramFilter-->>Searcher: indices
Searcher->>FuzzyWorker: FuzzyFindFiltered(query, candidates)
else no candidates
Searcher->>FuzzyWorker: FuzzyFindParallel(query, all items)
end
FuzzyWorker->>FuzzyWorker: parallel scoring, local heaps
FuzzyWorker-->>Searcher: merged top‑K matches
Searcher->>Searcher: apply memory/context boosts, sort, trim
Searcher-->>Client: []string results
Client->>FileMemory: RecordSelection(query, selected_file)
FileMemory->>FileMemory: update counts, timestamps, evict if needed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new memory-based caching system (Frecency) to replace the previous query cache, adds a bitset-based filter for faster search performance, and includes comprehensive benchmark results. I have provided feedback on a critical bug in the string normalization logic, a performance improvement for typo-handling, and minor code optimizations for readability and idiomatic Go usage.
| if isASCII { | ||
| buf := make([]byte, len(s)) | ||
| for i, char := range []byte(s) { | ||
| if char >= 'A' && char <= 'Z' { | ||
| buf[i] = char + 32 | ||
| } else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' { | ||
| buf[i] = char | ||
| } | ||
| } | ||
| return string(buf) | ||
| } |
There was a problem hiding this comment.
Trong fast path xử lý chuỗi ASCII của hàm Normalize, bạn đang khởi tạo buf với độ dài bằng len(s) và sau đó gán giá trị tại index i.
buf := make([]byte, len(s))
for i, char := range []byte(s) {
// ... gán giá trị cho buf[i]
}Vấn đề xảy ra khi một ký tự không nằm trong danh sách cho phép (ví dụ: #, !, ?). Trong trường hợp này, buf[i] sẽ không được gán giá trị mới và giữ nguyên giá trị mặc định là \x00 (null byte).
Ví dụ, Normalize("a#b") sẽ trả về "a\x00b".
Sự xuất hiện của null byte trong chuỗi đã chuẩn hóa có thể gây ra các lỗi không mong muốn trong các bước so sánh và tìm kiếm sau này.
Để khắc phục, bạn nên khởi tạo buffer với len=0 và cap=len(s), sau đó sử dụng append để chỉ thêm vào các ký tự hợp lệ. Cách này sẽ loại bỏ hoàn toàn các ký tự không mong muốn thay vì thay thế chúng bằng null byte.
| if isASCII { | |
| buf := make([]byte, len(s)) | |
| for i, char := range []byte(s) { | |
| if char >= 'A' && char <= 'Z' { | |
| buf[i] = char + 32 | |
| } else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' { | |
| buf[i] = char | |
| } | |
| } | |
| return string(buf) | |
| } | |
| if isASCII { | |
| buf := make([]byte, 0, len(s)) | |
| for _, char := range []byte(s) { | |
| if char >= 'A' && char <= 'Z' { | |
| buf = append(buf, char+32) | |
| } else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\' || char == '_' || char == '-' || char == ' ' { | |
| buf = append(buf, char) | |
| } | |
| } | |
| return string(buf) | |
| } |
| */ | ||
| func (s *Searcher) RecordSelection(query, filePath string) { | ||
| if s.Cache != nil { | ||
| s.Cache.RecordSelection(query, filePath) | ||
| func (s *Searcher) findButTypo(query string) []core.FuzzyMatch { | ||
| var matches []core.FuzzyMatch | ||
| // chỉ so sánh với file name để đạt độ chính xác cao nhất cho typo | ||
| for i, filename := range s.FilenamesOnly { | ||
| dist := core.LevenshteinRatio(query, filename) | ||
| // cho phép sai 1 ký tự trên 4 ký tự gõ vào | ||
| threshold := len(query) / 4 | ||
| if threshold < 1 { | ||
| threshold = 1 | ||
| } | ||
| if dist <= threshold { | ||
| matches = append(matches, core.FuzzyMatch{ | ||
| Index: i, | ||
| Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy | ||
| }) | ||
| } | ||
| } | ||
| return matches |
There was a problem hiding this comment.
Hàm findButTypo thực hiện quét tuần tự qua toàn bộ s.FilenamesOnly để tính khoảng cách Levenshtein. Với số lượng file lớn (ví dụ > 100k), thao tác này có thể trở thành một điểm nghẽn về hiệu năng, đặc biệt khi so sánh với các hàm tìm kiếm fuzzy đã được song song hóa.
Để cải thiện hiệu suất, bạn nên xem xét việc song song hóa vòng lặp này, tương tự như cách đã làm với FuzzyFindParallel. Bạn có thể chia s.FilenamesOnly thành các chunk và xử lý chúng trên nhiều goroutine.
| // Reset vùng nhớ cần dùng | ||
| for i := 0; i < l1; i++ { buf[0][i] = false } | ||
| for i := 0; i < l2; i++ { buf[1][i] = false } |
There was a problem hiding this comment.
Để reset vùng nhớ của buffer, bạn đang dùng vòng lặp for. Cách này đúng nhưng có thể thay thế bằng hàm clear() được giới thiệu từ Go 1.21 để code trông gọn và idiomatic hơn.
Nếu dự án của bạn đang dùng Go 1.21+, bạn có thể cân nhắc thay đổi này.
| // Reset vùng nhớ cần dùng | |
| for i := 0; i < l1; i++ { buf[0][i] = false } | |
| for i := 0; i < l2; i++ { buf[1][i] = false } | |
| // Reset vùng nhớ cần dùng | |
| clear(buf[0][:l1]) | |
| clear(buf[1][:l2]) |
|
|
||
| // Tier 3: Có ít nhất 1 match trong filename | ||
| if firstMatchIdx < baseStart { | ||
| totalScore += (totalScore * 200) / 100 |
There was a problem hiding this comment.
Phép tính totalScore += (totalScore * 200) / 100 có thể được đơn giản hóa để dễ đọc và hiệu quả hơn.
Nó tương đương với totalScore = totalScore + totalScore * 2, hay totalScore *= 3.
Việc sử dụng phép nhân trực tiếp sẽ rõ ràng hơn và tránh được một phép chia không cần thiết.
| totalScore += (totalScore * 200) / 100 | |
| totalScore *= 3 |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
fuzzyvn_test.go (1)
677-679:⚠️ Potential issue | 🟠 MajorSame
b.Loop()compatibility issue as inbench_linux_test.go.Multiple benchmarks in this file use
b.Loop()(lines 677, 685, 693, 702, 712, 721, 730, 742, 749, 764, 779, 795, 807, 821). Replace all occurrences withfor i := 0; i < b.N; i++for Go 1.21+ compatibility.🔧 Example fix pattern
- for b.Loop() { + for i := 0; i < b.N; i++ { searcher.Search("config") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fuzzyvn_test.go` around lines 677 - 679, Replace the deprecated b.Loop() usage with the standard Go benchmark loop for i := 0; i < b.N; i++ in this test file: locate each benchmark where b.Loop() is used (e.g., the block calling searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so the benchmark runs correctly on Go 1.21+; apply the same replacement for all other occurrences listed in the comment (lines with b.Loop() around searcher.Search and similar benchmark bodies).
🧹 Nitpick comments (3)
core/utils.go (1)
281-295: RenameLevenshteinRatioto match its return value.This API returns raw edit distance, not a ratio or percentage.
LevenshteinDistancewould be much less error-prone for callers, withLevenshteinRatiokept as a deprecated alias only if compatibility matters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/utils.go` around lines 281 - 295, The function LevenshteinRatio currently returns a raw edit distance; rename the exported function to LevenshteinDistance (update its doc comment accordingly) and leave LevenshteinRatio as a thin deprecated wrapper that calls LevenshteinDistance to preserve backward compatibility; update all internal references/usages to call LevenshteinDistance and add a deprecation comment on LevenshteinRatio so callers can migrate smoothly.fuzzyvn_test.go (1)
263-266: Consider reducing sleep duration or using alternative synchronization.The
time.Sleep(1100ms)calls make this test take ~3.3 seconds. While this works, it slows down the test suite. Consider using shorter intervals if the underlying implementation supports millisecond-level precision, or document why second-level precision is required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fuzzyvn_test.go` around lines 263 - 266, The current test uses long time.Sleep(1100 * time.Millisecond) to force distinct Unix-second timestamps; replace these sleeps with much shorter sleeps (e.g., 10-50 * time.Millisecond) if the code under test (mem.RecordSelection) supports millisecond precision, or better yet change the test to avoid real sleeping by using a controllable clock or by adding an overload to RecordSelection that accepts an explicit timestamp (e.g., RecordSelectionWithTime or passing time.Now() from test) so you can synthesize distinct timestamps deterministically; update calls to mem.RecordSelection("q2", "/b.go") and mem.RecordSelection("q3", "/c.go") accordingly and remove the long 1100ms sleeps.fuzzyvn.go (1)
234-236:ClearCachemay leave stale references if Memory was shared.When
Memorywas passed viaNewSearcherWithMemory, callingClearCachecreates a newFileMemoryinstance but doesn't clear the original shared memory. Other searchers sharing the same memory instance will retain the old data.This may be intentional (each searcher gets independent memory after clear), but it's worth documenting the behavior.
📝 Consider adding documentation
/* -ClearCache: Xóa sạch bộ nhớ lịch sử +ClearCache: Xóa sạch bộ nhớ lịch sử của Searcher này. +Lưu ý: Nếu Memory được chia sẻ qua NewSearcherWithMemory, các Searcher khác +vẫn giữ nguyên dữ liệu cũ. */ func (s *Searcher) ClearCache() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fuzzyvn.go` around lines 234 - 236, The ClearCache method currently replaces s.Memory with a new core.NewFileMemory(nil) but does not mutate or clear the original Memory object passed via NewSearcherWithMemory, leaving other searchers that hold that shared instance unchanged; update the comment/docstring above ClearCache to explicitly state that ClearCache creates a fresh FileMemory for this Searcher and does not clear or modify any previously shared Memory instances, and if the intended behavior is to clear shared memory instead, implement and call a clear-style method on the Memory interface (e.g., Memory.Clear()) or detect and clear the existing instance rather than replacing it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/release.yml:
- Around line 5-9: The release action is running on PRs and non-tag pushes; add
a tag-only guard so it only runs for tag pushes (refs/tags/v*). Update the
release job or the specific release step (e.g., the job named "release" or the
step labeled "Create Release"/release publishing) to include an if condition
like startsWith(github.ref, 'refs/tags/v') so the job/step executes only when a
tag matching v* is pushed.
In `@bench_linux_test.go`:
- Around line 84-87: The benchmark uses b.Loop(), which is incompatible with Go
1.21+; update the loop around NewSearcher(files) to use the standard b.N pattern
(e.g., for i := 0; i < b.N; i++ { NewSearcher(files) }) while keeping the
existing b.ResetTimer() call so the benchmark runs correctly in newer Go
versions and still measures NewSearcher(files) per iteration.
In `@core/filter.go`:
- Around line 117-124: UnigramFilter.DeleteFile currently fails for negative
docID and does a non-atomic read/modify/write on uf.Bin, so add a guard that
returns if docID < 0 || docID >= uf.NumTargets, compute blockIdx and bitPos as
before, and perform a thread-safe update: either use a sync.Mutex (e.g., a mutex
on UnigramFilter or per-block locks) around the uf.Bin[blockIdx] |= bitPos
mutation or use atomic operations (a CAS loop with
atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically
OR the bit; ensure the same synchronization strategy is used wherever the bitmap
is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.
In `@core/jaro.go`:
- Around line 33-37: The JaroWinkler implementation currently hard-rejects when
the first byte differs (the a[0] != b[0] return 0.0 check); remove that
early-return so the full JaroWinkler similarity is computed for all inputs
(i.e., delete the first-byte check in the JaroWinkler function in core/jaro.go)
and ensure the function proceeds with the regular matching/weighting logic
instead of returning 0.0 for differing first characters.
In `@core/memory.go`:
- Around line 97-111: The current update logic for record.Queries leaves
queryNorm in its old slot when found, which prevents it from being treated as
most-recent; instead, when queryNorm exists you should remove it from
record.Queries and re-append it to the end so it becomes the newest entry, and
still enforce the max length (3) by trimming the oldest element if needed;
update the loop that checks for equality to record.Queries to capture the found
index, splice out that index when found, then append queryNorm and if
len(record.Queries) > 3 drop the first element so the ring buffer always
contains the three most recent queries.
In `@core/score.go`:
- Around line 85-96: The bucket used for Tier 2 frequency counts, charBucket
declared in the score calculation (charBucket [256]int8) can overflow for long
filenames; change its type to a wider integer (e.g., int16 or int) in the same
scope where baseStart, lenP, target, pattern and filenameHits are used so
repeated bytes are counted correctly and the Tier 2 bonus logic still triggers
when appropriate.
In `@core/utils.go`:
- Around line 43-52: The current ASCII-normalization loop preallocates buf :=
make([]byte, len(s)) and only writes some indices, leaving NULs for dropped
chars; change it to build a compact buffer by either using var buf []byte and
append(buf, char) for allowed characters or keep buf := make([]byte, len(s)) but
maintain a write index j and assign buf[j] = char then return string(buf[:j]);
update the isASCII branch (variables: buf, s, char) accordingly so unsupported
ASCII characters are removed instead of becoming NUL bytes.
In `@core/worker.go`:
- Around line 118-178: FuzzyFindParallel currently scans the full items slice
and can return entries for files marked deleted in the UnigramFilter/ Bin
bitmap; fix by adding a deleted bitmap parameter (e.g., deletedBin) to
FuzzyFindParallel and use it to skip deleted indexes both inside the per-worker
loop (before calling fuzzyScoreGreedy for index j) and when merging results from
resultChan into finalHeap (skip any FuzzyMatch whose Index is marked deleted);
update callers accordingly so deleted files are never scored or returned
(references: function FuzzyFindParallel, fuzzyScoreGreedy, resultChan,
finalHeap, heapToSorted).
In `@demo/main.go`:
- Around line 103-105: The code reads globalMemory (recentFiles :=
globalMemory.GetRecentFiles(5); boostedFiles :=
globalMemory.GetBoostScores(query)) without nil-checking or synchronization,
risking a nil panic and a data race because globalMemory is set in indexFiles()
concurrently; fix by ensuring globalMemory is safely published before use
(either initialize it synchronously before the server accepts requests or
protect reads/writes with the same mutex used for searcher, e.g., acquire
searcherLock (or a new memoryLock) around writes in indexFiles and around reads
here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet
ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.
In `@fuzzyvn.go`:
- Around line 177-185: The current conditional sort only runs when memoryBoosts
or opts[0].ContextBoosts exist, causing non-deterministic ordering when no
boosts are present; always sort rankedResults after they are produced to ensure
deterministic ordering (tie-break by rankedResults[i].Str), i.e. remove the
conditional and invoke sort.Slice unconditionally after the
FuzzyFindFiltered/FuzzyFindParallel results are assembled, keeping the existing
comparator that prefers higher Score and falls back to alphabetical Str.
- Around line 140-152: The code allocates and fills scoreBuf from
s.scorePool.Get() but never reads it afterwards (scoreBuf, s.scorePool.Get(),
matches), causing wasted work; either remove the scoreBuf allocation/population
block entirely or actually use scoreBuf in the subsequent ranking loop (replace
direct reads of m.Score from matches with lookups into scoreBuf by m.Index) and
keep the buffer reset/Put logic if you retain the pool; if scoreBuf was intended
for deduplication/additional lookup, implement that logic or add a clarifying
comment next to scoreBuf to avoid unnecessary allocation.
In `@Makefile`:
- Around line 7-8: The Makefile defines targets named cli and gen that are not
listed in .PHONY, so an existing file/dir named cli or gen will prevent those
targets from running; add cli and gen to the .PHONY declaration (alongside
existing phony targets) to ensure they're always executed, updating the .PHONY
line to include "cli" and "gen" so targets cli and gen (and any others already
present) are treated as phony.
---
Outside diff comments:
In `@fuzzyvn_test.go`:
- Around line 677-679: Replace the deprecated b.Loop() usage with the standard
Go benchmark loop for i := 0; i < b.N; i++ in this test file: locate each
benchmark where b.Loop() is used (e.g., the block calling
searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so
the benchmark runs correctly on Go 1.21+; apply the same replacement for all
other occurrences listed in the comment (lines with b.Loop() around
searcher.Search and similar benchmark bodies).
---
Nitpick comments:
In `@core/utils.go`:
- Around line 281-295: The function LevenshteinRatio currently returns a raw
edit distance; rename the exported function to LevenshteinDistance (update its
doc comment accordingly) and leave LevenshteinRatio as a thin deprecated wrapper
that calls LevenshteinDistance to preserve backward compatibility; update all
internal references/usages to call LevenshteinDistance and add a deprecation
comment on LevenshteinRatio so callers can migrate smoothly.
In `@fuzzyvn_test.go`:
- Around line 263-266: The current test uses long time.Sleep(1100 *
time.Millisecond) to force distinct Unix-second timestamps; replace these sleeps
with much shorter sleeps (e.g., 10-50 * time.Millisecond) if the code under test
(mem.RecordSelection) supports millisecond precision, or better yet change the
test to avoid real sleeping by using a controllable clock or by adding an
overload to RecordSelection that accepts an explicit timestamp (e.g.,
RecordSelectionWithTime or passing time.Now() from test) so you can synthesize
distinct timestamps deterministically; update calls to mem.RecordSelection("q2",
"/b.go") and mem.RecordSelection("q3", "/c.go") accordingly and remove the long
1100ms sleeps.
In `@fuzzyvn.go`:
- Around line 234-236: The ClearCache method currently replaces s.Memory with a
new core.NewFileMemory(nil) but does not mutate or clear the original Memory
object passed via NewSearcherWithMemory, leaving other searchers that hold that
shared instance unchanged; update the comment/docstring above ClearCache to
explicitly state that ClearCache creates a fresh FileMemory for this Searcher
and does not clear or modify any previously shared Memory instances, and if the
intended behavior is to clear shared memory instead, implement and call a
clear-style method on the Memory interface (e.g., Memory.Clear()) or detect and
clear the existing instance rather than replacing it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 18c81e78-3d35-4968-a7ac-4ef34866aa47
📒 Files selected for processing (17)
.github/workflows/release.ymlMakefileREADME.mdbench_linux_test.gocore/filter.gocore/jaro.gocore/memory.gocore/score.gocore/utils.gocore/worker.godemo/cli_search.godemo/main.godocs/bench.mddocs/bench_result_amd.txtdocs/bench_result_n2.txtfuzzyvn.gofuzzyvn_test.go
| func (uf *UnigramFilter) DeleteFile(docID int) { | ||
| if docID >= uf.NumTargets { | ||
| return | ||
| } | ||
| blockIdx := docID / 64 | ||
| bitPos := uint64(1) << (docID % 64) | ||
| // đánh dấu bỏ vào bin | ||
| uf.Bin[blockIdx] |= bitPos |
There was a problem hiding this comment.
Guard DeleteFile against invalid IDs and concurrent bitmap updates.
docID < 0 will panic here, and uf.Bin[blockIdx] |= bitPos is a non-atomic read/modify/write. Two deletes in the same 64-file block can lose one tombstone, and Filter also reads the same bitmap on Line 195 with no synchronization.
Also applies to: 185-195
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/filter.go` around lines 117 - 124, UnigramFilter.DeleteFile currently
fails for negative docID and does a non-atomic read/modify/write on uf.Bin, so
add a guard that returns if docID < 0 || docID >= uf.NumTargets, compute
blockIdx and bitPos as before, and perform a thread-safe update: either use a
sync.Mutex (e.g., a mutex on UnigramFilter or per-block locks) around the
uf.Bin[blockIdx] |= bitPos mutation or use atomic operations (a CAS loop with
atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically
OR the bit; ensure the same synchronization strategy is used wherever the bitmap
is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.
| // Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp | ||
| // Giúp bỏ qua nhanh các file không liên quan trong lịch sử | ||
| if a[0] != b[0] { | ||
| return 0.0 | ||
| } |
There was a problem hiding this comment.
Don't hard-reject JaroWinkler when the first byte differs.
This turns a similarity algorithm into a heuristic. Strings like abc/xbc or main/pain can still clear the 0.7 threshold, so GetBoostScores will miss relevant history whenever the typo lands on the first character.
Minimal fix
- // Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp
- // Giúp bỏ qua nhanh các file không liên quan trong lịch sử
- if a[0] != b[0] {
- return 0.0
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp | |
| // Giúp bỏ qua nhanh các file không liên quan trong lịch sử | |
| if a[0] != b[0] { | |
| return 0.0 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/jaro.go` around lines 33 - 37, The JaroWinkler implementation currently
hard-rejects when the first byte differs (the a[0] != b[0] return 0.0 check);
remove that early-return so the full JaroWinkler similarity is computed for all
inputs (i.e., delete the first-byte check in the JaroWinkler function in
core/jaro.go) and ensure the function proceeds with the regular
matching/weighting logic instead of returning 0.0 for differing first
characters.
| // Cập nhật query list (ring buffer) | ||
| foundQuery := false | ||
| for _, q := range record.Queries { | ||
| if q == queryNorm { | ||
| foundQuery = true | ||
| break | ||
| } | ||
| } | ||
| if !foundQuery { | ||
| if len(record.Queries) >= 3 { | ||
| // Xóa cái cũ nhất | ||
| record.Queries = record.Queries[1:] | ||
| } | ||
| record.Queries = append(record.Queries, queryNorm) | ||
| } |
There was a problem hiding this comment.
Refresh an existing query instead of leaving it in place.
When queryNorm is already present, this path keeps it in its old slot. One later insert can evict a query that was just reused, so the "3 recent queries" list stops being recent.
Possible fix
- foundQuery := false
- for _, q := range record.Queries {
- if q == queryNorm {
- foundQuery = true
- break
- }
- }
- if !foundQuery {
- if len(record.Queries) >= 3 {
- // Xóa cái cũ nhất
- record.Queries = record.Queries[1:]
- }
- record.Queries = append(record.Queries, queryNorm)
- }
+ pos := -1
+ for i, q := range record.Queries {
+ if q == queryNorm {
+ pos = i
+ break
+ }
+ }
+ if pos >= 0 {
+ copy(record.Queries[pos:], record.Queries[pos+1:])
+ record.Queries = record.Queries[:len(record.Queries)-1]
+ } else if len(record.Queries) >= 3 {
+ // Xóa cái cũ nhất
+ record.Queries = record.Queries[1:]
+ }
+ record.Queries = append(record.Queries, queryNorm)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Cập nhật query list (ring buffer) | |
| foundQuery := false | |
| for _, q := range record.Queries { | |
| if q == queryNorm { | |
| foundQuery = true | |
| break | |
| } | |
| } | |
| if !foundQuery { | |
| if len(record.Queries) >= 3 { | |
| // Xóa cái cũ nhất | |
| record.Queries = record.Queries[1:] | |
| } | |
| record.Queries = append(record.Queries, queryNorm) | |
| } | |
| // Cập nhật query list (ring buffer) | |
| pos := -1 | |
| for i, q := range record.Queries { | |
| if q == queryNorm { | |
| pos = i | |
| break | |
| } | |
| } | |
| if pos >= 0 { | |
| copy(record.Queries[pos:], record.Queries[pos+1:]) | |
| record.Queries = record.Queries[:len(record.Queries)-1] | |
| } else if len(record.Queries) >= 3 { | |
| // Xóa cái cũ nhất | |
| record.Queries = record.Queries[1:] | |
| } | |
| record.Queries = append(record.Queries, queryNorm) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/memory.go` around lines 97 - 111, The current update logic for
record.Queries leaves queryNorm in its old slot when found, which prevents it
from being treated as most-recent; instead, when queryNorm exists you should
remove it from record.Queries and re-append it to the end so it becomes the
newest entry, and still enforce the max length (3) by trimming the oldest
element if needed; update the loop that checks for equality to record.Queries to
capture the found index, splice out that index when found, then append queryNorm
and if len(record.Queries) > 3 drop the first element so the ring buffer always
contains the three most recent queries.
| func FuzzyFindParallel(query []byte, items [][]byte, baseStarts []int) []FuzzyMatch { | ||
| numItems := len(items) | ||
| if numItems == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| numCPUs := runtime.GOMAXPROCS(0) | ||
| chunkSize := (numItems + numCPUs - 1) / numCPUs | ||
|
|
||
| var wg sync.WaitGroup | ||
| resultChan := make(chan []FuzzyMatch, numCPUs) | ||
|
|
||
| for i := range numCPUs { | ||
| start := i * chunkSize | ||
| if start >= numItems { | ||
| break | ||
| } | ||
| end := start + chunkSize | ||
| if end > numItems { | ||
| end = numItems | ||
| } | ||
|
|
||
| wg.Add(1) | ||
| go func(s, e int) { | ||
| defer wg.Done() | ||
| h := &minHeap{} | ||
| heap.Init(h) | ||
| for j := s; j < e; j++ { | ||
| if score, matched := fuzzyScoreGreedy(query, items[j], baseStarts[j]); matched { | ||
| if h.Len() < topK { | ||
| heap.Push(h, FuzzyMatch{Index: j, Score: score}) | ||
| } else if score > (*h)[0].Score { | ||
| (*h)[0] = FuzzyMatch{Index: j, Score: score} | ||
| heap.Fix(h, 0) | ||
| } | ||
| } | ||
| } | ||
| resultChan <- heapToSorted(h) | ||
| }(start, end) | ||
| } | ||
|
|
||
| go func() { | ||
| wg.Wait() | ||
| close(resultChan) | ||
| }() | ||
|
|
||
| finalHeap := &minHeap{} | ||
| heap.Init(finalHeap) | ||
| for matches := range resultChan { | ||
| for _, m := range matches { | ||
| if finalHeap.Len() < topK { | ||
| heap.Push(finalHeap, m) | ||
| } else if m.Score > (*finalHeap)[0].Score { | ||
| (*finalHeap)[0] = m | ||
| heap.Fix(finalHeap, 0) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return heapToSorted(finalHeap) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Search fallback call sites ==\n'
rg -n -C4 'FuzzyFindFiltered|FuzzyFindParallel' --type go
printf '\n== DeleteFile implementations and callers ==\n'
rg -n -C4 'DeleteFile\(' --type go
printf '\n== Tombstone bitmap usage ==\n'
rg -n -C2 '\bBin\b' --type goRepository: versenilvis/fuzzyvn
Length of output: 3098
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Filter method implementation ==\n'
rg -n -A30 'func \(uf \*UnigramFilter\) Filter\(' --type go
printf '\n== Search method around FuzzyFindParallel call ==\n'
rg -n -B10 -A10 'FuzzyFindParallel' fuzzyvn.go --type go
printf '\n== Check if items are modified during deletion ==\n'
rg -n -C3 'DeleteFile|delete\|remove.*Normalized' --type go | head -40Repository: versenilvis/fuzzyvn
Length of output: 2598
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Full Filter method ==\n'
rg -n -A80 'func \(uf \*UnigramFilter\) Filter\(' core/filter.go --type go
printf '\n== Check if Bin is checked in Filter logic ==\n'
rg -n -B5 -A5 'Bin\[blockIdx\]' core/filter.go --type goRepository: versenilvis/fuzzyvn
Length of output: 3259
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== Check post-processing after FuzzyFindParallel ==\n'
rg -n -A15 'FuzzyFindParallel' fuzzyvn.go --type go
printf '\n== Search method full implementation ==\n'
rg -n -B5 -A30 'func \(s \*Searcher\) Search\(' --type go | head -80Repository: versenilvis/fuzzyvn
Length of output: 2191
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '== FuzzyMatch struct and result handling ==\n'
rg -n -B2 -A10 'type FuzzyMatch struct' --type go
printf '\n== How matches are converted to final results ==\n'
rg -n -A20 'if len\(matches\) == 0' fuzzyvn.go --type go | head -50Repository: versenilvis/fuzzyvn
Length of output: 1384
Apply deletion filter to FuzzyFindParallel fallback results.
When Filter() returns nil (short query), FuzzyFindParallel scans the full items slice with no knowledge of the Bin deletion bitmap. Deleted files marked via UnigramFilter.DeleteFile() are excluded only in the filtered path (line 195), leaving deleted items exposed in the fallback path.
Either pass Bin to FuzzyFindParallel and filter deleted files during scoring, or post-filter results against the bitmap before returning them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/worker.go` around lines 118 - 178, FuzzyFindParallel currently scans the
full items slice and can return entries for files marked deleted in the
UnigramFilter/ Bin bitmap; fix by adding a deleted bitmap parameter (e.g.,
deletedBin) to FuzzyFindParallel and use it to skip deleted indexes both inside
the per-worker loop (before calling fuzzyScoreGreedy for index j) and when
merging results from resultChan into finalHeap (skip any FuzzyMatch whose Index
is marked deleted); update callers accordingly so deleted files are never scored
or returned (references: function FuzzyFindParallel, fuzzyScoreGreedy,
resultChan, finalHeap, heapToSorted).
| recentFiles := globalMemory.GetRecentFiles(5) | ||
| boostedFiles := globalMemory.GetBoostScores(query) | ||
| matchedStrings := searcher.Search(query) |
There was a problem hiding this comment.
Potential nil pointer dereference on globalMemory.
globalMemory is accessed at lines 103-104 without nil checking, but it's only initialized inside indexFiles() (line 72) which runs asynchronously in a goroutine. If a /search request arrives before indexing completes, globalMemory will be nil causing a panic.
Additionally, there's a potential data race: globalMemory is written in indexFiles (line 72) and read here without synchronization (the searcherLock only protects searcher, not globalMemory).
🐛 Proposed fix
searcherLock.RLock()
defer searcherLock.RUnlock()
if searcher == nil {
json.NewEncoder(w).Encode(SearchResponse{
RecentFiles: []string{},
Results: []SearchResult{},
})
return
}
+ if globalMemory == nil {
+ json.NewEncoder(w).Encode(SearchResponse{
+ RecentFiles: []string{},
+ Results: []SearchResult{},
+ })
+ return
+ }
+
recentFiles := globalMemory.GetRecentFiles(5)
boostedFiles := globalMemory.GetBoostScores(query)For a more robust solution, consider protecting globalMemory with the same lock or ensuring it's always initialized before the server starts accepting requests.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| recentFiles := globalMemory.GetRecentFiles(5) | |
| boostedFiles := globalMemory.GetBoostScores(query) | |
| matchedStrings := searcher.Search(query) | |
| searcherLock.RLock() | |
| defer searcherLock.RUnlock() | |
| if searcher == nil { | |
| json.NewEncoder(w).Encode(SearchResponse{ | |
| RecentFiles: []string{}, | |
| Results: []SearchResult{}, | |
| }) | |
| return | |
| } | |
| if globalMemory == nil { | |
| json.NewEncoder(w).Encode(SearchResponse{ | |
| RecentFiles: []string{}, | |
| Results: []SearchResult{}, | |
| }) | |
| return | |
| } | |
| recentFiles := globalMemory.GetRecentFiles(5) | |
| boostedFiles := globalMemory.GetBoostScores(query) | |
| matchedStrings := searcher.Search(query) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@demo/main.go` around lines 103 - 105, The code reads globalMemory
(recentFiles := globalMemory.GetRecentFiles(5); boostedFiles :=
globalMemory.GetBoostScores(query)) without nil-checking or synchronization,
risking a nil panic and a data race because globalMemory is set in indexFiles()
concurrently; fix by ensuring globalMemory is safely published before use
(either initialize it synchronously before the server accepts requests or
protect reads/writes with the same mutex used for searcher, e.g., acquire
searcherLock (or a new memoryLock) around writes in indexFiles and around reads
here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet
ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.
| // xếp hạng và áp dụng boosts | ||
| scoreBuf := s.scorePool.Get().([]int) | ||
| defer func() { | ||
| // reset buffer trước khi trả lại pool | ||
| for i := range scoreBuf { | ||
| scoreBuf[i] = math.MinInt | ||
| } | ||
| s.scorePool.Put(scoreBuf) | ||
| }() | ||
|
|
||
| for i, nameNorm := range s.FilenamesOnly { | ||
| // Thay vì: runesName := []rune(nameNorm) | ||
| // Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại) | ||
| if len(nameNorm) < queryLen { | ||
| continue | ||
| } | ||
|
|
||
| // So sánh với phần đầu của filename | ||
| targetStr1 := fastSubstring(nameNorm, queryLen) | ||
| // Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ | ||
| if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized | ||
| continue | ||
| } | ||
|
|
||
| dist := LevenshteinRatio(queryNorm, targetStr1) | ||
|
|
||
| // So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự) | ||
| if len(nameNorm) > len(targetStr1) { | ||
| // Lấy prefix dài hơn 1 rune | ||
| targetStr2 := fastSubstring(nameNorm, queryLen+1) | ||
|
|
||
| d2 := LevenshteinRatio(queryNorm, targetStr2) | ||
| if d2 < dist { | ||
| dist = d2 | ||
| } | ||
| } | ||
| /* | ||
| Ở phần trên ví dụ như "mian", target 1 là "main" target 2 là "maina" | ||
| Ta tính điểm ở target 1, dist = d1 = 2, nhưng ở target 2, dist = d2 = 3 | ||
| if d2 < dist { | ||
| dist = d2 | ||
| } | ||
| Tức là nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên | ||
| Kiểu như min(d1, d2) | ||
| */ | ||
|
|
||
| // Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm | ||
| // Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình) | ||
| if dist <= baseThreshold { | ||
| // Base score 3000 | ||
| score := 3000 - (dist * 400) | ||
| runeCountName := 0 | ||
| for range nameNorm { | ||
| runeCountName++ | ||
| } | ||
| lenDiff := runeCountName - queryLen | ||
| if lenDiff > 0 { | ||
| score -= (lenDiff * 15) // Phạt độ dài tên | ||
| } | ||
|
|
||
| // Thưởng exact | ||
| if lenDiff == 0 && dist == 0 { | ||
| score += 1000 | ||
| } | ||
|
|
||
| // Phạt độ dài đường dẫn | ||
| score -= len(s.Originals[i]) / 5 | ||
| for _, m := range matches { | ||
| scoreBuf[m.Index] = m.Score | ||
| } |
There was a problem hiding this comment.
scoreBuf is populated but never used.
Lines 150-152 populate scoreBuf with match scores, but the subsequent ranking loop (lines 154-175) recalculates scores from matches directly and never reads from scoreBuf. This allocation and population is wasted work.
🔧 Proposed fix - remove unused scoreBuf logic
- // xếp hạng và áp dụng boosts
- scoreBuf := s.scorePool.Get().([]int)
- defer func() {
- // reset buffer trước khi trả lại pool
- for i := range scoreBuf {
- scoreBuf[i] = math.MinInt
- }
- s.scorePool.Put(scoreBuf)
- }()
-
- for _, m := range matches {
- scoreBuf[m.Index] = m.Score
- }
-
rankedResults := make([]MatchResult, 0, len(matches))
for _, m := range matches {If scoreBuf was intended for deduplication or lookup purposes, consider adding a comment explaining its purpose or implementing the intended logic.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // xếp hạng và áp dụng boosts | |
| scoreBuf := s.scorePool.Get().([]int) | |
| defer func() { | |
| // reset buffer trước khi trả lại pool | |
| for i := range scoreBuf { | |
| scoreBuf[i] = math.MinInt | |
| } | |
| s.scorePool.Put(scoreBuf) | |
| }() | |
| for i, nameNorm := range s.FilenamesOnly { | |
| // Thay vì: runesName := []rune(nameNorm) | |
| // Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại) | |
| if len(nameNorm) < queryLen { | |
| continue | |
| } | |
| // So sánh với phần đầu của filename | |
| targetStr1 := fastSubstring(nameNorm, queryLen) | |
| // Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ | |
| if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized | |
| continue | |
| } | |
| dist := LevenshteinRatio(queryNorm, targetStr1) | |
| // So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự) | |
| if len(nameNorm) > len(targetStr1) { | |
| // Lấy prefix dài hơn 1 rune | |
| targetStr2 := fastSubstring(nameNorm, queryLen+1) | |
| d2 := LevenshteinRatio(queryNorm, targetStr2) | |
| if d2 < dist { | |
| dist = d2 | |
| } | |
| } | |
| /* | |
| Ở phần trên ví dụ như "mian", target 1 là "main" target 2 là "maina" | |
| Ta tính điểm ở target 1, dist = d1 = 2, nhưng ở target 2, dist = d2 = 3 | |
| if d2 < dist { | |
| dist = d2 | |
| } | |
| Tức là nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên | |
| Kiểu như min(d1, d2) | |
| */ | |
| // Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm | |
| // Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình) | |
| if dist <= baseThreshold { | |
| // Base score 3000 | |
| score := 3000 - (dist * 400) | |
| runeCountName := 0 | |
| for range nameNorm { | |
| runeCountName++ | |
| } | |
| lenDiff := runeCountName - queryLen | |
| if lenDiff > 0 { | |
| score -= (lenDiff * 15) // Phạt độ dài tên | |
| } | |
| // Thưởng exact | |
| if lenDiff == 0 && dist == 0 { | |
| score += 1000 | |
| } | |
| // Phạt độ dài đường dẫn | |
| score -= len(s.Originals[i]) / 5 | |
| for _, m := range matches { | |
| scoreBuf[m.Index] = m.Score | |
| } | |
| rankedResults := make([]MatchResult, 0, len(matches)) | |
| for _, m := range matches { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@fuzzyvn.go` around lines 140 - 152, The code allocates and fills scoreBuf
from s.scorePool.Get() but never reads it afterwards (scoreBuf,
s.scorePool.Get(), matches), causing wasted work; either remove the scoreBuf
allocation/population block entirely or actually use scoreBuf in the subsequent
ranking loop (replace direct reads of m.Score from matches with lookups into
scoreBuf by m.Index) and keep the buffer reset/Put logic if you retain the pool;
if scoreBuf was intended for deduplication/additional lookup, implement that
logic or add a clarifying comment next to scoreBuf to avoid unnecessary
allocation.
| cli: | ||
| @cd demo && go run cli_search.go |
There was a problem hiding this comment.
Add cli to .PHONY.
Right now a file or directory named cli will make this target a no-op. gen has the same problem while you're touching the target list.
Possible fix
-.PHONY: demo test bench
+.PHONY: demo cli test bench gen🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 7 - 8, The Makefile defines targets named cli and gen
that are not listed in .PHONY, so an existing file/dir named cli or gen will
prevent those targets from running; add cli and gen to the .PHONY declaration
(alongside existing phony targets) to ensure they're always executed, updating
the .PHONY line to include "cli" and "gen" so targets cli and gen (and any
others already present) are treated as phony.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
fuzzyvn.go (1)
205-223: Consider capping typo scores to avoid zero or low scores for valid matches.For long queries (e.g., 40 chars), threshold becomes 10, and if
dist == threshold, the score becomes100 - 10*10 = 0. While technically correct, a zero-scored typo match may be unexpectedly filtered or ranked poorly.♻️ Proposed fix - ensure minimum positive score
if dist <= threshold { + score := 100 - dist*10 + if score < 10 { + score = 10 // ensure typo matches always have positive relevance + } matches = append(matches, core.FuzzyMatch{ Index: i, - Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy + Score: score, }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fuzzyvn.go` around lines 205 - 223, The findButTypo function can produce zero or very low Score values for long queries because Score is computed as 100 - dist*10; change the scoring so typo matches are floored to a sensible minimum (e.g., at least 1 or 5/10) instead of allowing 0 or negative. Locate findButTypo, and after computing dist and threshold, compute the rawScore (100 - dist*10) then clamp it with a minScore constant (e.g., minScore := 10) before assigning to core.FuzzyMatch.Score so every valid typo match has a positive, non-zero score.bench_linux_test.go (1)
43-53: Avoidfmt.Printfin benchmarks; useb.Logfinstead.Using
fmt.Printfunconditionally prints output on every benchmark run, cluttering results.b.Logfonly prints when-vis passed or on failure, which is the idiomatic approach.♻️ Proposed fix
- fmt.Printf("\n--- Linux Benchmark Check ---\n") - fmt.Printf("Total files scanned: %d\n", len(files)) + b.Logf("\n--- Linux Benchmark Check ---") + b.Logf("Total files scanned: %d", len(files)) if len(files) < limit { b.Fatalf("ERROR: Hệ thống chỉ có %d files, không đủ %d để benchmark", len(files), limit) } - fmt.Println("Sample (first 10 files):") + b.Log("Sample (first 10 files):") for i := 0; i < 10; i++ { - fmt.Printf(" [%d] %s\n", i+1, files[i]) + b.Logf(" [%d] %s", i+1, files[i]) } - fmt.Printf("-----------------------------\n") + b.Log("-----------------------------")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench_linux_test.go` around lines 43 - 53, Replace the direct fmt.* prints in the Linux benchmark block with the testing.B logger so output only appears on -v or failure: change fmt.Printf("\n--- Linux Benchmark Check ---\n"), fmt.Printf("Total files scanned: %d\n", ...), fmt.Println("Sample (first 10 files):"), and the loop's fmt.Printf lines to b.Logf(...) (keep the same messages/formatting), and replace the final fmt.Printf("-----------------------------\n") with b.Logf as well; locate these calls in the benchmark function in bench_linux_test.go and use b.Logf to emit the same text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bench_linux_test.go`:
- Around line 43-53: Replace the direct fmt.* prints in the Linux benchmark
block with the testing.B logger so output only appears on -v or failure: change
fmt.Printf("\n--- Linux Benchmark Check ---\n"), fmt.Printf("Total files
scanned: %d\n", ...), fmt.Println("Sample (first 10 files):"), and the loop's
fmt.Printf lines to b.Logf(...) (keep the same messages/formatting), and replace
the final fmt.Printf("-----------------------------\n") with b.Logf as well;
locate these calls in the benchmark function in bench_linux_test.go and use
b.Logf to emit the same text.
In `@fuzzyvn.go`:
- Around line 205-223: The findButTypo function can produce zero or very low
Score values for long queries because Score is computed as 100 - dist*10; change
the scoring so typo matches are floored to a sensible minimum (e.g., at least 1
or 5/10) instead of allowing 0 or negative. Locate findButTypo, and after
computing dist and threshold, compute the rawScore (100 - dist*10) then clamp it
with a minScore constant (e.g., minScore := 10) before assigning to
core.FuzzyMatch.Score so every valid typo match has a positive, non-zero score.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ccad7bce-e3d8-403a-a027-495799168793
📒 Files selected for processing (6)
.github/workflows/release.ymlbench_linux_test.godemo/cli_search.godemo/main.gofuzzyvn.gogo.mod
✅ Files skipped from review due to trivial changes (2)
- go.mod
- demo/cli_search.go
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/release.yml
- demo/main.go
Note
Thay đổi
Về hiệu năng
sort.SliceO(NlogN) bằng giải thuật sắp xếp một phần dùng Min-heap O(Nlog 20), giúp giảm chi phí sắp xếp xuống ~10 lần trên các tập kết quả lớnFuzzyFindParallelLoại bỏ các lệnh gọi
unicode.IsLower/IsUpper, thay thế bằngswitchtrên byte để nhận diện word boundary[]runesang[]byte, giảm chu kỳ xử lý của CPU và áp lực lên bộ nhớSử dụng thuật toán Jaro-Winkler hiệu suất cao sử dụng
sync.Poolcho các chuỗi có độ dài lên đến 128 bytesSử dụng
sync.Poolcho các buffer điểm số để loại bỏ tình trạng tranh chấp dữ liệu trong khi tìm kiếm đồng thờiHệ thống tính điểm mới
main->mian).mian.gokhi tìm kiếmmain.govà ngược lạiFileMemoryvới cơ chế giảm điểm theo thời gian để ưu tiên các file thường xuyên được chọnFiltering
DeleteFileđánh dấu các file đã xoá vào Bin thay vì build lại