FuzzyVN V3.0.0 by versenilvis · Pull Request #3 · versenilvis/fuzzyvn

versenilvis · 2026-04-05T19:59:32Z

Note

Thay đổi

Về hiệu năng

Thay thế sort.Slice O(NlogN) bằng giải thuật sắp xếp một phần dùng Min-heap O(Nlog 20), giúp giảm chi phí sắp xếp xuống ~10 lần trên các tập kết quả lớn
Nâng cấp tìm kiếm có bộ lọc từ đơn luồng lên đa nhân (parallel), đạt hiệu suất tương đương với FuzzyFindParallel
Bỏ qua hoàn toàn bước sắp xếp lại khi không có các chỉ số tăng điểm về Memory/Context nào đang hoạt động
Loại bỏ các lệnh gọi unicode.IsLower/IsUpper, thay thế bằng switch trên byte để nhận diện word boundary
Chuyển đổi các thao tác tìm kiếm cốt lõi từ []rune sang []byte, giảm chu kỳ xử lý của CPU và áp lực lên bộ nhớ
Sử dụng thuật toán Jaro-Winkler hiệu suất cao sử dụng sync.Pool cho các chuỗi có độ dài lên đến 128 bytes
Sử dụng sync.Pool cho các buffer điểm số để loại bỏ tình trạng tranh chấp dữ liệu trong khi tìm kiếm đồng thời

Hệ thống tính điểm mới

Hệ thống tính điểm 4 tầng:
- Tầng 1 (+1.000.000): Khớp tiền tố hoàn hảo (truy vấn nằm ngay đầu tên file).
- Tầng 2 (+500.000): Khớp chứa ký tự (tất cả ký tự trong truy vấn đều xuất hiện trong tên file ngắn, giúp bắt được lỗi đảo ký tự như main -> mian).
- Tầng 3 (+200%): Khớp một phần tên file (khớp ít nhất 1 ký tự trong phần tên file).
- Tầng 4 (phạt điểm): Chỉ khớp trong đường dẫn (bị trừ điểm dựa trên độ dài đường dẫn).
Xử lý đảo ký tự: Đếm tần suất ký tự không theo thứ tự trong tên file giúp tìm thấy mian.go khi tìm kiếm main.go và ngược lại
Tích hợp FileMemory với cơ chế giảm điểm theo thời gian để ưu tiên các file thường xuyên được chọn

Filtering

Bitset index có khả năng chịu lỗi typo (cho phép thiếu 1-2 ký tự đối với các truy vấn dài)
Các ký tự xuất hiện trong 85& số file sẽ bị loại khỏi chỉ mục để giảm nhiễu
DeleteFile đánh dấu các file đã xoá vào Bin thay vì build lại

…n, loop cleanup

… thread-safe pooling

coderabbitai · 2026-04-05T19:59:47Z

📝 Walkthrough

Walkthrough

Refactors the search implementation to use byte-normalized indexes, adds a UnigramFilter for candidate reduction, implements Jaro–Winkler similarity, introduces in-memory frecency tracking (FileMemory) with decay and persistence, and provides parallel top‑K fuzzy matching with min‑heap merging. Search API now accepts SearchOptions for context boosts.

Changes

Cohort / File(s)	Summary
CI & Build `\.github/workflows/release.yml`, `Makefile`, `go.mod`	Workflow triggers expanded to push→main and pull_request→main; actions/setup-go toolchain updated to 1.24; release step gated to v* tags. Added `cli` Make target. go.mod toolchain set to `go 1.24`.
Core: filtering, scoring, utils, worker `core/filter.go`, `core/score.go`, `core/worker.go`, `core/jaro.go`, `core/utils.go`	New UnigramFilter (ASCII bitset index + deletions), greedy fuzzy scorer, parallel/top‑K fuzzy search using per‑goroutine min‑heaps, Jaro‑Winkler similarity, normalization (including Vietnamese mappings), Levenshtein ratio, and assorted text utilities.
Memory & frecency `core/memory.go`	New FileMemory/FileRecord types with RecordSelection, time‑decay frecency scoring (GetBoostScores), max‑entries eviction, Export/Import, and GetRecentFiles. Thread‑safe via RWMutex.
Public API & searcher refactor `fuzzyvn.go`	Replaced QueryCache with core.FileMemory and UnigramFilter; Searcher uses byte-normalized index and baseStarts; Search signature changed to `Search(query string, opts ...*SearchOptions)`; added SearchOptions, NewSearcherWithMemory, Normalize/LevenshteinRatio aliases, RecordSelection/ ClearCache updated to use FileMemory; candidate reduction + parallel scoring + typo fallback integrated.
Demo / server / CLI `demo/main.go`, `demo/cli_search.go`	Server switched from cache to memory, response renamed `RecentFiles`, removed `/cache-info`, updated error handling. Added demo CLI (`demo/cli_search.go`) and Makefile `cli` target.
Benchmarks & tests `bench_linux_test.go`, `fuzzyvn_test.go`	Added Linux 100k-file benchmarks and NewSearcher construction benchmark. Tests updated to exercise FileMemory (RecordSelection, GetBoostScores, Export/Import, concurrency, edge cases); removed QueryCache tests; benchmark adjustments to use core.FileMemory.
Documentation & bench results `README.md`, `docs/bench.md`, `docs/bench_result_amd.txt`, `docs/bench_result_n2.txt`	README rewritten to replace cache demo with memory/frecency demo, updated API docs for Search signature and ClearCache, removed QueryCache docs. Added detailed benchmark docs and results files.
Misc `fuzzyvn.go` (large refactor)	Major removal of old in-file fuzzy/cache code and replacement with core-based implementations; many lines reworked across package to adopt new architecture.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Searcher
    participant UnigramFilter
    participant Worker as FuzzyWorker
    participant FileMemory

    Client->>Searcher: Search(query, opts)
    Searcher->>Searcher: Normalize query (bytes)
    Searcher->>FileMemory: GetBoostScores(query)
    FileMemory-->>Searcher: boost map
    Searcher->>UnigramFilter: Filter(query bytes)
    alt candidates returned
        UnigramFilter-->>Searcher: indices
        Searcher->>FuzzyWorker: FuzzyFindFiltered(query, candidates)
    else no candidates
        Searcher->>FuzzyWorker: FuzzyFindParallel(query, all items)
    end
    FuzzyWorker->>FuzzyWorker: parallel scoring, local heaps
    FuzzyWorker-->>Searcher: merged top‑K matches
    Searcher->>Searcher: apply memory/context boosts, sort, trim
    Searcher-->>Client: []string results
    Client->>FileMemory: RecordSelection(query, selected_file)
    FileMemory->>FileMemory: update counts, timestamps, evict if needed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 From cache to memory, bytes hop and play,
Filters hum softly and Jaro leads the way,
Heaps gather winners while frecency grows,
A rabbit cheers quietly as performance shows. 🥕✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title 'FuzzyVN V3.0.0' is vague and uses a generic version number that does not convey what the changeset actually accomplishes or what the primary changes are.	Provide a more descriptive title that highlights the main technical improvement or feature, such as 'Refactor search engine to use byte-based scoring with min-heap optimization' or 'Replace query cache with FileMemory system for frecency-based ranking'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new memory-based caching system (Frecency) to replace the previous query cache, adds a bitset-based filter for faster search performance, and includes comprehensive benchmark results. I have provided feedback on a critical bug in the string normalization logic, a performance improvement for typo-handling, and minor code optimizations for readability and idiomatic Go usage.

gemini-code-assist · 2026-04-05T20:04:36Z

+	if isASCII {
+		buf := make([]byte, len(s))
+		for i, char := range []byte(s) {
+			if char >= 'A' && char <= 'Z' {
+				buf[i] = char + 32
+			} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' {
+				buf[i] = char
+			}
+		}
+		return string(buf)
+	}


Trong fast path xử lý chuỗi ASCII của hàm Normalize, bạn đang khởi tạo buf với độ dài bằng len(s) và sau đó gán giá trị tại index i.

buf := make([]byte, len(s)) for i, char := range []byte(s) { // ... gán giá trị cho buf[i] }

Vấn đề xảy ra khi một ký tự không nằm trong danh sách cho phép (ví dụ: #, !, ?). Trong trường hợp này, buf[i] sẽ không được gán giá trị mới và giữ nguyên giá trị mặc định là \x00 (null byte).
Ví dụ, Normalize("a#b") sẽ trả về "a\x00b".

Sự xuất hiện của null byte trong chuỗi đã chuẩn hóa có thể gây ra các lỗi không mong muốn trong các bước so sánh và tìm kiếm sau này.

Để khắc phục, bạn nên khởi tạo buffer với len=0 và cap=len(s), sau đó sử dụng append để chỉ thêm vào các ký tự hợp lệ. Cách này sẽ loại bỏ hoàn toàn các ký tự không mong muốn thay vì thay thế chúng bằng null byte.

Suggested change

if isASCII {

buf := make([]byte, len(s))

for i, char := range []byte(s) {

if char >= 'A' && char <= 'Z' {

buf[i] = char + 32

} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' {

buf[i] = char

}

}

return string(buf)

}

if isASCII {

buf := make([]byte, 0, len(s))

for _, char := range []byte(s) {

if char >= 'A' && char <= 'Z' {

buf = append(buf, char+32)

} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\' || char == '_' || char == '-' || char == ' ' {

buf = append(buf, char)

}

}

return string(buf)

}

gemini-code-assist · 2026-04-05T20:04:36Z

 */
-func (s *Searcher) RecordSelection(query, filePath string) {
-	if s.Cache != nil {
-		s.Cache.RecordSelection(query, filePath)
+func (s *Searcher) findButTypo(query string) []core.FuzzyMatch {
+	var matches []core.FuzzyMatch
+	// chỉ so sánh với file name để đạt độ chính xác cao nhất cho typo
+	for i, filename := range s.FilenamesOnly {
+		dist := core.LevenshteinRatio(query, filename)
+		// cho phép sai 1 ký tự trên 4 ký tự gõ vào
+		threshold := len(query) / 4
+		if threshold < 1 {
+			threshold = 1
+		}
+		if dist <= threshold {
+			matches = append(matches, core.FuzzyMatch{
+				Index: i,
+				Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy
+			})
+		}
 	}
+	return matches


Hàm findButTypo thực hiện quét tuần tự qua toàn bộ s.FilenamesOnly để tính khoảng cách Levenshtein. Với số lượng file lớn (ví dụ > 100k), thao tác này có thể trở thành một điểm nghẽn về hiệu năng, đặc biệt khi so sánh với các hàm tìm kiếm fuzzy đã được song song hóa.

Để cải thiện hiệu suất, bạn nên xem xét việc song song hóa vòng lặp này, tương tự như cách đã làm với FuzzyFindParallel. Bạn có thể chia s.FilenamesOnly thành các chunk và xử lý chúng trên nhiều goroutine.

gemini-code-assist · 2026-04-05T20:04:36Z

+		// Reset vùng nhớ cần dùng
+		for i := 0; i < l1; i++ { buf[0][i] = false }
+		for i := 0; i < l2; i++ { buf[1][i] = false }


Để reset vùng nhớ của buffer, bạn đang dùng vòng lặp for. Cách này đúng nhưng có thể thay thế bằng hàm clear() được giới thiệu từ Go 1.21 để code trông gọn và idiomatic hơn.

Nếu dự án của bạn đang dùng Go 1.21+, bạn có thể cân nhắc thay đổi này.

Suggested change

// Reset vùng nhớ cần dùng

for i := 0; i < l1; i++ { buf[0][i] = false }

for i := 0; i < l2; i++ { buf[1][i] = false }

// Reset vùng nhớ cần dùng

clear(buf[0][:l1])

clear(buf[1][:l2])

gemini-code-assist · 2026-04-05T20:04:36Z

+
+	// Tier 3: Có ít nhất 1 match trong filename
+	if firstMatchIdx < baseStart {
+		totalScore += (totalScore * 200) / 100


Phép tính totalScore += (totalScore * 200) / 100 có thể được đơn giản hóa để dễ đọc và hiệu quả hơn.
Nó tương đương với totalScore = totalScore + totalScore * 2, hay totalScore *= 3.

Việc sử dụng phép nhân trực tiếp sẽ rõ ràng hơn và tránh được một phép chia không cần thiết.

Suggested change

totalScore += (totalScore * 200) / 100

totalScore *= 3

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

fuzzyvn_test.go (1)
677-679: ⚠️ Potential issue | 🟠 Major

Same b.Loop() compatibility issue as in bench_linux_test.go.

Multiple benchmarks in this file use b.Loop() (lines 677, 685, 693, 702, 712, 721, 730, 742, 749, 764, 779, 795, 807, 821). Replace all occurrences with for i := 0; i < b.N; i++ for Go 1.21+ compatibility.
🔧 Example fix pattern
-		for b.Loop() {
+		for i := 0; i < b.N; i++ {
 			searcher.Search("config")
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn_test.go` around lines 677 - 679, Replace the deprecated b.Loop()
usage with the standard Go benchmark loop for i := 0; i < b.N; i++ in this test
file: locate each benchmark where b.Loop() is used (e.g., the block calling
searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so
the benchmark runs correctly on Go 1.21+; apply the same replacement for all
other occurrences listed in the comment (lines with b.Loop() around
searcher.Search and similar benchmark bodies).

🧹 Nitpick comments (3)

core/utils.go (1)
281-295: Rename LevenshteinRatio to match its return value.

This API returns raw edit distance, not a ratio or percentage. LevenshteinDistance would be much less error-prone for callers, with LevenshteinRatio kept as a deprecated alias only if compatibility matters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/utils.go` around lines 281 - 295, The function LevenshteinRatio
currently returns a raw edit distance; rename the exported function to
LevenshteinDistance (update its doc comment accordingly) and leave
LevenshteinRatio as a thin deprecated wrapper that calls LevenshteinDistance to
preserve backward compatibility; update all internal references/usages to call
LevenshteinDistance and add a deprecation comment on LevenshteinRatio so callers
can migrate smoothly.
fuzzyvn_test.go (1)
263-266: Consider reducing sleep duration or using alternative synchronization.

The time.Sleep(1100ms) calls make this test take ~3.3 seconds. While this works, it slows down the test suite. Consider using shorter intervals if the underlying implementation supports millisecond-level precision, or document why second-level precision is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn_test.go` around lines 263 - 266, The current test uses long
time.Sleep(1100 * time.Millisecond) to force distinct Unix-second timestamps;
replace these sleeps with much shorter sleeps (e.g., 10-50 * time.Millisecond)
if the code under test (mem.RecordSelection) supports millisecond precision, or
better yet change the test to avoid real sleeping by using a controllable clock
or by adding an overload to RecordSelection that accepts an explicit timestamp
(e.g., RecordSelectionWithTime or passing time.Now() from test) so you can
synthesize distinct timestamps deterministically; update calls to
mem.RecordSelection("q2", "/b.go") and mem.RecordSelection("q3", "/c.go")
accordingly and remove the long 1100ms sleeps.
fuzzyvn.go (1)
234-236: ClearCache may leave stale references if Memory was shared.

When Memory was passed via NewSearcherWithMemory, calling ClearCache creates a new FileMemory instance but doesn't clear the original shared memory. Other searchers sharing the same memory instance will retain the old data.

This may be intentional (each searcher gets independent memory after clear), but it's worth documenting the behavior.
📝 Consider adding documentation
 /*
-ClearCache: Xóa sạch bộ nhớ lịch sử
+ClearCache: Xóa sạch bộ nhớ lịch sử của Searcher này.
+Lưu ý: Nếu Memory được chia sẻ qua NewSearcherWithMemory, các Searcher khác
+vẫn giữ nguyên dữ liệu cũ.
 */
 func (s *Searcher) ClearCache() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn.go` around lines 234 - 236, The ClearCache method currently replaces
s.Memory with a new core.NewFileMemory(nil) but does not mutate or clear the
original Memory object passed via NewSearcherWithMemory, leaving other searchers
that hold that shared instance unchanged; update the comment/docstring above
ClearCache to explicitly state that ClearCache creates a fresh FileMemory for
this Searcher and does not clear or modify any previously shared Memory
instances, and if the intended behavior is to clear shared memory instead,
implement and call a clear-style method on the Memory interface (e.g.,
Memory.Clear()) or detect and clear the existing instance rather than replacing
it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/release.yml:
- Around line 5-9: The release action is running on PRs and non-tag pushes; add
a tag-only guard so it only runs for tag pushes (refs/tags/v*). Update the
release job or the specific release step (e.g., the job named "release" or the
step labeled "Create Release"/release publishing) to include an if condition
like startsWith(github.ref, 'refs/tags/v') so the job/step executes only when a
tag matching v* is pushed.

In `@bench_linux_test.go`:
- Around line 84-87: The benchmark uses b.Loop(), which is incompatible with Go
1.21+; update the loop around NewSearcher(files) to use the standard b.N pattern
(e.g., for i := 0; i < b.N; i++ { NewSearcher(files) }) while keeping the
existing b.ResetTimer() call so the benchmark runs correctly in newer Go
versions and still measures NewSearcher(files) per iteration.

In `@core/filter.go`:
- Around line 117-124: UnigramFilter.DeleteFile currently fails for negative
docID and does a non-atomic read/modify/write on uf.Bin, so add a guard that
returns if docID < 0 || docID >= uf.NumTargets, compute blockIdx and bitPos as
before, and perform a thread-safe update: either use a sync.Mutex (e.g., a mutex
on UnigramFilter or per-block locks) around the uf.Bin[blockIdx] |= bitPos
mutation or use atomic operations (a CAS loop with
atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically
OR the bit; ensure the same synchronization strategy is used wherever the bitmap
is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.

In `@core/jaro.go`:
- Around line 33-37: The JaroWinkler implementation currently hard-rejects when
the first byte differs (the a[0] != b[0] return 0.0 check); remove that
early-return so the full JaroWinkler similarity is computed for all inputs
(i.e., delete the first-byte check in the JaroWinkler function in core/jaro.go)
and ensure the function proceeds with the regular matching/weighting logic
instead of returning 0.0 for differing first characters.

In `@core/memory.go`:
- Around line 97-111: The current update logic for record.Queries leaves
queryNorm in its old slot when found, which prevents it from being treated as
most-recent; instead, when queryNorm exists you should remove it from
record.Queries and re-append it to the end so it becomes the newest entry, and
still enforce the max length (3) by trimming the oldest element if needed;
update the loop that checks for equality to record.Queries to capture the found
index, splice out that index when found, then append queryNorm and if
len(record.Queries) > 3 drop the first element so the ring buffer always
contains the three most recent queries.

In `@core/score.go`:
- Around line 85-96: The bucket used for Tier 2 frequency counts, charBucket
declared in the score calculation (charBucket [256]int8) can overflow for long
filenames; change its type to a wider integer (e.g., int16 or int) in the same
scope where baseStart, lenP, target, pattern and filenameHits are used so
repeated bytes are counted correctly and the Tier 2 bonus logic still triggers
when appropriate.

In `@core/utils.go`:
- Around line 43-52: The current ASCII-normalization loop preallocates buf :=
make([]byte, len(s)) and only writes some indices, leaving NULs for dropped
chars; change it to build a compact buffer by either using var buf []byte and
append(buf, char) for allowed characters or keep buf := make([]byte, len(s)) but
maintain a write index j and assign buf[j] = char then return string(buf[:j]);
update the isASCII branch (variables: buf, s, char) accordingly so unsupported
ASCII characters are removed instead of becoming NUL bytes.

In `@core/worker.go`:
- Around line 118-178: FuzzyFindParallel currently scans the full items slice
and can return entries for files marked deleted in the UnigramFilter/ Bin
bitmap; fix by adding a deleted bitmap parameter (e.g., deletedBin) to
FuzzyFindParallel and use it to skip deleted indexes both inside the per-worker
loop (before calling fuzzyScoreGreedy for index j) and when merging results from
resultChan into finalHeap (skip any FuzzyMatch whose Index is marked deleted);
update callers accordingly so deleted files are never scored or returned
(references: function FuzzyFindParallel, fuzzyScoreGreedy, resultChan,
finalHeap, heapToSorted).

In `@demo/main.go`:
- Around line 103-105: The code reads globalMemory (recentFiles :=
globalMemory.GetRecentFiles(5); boostedFiles :=
globalMemory.GetBoostScores(query)) without nil-checking or synchronization,
risking a nil panic and a data race because globalMemory is set in indexFiles()
concurrently; fix by ensuring globalMemory is safely published before use
(either initialize it synchronously before the server accepts requests or
protect reads/writes with the same mutex used for searcher, e.g., acquire
searcherLock (or a new memoryLock) around writes in indexFiles and around reads
here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet
ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.

In `@fuzzyvn.go`:
- Around line 177-185: The current conditional sort only runs when memoryBoosts
or opts[0].ContextBoosts exist, causing non-deterministic ordering when no
boosts are present; always sort rankedResults after they are produced to ensure
deterministic ordering (tie-break by rankedResults[i].Str), i.e. remove the
conditional and invoke sort.Slice unconditionally after the
FuzzyFindFiltered/FuzzyFindParallel results are assembled, keeping the existing
comparator that prefers higher Score and falls back to alphabetical Str.
- Around line 140-152: The code allocates and fills scoreBuf from
s.scorePool.Get() but never reads it afterwards (scoreBuf, s.scorePool.Get(),
matches), causing wasted work; either remove the scoreBuf allocation/population
block entirely or actually use scoreBuf in the subsequent ranking loop (replace
direct reads of m.Score from matches with lookups into scoreBuf by m.Index) and
keep the buffer reset/Put logic if you retain the pool; if scoreBuf was intended
for deduplication/additional lookup, implement that logic or add a clarifying
comment next to scoreBuf to avoid unnecessary allocation.

In `@Makefile`:
- Around line 7-8: The Makefile defines targets named cli and gen that are not
listed in .PHONY, so an existing file/dir named cli or gen will prevent those
targets from running; add cli and gen to the .PHONY declaration (alongside
existing phony targets) to ensure they're always executed, updating the .PHONY
line to include "cli" and "gen" so targets cli and gen (and any others already
present) are treated as phony.

---

Outside diff comments:
In `@fuzzyvn_test.go`:
- Around line 677-679: Replace the deprecated b.Loop() usage with the standard
Go benchmark loop for i := 0; i < b.N; i++ in this test file: locate each
benchmark where b.Loop() is used (e.g., the block calling
searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so
the benchmark runs correctly on Go 1.21+; apply the same replacement for all
other occurrences listed in the comment (lines with b.Loop() around
searcher.Search and similar benchmark bodies).

---

Nitpick comments:
In `@core/utils.go`:
- Around line 281-295: The function LevenshteinRatio currently returns a raw
edit distance; rename the exported function to LevenshteinDistance (update its
doc comment accordingly) and leave LevenshteinRatio as a thin deprecated wrapper
that calls LevenshteinDistance to preserve backward compatibility; update all
internal references/usages to call LevenshteinDistance and add a deprecation
comment on LevenshteinRatio so callers can migrate smoothly.

In `@fuzzyvn_test.go`:
- Around line 263-266: The current test uses long time.Sleep(1100 *
time.Millisecond) to force distinct Unix-second timestamps; replace these sleeps
with much shorter sleeps (e.g., 10-50 * time.Millisecond) if the code under test
(mem.RecordSelection) supports millisecond precision, or better yet change the
test to avoid real sleeping by using a controllable clock or by adding an
overload to RecordSelection that accepts an explicit timestamp (e.g.,
RecordSelectionWithTime or passing time.Now() from test) so you can synthesize
distinct timestamps deterministically; update calls to mem.RecordSelection("q2",
"/b.go") and mem.RecordSelection("q3", "/c.go") accordingly and remove the long
1100ms sleeps.

In `@fuzzyvn.go`:
- Around line 234-236: The ClearCache method currently replaces s.Memory with a
new core.NewFileMemory(nil) but does not mutate or clear the original Memory
object passed via NewSearcherWithMemory, leaving other searchers that hold that
shared instance unchanged; update the comment/docstring above ClearCache to
explicitly state that ClearCache creates a fresh FileMemory for this Searcher
and does not clear or modify any previously shared Memory instances, and if the
intended behavior is to clear shared memory instead, implement and call a
clear-style method on the Memory interface (e.g., Memory.Clear()) or detect and
clear the existing instance rather than replacing it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18c81e78-3d35-4968-a7ac-4ef34866aa47

📥 Commits

Reviewing files that changed from the base of the PR and between 2b42a24 and e3dce6d.

📒 Files selected for processing (17)

.github/workflows/release.yml
Makefile
README.md
bench_linux_test.go
core/filter.go
core/jaro.go
core/memory.go
core/score.go
core/utils.go
core/worker.go
demo/cli_search.go
demo/main.go
docs/bench.md
docs/bench_result_amd.txt
docs/bench_result_n2.txt
fuzzyvn.go
fuzzyvn_test.go

coderabbitai · 2026-04-05T20:13:17Z

+func (uf *UnigramFilter) DeleteFile(docID int) {
+	if docID >= uf.NumTargets {
+		return
+	}
+	blockIdx := docID / 64
+	bitPos := uint64(1) << (docID % 64)
+	// đánh dấu bỏ vào bin
+	uf.Bin[blockIdx] |= bitPos


⚠️ Potential issue | 🟠 Major

Guard DeleteFile against invalid IDs and concurrent bitmap updates.

docID < 0 will panic here, and uf.Bin[blockIdx] |= bitPos is a non-atomic read/modify/write. Two deletes in the same 64-file block can lose one tombstone, and Filter also reads the same bitmap on Line 195 with no synchronization.

Also applies to: 185-195

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/filter.go` around lines 117 - 124, UnigramFilter.DeleteFile currently fails for negative docID and does a non-atomic read/modify/write on uf.Bin, so add a guard that returns if docID < 0 || docID >= uf.NumTargets, compute blockIdx and bitPos as before, and perform a thread-safe update: either use a sync.Mutex (e.g., a mutex on UnigramFilter or per-block locks) around the uf.Bin[blockIdx] |= bitPos mutation or use atomic operations (a CAS loop with atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically OR the bit; ensure the same synchronization strategy is used wherever the bitmap is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.

coderabbitai · 2026-04-05T20:13:17Z

+	// Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp
+	// Giúp bỏ qua nhanh các file không liên quan trong lịch sử
+	if a[0] != b[0] {
+		return 0.0
+	}


⚠️ Potential issue | 🟠 Major

Don't hard-reject JaroWinkler when the first byte differs.

This turns a similarity algorithm into a heuristic. Strings like abc/xbc or main/pain can still clear the 0.7 threshold, so GetBoostScores will miss relevant history whenever the typo lands on the first character.

Minimal fix

- // Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp - // Giúp bỏ qua nhanh các file không liên quan trong lịch sử - if a[0] != b[0] { - return 0.0 - }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp

// Giúp bỏ qua nhanh các file không liên quan trong lịch sử

if a[0] != b[0] {

return 0.0

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/jaro.go` around lines 33 - 37, The JaroWinkler implementation currently hard-rejects when the first byte differs (the a[0] != b[0] return 0.0 check); remove that early-return so the full JaroWinkler similarity is computed for all inputs (i.e., delete the first-byte check in the JaroWinkler function in core/jaro.go) and ensure the function proceeds with the regular matching/weighting logic instead of returning 0.0 for differing first characters.

coderabbitai · 2026-04-05T20:13:17Z

+	// Cập nhật query list (ring buffer)
+	foundQuery := false
+	for _, q := range record.Queries {
+		if q == queryNorm {
+			foundQuery = true
+			break
+		}
+	}
+	if !foundQuery {
+		if len(record.Queries) >= 3 {
+			// Xóa cái cũ nhất
+			record.Queries = record.Queries[1:]
+		}
+		record.Queries = append(record.Queries, queryNorm)
+	}


⚠️ Potential issue | 🟡 Minor

Refresh an existing query instead of leaving it in place.

When queryNorm is already present, this path keeps it in its old slot. One later insert can evict a query that was just reused, so the "3 recent queries" list stops being recent.

Possible fix

- foundQuery := false - for _, q := range record.Queries { - if q == queryNorm { - foundQuery = true - break - } - } - if !foundQuery { - if len(record.Queries) >= 3 { - // Xóa cái cũ nhất - record.Queries = record.Queries[1:] - } - record.Queries = append(record.Queries, queryNorm) - } + pos := -1 + for i, q := range record.Queries { + if q == queryNorm { + pos = i + break + } + } + if pos >= 0 { + copy(record.Queries[pos:], record.Queries[pos+1:]) + record.Queries = record.Queries[:len(record.Queries)-1] + } else if len(record.Queries) >= 3 { + // Xóa cái cũ nhất + record.Queries = record.Queries[1:] + } + record.Queries = append(record.Queries, queryNorm)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Cập nhật query list (ring buffer)

foundQuery := false

for _, q := range record.Queries {

if q == queryNorm {

foundQuery = true

break

}

}

if !foundQuery {

if len(record.Queries) >= 3 {

// Xóa cái cũ nhất

record.Queries = record.Queries[1:]

}

record.Queries = append(record.Queries, queryNorm)

}

// Cập nhật query list (ring buffer)

pos := -1

for i, q := range record.Queries {

if q == queryNorm {

pos = i

break

}

}

if pos >= 0 {

copy(record.Queries[pos:], record.Queries[pos+1:])

record.Queries = record.Queries[:len(record.Queries)-1]

} else if len(record.Queries) >= 3 {

// Xóa cái cũ nhất

record.Queries = record.Queries[1:]

}

record.Queries = append(record.Queries, queryNorm)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/memory.go` around lines 97 - 111, The current update logic for record.Queries leaves queryNorm in its old slot when found, which prevents it from being treated as most-recent; instead, when queryNorm exists you should remove it from record.Queries and re-append it to the end so it becomes the newest entry, and still enforce the max length (3) by trimming the oldest element if needed; update the loop that checks for equality to record.Queries to capture the found index, splice out that index when found, then append queryNorm and if len(record.Queries) > 3 drop the first element so the ring buffer always contains the three most recent queries.

coderabbitai · 2026-04-05T20:13:17Z

+func FuzzyFindParallel(query []byte, items [][]byte, baseStarts []int) []FuzzyMatch {
+	numItems := len(items)
+	if numItems == 0 {
+		return nil
+	}
+
+	numCPUs := runtime.GOMAXPROCS(0)
+	chunkSize := (numItems + numCPUs - 1) / numCPUs
+
+	var wg sync.WaitGroup
+	resultChan := make(chan []FuzzyMatch, numCPUs)
+
+	for i := range numCPUs {
+		start := i * chunkSize
+		if start >= numItems {
+			break
+		}
+		end := start + chunkSize
+		if end > numItems {
+			end = numItems
+		}
+
+		wg.Add(1)
+		go func(s, e int) {
+			defer wg.Done()
+			h := &minHeap{}
+			heap.Init(h)
+			for j := s; j < e; j++ {
+				if score, matched := fuzzyScoreGreedy(query, items[j], baseStarts[j]); matched {
+					if h.Len() < topK {
+						heap.Push(h, FuzzyMatch{Index: j, Score: score})
+					} else if score > (*h)[0].Score {
+						(*h)[0] = FuzzyMatch{Index: j, Score: score}
+						heap.Fix(h, 0)
+					}
+				}
+			}
+			resultChan <- heapToSorted(h)
+		}(start, end)
+	}
+
+	go func() {
+		wg.Wait()
+		close(resultChan)
+	}()
+
+	finalHeap := &minHeap{}
+	heap.Init(finalHeap)
+	for matches := range resultChan {
+		for _, m := range matches {
+			if finalHeap.Len() < topK {
+				heap.Push(finalHeap, m)
+			} else if m.Score > (*finalHeap)[0].Score {
+				(*finalHeap)[0] = m
+				heap.Fix(finalHeap, 0)
+			}
+		}
+	}
+
+	return heapToSorted(finalHeap)
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '== Search fallback call sites ==\n' rg -n -C4 'FuzzyFindFiltered|FuzzyFindParallel' --type go printf '\n== DeleteFile implementations and callers ==\n' rg -n -C4 'DeleteFile\(' --type go printf '\n== Tombstone bitmap usage ==\n' rg -n -C2 '\bBin\b' --type go

Repository: versenilvis/fuzzyvn

Length of output: 3098

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '== Filter method implementation ==\n' rg -n -A30 'func \(uf \*UnigramFilter\) Filter\(' --type go printf '\n== Search method around FuzzyFindParallel call ==\n' rg -n -B10 -A10 'FuzzyFindParallel' fuzzyvn.go --type go printf '\n== Check if items are modified during deletion ==\n' rg -n -C3 'DeleteFile|delete\|remove.*Normalized' --type go | head -40

Repository: versenilvis/fuzzyvn

Length of output: 2598

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '== Full Filter method ==\n' rg -n -A80 'func \(uf \*UnigramFilter\) Filter\(' core/filter.go --type go printf '\n== Check if Bin is checked in Filter logic ==\n' rg -n -B5 -A5 'Bin\[blockIdx\]' core/filter.go --type go

Repository: versenilvis/fuzzyvn

Length of output: 3259

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '== Check post-processing after FuzzyFindParallel ==\n' rg -n -A15 'FuzzyFindParallel' fuzzyvn.go --type go printf '\n== Search method full implementation ==\n' rg -n -B5 -A30 'func \(s \*Searcher\) Search\(' --type go | head -80

Repository: versenilvis/fuzzyvn

Length of output: 2191

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '== FuzzyMatch struct and result handling ==\n' rg -n -B2 -A10 'type FuzzyMatch struct' --type go printf '\n== How matches are converted to final results ==\n' rg -n -A20 'if len\(matches\) == 0' fuzzyvn.go --type go | head -50

Repository: versenilvis/fuzzyvn

Length of output: 1384

Apply deletion filter to FuzzyFindParallel fallback results.

When Filter() returns nil (short query), FuzzyFindParallel scans the full items slice with no knowledge of the Bin deletion bitmap. Deleted files marked via UnigramFilter.DeleteFile() are excluded only in the filtered path (line 195), leaving deleted items exposed in the fallback path.

Either pass Bin to FuzzyFindParallel and filter deleted files during scoring, or post-filter results against the bitmap before returning them.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/worker.go` around lines 118 - 178, FuzzyFindParallel currently scans the full items slice and can return entries for files marked deleted in the UnigramFilter/ Bin bitmap; fix by adding a deleted bitmap parameter (e.g., deletedBin) to FuzzyFindParallel and use it to skip deleted indexes both inside the per-worker loop (before calling fuzzyScoreGreedy for index j) and when merging results from resultChan into finalHeap (skip any FuzzyMatch whose Index is marked deleted); update callers accordingly so deleted files are never scored or returned (references: function FuzzyFindParallel, fuzzyScoreGreedy, resultChan, finalHeap, heapToSorted).

coderabbitai · 2026-04-05T20:13:17Z

+	recentFiles := globalMemory.GetRecentFiles(5)
+	boostedFiles := globalMemory.GetBoostScores(query)
 	matchedStrings := searcher.Search(query)


⚠️ Potential issue | 🔴 Critical

Potential nil pointer dereference on globalMemory.

globalMemory is accessed at lines 103-104 without nil checking, but it's only initialized inside indexFiles() (line 72) which runs asynchronously in a goroutine. If a /search request arrives before indexing completes, globalMemory will be nil causing a panic.

Additionally, there's a potential data race: globalMemory is written in indexFiles (line 72) and read here without synchronization (the searcherLock only protects searcher, not globalMemory).

🐛 Proposed fix

searcherLock.RLock() defer searcherLock.RUnlock() if searcher == nil { json.NewEncoder(w).Encode(SearchResponse{ RecentFiles: []string{}, Results: []SearchResult{}, }) return } + if globalMemory == nil { + json.NewEncoder(w).Encode(SearchResponse{ + RecentFiles: []string{}, + Results: []SearchResult{}, + }) + return + } + recentFiles := globalMemory.GetRecentFiles(5) boostedFiles := globalMemory.GetBoostScores(query)

For a more robust solution, consider protecting globalMemory with the same lock or ensuring it's always initialized before the server starts accepting requests.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

recentFiles := globalMemory.GetRecentFiles(5)

boostedFiles := globalMemory.GetBoostScores(query)

matchedStrings := searcher.Search(query)

searcherLock.RLock()

defer searcherLock.RUnlock()

if searcher == nil {

json.NewEncoder(w).Encode(SearchResponse{

RecentFiles: []string{},

Results: []SearchResult{},

})

return

}

if globalMemory == nil {

json.NewEncoder(w).Encode(SearchResponse{

RecentFiles: []string{},

Results: []SearchResult{},

})

return

}

recentFiles := globalMemory.GetRecentFiles(5)

boostedFiles := globalMemory.GetBoostScores(query)

matchedStrings := searcher.Search(query)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@demo/main.go` around lines 103 - 105, The code reads globalMemory (recentFiles := globalMemory.GetRecentFiles(5); boostedFiles := globalMemory.GetBoostScores(query)) without nil-checking or synchronization, risking a nil panic and a data race because globalMemory is set in indexFiles() concurrently; fix by ensuring globalMemory is safely published before use (either initialize it synchronously before the server accepts requests or protect reads/writes with the same mutex used for searcher, e.g., acquire searcherLock (or a new memoryLock) around writes in indexFiles and around reads here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.

coderabbitai · 2026-04-05T20:13:18Z

+	// xếp hạng và áp dụng boosts
+	scoreBuf := s.scorePool.Get().([]int)
+	defer func() {
+		// reset buffer trước khi trả lại pool
+		for i := range scoreBuf {
+			scoreBuf[i] = math.MinInt
 		}
+		s.scorePool.Put(scoreBuf)
+	}()

-		for i, nameNorm := range s.FilenamesOnly {
-			// Thay vì: runesName := []rune(nameNorm)
-			// Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại)
-			if len(nameNorm) < queryLen {
-				continue
-			}
-
-			// So sánh với phần đầu của filename
-			targetStr1 := fastSubstring(nameNorm, queryLen)
-			// Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ
-			if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized
-				continue
-			}
-
-			dist := LevenshteinRatio(queryNorm, targetStr1)
-
-			// So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự)
-			if len(nameNorm) > len(targetStr1) {
-				// Lấy prefix dài hơn 1 rune
-				targetStr2 := fastSubstring(nameNorm, queryLen+1)
-
-				d2 := LevenshteinRatio(queryNorm, targetStr2)
-				if d2 < dist {
-					dist = d2
-				}
-			}
-			/*
-				Ở phần trên ví dụ như "mian", target 1 là "main" target 2 là "maina"
-				Ta tính điểm ở target 1, dist = d1 = 2, nhưng ở target 2, dist = d2 = 3
-				if d2 < dist {
-						dist = d2
-					}
-				Tức là nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên
-				Kiểu như min(d1, d2)
-			*/
-
-			// Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm
-			// Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình)
-			if dist <= baseThreshold {
-				// Base score 3000
-				score := 3000 - (dist * 400)
-				runeCountName := 0
-				for range nameNorm {
-					runeCountName++
-				}
-				lenDiff := runeCountName - queryLen
-				if lenDiff > 0 {
-					score -= (lenDiff * 15) // Phạt độ dài tên
-				}
-
-				// Thưởng exact
-				if lenDiff == 0 && dist == 0 {
-					score += 1000
-				}
-
-				// Phạt độ dài đường dẫn
-				score -= len(s.Originals[i]) / 5
+	for _, m := range matches {
+		scoreBuf[m.Index] = m.Score
+	}


⚠️ Potential issue | 🟡 Minor

scoreBuf is populated but never used.

Lines 150-152 populate scoreBuf with match scores, but the subsequent ranking loop (lines 154-175) recalculates scores from matches directly and never reads from scoreBuf. This allocation and population is wasted work.

🔧 Proposed fix - remove unused scoreBuf logic

- // xếp hạng và áp dụng boosts - scoreBuf := s.scorePool.Get().([]int) - defer func() { - // reset buffer trước khi trả lại pool - for i := range scoreBuf { - scoreBuf[i] = math.MinInt - } - s.scorePool.Put(scoreBuf) - }() - - for _, m := range matches { - scoreBuf[m.Index] = m.Score - } - rankedResults := make([]MatchResult, 0, len(matches)) for _, m := range matches {

If scoreBuf was intended for deduplication or lookup purposes, consider adding a comment explaining its purpose or implementing the intended logic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// xếp hạng và áp dụng boosts

scoreBuf := s.scorePool.Get().([]int)

defer func() {

// reset buffer trước khi trả lại pool

for i := range scoreBuf {

scoreBuf[i] = math.MinInt

}

s.scorePool.Put(scoreBuf)

}()

for i, nameNorm := range s.FilenamesOnly {

// Thay vì: runesName := []rune(nameNorm)

// Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại)

if len(nameNorm) < queryLen {

continue

}

// So sánh với phần đầu của filename

targetStr1 := fastSubstring(nameNorm, queryLen)

// Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ

if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized

continue

}

dist := LevenshteinRatio(queryNorm, targetStr1)

// So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự)

if len(nameNorm) > len(targetStr1) {

// Lấy prefix dài hơn 1 rune

targetStr2 := fastSubstring(nameNorm, queryLen+1)

d2 := LevenshteinRatio(queryNorm, targetStr2)

if d2 < dist {

dist = d2

}

}

/*

Ở phần trên ví dụ như "mian", target 1 là "main" target 2 là "maina"

Ta tính điểm ở target 1, dist = d1 = 2, nhưng ở target 2, dist = d2 = 3

if d2 < dist {

dist = d2

}

Tức là nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên

Kiểu như min(d1, d2)

*/

// Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm

// Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình)

if dist <= baseThreshold {

// Base score 3000

score := 3000 - (dist * 400)

runeCountName := 0

for range nameNorm {

runeCountName++

}

lenDiff := runeCountName - queryLen

if lenDiff > 0 {

score -= (lenDiff * 15) // Phạt độ dài tên

}

// Thưởng exact

if lenDiff == 0 && dist == 0 {

score += 1000

}

// Phạt độ dài đường dẫn

score -= len(s.Originals[i]) / 5

for _, m := range matches {

scoreBuf[m.Index] = m.Score

}

rankedResults := make([]MatchResult, 0, len(matches))

for _, m := range matches {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@fuzzyvn.go` around lines 140 - 152, The code allocates and fills scoreBuf from s.scorePool.Get() but never reads it afterwards (scoreBuf, s.scorePool.Get(), matches), causing wasted work; either remove the scoreBuf allocation/population block entirely or actually use scoreBuf in the subsequent ranking loop (replace direct reads of m.Score from matches with lookups into scoreBuf by m.Index) and keep the buffer reset/Put logic if you retain the pool; if scoreBuf was intended for deduplication/additional lookup, implement that logic or add a clarifying comment next to scoreBuf to avoid unnecessary allocation.

coderabbitai · 2026-04-05T20:13:18Z

+cli:
+	@cd demo && go run cli_search.go


⚠️ Potential issue | 🟡 Minor

Add cli to .PHONY.

Right now a file or directory named cli will make this target a no-op. gen has the same problem while you're touching the target list.

Possible fix

-.PHONY: demo test bench +.PHONY: demo cli test bench gen

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 7 - 8, The Makefile defines targets named cli and gen that are not listed in .PHONY, so an existing file/dir named cli or gen will prevent those targets from running; add cli and gen to the .PHONY declaration (alongside existing phony targets) to ensure they're always executed, updating the .PHONY line to include "cli" and "gen" so targets cli and gen (and any others already present) are treated as phony.

coderabbitai

🧹 Nitpick comments (2)

fuzzyvn.go (1)

205-223: Consider capping typo scores to avoid zero or low scores for valid matches.

For long queries (e.g., 40 chars), threshold becomes 10, and if dist == threshold, the score becomes 100 - 10*10 = 0. While technically correct, a zero-scored typo match may be unexpectedly filtered or ranked poorly.

♻️ Proposed fix - ensure minimum positive score

 		if dist <= threshold {
+			score := 100 - dist*10
+			if score < 10 {
+				score = 10 // ensure typo matches always have positive relevance
+			}
 			matches = append(matches, core.FuzzyMatch{
 				Index: i,
-				Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy
+				Score: score,
 			})
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn.go` around lines 205 - 223, The findButTypo function can produce zero
or very low Score values for long queries because Score is computed as 100 -
dist*10; change the scoring so typo matches are floored to a sensible minimum
(e.g., at least 1 or 5/10) instead of allowing 0 or negative. Locate
findButTypo, and after computing dist and threshold, compute the rawScore (100 -
dist*10) then clamp it with a minScore constant (e.g., minScore := 10) before
assigning to core.FuzzyMatch.Score so every valid typo match has a positive,
non-zero score.

bench_linux_test.go (1)

43-53: Avoid fmt.Printf in benchmarks; use b.Logf instead.

Using fmt.Printf unconditionally prints output on every benchmark run, cluttering results. b.Logf only prints when -v is passed or on failure, which is the idiomatic approach.

♻️ Proposed fix

-	fmt.Printf("\n--- Linux Benchmark Check ---\n")
-	fmt.Printf("Total files scanned: %d\n", len(files))
+	b.Logf("\n--- Linux Benchmark Check ---")
+	b.Logf("Total files scanned: %d", len(files))
 	if len(files) < limit {
 		b.Fatalf("ERROR: Hệ thống chỉ có %d files, không đủ %d để benchmark", len(files), limit)
 	}
 	
-	fmt.Println("Sample (first 10 files):")
+	b.Log("Sample (first 10 files):")
 	for i := 0; i < 10; i++ {
-		fmt.Printf("  [%d] %s\n", i+1, files[i])
+		b.Logf("  [%d] %s", i+1, files[i])
 	}
-	fmt.Printf("-----------------------------\n")
+	b.Log("-----------------------------")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench_linux_test.go` around lines 43 - 53, Replace the direct fmt.* prints in
the Linux benchmark block with the testing.B logger so output only appears on -v
or failure: change fmt.Printf("\n--- Linux Benchmark Check ---\n"),
fmt.Printf("Total files scanned: %d\n", ...), fmt.Println("Sample (first 10
files):"), and the loop's fmt.Printf lines to b.Logf(...) (keep the same
messages/formatting), and replace the final
fmt.Printf("-----------------------------\n") with b.Logf as well; locate these
calls in the benchmark function in bench_linux_test.go and use b.Logf to emit
the same text.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bench_linux_test.go`:
- Around line 43-53: Replace the direct fmt.* prints in the Linux benchmark
block with the testing.B logger so output only appears on -v or failure: change
fmt.Printf("\n--- Linux Benchmark Check ---\n"), fmt.Printf("Total files
scanned: %d\n", ...), fmt.Println("Sample (first 10 files):"), and the loop's
fmt.Printf lines to b.Logf(...) (keep the same messages/formatting), and replace
the final fmt.Printf("-----------------------------\n") with b.Logf as well;
locate these calls in the benchmark function in bench_linux_test.go and use
b.Logf to emit the same text.

In `@fuzzyvn.go`:
- Around line 205-223: The findButTypo function can produce zero or very low
Score values for long queries because Score is computed as 100 - dist*10; change
the scoring so typo matches are floored to a sensible minimum (e.g., at least 1
or 5/10) instead of allowing 0 or negative. Locate findButTypo, and after
computing dist and threshold, compute the rawScore (100 - dist*10) then clamp it
with a minScore constant (e.g., minScore := 10) before assigning to
core.FuzzyMatch.Score so every valid typo match has a positive, non-zero score.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ccad7bce-e3d8-403a-a027-495799168793

📥 Commits

Reviewing files that changed from the base of the PR and between e3dce6d and 820a740.

📒 Files selected for processing (6)

.github/workflows/release.yml
bench_linux_test.go
demo/cli_search.go
demo/main.go
fuzzyvn.go
go.mod

✅ Files skipped from review due to trivial changes (2)

go.mod
demo/cli_search.go

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/release.yml
demo/main.go

versenilvis added 30 commits April 4, 2026 11:27

refactor: separate into small files

7735107

feat(core): files filtering before search

fdfbd16

feat(score): bonus score for file base and directory

be1e91c

chore(utils): comment out unused containsRunes() function

2a982af

feat(worker): now we only search after filtering

da15cd9

chore(worker): simplify search flow

fdd9336

perf(utils): optimize normalization

393d337

perf(utils): convert to NFC by default and byte to rune in Levenshtei…

7c6a4bf

…n, loop cleanup

feat(core): replace cache with memory

9b6c13e

chore(score): update to new codebase

0816fbb

feat(worker): new parallel and filtered search strategies

333fc4c

chore(memory): convert query to byte slice

ae6ebe3

feat(core): JaroWinkler similarity

f4906ed

refactor(fuzzyvn): update to new code with memory-boosted ranking and…

5024d97

… thread-safe pooling

feat(test): some new edge cases

01c6553

refactor(demo): update new api

ed456eb

feat(makefile): cli search

f00f448

fix(makefile): need cd to recognize folder

b6886c4

chore(utils): comment out unused function

b49b884

fix(score): change logic to match use case

76c9352

feat(demo): cli search test

d00e6bd

chore(score): remove unused function

6a417e3

fix(cli search): correct dir path

08fe672

perf: optimize

861ed99

feat(bench): 100k files linux

a665b7c

docs(bench): results

1624025

docs(becnh): summary of benchmarks

1b33d32

docs(root): update README

89ed284

ci/cd: bump to version 1.26

18778b9

ci/cd: downgrade to 1.24

11e0539

ci/cd: run when merge into main

e3dce6d

versenilvis added 3 commits April 6, 2026 03:01

chore(pkg): downgrade to 1.24 to fit with ci/cd

503447b

fix(fuzzyvn): Fix SA6002

208869c

fix: errcheck

33724ce

gemini-code-assist Bot reviewed Apr 5, 2026

View reviewed changes

versenilvis added 3 commits April 6, 2026 03:05

fix(demo): err check

d9d44d5

ci/cd: skip create release

820a740

fix(utils): prevent null byte

f9f1cf6

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

versenilvis added 2 commits April 6, 2026 03:15

perf(fuzzyvn): optimize findButTypo

06cc8b0

chore(jaro): use clear() funtion from go 1.21

49443e5

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

versenilvis added 2 commits April 6, 2026 03:19

chore(test): use for i loop to fit lower version

598fdb9

chore(utils): comment out unused function

8098ac5

versenilvis merged commit b82d8df into main Apr 5, 2026
2 checks passed

-	recentFiles := globalMemory.GetRecentFiles(5)
-	boostedFiles := globalMemory.GetBoostScores(query)
-	matchedStrings := searcher.Search(query)
+	searcherLock.RLock()
+	defer searcherLock.RUnlock()
+	if searcher == nil {
+		json.NewEncoder(w).Encode(SearchResponse{
+			RecentFiles: []string{},
+			Results:     []SearchResult{},
+		})
+		return
+	}
+	if globalMemory == nil {
+		json.NewEncoder(w).Encode(SearchResponse{
+			RecentFiles: []string{},
+			Results:     []SearchResult{},
+		})
+		return
+	}
+	recentFiles := globalMemory.GetRecentFiles(5)
+	boostedFiles := globalMemory.GetBoostScores(query)
+	matchedStrings := searcher.Search(query)

Conversation

versenilvis commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thay đổi

Về hiệu năng

Hệ thống tính điểm mới

Filtering

Uh oh!

coderabbitai Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

versenilvis commented Apr 5, 2026 •

edited

Loading

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading