Skip to content

FuzzyVN V3.0.0#3

Merged
versenilvis merged 41 commits into
mainfrom
v3
Apr 5, 2026
Merged

FuzzyVN V3.0.0#3
versenilvis merged 41 commits into
mainfrom
v3

Conversation

@versenilvis

@versenilvis versenilvis commented Apr 5, 2026

Copy link
Copy Markdown
Owner

Note

Thay đổi

Về hiệu năng

  • Thay thế sort.Slice O(NlogN) bằng giải thuật sắp xếp một phần dùng Min-heap O(Nlog 20), giúp giảm chi phí sắp xếp xuống ~10 lần trên các tập kết quả lớn
  • Nâng cấp tìm kiếm có bộ lọc từ đơn luồng lên đa nhân (parallel), đạt hiệu suất tương đương với FuzzyFindParallel
  • Bỏ qua hoàn toàn bước sắp xếp lại khi không có các chỉ số tăng điểm về Memory/Context nào đang hoạt động
    Loại bỏ các lệnh gọi unicode.IsLower/IsUpper, thay thế bằng switch trên byte để nhận diện word boundary
  • Chuyển đổi các thao tác tìm kiếm cốt lõi từ []rune sang []byte, giảm chu kỳ xử lý của CPU và áp lực lên bộ nhớ
    Sử dụng thuật toán Jaro-Winkler hiệu suất cao sử dụng sync.Pool cho các chuỗi có độ dài lên đến 128 bytes
    Sử dụng sync.Pool cho các buffer điểm số để loại bỏ tình trạng tranh chấp dữ liệu trong khi tìm kiếm đồng thời

Hệ thống tính điểm mới

  • Hệ thống tính điểm 4 tầng:
    • Tầng 1 (+1.000.000): Khớp tiền tố hoàn hảo (truy vấn nằm ngay đầu tên file).
    • Tầng 2 (+500.000): Khớp chứa ký tự (tất cả ký tự trong truy vấn đều xuất hiện trong tên file ngắn, giúp bắt được lỗi đảo ký tự như main -> mian).
    • Tầng 3 (+200%): Khớp một phần tên file (khớp ít nhất 1 ký tự trong phần tên file).
    • Tầng 4 (phạt điểm): Chỉ khớp trong đường dẫn (bị trừ điểm dựa trên độ dài đường dẫn).
  • Xử lý đảo ký tự: Đếm tần suất ký tự không theo thứ tự trong tên file giúp tìm thấy mian.go khi tìm kiếm main.go và ngược lại
  • Tích hợp FileMemory với cơ chế giảm điểm theo thời gian để ưu tiên các file thường xuyên được chọn

Filtering

  • Bitset index có khả năng chịu lỗi typo (cho phép thiếu 1-2 ký tự đối với các truy vấn dài)
  • Các ký tự xuất hiện trong 85& số file sẽ bị loại khỏi chỉ mục để giảm nhiễu
  • DeleteFile đánh dấu các file đã xoá vào Bin thay vì build lại

@coderabbitai

coderabbitai Bot commented Apr 5, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Refactors the search implementation to use byte-normalized indexes, adds a UnigramFilter for candidate reduction, implements Jaro–Winkler similarity, introduces in-memory frecency tracking (FileMemory) with decay and persistence, and provides parallel top‑K fuzzy matching with min‑heap merging. Search API now accepts SearchOptions for context boosts.

Changes

Cohort / File(s) Summary
CI & Build
\.github/workflows/release.yml, Makefile, go.mod
Workflow triggers expanded to push→main and pull_request→main; actions/setup-go toolchain updated to 1.24; release step gated to v* tags. Added cli Make target. go.mod toolchain set to go 1.24.
Core: filtering, scoring, utils, worker
core/filter.go, core/score.go, core/worker.go, core/jaro.go, core/utils.go
New UnigramFilter (ASCII bitset index + deletions), greedy fuzzy scorer, parallel/top‑K fuzzy search using per‑goroutine min‑heaps, Jaro‑Winkler similarity, normalization (including Vietnamese mappings), Levenshtein ratio, and assorted text utilities.
Memory & frecency
core/memory.go
New FileMemory/FileRecord types with RecordSelection, time‑decay frecency scoring (GetBoostScores), max‑entries eviction, Export/Import, and GetRecentFiles. Thread‑safe via RWMutex.
Public API & searcher refactor
fuzzyvn.go
Replaced QueryCache with core.FileMemory and UnigramFilter; Searcher uses byte-normalized index and baseStarts; Search signature changed to Search(query string, opts ...*SearchOptions); added SearchOptions, NewSearcherWithMemory, Normalize/LevenshteinRatio aliases, RecordSelection/ ClearCache updated to use FileMemory; candidate reduction + parallel scoring + typo fallback integrated.
Demo / server / CLI
demo/main.go, demo/cli_search.go
Server switched from cache to memory, response renamed RecentFiles, removed /cache-info, updated error handling. Added demo CLI (demo/cli_search.go) and Makefile cli target.
Benchmarks & tests
bench_linux_test.go, fuzzyvn_test.go
Added Linux 100k-file benchmarks and NewSearcher construction benchmark. Tests updated to exercise FileMemory (RecordSelection, GetBoostScores, Export/Import, concurrency, edge cases); removed QueryCache tests; benchmark adjustments to use core.FileMemory.
Documentation & bench results
README.md, docs/bench.md, docs/bench_result_amd.txt, docs/bench_result_n2.txt
README rewritten to replace cache demo with memory/frecency demo, updated API docs for Search signature and ClearCache, removed QueryCache docs. Added detailed benchmark docs and results files.
Misc
fuzzyvn.go (large refactor)
Major removal of old in-file fuzzy/cache code and replacement with core-based implementations; many lines reworked across package to adopt new architecture.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Searcher
    participant UnigramFilter
    participant Worker as FuzzyWorker
    participant FileMemory

    Client->>Searcher: Search(query, opts)
    Searcher->>Searcher: Normalize query (bytes)
    Searcher->>FileMemory: GetBoostScores(query)
    FileMemory-->>Searcher: boost map
    Searcher->>UnigramFilter: Filter(query bytes)
    alt candidates returned
        UnigramFilter-->>Searcher: indices
        Searcher->>FuzzyWorker: FuzzyFindFiltered(query, candidates)
    else no candidates
        Searcher->>FuzzyWorker: FuzzyFindParallel(query, all items)
    end
    FuzzyWorker->>FuzzyWorker: parallel scoring, local heaps
    FuzzyWorker-->>Searcher: merged top‑K matches
    Searcher->>Searcher: apply memory/context boosts, sort, trim
    Searcher-->>Client: []string results
    Client->>FileMemory: RecordSelection(query, selected_file)
    FileMemory->>FileMemory: update counts, timestamps, evict if needed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 From cache to memory, bytes hop and play,
Filters hum softly and Jaro leads the way,
Heaps gather winners while frecency grows,
A rabbit cheers quietly as performance shows. 🥕✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'FuzzyVN V3.0.0' is vague and uses a generic version number that does not convey what the changeset actually accomplishes or what the primary changes are. Provide a more descriptive title that highlights the main technical improvement or feature, such as 'Refactor search engine to use byte-based scoring with min-heap optimization' or 'Replace query cache with FileMemory system for frecency-based ranking'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new memory-based caching system (Frecency) to replace the previous query cache, adds a bitset-based filter for faster search performance, and includes comprehensive benchmark results. I have provided feedback on a critical bug in the string normalization logic, a performance improvement for typo-handling, and minor code optimizations for readability and idiomatic Go usage.

Comment thread core/utils.go
Comment on lines +43 to +53
if isASCII {
buf := make([]byte, len(s))
for i, char := range []byte(s) {
if char >= 'A' && char <= 'Z' {
buf[i] = char + 32
} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' {
buf[i] = char
}
}
return string(buf)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Trong fast path xử lý chuỗi ASCII của hàm Normalize, bạn đang khởi tạo buf với độ dài bằng len(s) và sau đó gán giá trị tại index i.

buf := make([]byte, len(s))
for i, char := range []byte(s) {
    // ... gán giá trị cho buf[i]
}

Vấn đề xảy ra khi một ký tự không nằm trong danh sách cho phép (ví dụ: #, !, ?). Trong trường hợp này, buf[i] sẽ không được gán giá trị mới và giữ nguyên giá trị mặc định là \x00 (null byte).
Ví dụ, Normalize("a#b") sẽ trả về "a\x00b".

Sự xuất hiện của null byte trong chuỗi đã chuẩn hóa có thể gây ra các lỗi không mong muốn trong các bước so sánh và tìm kiếm sau này.

Để khắc phục, bạn nên khởi tạo buffer với len=0cap=len(s), sau đó sử dụng append để chỉ thêm vào các ký tự hợp lệ. Cách này sẽ loại bỏ hoàn toàn các ký tự không mong muốn thay vì thay thế chúng bằng null byte.

Suggested change
if isASCII {
buf := make([]byte, len(s))
for i, char := range []byte(s) {
if char >= 'A' && char <= 'Z' {
buf[i] = char + 32
} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\\' || char == '_' || char == '-' || char == ' ' {
buf[i] = char
}
}
return string(buf)
}
if isASCII {
buf := make([]byte, 0, len(s))
for _, char := range []byte(s) {
if char >= 'A' && char <= 'Z' {
buf = append(buf, char+32)
} else if (char >= 'a' && char <= 'z') || (char >= '0' && char <= '9') || char == '.' || char == '/' || char == '\' || char == '_' || char == '-' || char == ' ' {
buf = append(buf, char)
}
}
return string(buf)
}

Comment thread fuzzyvn.go
Comment on lines 204 to +222
*/
func (s *Searcher) RecordSelection(query, filePath string) {
if s.Cache != nil {
s.Cache.RecordSelection(query, filePath)
func (s *Searcher) findButTypo(query string) []core.FuzzyMatch {
var matches []core.FuzzyMatch
// chỉ so sánh với file name để đạt độ chính xác cao nhất cho typo
for i, filename := range s.FilenamesOnly {
dist := core.LevenshteinRatio(query, filename)
// cho phép sai 1 ký tự trên 4 ký tự gõ vào
threshold := len(query) / 4
if threshold < 1 {
threshold = 1
}
if dist <= threshold {
matches = append(matches, core.FuzzyMatch{
Index: i,
Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy
})
}
}
return matches

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hàm findButTypo thực hiện quét tuần tự qua toàn bộ s.FilenamesOnly để tính khoảng cách Levenshtein. Với số lượng file lớn (ví dụ > 100k), thao tác này có thể trở thành một điểm nghẽn về hiệu năng, đặc biệt khi so sánh với các hàm tìm kiếm fuzzy đã được song song hóa.

Để cải thiện hiệu suất, bạn nên xem xét việc song song hóa vòng lặp này, tương tự như cách đã làm với FuzzyFindParallel. Bạn có thể chia s.FilenamesOnly thành các chunk và xử lý chúng trên nhiều goroutine.

Comment thread core/jaro.go Outdated
Comment on lines +83 to +85
// Reset vùng nhớ cần dùng
for i := 0; i < l1; i++ { buf[0][i] = false }
for i := 0; i < l2; i++ { buf[1][i] = false }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Để reset vùng nhớ của buffer, bạn đang dùng vòng lặp for. Cách này đúng nhưng có thể thay thế bằng hàm clear() được giới thiệu từ Go 1.21 để code trông gọn và idiomatic hơn.

Nếu dự án của bạn đang dùng Go 1.21+, bạn có thể cân nhắc thay đổi này.

Suggested change
// Reset vùng nhớ cần dùng
for i := 0; i < l1; i++ { buf[0][i] = false }
for i := 0; i < l2; i++ { buf[1][i] = false }
// Reset vùng nhớ cần dùng
clear(buf[0][:l1])
clear(buf[1][:l2])

Comment thread core/score.go

// Tier 3: Có ít nhất 1 match trong filename
if firstMatchIdx < baseStart {
totalScore += (totalScore * 200) / 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Phép tính totalScore += (totalScore * 200) / 100 có thể được đơn giản hóa để dễ đọc và hiệu quả hơn.
Nó tương đương với totalScore = totalScore + totalScore * 2, hay totalScore *= 3.

Việc sử dụng phép nhân trực tiếp sẽ rõ ràng hơn và tránh được một phép chia không cần thiết.

Suggested change
totalScore += (totalScore * 200) / 100
totalScore *= 3

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
fuzzyvn_test.go (1)

677-679: ⚠️ Potential issue | 🟠 Major

Same b.Loop() compatibility issue as in bench_linux_test.go.

Multiple benchmarks in this file use b.Loop() (lines 677, 685, 693, 702, 712, 721, 730, 742, 749, 764, 779, 795, 807, 821). Replace all occurrences with for i := 0; i < b.N; i++ for Go 1.21+ compatibility.

🔧 Example fix pattern
-		for b.Loop() {
+		for i := 0; i < b.N; i++ {
 			searcher.Search("config")
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn_test.go` around lines 677 - 679, Replace the deprecated b.Loop()
usage with the standard Go benchmark loop for i := 0; i < b.N; i++ in this test
file: locate each benchmark where b.Loop() is used (e.g., the block calling
searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so
the benchmark runs correctly on Go 1.21+; apply the same replacement for all
other occurrences listed in the comment (lines with b.Loop() around
searcher.Search and similar benchmark bodies).
🧹 Nitpick comments (3)
core/utils.go (1)

281-295: Rename LevenshteinRatio to match its return value.

This API returns raw edit distance, not a ratio or percentage. LevenshteinDistance would be much less error-prone for callers, with LevenshteinRatio kept as a deprecated alias only if compatibility matters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/utils.go` around lines 281 - 295, The function LevenshteinRatio
currently returns a raw edit distance; rename the exported function to
LevenshteinDistance (update its doc comment accordingly) and leave
LevenshteinRatio as a thin deprecated wrapper that calls LevenshteinDistance to
preserve backward compatibility; update all internal references/usages to call
LevenshteinDistance and add a deprecation comment on LevenshteinRatio so callers
can migrate smoothly.
fuzzyvn_test.go (1)

263-266: Consider reducing sleep duration or using alternative synchronization.

The time.Sleep(1100ms) calls make this test take ~3.3 seconds. While this works, it slows down the test suite. Consider using shorter intervals if the underlying implementation supports millisecond-level precision, or document why second-level precision is required.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn_test.go` around lines 263 - 266, The current test uses long
time.Sleep(1100 * time.Millisecond) to force distinct Unix-second timestamps;
replace these sleeps with much shorter sleeps (e.g., 10-50 * time.Millisecond)
if the code under test (mem.RecordSelection) supports millisecond precision, or
better yet change the test to avoid real sleeping by using a controllable clock
or by adding an overload to RecordSelection that accepts an explicit timestamp
(e.g., RecordSelectionWithTime or passing time.Now() from test) so you can
synthesize distinct timestamps deterministically; update calls to
mem.RecordSelection("q2", "/b.go") and mem.RecordSelection("q3", "/c.go")
accordingly and remove the long 1100ms sleeps.
fuzzyvn.go (1)

234-236: ClearCache may leave stale references if Memory was shared.

When Memory was passed via NewSearcherWithMemory, calling ClearCache creates a new FileMemory instance but doesn't clear the original shared memory. Other searchers sharing the same memory instance will retain the old data.

This may be intentional (each searcher gets independent memory after clear), but it's worth documenting the behavior.

📝 Consider adding documentation
 /*
-ClearCache: Xóa sạch bộ nhớ lịch sử
+ClearCache: Xóa sạch bộ nhớ lịch sử của Searcher này.
+Lưu ý: Nếu Memory được chia sẻ qua NewSearcherWithMemory, các Searcher khác
+vẫn giữ nguyên dữ liệu cũ.
 */
 func (s *Searcher) ClearCache() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn.go` around lines 234 - 236, The ClearCache method currently replaces
s.Memory with a new core.NewFileMemory(nil) but does not mutate or clear the
original Memory object passed via NewSearcherWithMemory, leaving other searchers
that hold that shared instance unchanged; update the comment/docstring above
ClearCache to explicitly state that ClearCache creates a fresh FileMemory for
this Searcher and does not clear or modify any previously shared Memory
instances, and if the intended behavior is to clear shared memory instead,
implement and call a clear-style method on the Memory interface (e.g.,
Memory.Clear()) or detect and clear the existing instance rather than replacing
it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/release.yml:
- Around line 5-9: The release action is running on PRs and non-tag pushes; add
a tag-only guard so it only runs for tag pushes (refs/tags/v*). Update the
release job or the specific release step (e.g., the job named "release" or the
step labeled "Create Release"/release publishing) to include an if condition
like startsWith(github.ref, 'refs/tags/v') so the job/step executes only when a
tag matching v* is pushed.

In `@bench_linux_test.go`:
- Around line 84-87: The benchmark uses b.Loop(), which is incompatible with Go
1.21+; update the loop around NewSearcher(files) to use the standard b.N pattern
(e.g., for i := 0; i < b.N; i++ { NewSearcher(files) }) while keeping the
existing b.ResetTimer() call so the benchmark runs correctly in newer Go
versions and still measures NewSearcher(files) per iteration.

In `@core/filter.go`:
- Around line 117-124: UnigramFilter.DeleteFile currently fails for negative
docID and does a non-atomic read/modify/write on uf.Bin, so add a guard that
returns if docID < 0 || docID >= uf.NumTargets, compute blockIdx and bitPos as
before, and perform a thread-safe update: either use a sync.Mutex (e.g., a mutex
on UnigramFilter or per-block locks) around the uf.Bin[blockIdx] |= bitPos
mutation or use atomic operations (a CAS loop with
atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically
OR the bit; ensure the same synchronization strategy is used wherever the bitmap
is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.

In `@core/jaro.go`:
- Around line 33-37: The JaroWinkler implementation currently hard-rejects when
the first byte differs (the a[0] != b[0] return 0.0 check); remove that
early-return so the full JaroWinkler similarity is computed for all inputs
(i.e., delete the first-byte check in the JaroWinkler function in core/jaro.go)
and ensure the function proceeds with the regular matching/weighting logic
instead of returning 0.0 for differing first characters.

In `@core/memory.go`:
- Around line 97-111: The current update logic for record.Queries leaves
queryNorm in its old slot when found, which prevents it from being treated as
most-recent; instead, when queryNorm exists you should remove it from
record.Queries and re-append it to the end so it becomes the newest entry, and
still enforce the max length (3) by trimming the oldest element if needed;
update the loop that checks for equality to record.Queries to capture the found
index, splice out that index when found, then append queryNorm and if
len(record.Queries) > 3 drop the first element so the ring buffer always
contains the three most recent queries.

In `@core/score.go`:
- Around line 85-96: The bucket used for Tier 2 frequency counts, charBucket
declared in the score calculation (charBucket [256]int8) can overflow for long
filenames; change its type to a wider integer (e.g., int16 or int) in the same
scope where baseStart, lenP, target, pattern and filenameHits are used so
repeated bytes are counted correctly and the Tier 2 bonus logic still triggers
when appropriate.

In `@core/utils.go`:
- Around line 43-52: The current ASCII-normalization loop preallocates buf :=
make([]byte, len(s)) and only writes some indices, leaving NULs for dropped
chars; change it to build a compact buffer by either using var buf []byte and
append(buf, char) for allowed characters or keep buf := make([]byte, len(s)) but
maintain a write index j and assign buf[j] = char then return string(buf[:j]);
update the isASCII branch (variables: buf, s, char) accordingly so unsupported
ASCII characters are removed instead of becoming NUL bytes.

In `@core/worker.go`:
- Around line 118-178: FuzzyFindParallel currently scans the full items slice
and can return entries for files marked deleted in the UnigramFilter/ Bin
bitmap; fix by adding a deleted bitmap parameter (e.g., deletedBin) to
FuzzyFindParallel and use it to skip deleted indexes both inside the per-worker
loop (before calling fuzzyScoreGreedy for index j) and when merging results from
resultChan into finalHeap (skip any FuzzyMatch whose Index is marked deleted);
update callers accordingly so deleted files are never scored or returned
(references: function FuzzyFindParallel, fuzzyScoreGreedy, resultChan,
finalHeap, heapToSorted).

In `@demo/main.go`:
- Around line 103-105: The code reads globalMemory (recentFiles :=
globalMemory.GetRecentFiles(5); boostedFiles :=
globalMemory.GetBoostScores(query)) without nil-checking or synchronization,
risking a nil panic and a data race because globalMemory is set in indexFiles()
concurrently; fix by ensuring globalMemory is safely published before use
(either initialize it synchronously before the server accepts requests or
protect reads/writes with the same mutex used for searcher, e.g., acquire
searcherLock (or a new memoryLock) around writes in indexFiles and around reads
here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet
ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.

In `@fuzzyvn.go`:
- Around line 177-185: The current conditional sort only runs when memoryBoosts
or opts[0].ContextBoosts exist, causing non-deterministic ordering when no
boosts are present; always sort rankedResults after they are produced to ensure
deterministic ordering (tie-break by rankedResults[i].Str), i.e. remove the
conditional and invoke sort.Slice unconditionally after the
FuzzyFindFiltered/FuzzyFindParallel results are assembled, keeping the existing
comparator that prefers higher Score and falls back to alphabetical Str.
- Around line 140-152: The code allocates and fills scoreBuf from
s.scorePool.Get() but never reads it afterwards (scoreBuf, s.scorePool.Get(),
matches), causing wasted work; either remove the scoreBuf allocation/population
block entirely or actually use scoreBuf in the subsequent ranking loop (replace
direct reads of m.Score from matches with lookups into scoreBuf by m.Index) and
keep the buffer reset/Put logic if you retain the pool; if scoreBuf was intended
for deduplication/additional lookup, implement that logic or add a clarifying
comment next to scoreBuf to avoid unnecessary allocation.

In `@Makefile`:
- Around line 7-8: The Makefile defines targets named cli and gen that are not
listed in .PHONY, so an existing file/dir named cli or gen will prevent those
targets from running; add cli and gen to the .PHONY declaration (alongside
existing phony targets) to ensure they're always executed, updating the .PHONY
line to include "cli" and "gen" so targets cli and gen (and any others already
present) are treated as phony.

---

Outside diff comments:
In `@fuzzyvn_test.go`:
- Around line 677-679: Replace the deprecated b.Loop() usage with the standard
Go benchmark loop for i := 0; i < b.N; i++ in this test file: locate each
benchmark where b.Loop() is used (e.g., the block calling
searcher.Search("config")) and change the loop to for i := 0; i < b.N; i++ so
the benchmark runs correctly on Go 1.21+; apply the same replacement for all
other occurrences listed in the comment (lines with b.Loop() around
searcher.Search and similar benchmark bodies).

---

Nitpick comments:
In `@core/utils.go`:
- Around line 281-295: The function LevenshteinRatio currently returns a raw
edit distance; rename the exported function to LevenshteinDistance (update its
doc comment accordingly) and leave LevenshteinRatio as a thin deprecated wrapper
that calls LevenshteinDistance to preserve backward compatibility; update all
internal references/usages to call LevenshteinDistance and add a deprecation
comment on LevenshteinRatio so callers can migrate smoothly.

In `@fuzzyvn_test.go`:
- Around line 263-266: The current test uses long time.Sleep(1100 *
time.Millisecond) to force distinct Unix-second timestamps; replace these sleeps
with much shorter sleeps (e.g., 10-50 * time.Millisecond) if the code under test
(mem.RecordSelection) supports millisecond precision, or better yet change the
test to avoid real sleeping by using a controllable clock or by adding an
overload to RecordSelection that accepts an explicit timestamp (e.g.,
RecordSelectionWithTime or passing time.Now() from test) so you can synthesize
distinct timestamps deterministically; update calls to mem.RecordSelection("q2",
"/b.go") and mem.RecordSelection("q3", "/c.go") accordingly and remove the long
1100ms sleeps.

In `@fuzzyvn.go`:
- Around line 234-236: The ClearCache method currently replaces s.Memory with a
new core.NewFileMemory(nil) but does not mutate or clear the original Memory
object passed via NewSearcherWithMemory, leaving other searchers that hold that
shared instance unchanged; update the comment/docstring above ClearCache to
explicitly state that ClearCache creates a fresh FileMemory for this Searcher
and does not clear or modify any previously shared Memory instances, and if the
intended behavior is to clear shared memory instead, implement and call a
clear-style method on the Memory interface (e.g., Memory.Clear()) or detect and
clear the existing instance rather than replacing it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18c81e78-3d35-4968-a7ac-4ef34866aa47

📥 Commits

Reviewing files that changed from the base of the PR and between 2b42a24 and e3dce6d.

📒 Files selected for processing (17)
  • .github/workflows/release.yml
  • Makefile
  • README.md
  • bench_linux_test.go
  • core/filter.go
  • core/jaro.go
  • core/memory.go
  • core/score.go
  • core/utils.go
  • core/worker.go
  • demo/cli_search.go
  • demo/main.go
  • docs/bench.md
  • docs/bench_result_amd.txt
  • docs/bench_result_n2.txt
  • fuzzyvn.go
  • fuzzyvn_test.go

Comment thread .github/workflows/release.yml
Comment thread bench_linux_test.go
Comment thread core/filter.go
Comment on lines +117 to +124
func (uf *UnigramFilter) DeleteFile(docID int) {
if docID >= uf.NumTargets {
return
}
blockIdx := docID / 64
bitPos := uint64(1) << (docID % 64)
// đánh dấu bỏ vào bin
uf.Bin[blockIdx] |= bitPos

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard DeleteFile against invalid IDs and concurrent bitmap updates.

docID < 0 will panic here, and uf.Bin[blockIdx] |= bitPos is a non-atomic read/modify/write. Two deletes in the same 64-file block can lose one tombstone, and Filter also reads the same bitmap on Line 195 with no synchronization.

Also applies to: 185-195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/filter.go` around lines 117 - 124, UnigramFilter.DeleteFile currently
fails for negative docID and does a non-atomic read/modify/write on uf.Bin, so
add a guard that returns if docID < 0 || docID >= uf.NumTargets, compute
blockIdx and bitPos as before, and perform a thread-safe update: either use a
sync.Mutex (e.g., a mutex on UnigramFilter or per-block locks) around the
uf.Bin[blockIdx] |= bitPos mutation or use atomic operations (a CAS loop with
atomic.LoadUint64/atomic.CompareAndSwapUint64 on uf.Bin[blockIdx]) to atomically
OR the bit; ensure the same synchronization strategy is used wherever the bitmap
is read (e.g., the Filter reader that accesses uf.Bin) to avoid races.

Comment thread core/jaro.go
Comment on lines +33 to +37
// Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp
// Giúp bỏ qua nhanh các file không liên quan trong lịch sử
if a[0] != b[0] {
return 0.0
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't hard-reject JaroWinkler when the first byte differs.

This turns a similarity algorithm into a heuristic. Strings like abc/xbc or main/pain can still clear the 0.7 threshold, so GetBoostScores will miss relevant history whenever the typo lands on the first character.

Minimal fix
-	// Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp
-	// Giúp bỏ qua nhanh các file không liên quan trong lịch sử
-	if a[0] != b[0] {
-		return 0.0
-	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Early Exit: Nếu byte đầu khác nhau, xác suất Jaro > 0.7 là rất thấp
// Giúp bỏ qua nhanh các file không liên quan trong lịch sử
if a[0] != b[0] {
return 0.0
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/jaro.go` around lines 33 - 37, The JaroWinkler implementation currently
hard-rejects when the first byte differs (the a[0] != b[0] return 0.0 check);
remove that early-return so the full JaroWinkler similarity is computed for all
inputs (i.e., delete the first-byte check in the JaroWinkler function in
core/jaro.go) and ensure the function proceeds with the regular
matching/weighting logic instead of returning 0.0 for differing first
characters.

Comment thread core/memory.go
Comment on lines +97 to +111
// Cập nhật query list (ring buffer)
foundQuery := false
for _, q := range record.Queries {
if q == queryNorm {
foundQuery = true
break
}
}
if !foundQuery {
if len(record.Queries) >= 3 {
// Xóa cái cũ nhất
record.Queries = record.Queries[1:]
}
record.Queries = append(record.Queries, queryNorm)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Refresh an existing query instead of leaving it in place.

When queryNorm is already present, this path keeps it in its old slot. One later insert can evict a query that was just reused, so the "3 recent queries" list stops being recent.

Possible fix
-	foundQuery := false
-	for _, q := range record.Queries {
-		if q == queryNorm {
-			foundQuery = true
-			break
-		}
-	}
-	if !foundQuery {
-		if len(record.Queries) >= 3 {
-			// Xóa cái cũ nhất
-			record.Queries = record.Queries[1:]
-		}
-		record.Queries = append(record.Queries, queryNorm)
-	}
+	pos := -1
+	for i, q := range record.Queries {
+		if q == queryNorm {
+			pos = i
+			break
+		}
+	}
+	if pos >= 0 {
+		copy(record.Queries[pos:], record.Queries[pos+1:])
+		record.Queries = record.Queries[:len(record.Queries)-1]
+	} else if len(record.Queries) >= 3 {
+		// Xóa cái cũ nhất
+		record.Queries = record.Queries[1:]
+	}
+	record.Queries = append(record.Queries, queryNorm)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Cập nhật query list (ring buffer)
foundQuery := false
for _, q := range record.Queries {
if q == queryNorm {
foundQuery = true
break
}
}
if !foundQuery {
if len(record.Queries) >= 3 {
// Xóa cái cũ nhất
record.Queries = record.Queries[1:]
}
record.Queries = append(record.Queries, queryNorm)
}
// Cập nhật query list (ring buffer)
pos := -1
for i, q := range record.Queries {
if q == queryNorm {
pos = i
break
}
}
if pos >= 0 {
copy(record.Queries[pos:], record.Queries[pos+1:])
record.Queries = record.Queries[:len(record.Queries)-1]
} else if len(record.Queries) >= 3 {
// Xóa cái cũ nhất
record.Queries = record.Queries[1:]
}
record.Queries = append(record.Queries, queryNorm)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/memory.go` around lines 97 - 111, The current update logic for
record.Queries leaves queryNorm in its old slot when found, which prevents it
from being treated as most-recent; instead, when queryNorm exists you should
remove it from record.Queries and re-append it to the end so it becomes the
newest entry, and still enforce the max length (3) by trimming the oldest
element if needed; update the loop that checks for equality to record.Queries to
capture the found index, splice out that index when found, then append queryNorm
and if len(record.Queries) > 3 drop the first element so the ring buffer always
contains the three most recent queries.

Comment thread core/worker.go
Comment on lines +118 to +178
func FuzzyFindParallel(query []byte, items [][]byte, baseStarts []int) []FuzzyMatch {
numItems := len(items)
if numItems == 0 {
return nil
}

numCPUs := runtime.GOMAXPROCS(0)
chunkSize := (numItems + numCPUs - 1) / numCPUs

var wg sync.WaitGroup
resultChan := make(chan []FuzzyMatch, numCPUs)

for i := range numCPUs {
start := i * chunkSize
if start >= numItems {
break
}
end := start + chunkSize
if end > numItems {
end = numItems
}

wg.Add(1)
go func(s, e int) {
defer wg.Done()
h := &minHeap{}
heap.Init(h)
for j := s; j < e; j++ {
if score, matched := fuzzyScoreGreedy(query, items[j], baseStarts[j]); matched {
if h.Len() < topK {
heap.Push(h, FuzzyMatch{Index: j, Score: score})
} else if score > (*h)[0].Score {
(*h)[0] = FuzzyMatch{Index: j, Score: score}
heap.Fix(h, 0)
}
}
}
resultChan <- heapToSorted(h)
}(start, end)
}

go func() {
wg.Wait()
close(resultChan)
}()

finalHeap := &minHeap{}
heap.Init(finalHeap)
for matches := range resultChan {
for _, m := range matches {
if finalHeap.Len() < topK {
heap.Push(finalHeap, m)
} else if m.Score > (*finalHeap)[0].Score {
(*finalHeap)[0] = m
heap.Fix(finalHeap, 0)
}
}
}

return heapToSorted(finalHeap)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== Search fallback call sites ==\n'
rg -n -C4 'FuzzyFindFiltered|FuzzyFindParallel' --type go

printf '\n== DeleteFile implementations and callers ==\n'
rg -n -C4 'DeleteFile\(' --type go

printf '\n== Tombstone bitmap usage ==\n'
rg -n -C2 '\bBin\b' --type go

Repository: versenilvis/fuzzyvn

Length of output: 3098


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== Filter method implementation ==\n'
rg -n -A30 'func \(uf \*UnigramFilter\) Filter\(' --type go

printf '\n== Search method around FuzzyFindParallel call ==\n'
rg -n -B10 -A10 'FuzzyFindParallel' fuzzyvn.go --type go

printf '\n== Check if items are modified during deletion ==\n'
rg -n -C3 'DeleteFile|delete\|remove.*Normalized' --type go | head -40

Repository: versenilvis/fuzzyvn

Length of output: 2598


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== Full Filter method ==\n'
rg -n -A80 'func \(uf \*UnigramFilter\) Filter\(' core/filter.go --type go

printf '\n== Check if Bin is checked in Filter logic ==\n'
rg -n -B5 -A5 'Bin\[blockIdx\]' core/filter.go --type go

Repository: versenilvis/fuzzyvn

Length of output: 3259


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== Check post-processing after FuzzyFindParallel ==\n'
rg -n -A15 'FuzzyFindParallel' fuzzyvn.go --type go

printf '\n== Search method full implementation ==\n'
rg -n -B5 -A30 'func \(s \*Searcher\) Search\(' --type go | head -80

Repository: versenilvis/fuzzyvn

Length of output: 2191


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== FuzzyMatch struct and result handling ==\n'
rg -n -B2 -A10 'type FuzzyMatch struct' --type go

printf '\n== How matches are converted to final results ==\n'
rg -n -A20 'if len\(matches\) == 0' fuzzyvn.go --type go | head -50

Repository: versenilvis/fuzzyvn

Length of output: 1384


Apply deletion filter to FuzzyFindParallel fallback results.

When Filter() returns nil (short query), FuzzyFindParallel scans the full items slice with no knowledge of the Bin deletion bitmap. Deleted files marked via UnigramFilter.DeleteFile() are excluded only in the filtered path (line 195), leaving deleted items exposed in the fallback path.

Either pass Bin to FuzzyFindParallel and filter deleted files during scoring, or post-filter results against the bitmap before returning them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/worker.go` around lines 118 - 178, FuzzyFindParallel currently scans the
full items slice and can return entries for files marked deleted in the
UnigramFilter/ Bin bitmap; fix by adding a deleted bitmap parameter (e.g.,
deletedBin) to FuzzyFindParallel and use it to skip deleted indexes both inside
the per-worker loop (before calling fuzzyScoreGreedy for index j) and when
merging results from resultChan into finalHeap (skip any FuzzyMatch whose Index
is marked deleted); update callers accordingly so deleted files are never scored
or returned (references: function FuzzyFindParallel, fuzzyScoreGreedy,
resultChan, finalHeap, heapToSorted).

Comment thread demo/main.go
Comment on lines +103 to 105
recentFiles := globalMemory.GetRecentFiles(5)
boostedFiles := globalMemory.GetBoostScores(query)
matchedStrings := searcher.Search(query)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential nil pointer dereference on globalMemory.

globalMemory is accessed at lines 103-104 without nil checking, but it's only initialized inside indexFiles() (line 72) which runs asynchronously in a goroutine. If a /search request arrives before indexing completes, globalMemory will be nil causing a panic.

Additionally, there's a potential data race: globalMemory is written in indexFiles (line 72) and read here without synchronization (the searcherLock only protects searcher, not globalMemory).

🐛 Proposed fix
 	searcherLock.RLock()
 	defer searcherLock.RUnlock()
 
 	if searcher == nil {
 		json.NewEncoder(w).Encode(SearchResponse{
 			RecentFiles: []string{},
 			Results:     []SearchResult{},
 		})
 		return
 	}
 
+	if globalMemory == nil {
+		json.NewEncoder(w).Encode(SearchResponse{
+			RecentFiles: []string{},
+			Results:     []SearchResult{},
+		})
+		return
+	}
+
 	recentFiles := globalMemory.GetRecentFiles(5)
 	boostedFiles := globalMemory.GetBoostScores(query)

For a more robust solution, consider protecting globalMemory with the same lock or ensuring it's always initialized before the server starts accepting requests.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
recentFiles := globalMemory.GetRecentFiles(5)
boostedFiles := globalMemory.GetBoostScores(query)
matchedStrings := searcher.Search(query)
searcherLock.RLock()
defer searcherLock.RUnlock()
if searcher == nil {
json.NewEncoder(w).Encode(SearchResponse{
RecentFiles: []string{},
Results: []SearchResult{},
})
return
}
if globalMemory == nil {
json.NewEncoder(w).Encode(SearchResponse{
RecentFiles: []string{},
Results: []SearchResult{},
})
return
}
recentFiles := globalMemory.GetRecentFiles(5)
boostedFiles := globalMemory.GetBoostScores(query)
matchedStrings := searcher.Search(query)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/main.go` around lines 103 - 105, The code reads globalMemory
(recentFiles := globalMemory.GetRecentFiles(5); boostedFiles :=
globalMemory.GetBoostScores(query)) without nil-checking or synchronization,
risking a nil panic and a data race because globalMemory is set in indexFiles()
concurrently; fix by ensuring globalMemory is safely published before use
(either initialize it synchronously before the server accepts requests or
protect reads/writes with the same mutex used for searcher, e.g., acquire
searcherLock (or a new memoryLock) around writes in indexFiles and around reads
here), and add a nil-check/early return or HTTP 503 if globalMemory is not yet
ready to avoid dereferencing nil in GetRecentFiles/GetBoostScores.

Comment thread fuzzyvn.go
Comment on lines +140 to +152
// xếp hạng và áp dụng boosts
scoreBuf := s.scorePool.Get().([]int)
defer func() {
// reset buffer trước khi trả lại pool
for i := range scoreBuf {
scoreBuf[i] = math.MinInt
}
s.scorePool.Put(scoreBuf)
}()

for i, nameNorm := range s.FilenamesOnly {
// Thay vì: runesName := []rune(nameNorm)
// Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại)
if len(nameNorm) < queryLen {
continue
}

// So sánh với phần đầu của filename
targetStr1 := fastSubstring(nameNorm, queryLen)
// Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ
if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized
continue
}

dist := LevenshteinRatio(queryNorm, targetStr1)

// So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự)
if len(nameNorm) > len(targetStr1) {
// Lấy prefix dài hơn 1 rune
targetStr2 := fastSubstring(nameNorm, queryLen+1)

d2 := LevenshteinRatio(queryNorm, targetStr2)
if d2 < dist {
dist = d2
}
}
/*
Ở phần trên ví dụ như "mian", target 1 là "main" target 2 là "maina"
Ta tính điểm ở target 1, dist = d1 = 2, nhưng ở target 2, dist = d2 = 3
if d2 < dist {
dist = d2
}
Tức là nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên
Kiểu như min(d1, d2)
*/

// Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm
// Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình)
if dist <= baseThreshold {
// Base score 3000
score := 3000 - (dist * 400)
runeCountName := 0
for range nameNorm {
runeCountName++
}
lenDiff := runeCountName - queryLen
if lenDiff > 0 {
score -= (lenDiff * 15) // Phạt độ dài tên
}

// Thưởng exact
if lenDiff == 0 && dist == 0 {
score += 1000
}

// Phạt độ dài đường dẫn
score -= len(s.Originals[i]) / 5
for _, m := range matches {
scoreBuf[m.Index] = m.Score
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

scoreBuf is populated but never used.

Lines 150-152 populate scoreBuf with match scores, but the subsequent ranking loop (lines 154-175) recalculates scores from matches directly and never reads from scoreBuf. This allocation and population is wasted work.

🔧 Proposed fix - remove unused scoreBuf logic
-	// xếp hạng và áp dụng boosts
-	scoreBuf := s.scorePool.Get().([]int)
-	defer func() {
-		// reset buffer trước khi trả lại pool
-		for i := range scoreBuf {
-			scoreBuf[i] = math.MinInt
-		}
-		s.scorePool.Put(scoreBuf)
-	}()
-
-	for _, m := range matches {
-		scoreBuf[m.Index] = m.Score
-	}
-
 	rankedResults := make([]MatchResult, 0, len(matches))
 	for _, m := range matches {

If scoreBuf was intended for deduplication or lookup purposes, consider adding a comment explaining its purpose or implementing the intended logic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// xếp hạng và áp dụng boosts
scoreBuf := s.scorePool.Get().([]int)
defer func() {
// reset buffer trước khi trả lại pool
for i := range scoreBuf {
scoreBuf[i] = math.MinInt
}
s.scorePool.Put(scoreBuf)
}()
for i, nameNorm := range s.FilenamesOnly {
// Thay vì: runesName := []rune(nameNorm)
// Ta kiểm tra độ dài bằng len() byte trước cho nhanh (sơ loại)
if len(nameNorm) < queryLen {
continue
}
// So sánh với phần đầu của filename
targetStr1 := fastSubstring(nameNorm, queryLen)
// Nếu sau khi cắt mà độ dài vẫn ngắn hơn query (do ký tự utf8) thì bỏ
if len(targetStr1) < len(queryNorm) { // so sánh byte length ok vì đã normalized
continue
}
dist := LevenshteinRatio(queryNorm, targetStr1)
// So sánh thêm 1 ký tự (phòng trường hợp typo thêm ký tự)
if len(nameNorm) > len(targetStr1) {
// Lấy prefix dài hơn 1 rune
targetStr2 := fastSubstring(nameNorm, queryLen+1)
d2 := LevenshteinRatio(queryNorm, targetStr2)
if d2 < dist {
dist = d2
}
}
/*
phần trên dụ như "mian", target 1 "main" target 2 "maina"
Ta tính điểm target 1, dist = d1 = 2, nhưng target 2, dist = d2 = 3
if d2 < dist {
dist = d2
}
Tức nếu nhỏ hơn cái d1 thì lấy, còn không thì giữ nguyên
Kiểu như min(d1, d2)
*/
// Nếu điểm sai chính tả nhỏ hơn ngưỡng cho phép thì tính điểm
// Robust solution khi sai chính tả đi quá xa (hoặc nếu không thì mong bạn có thể mở PR hỗ trợ mình)
if dist <= baseThreshold {
// Base score 3000
score := 3000 - (dist * 400)
runeCountName := 0
for range nameNorm {
runeCountName++
}
lenDiff := runeCountName - queryLen
if lenDiff > 0 {
score -= (lenDiff * 15) // Phạt độ dài tên
}
// Thưởng exact
if lenDiff == 0 && dist == 0 {
score += 1000
}
// Phạt độ dài đường dẫn
score -= len(s.Originals[i]) / 5
for _, m := range matches {
scoreBuf[m.Index] = m.Score
}
rankedResults := make([]MatchResult, 0, len(matches))
for _, m := range matches {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn.go` around lines 140 - 152, The code allocates and fills scoreBuf
from s.scorePool.Get() but never reads it afterwards (scoreBuf,
s.scorePool.Get(), matches), causing wasted work; either remove the scoreBuf
allocation/population block entirely or actually use scoreBuf in the subsequent
ranking loop (replace direct reads of m.Score from matches with lookups into
scoreBuf by m.Index) and keep the buffer reset/Put logic if you retain the pool;
if scoreBuf was intended for deduplication/additional lookup, implement that
logic or add a clarifying comment next to scoreBuf to avoid unnecessary
allocation.

Comment thread fuzzyvn.go Outdated
Comment thread Makefile
Comment on lines +7 to +8
cli:
@cd demo && go run cli_search.go

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add cli to .PHONY.

Right now a file or directory named cli will make this target a no-op. gen has the same problem while you're touching the target list.

Possible fix
-.PHONY: demo test bench
+.PHONY: demo cli test bench gen
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 7 - 8, The Makefile defines targets named cli and gen
that are not listed in .PHONY, so an existing file/dir named cli or gen will
prevent those targets from running; add cli and gen to the .PHONY declaration
(alongside existing phony targets) to ensure they're always executed, updating
the .PHONY line to include "cli" and "gen" so targets cli and gen (and any
others already present) are treated as phony.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
fuzzyvn.go (1)

205-223: Consider capping typo scores to avoid zero or low scores for valid matches.

For long queries (e.g., 40 chars), threshold becomes 10, and if dist == threshold, the score becomes 100 - 10*10 = 0. While technically correct, a zero-scored typo match may be unexpectedly filtered or ranked poorly.

♻️ Proposed fix - ensure minimum positive score
 		if dist <= threshold {
+			score := 100 - dist*10
+			if score < 10 {
+				score = 10 // ensure typo matches always have positive relevance
+			}
 			matches = append(matches, core.FuzzyMatch{
 				Index: i,
-				Score: 100 - dist*10, // điểm typo thấp hơn điểm fuzzy
+				Score: score,
 			})
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fuzzyvn.go` around lines 205 - 223, The findButTypo function can produce zero
or very low Score values for long queries because Score is computed as 100 -
dist*10; change the scoring so typo matches are floored to a sensible minimum
(e.g., at least 1 or 5/10) instead of allowing 0 or negative. Locate
findButTypo, and after computing dist and threshold, compute the rawScore (100 -
dist*10) then clamp it with a minScore constant (e.g., minScore := 10) before
assigning to core.FuzzyMatch.Score so every valid typo match has a positive,
non-zero score.
bench_linux_test.go (1)

43-53: Avoid fmt.Printf in benchmarks; use b.Logf instead.

Using fmt.Printf unconditionally prints output on every benchmark run, cluttering results. b.Logf only prints when -v is passed or on failure, which is the idiomatic approach.

♻️ Proposed fix
-	fmt.Printf("\n--- Linux Benchmark Check ---\n")
-	fmt.Printf("Total files scanned: %d\n", len(files))
+	b.Logf("\n--- Linux Benchmark Check ---")
+	b.Logf("Total files scanned: %d", len(files))
 	if len(files) < limit {
 		b.Fatalf("ERROR: Hệ thống chỉ có %d files, không đủ %d để benchmark", len(files), limit)
 	}
 	
-	fmt.Println("Sample (first 10 files):")
+	b.Log("Sample (first 10 files):")
 	for i := 0; i < 10; i++ {
-		fmt.Printf("  [%d] %s\n", i+1, files[i])
+		b.Logf("  [%d] %s", i+1, files[i])
 	}
-	fmt.Printf("-----------------------------\n")
+	b.Log("-----------------------------")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench_linux_test.go` around lines 43 - 53, Replace the direct fmt.* prints in
the Linux benchmark block with the testing.B logger so output only appears on -v
or failure: change fmt.Printf("\n--- Linux Benchmark Check ---\n"),
fmt.Printf("Total files scanned: %d\n", ...), fmt.Println("Sample (first 10
files):"), and the loop's fmt.Printf lines to b.Logf(...) (keep the same
messages/formatting), and replace the final
fmt.Printf("-----------------------------\n") with b.Logf as well; locate these
calls in the benchmark function in bench_linux_test.go and use b.Logf to emit
the same text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bench_linux_test.go`:
- Around line 43-53: Replace the direct fmt.* prints in the Linux benchmark
block with the testing.B logger so output only appears on -v or failure: change
fmt.Printf("\n--- Linux Benchmark Check ---\n"), fmt.Printf("Total files
scanned: %d\n", ...), fmt.Println("Sample (first 10 files):"), and the loop's
fmt.Printf lines to b.Logf(...) (keep the same messages/formatting), and replace
the final fmt.Printf("-----------------------------\n") with b.Logf as well;
locate these calls in the benchmark function in bench_linux_test.go and use
b.Logf to emit the same text.

In `@fuzzyvn.go`:
- Around line 205-223: The findButTypo function can produce zero or very low
Score values for long queries because Score is computed as 100 - dist*10; change
the scoring so typo matches are floored to a sensible minimum (e.g., at least 1
or 5/10) instead of allowing 0 or negative. Locate findButTypo, and after
computing dist and threshold, compute the rawScore (100 - dist*10) then clamp it
with a minScore constant (e.g., minScore := 10) before assigning to
core.FuzzyMatch.Score so every valid typo match has a positive, non-zero score.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ccad7bce-e3d8-403a-a027-495799168793

📥 Commits

Reviewing files that changed from the base of the PR and between e3dce6d and 820a740.

📒 Files selected for processing (6)
  • .github/workflows/release.yml
  • bench_linux_test.go
  • demo/cli_search.go
  • demo/main.go
  • fuzzyvn.go
  • go.mod
✅ Files skipped from review due to trivial changes (2)
  • go.mod
  • demo/cli_search.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/release.yml
  • demo/main.go

@versenilvis versenilvis merged commit b82d8df into main Apr 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant