perf(hgraph): skip mutex acquisition on immutable index search#2154
perf(hgraph): skip mutex acquisition on immutable index search#2154jfeng18 wants to merge 3 commits into
Conversation
When an HGraph index is marked immutable via SetImmutable(), no writer (Add/Build/Remove/Tune) can modify shared state. The search paths previously acquired global_mutex_ and force_remove_mutex_ unconditionally, causing unnecessary atomic CAS cache-line contention across all search threads on read-only indexes. This change skips both locks when immutable_ is true, eliminating the contention on high-concurrency search workloads. Additionally fixes two pre-existing correctness issues: - immutable_ was a plain bool read without synchronization from search threads while written under exclusive lock by SetImmutable(). Changed to std::atomic<bool> with acquire/release semantics for ARM safety. - SetImmutable() now acquires add_mutex_ before global_mutex_ to prevent a TOCTOU race where Tune() could pass CHECK_IMMUTABLE_INDEX, then SetImmutable() completes, then search skips the lock while Tune() modifies basic_flatten_codes_. Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com> Assisted-by: Claude:claude-opus-4-6
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
|
There was a problem hiding this comment.
Code Review
This pull request makes the index immutability check thread-safe by converting immutable_ to an atomic boolean and using acquire-release memory ordering. It also optimizes search paths by skipping global lock acquisitions when the index is immutable. The review feedback highlights a critical TOCTOU race condition in HGraph::Tune() where the index could still be modified after being marked immutable, and suggests re-checking the immutability status after acquiring add_mutex_. Additionally, the reviewer recommends using std::defer_lock for std::shared_lock initialization in search methods to avoid move-assignment overhead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| std::scoped_lock<std::shared_mutex> add_lock(this->add_mutex_); | ||
| std::scoped_lock<std::shared_mutex> wlock(this->global_mutex_); |
There was a problem hiding this comment.
While acquiring add_mutex_ in SetImmutable() is intended to prevent a TOCTOU race with Tune(), it is currently insufficient because CHECK_IMMUTABLE_INDEX is checked outside of any lock in IndexImpl::Tune().
If Tune() passes the check before SetImmutable() runs, it will block on add_mutex_ inside HGraph::Tune(). Once SetImmutable() completes and releases the locks, Tune() will proceed to modify the index even though it has been made immutable. Since search threads skip acquiring global_mutex_ when immutable_ is true, this will lead to concurrent reads and writes on basic_flatten_codes_ and other member variables, causing a severe data race and undefined behavior.
To completely resolve this, HGraph::Tune() must re-check immutable_ after acquiring add_mutex_:
std::scoped_lock lock(this->add_mutex_);
if (this->immutable_.load(std::memory_order_acquire)) {
return false;
}There was a problem hiding this comment.
Thanks for catching this! Already addressed in commit f895451 — Tune() now re-checks immutable_ after acquiring add_mutex_:
std::scoped_lock lock(this->add_mutex_);
if (this->immutable_.load(std::memory_order_acquire)) {
return false;
}| std::shared_lock<std::shared_mutex> force_remove_rlock; | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| std::shared_lock<std::shared_mutex> shared_lock; | ||
| if (!this->immutable_.load(std::memory_order_acquire)) { | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| } | ||
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | ||
| } |
There was a problem hiding this comment.
Using std::defer_lock is more idiomatic and efficient than default-constructing std::shared_lock and then move-assigning a temporary lock object inside the conditional block. This avoids the overhead of move construction/assignment.
| std::shared_lock<std::shared_mutex> force_remove_rlock; | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| std::shared_lock<std::shared_mutex> shared_lock; | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| } | |
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | |
| } | |
| std::shared_lock<std::shared_mutex> force_remove_rlock(this->force_remove_mutex_, std::defer_lock); | |
| std::shared_lock<std::shared_mutex> shared_lock(this->global_mutex_, std::defer_lock); | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock.lock(); | |
| } | |
| shared_lock.lock(); | |
| } |
There was a problem hiding this comment.
Thanks for the suggestion. We intentionally use default-construct + conditional move-assign because force_remove_rlock is only conditionally associated with its mutex (when support_force_remove() is true). Applying defer_lock to it would unconditionally bind it to force_remove_mutex_, which is semantically imprecise. The current approach handles both lock variables uniformly and is correct (default-constructed shared_lock destructor is a no-op).
| std::shared_lock<std::shared_mutex> force_remove_rlock; | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| std::shared_lock<std::shared_mutex> shared_lock; | ||
| if (!this->immutable_.load(std::memory_order_acquire)) { | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| } | ||
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | ||
| } |
There was a problem hiding this comment.
Using std::defer_lock is more idiomatic and efficient than default-constructing std::shared_lock and then move-assigning a temporary lock object inside the conditional block. This avoids the overhead of move construction/assignment.
| std::shared_lock<std::shared_mutex> force_remove_rlock; | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| std::shared_lock<std::shared_mutex> shared_lock; | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| } | |
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | |
| } | |
| std::shared_lock<std::shared_mutex> force_remove_rlock(this->force_remove_mutex_, std::defer_lock); | |
| std::shared_lock<std::shared_mutex> shared_lock(this->global_mutex_, std::defer_lock); | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock.lock(); | |
| } | |
| shared_lock.lock(); | |
| } |
There was a problem hiding this comment.
Thanks for the suggestion. We intentionally use default-construct + conditional move-assign because force_remove_rlock is only conditionally associated with its mutex (when support_force_remove() is true). Applying defer_lock to it would unconditionally bind it to force_remove_mutex_, which is semantically imprecise. The current approach handles both lock variables uniformly and is correct (default-constructed shared_lock destructor is a no-op).
| std::shared_lock<std::shared_mutex> force_remove_rlock; | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| std::shared_lock<std::shared_mutex> shared_lock; | ||
| if (!this->immutable_.load(std::memory_order_acquire)) { | ||
| if (this->support_force_remove()) { | ||
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | ||
| } | ||
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | ||
| } |
There was a problem hiding this comment.
Using std::defer_lock is more idiomatic and efficient than default-constructing std::shared_lock and then move-assigning a temporary lock object inside the conditional block. This avoids the overhead of move construction/assignment.
| std::shared_lock<std::shared_mutex> force_remove_rlock; | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| std::shared_lock<std::shared_mutex> shared_lock; | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock = std::shared_lock<std::shared_mutex>(this->force_remove_mutex_); | |
| } | |
| shared_lock = std::shared_lock<std::shared_mutex>(this->global_mutex_); | |
| } | |
| std::shared_lock<std::shared_mutex> force_remove_rlock(this->force_remove_mutex_, std::defer_lock); | |
| std::shared_lock<std::shared_mutex> shared_lock(this->global_mutex_, std::defer_lock); | |
| if (!this->immutable_.load(std::memory_order_acquire)) { | |
| if (this->support_force_remove()) { | |
| force_remove_rlock.lock(); | |
| } | |
| shared_lock.lock(); | |
| } |
There was a problem hiding this comment.
Thanks for the suggestion. We intentionally use default-construct + conditional move-assign because force_remove_rlock is only conditionally associated with its mutex (when support_force_remove() is true). Applying defer_lock to it would unconditionally bind it to force_remove_mutex_, which is semantically imprecise. The current approach handles both lock variables uniformly and is correct (default-constructed shared_lock destructor is a no-op).
Prevents a TOCTOU race where Tune() passes CHECK_IMMUTABLE_INDEX before SetImmutable() runs, then proceeds to modify basic_flatten_codes_ after SetImmutable() completes, while search threads have already stopped acquiring global_mutex_. Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com> Assisted-by: Claude:claude-opus-4-6
|
Hi maintainers, could you please add |
@jfeng18 Hi! Thanks for your contribution! I have already added labels. Please check the comments from the AI reviewers. |
|
Hi @jfeng18, the Format check has failed. Please format the codes via |
Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com> Assisted-by: Claude:claude-opus-4-6
|
Hi @wxyucs, thanks for adding the labels! I've addressed all review comments:
All CI checks should be green now. |
Summary
global_mutex_andforce_remove_mutex_acquisition in all three HGraph search paths (KnnSearch, RangeSearch, SearchWithRequest) when the index is immutableimmutable_frombooltostd::atomic<bool>with acquire/release semantics for ARM correctnessSetImmutable()acquireadd_mutex_beforeglobal_mutex_to prevent TOCTOU race withTune()Motivation
After
SetImmutable(), no writer can modify shared state (all mutations guarded byCHECK_IMMUTABLE_INDEX). Yet every search unconditionally acquiredshared_lock(global_mutex_), causing atomic CAS cache-line contention across all search threads on read-only indexes.Test plan
🤖 Generated with Claude Code