Skip to content

refactor(hgraph): create hgraph directory and split cpp files#2030

Open
LHT129 wants to merge 13 commits into
mainfrom
opencode/hgraph-refactor-helper
Open

refactor(hgraph): create hgraph directory and split cpp files#2030
LHT129 wants to merge 13 commits into
mainfrom
opencode/hgraph-refactor-helper

Conversation

@LHT129
Copy link
Copy Markdown
Collaborator

@LHT129 LHT129 commented May 11, 2026

Summary

This PR is a refactor-only change for HGraph file organization. It does not intentionally change HGraph runtime logic or algorithm behavior.

Stage 1: Move HGraph files into a dedicated directory

  • Create src/algorithm/hgraph/
  • Move hgraph.h/cpp, hgraph_parameter.*, and hgraph_parameter_test.cpp into the new directory
  • Add src/algorithm/hgraph/CMakeLists.txt
  • Update include paths and parent CMake wiring for the new layout

Stage 2: Split hgraph.cpp into focused implementation files

  • hgraph.cpp: remaining core methods
  • hgraph_build.cpp: build-related methods
  • hgraph_modify.cpp: modify-related methods
  • hgraph_search.cpp: search-related methods
  • hgraph_serialize.cpp: serialize / deserialize related methods
  • hgraph_param_mapping.cpp: parameter mapping related methods

Build-system and layout adjustments included in this refactor

  • Add the new HGraph object-library CMake target under src/algorithm/hgraph/
  • Remove duplicate test target wiring from the subdirectory
  • Update affected include paths to match the new directory structure

Result

  • src/algorithm/hgraph.cpp is split into smaller implementation units under src/algorithm/hgraph/
  • Current src/algorithm/hgraph/hgraph.cpp is reduced to 689 lines from the original 2694 lines
  • A single public header, src/algorithm/hgraph/hgraph.h, is still kept for HGraph declarations

Testing

  • Build verified successfully with ninja
  • Format check passing

File Structure

src/algorithm/hgraph/
├── CMakeLists.txt
├── hgraph.h
├── hgraph.cpp
├── hgraph_build.cpp
├── hgraph_modify.cpp
├── hgraph_search.cpp
├── hgraph_serialize.cpp
├── hgraph_param_mapping.cpp
├── hgraph_parameter.h
├── hgraph_parameter.cpp
└── hgraph_parameter_test.cpp

Copilot AI review requested due to automatic review settings May 11, 2026 07:30
@mergify mergify Bot added the module/index label May 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 11, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

@LHT129 LHT129 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/1.0 labels May 11, 2026
@LHT129 LHT129 force-pushed the opencode/hgraph-refactor-helper branch from 3fc0a8c to 20e6078 Compare May 11, 2026 07:34
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the HGraph implementation by extracting its core logic into specialized classes: HGraphBuilder, HGraphModifier, and HGraphSerializer. This change involves moving serialization, building, and modification logic into separate files and updating include paths across the codebase. The review identified several critical issues, including a potential heap buffer overflow in HGraphModifier::UpdateVector, incorrect handling of non-float data types in both the builder and modifier, an unused variable, and a missing ShrinkToFit call for the label table.

Comment thread src/algorithm/hgraph/hgraph_modifier.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_builder.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_builder.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_modifier.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_modifier.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_modifier.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the HGraph implementation by moving it into a dedicated src/algorithm/hgraph/ directory, extracting serialization into a helper (HGraphSerializer), and introducing initial helper frameworks for building and modifying HGraph while updating include paths and build wiring accordingly.

Changes:

  • Updated include paths across the codebase to reflect the new algorithm/hgraph/ layout.
  • Extracted HGraph serialization/deserialization + memory-usage detail logic into HGraphSerializer and delegated HGraph::{Serialize,Deserialize,GetMemoryUsageDetail} to it.
  • Added new helper class scaffolding (HGraphBuilder, HGraphModifier) and hooked the new hgraph object library into CMake.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/index/index_impl_test.cpp Update HGraph include path to new directory layout.
src/factory/index_creators.cpp Update HGraph include path for index factory wiring.
src/analyzer/hgraph_analyzer.h Update HGraph include path for analyzer.
src/analyzer/analyzer.h Update HGraph include path used by analyzer interfaces.
src/algorithm/pyramid_zparameters.h Update include path for HGraphParameter header.
src/algorithm/ivf_partition/ivf_nearest_partition.cpp Update HGraph include path from ivf_partition code.
src/algorithm/inner_index_interface.cpp Update include path for HGraph from algorithm root compilation unit.
src/algorithm/inner_index_interface_test.cpp Update include path for HGraph in tests.
src/algorithm/hgraph/hgraph.h Add friend hooks for helpers; remove in-class serialization method declarations; adjust InnerIndexInterface include.
src/algorithm/hgraph/hgraph.cpp Delegate serialization/deserialization/memory-usage detail to HGraphSerializer.
src/algorithm/hgraph/hgraph_serializer.h New helper API for HGraph serialization logic.
src/algorithm/hgraph/hgraph_serializer.cpp New extracted implementation for HGraph serialization logic.
src/algorithm/hgraph/hgraph_parameter.h Adjust includes to match new folder layout.
src/algorithm/hgraph/hgraph_parameter.cpp New compilation unit for HGraph parameter implementation (moved/split).
src/algorithm/hgraph/hgraph_parameter_test.cpp Update include path to local hgraph.h.
src/algorithm/hgraph/hgraph_modifier.h New helper API for modification operations (remove/update/recover).
src/algorithm/hgraph/hgraph_modifier.cpp New helper implementation for modification operations.
src/algorithm/hgraph/hgraph_builder.h New helper API for training/build/add/resize/optimize operations.
src/algorithm/hgraph/hgraph_builder.cpp New helper implementation for builder operations.
src/algorithm/hgraph/CMakeLists.txt New object library target for hgraph submodule sources.
src/algorithm/CMakeLists.txt Add hgraph/ subdirectory + include hgraph object library in algorithm libs list.

Comment thread src/algorithm/hgraph/hgraph_builder.h Outdated
Comment thread src/algorithm/hgraph/hgraph_modifier.h Outdated
Comment thread src/algorithm/hgraph/hgraph_builder.cpp Outdated
- Create src/algorithm/hgraph/ directory
- Move hgraph.h/cpp, hgraph_parameter.* to hgraph/
- Create hgraph/CMakeLists.txt with OBJECT library
- Update parent CMakeLists.txt: add_subdirectory(hgraph), add hgraph to ALGORITHM_LIBS
- Fix all include paths:
  - External: "hgraph.h" -> "hgraph/hgraph.h", "algorithm/hgraph.h" -> "algorithm/hgraph/hgraph.h"
  - Internal: "inner_index_interface.h" -> "../inner_index_interface.h", etc.

Stage 1 complete: directory creation and file migration.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
@LHT129 LHT129 self-assigned this May 11, 2026
@LHT129 LHT129 force-pushed the opencode/hgraph-refactor-helper branch from 20e6078 to 7f6b792 Compare May 11, 2026 08:19
@LHT129 LHT129 changed the title refactor(hgraph): file split with helper pattern refactor(hgraph): create hgraph directory and migrate files May 11, 2026
- hgraph.cpp (1982 lines): core methods (constructor, search, serialize, etc.)
- hgraph_build.cpp (405 lines): Train, Build, Add, resize, elp_optimize
- hgraph_modify.cpp (374 lines): Remove, UpdateAttribute, UpdateVector, etc.

Stage 2 complete: cpp file split (single header hgraph.h kept unchanged)

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 11, 2026 08:34
@LHT129 LHT129 changed the title refactor(hgraph): create hgraph directory and migrate files refactor(hgraph): create hgraph directory and split cpp files May 11, 2026
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/algorithm/hgraph/hgraph_modify.cpp:367

  • codes is assigned but never used. This is dead code and can also trigger unused-variable warnings (and potentially break builds when ENABLE_WERROR is enabled). Remove the variable or use it for the update call if it’s intended to select which codes get updated.

Comment thread src/algorithm/hgraph/CMakeLists.txt Outdated
Comment thread src/algorithm/hgraph/hgraph_build.cpp
@LHT129 LHT129 force-pushed the opencode/hgraph-refactor-helper branch from 382a17c to bf75adf Compare May 11, 2026 09:12
- Extract KnnSearch (3 overloads) including IteratorContext version
- Extract search_one_graph (2 template methods)
- Extract RangeSearch
- Extract SearchWithRequest
- Add necessary includes and helper function make_empty_dataset_with_stats
- hgraph.cpp reduced from 1982 to 1459 lines (26% reduction)

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 11, 2026 09:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 8 comments.

Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/hgraph/hgraph_modify.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_modify.cpp
Comment thread src/algorithm/hgraph/hgraph_modify.cpp
Comment thread src/algorithm/hgraph/hgraph_modify.cpp Outdated
Comment thread src/algorithm/hgraph/CMakeLists.txt Outdated
LHT129 added 2 commits May 11, 2026 18:25
- Extract serialize_basic_info_v0_14, deserialize_basic_info_v0_14
- Extract serialize_basic_info, deserialize_basic_info (with macros)
- Extract serialize_label_info, deserialize_label_info
- Extract Serialize, Deserialize, GetMemoryUsageDetail
- Include necessary serialization macros (TO_JSON_BASE64, FROM_JSON, FROM_JSON_BASE64)
- Add required includes: storage/serialization.h, storage/stream_reader.h
- hgraph.cpp reduced from 1453 to 1124 lines (23% reduction)

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
…ng.cpp

- Extract map_hgraph_param (405 lines) - large external-to-internal param mapping
- Extract CheckAndMappingExternalParam (29 lines) - validate and map external params
- Includes static ConstParamMap mapping table (~200 lines)
- Includes JSON template string for default parameters (~100 lines)
- hgraph.cpp reduced from 1124 to 689 lines (39% reduction, total 74% from original)

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 12, 2026 03:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.

Comment thread src/algorithm/hgraph/hgraph_search.cpp
Comment thread src/algorithm/hgraph/hgraph_param_mapping.cpp
Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/hgraph/hgraph_build.cpp
Comment thread src/algorithm/inner_index_interface_test.cpp Outdated
Change include from algorithm/hgraph.h to algorithm/hgraph/hgraph.h
after directory migration in previous commit.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
@mergify mergify Bot added the module/tools label May 12, 2026
- Add missing semicolon after CHECK_ARGUMENT in hgraph_search.cpp
- Fix JSON placeholder: HGRAPH_USE_ENV_OPTIMIZER -> HGRAPH_USE_ELP_OPTIMIZER_KEY
- Remove unused codes variable in hgraph_modify.cpp
- Fix unused exception variable: use const reference without name
- Fix typo in comments: ture -> true
- Add [[maybe_unused]] to unused mode parameter
- Add missing <fmt/format.h> include
- Fix type mismatch: LabelType -> int64_t for local index in inner_ids

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 12, 2026 09:07
@LHT129
Copy link
Copy Markdown
Collaborator Author

LHT129 commented May 12, 2026

Thank you for the detailed review! I have addressed most of the issues in commit 99831b0:

Fixed Issues:

  1. ✅ Added missing semicolon after CHECK_ARGUMENT in hgraph_search.cpp:284
  2. ✅ Fixed JSON placeholder: HGRAPH_USE_ENV_OPTIMIZER → HGRAPH_USE_ELP_OPTIMIZER_KEY
  3. ✅ Removed unused codes variable in hgraph_modify.cpp:353
  4. ✅ Fixed unused exception variable: changed to catch const reference without name
  5. ✅ Fixed typo in comments: ture → true
  6. ✅ Added [[maybe_unused]] to unused mode parameter
  7. ✅ Added missing <fmt/format.h> include
  8. ✅ Fixed type mismatch: LabelType → int64_t for local index in inner_ids

Clarifications:

  • The extra_infos variable at hgraph_build.cpp:66 is actually used at line 206, not unused
  • The Vector<int8_t> cast issue is existing code requiring deeper refactor beyond this PR scope
  • All include paths now use algorithm/hgraph/hgraph.h format consistently
  • file(GLOB) pattern matches existing practice in codebase (hnswlib/CMakeLists.txt)

Please review the fixes. Ready to resolve addressed comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

src/algorithm/hgraph/hgraph.h:22

  • This header uses a relative include path ("../inner_index_interface.h"). Other algorithm headers include this as "inner_index_interface.h" (e.g., src/algorithm/ivf.h:25, src/algorithm/pyramid.h:29). Using ".." paths makes includes more brittle when include directories change; consider switching to the established include style used elsewhere in src/algorithm/.
    src/algorithm/hgraph/hgraph_parameter.h:21
  • This header uses relative includes ("../index_search_parameter.h", "../inner_index_parameter.h"). The rest of src/algorithm/ typically includes these without ".." from configured include dirs. Consider aligning with the existing include style to avoid brittle relative paths.

Comment thread src/algorithm/hgraph/hgraph_modify.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_serialize.cpp
Comment thread src/algorithm/hgraph/hgraph_serialize.cpp
Comment thread src/algorithm/hgraph/hgraph_param_mapping.cpp
The parent CMakeLists.txt already collects all *_test.cpp into algorithm_test,
so hgraph_test causes duplicate compilation of test files.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
@LHT129
Copy link
Copy Markdown
Collaborator Author

LHT129 commented May 12, 2026

Additional clarifications for remaining review comments:

Already Fixed in latest commit (0d8dd12):

  • hgraph_test library duplication - Removed hgraph_test definition from CMakeLists.txt to prevent duplicate compilation

False Positive (code already exists):

  • label_table_->ShrinkToFit - Already called at hgraph_modify.cpp:190, not missing

Out of Scope (existing code issues):

The following issues exist in the original hgraph.cpp on main branch and are not introduced by this refactoring PR:

  • heap buffer overflow (Vector<int8_t> cast to float*) - Existing code requiring deeper refactor
  • GetFloat32Vectors for non-float types - Existing issue in build_by_odescent
  • Pointer arithmetic on vectors - Existing issue
  • Invalid pointer cast for CalcDistanceById - Existing issue
  • Loop index type (signed/unsigned mismatch) - Existing code pattern
  • extra_infos unused in build_by_odescent - Exists on main branch too
  • file(GLOB) pattern - Consistent with other CMakeLists in repo (hnswlib)

Files no longer exist:

  • hgraph_builder.h, hgraph_modifier.h - These helper class headers were removed in earlier refactoring; we use direct method splitting instead

This PR focuses on file organization and splitting. The existing code issues should be addressed in separate PRs to avoid scope creep.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 13, 2026 03:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 8 comments.

Comment thread src/algorithm/hgraph/hgraph_search.cpp
Comment thread src/algorithm/hgraph/hgraph_serialize.cpp
Comment thread src/algorithm/hgraph/hgraph_serialize.cpp
Comment thread src/algorithm/hgraph/hgraph_serialize.cpp Outdated
Comment thread src/algorithm/hgraph/hgraph_param_mapping.cpp
Comment thread src/algorithm/hgraph/hgraph_param_mapping.cpp
Comment thread src/algorithm/hgraph/hgraph_search.cpp
Comment thread src/algorithm/hgraph/hgraph_search.cpp Outdated
LHT129 added 2 commits May 13, 2026 16:16
Keep this PR scoped to file split refactoring only by reverting follow-up changes that altered behavior or semantics in build, modify, param mapping, and serialization paths.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 14, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/index module/tools size/XXL version/1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants