refactor(hgraph): create hgraph directory and split cpp files#2030
refactor(hgraph): create hgraph directory and split cpp files#2030LHT129 wants to merge 13 commits into
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
|
3fc0a8c to
20e6078
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the HGraph implementation by extracting its core logic into specialized classes: HGraphBuilder, HGraphModifier, and HGraphSerializer. This change involves moving serialization, building, and modification logic into separate files and updating include paths across the codebase. The review identified several critical issues, including a potential heap buffer overflow in HGraphModifier::UpdateVector, incorrect handling of non-float data types in both the builder and modifier, an unused variable, and a missing ShrinkToFit call for the label table.
There was a problem hiding this comment.
Pull request overview
This PR refactors the HGraph implementation by moving it into a dedicated src/algorithm/hgraph/ directory, extracting serialization into a helper (HGraphSerializer), and introducing initial helper frameworks for building and modifying HGraph while updating include paths and build wiring accordingly.
Changes:
- Updated include paths across the codebase to reflect the new
algorithm/hgraph/layout. - Extracted HGraph serialization/deserialization + memory-usage detail logic into
HGraphSerializerand delegatedHGraph::{Serialize,Deserialize,GetMemoryUsageDetail}to it. - Added new helper class scaffolding (
HGraphBuilder,HGraphModifier) and hooked the new hgraph object library into CMake.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/index/index_impl_test.cpp | Update HGraph include path to new directory layout. |
| src/factory/index_creators.cpp | Update HGraph include path for index factory wiring. |
| src/analyzer/hgraph_analyzer.h | Update HGraph include path for analyzer. |
| src/analyzer/analyzer.h | Update HGraph include path used by analyzer interfaces. |
| src/algorithm/pyramid_zparameters.h | Update include path for HGraphParameter header. |
| src/algorithm/ivf_partition/ivf_nearest_partition.cpp | Update HGraph include path from ivf_partition code. |
| src/algorithm/inner_index_interface.cpp | Update include path for HGraph from algorithm root compilation unit. |
| src/algorithm/inner_index_interface_test.cpp | Update include path for HGraph in tests. |
| src/algorithm/hgraph/hgraph.h | Add friend hooks for helpers; remove in-class serialization method declarations; adjust InnerIndexInterface include. |
| src/algorithm/hgraph/hgraph.cpp | Delegate serialization/deserialization/memory-usage detail to HGraphSerializer. |
| src/algorithm/hgraph/hgraph_serializer.h | New helper API for HGraph serialization logic. |
| src/algorithm/hgraph/hgraph_serializer.cpp | New extracted implementation for HGraph serialization logic. |
| src/algorithm/hgraph/hgraph_parameter.h | Adjust includes to match new folder layout. |
| src/algorithm/hgraph/hgraph_parameter.cpp | New compilation unit for HGraph parameter implementation (moved/split). |
| src/algorithm/hgraph/hgraph_parameter_test.cpp | Update include path to local hgraph.h. |
| src/algorithm/hgraph/hgraph_modifier.h | New helper API for modification operations (remove/update/recover). |
| src/algorithm/hgraph/hgraph_modifier.cpp | New helper implementation for modification operations. |
| src/algorithm/hgraph/hgraph_builder.h | New helper API for training/build/add/resize/optimize operations. |
| src/algorithm/hgraph/hgraph_builder.cpp | New helper implementation for builder operations. |
| src/algorithm/hgraph/CMakeLists.txt | New object library target for hgraph submodule sources. |
| src/algorithm/CMakeLists.txt | Add hgraph/ subdirectory + include hgraph object library in algorithm libs list. |
- Create src/algorithm/hgraph/ directory - Move hgraph.h/cpp, hgraph_parameter.* to hgraph/ - Create hgraph/CMakeLists.txt with OBJECT library - Update parent CMakeLists.txt: add_subdirectory(hgraph), add hgraph to ALGORITHM_LIBS - Fix all include paths: - External: "hgraph.h" -> "hgraph/hgraph.h", "algorithm/hgraph.h" -> "algorithm/hgraph/hgraph.h" - Internal: "inner_index_interface.h" -> "../inner_index_interface.h", etc. Stage 1 complete: directory creation and file migration. Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
20e6078 to
7f6b792
Compare
- hgraph.cpp (1982 lines): core methods (constructor, search, serialize, etc.) - hgraph_build.cpp (405 lines): Train, Build, Add, resize, elp_optimize - hgraph_modify.cpp (374 lines): Remove, UpdateAttribute, UpdateVector, etc. Stage 2 complete: cpp file split (single header hgraph.h kept unchanged) Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/algorithm/hgraph/hgraph_modify.cpp:367
codesis assigned but never used. This is dead code and can also trigger unused-variable warnings (and potentially break builds whenENABLE_WERRORis enabled). Remove the variable or use it for the update call if it’s intended to select which codes get updated.
382a17c to
bf75adf
Compare
- Extract KnnSearch (3 overloads) including IteratorContext version - Extract search_one_graph (2 template methods) - Extract RangeSearch - Extract SearchWithRequest - Add necessary includes and helper function make_empty_dataset_with_stats - hgraph.cpp reduced from 1982 to 1459 lines (26% reduction) Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
- Extract serialize_basic_info_v0_14, deserialize_basic_info_v0_14 - Extract serialize_basic_info, deserialize_basic_info (with macros) - Extract serialize_label_info, deserialize_label_info - Extract Serialize, Deserialize, GetMemoryUsageDetail - Include necessary serialization macros (TO_JSON_BASE64, FROM_JSON, FROM_JSON_BASE64) - Add required includes: storage/serialization.h, storage/stream_reader.h - hgraph.cpp reduced from 1453 to 1124 lines (23% reduction) Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
…ng.cpp - Extract map_hgraph_param (405 lines) - large external-to-internal param mapping - Extract CheckAndMappingExternalParam (29 lines) - validate and map external params - Includes static ConstParamMap mapping table (~200 lines) - Includes JSON template string for default parameters (~100 lines) - hgraph.cpp reduced from 1124 to 689 lines (39% reduction, total 74% from original) Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Change include from algorithm/hgraph.h to algorithm/hgraph/hgraph.h after directory migration in previous commit. Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
- Add missing semicolon after CHECK_ARGUMENT in hgraph_search.cpp - Fix JSON placeholder: HGRAPH_USE_ENV_OPTIMIZER -> HGRAPH_USE_ELP_OPTIMIZER_KEY - Remove unused codes variable in hgraph_modify.cpp - Fix unused exception variable: use const reference without name - Fix typo in comments: ture -> true - Add [[maybe_unused]] to unused mode parameter - Add missing <fmt/format.h> include - Fix type mismatch: LabelType -> int64_t for local index in inner_ids Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
|
Thank you for the detailed review! I have addressed most of the issues in commit 99831b0: Fixed Issues:
Clarifications:
Please review the fixes. Ready to resolve addressed comments. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
src/algorithm/hgraph/hgraph.h:22
- This header uses a relative include path ("../inner_index_interface.h"). Other algorithm headers include this as "inner_index_interface.h" (e.g., src/algorithm/ivf.h:25, src/algorithm/pyramid.h:29). Using ".." paths makes includes more brittle when include directories change; consider switching to the established include style used elsewhere in src/algorithm/.
src/algorithm/hgraph/hgraph_parameter.h:21 - This header uses relative includes ("../index_search_parameter.h", "../inner_index_parameter.h"). The rest of src/algorithm/ typically includes these without ".." from configured include dirs. Consider aligning with the existing include style to avoid brittle relative paths.
The parent CMakeLists.txt already collects all *_test.cpp into algorithm_test, so hgraph_test causes duplicate compilation of test files. Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
|
Additional clarifications for remaining review comments: Already Fixed in latest commit (0d8dd12):
False Positive (code already exists):
Out of Scope (existing code issues):The following issues exist in the original
Files no longer exist:
This PR focuses on file organization and splitting. The existing code issues should be addressed in separate PRs to avoid scope creep. |
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Keep this PR scoped to file split refactoring only by reverting follow-up changes that altered behavior or semantics in build, modify, param mapping, and serialization paths. Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Summary
This PR is a refactor-only change for HGraph file organization. It does not intentionally change HGraph runtime logic or algorithm behavior.
Stage 1: Move HGraph files into a dedicated directory
src/algorithm/hgraph/hgraph.h/cpp,hgraph_parameter.*, andhgraph_parameter_test.cppinto the new directorysrc/algorithm/hgraph/CMakeLists.txtStage 2: Split
hgraph.cppinto focused implementation fileshgraph.cpp: remaining core methodshgraph_build.cpp: build-related methodshgraph_modify.cpp: modify-related methodshgraph_search.cpp: search-related methodshgraph_serialize.cpp: serialize / deserialize related methodshgraph_param_mapping.cpp: parameter mapping related methodsBuild-system and layout adjustments included in this refactor
src/algorithm/hgraph/Result
src/algorithm/hgraph.cppis split into smaller implementation units undersrc/algorithm/hgraph/src/algorithm/hgraph/hgraph.cppis reduced to 689 lines from the original 2694 linessrc/algorithm/hgraph/hgraph.h, is still kept for HGraph declarationsTesting
ninjaFile Structure