Skip to content

refactor: split datacell factory interfaces#2002

Open
LHT129 wants to merge 3 commits into
antgroup:mainfrom
LHT129:opencode/split-datacell-factory-interface
Open

refactor: split datacell factory interfaces#2002
LHT129 wants to merge 3 commits into
antgroup:mainfrom
LHT129:opencode/split-datacell-factory-interface

Conversation

@LHT129
Copy link
Copy Markdown
Collaborator

@LHT129 LHT129 commented May 8, 2026

Summary

  • move graph and flatten instance creation out of the core interface headers into dedicated factory headers and source files
  • update all production and test call sites to use the new graph and flatten factory helpers
  • reduce core header coupling by removing dataset_impl.h from inner_index_interface.h while keeping downstream behavior unchanged

Testing

  • make fmt
  • targeted compilation via compile_commands.json entries for updated datacell, algorithm, and index translation units

Move graph and flatten instance construction out of core interface headers so factory dependencies no longer fan out through widely included abstractions. This keeps datacell creation logic in dedicated factory units while preserving existing behavior and serialization paths.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 8, 2026 06:55
@LHT129 LHT129 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/1.0 labels May 8, 2026
@LHT129 LHT129 self-assigned this May 8, 2026
@mergify mergify Bot added the module/index label May 8, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the instantiation of FlattenInterface and GraphInterface by moving the factory logic from the interface classes into dedicated factory files (flatten_factory.cpp and graph_factory.cpp). This involves replacing static MakeInstance methods with standalone MakeFlattenInstance and MakeGraphInstance functions across the codebase. Feedback identifies a potential null pointer dereference in the graph factory and suggests refactoring the IO type selection logic in the flatten factory using a map to improve maintainability.

Comment thread src/datacell/graph_factory.cpp
Comment thread src/datacell/flatten_factory.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors datacell instance creation by extracting FlattenInterface/GraphInterface construction logic into dedicated factory headers/source files, reducing core header coupling and updating call sites accordingly.

Changes:

  • Introduce MakeFlattenInstance and MakeGraphInstance in new flatten_factory.* and graph_factory.* files.
  • Remove MakeInstance static constructors from flatten_interface.* and graph_interface.*.
  • Update production and test code to include the new factory headers and call the new factory functions.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/index/index_impl.h Adds dataset_impl.h include after decoupling it from inner_index_interface.h.
src/index/hnsw_test.cpp Switches test call sites to MakeFlattenInstance / MakeGraphInstance.
src/index/hnsw.h Reduces header coupling by dropping logger include.
src/index/hnsw.cpp Uses new factory functions and includes factory headers.
src/index/diskann.cpp Uses MakeFlattenInstance and includes flatten factory headers.
src/impl/pruning_strategy_test.cpp Updates tests to use new factory headers/functions.
src/impl/odescent/odescent_graph_builder_test.cpp Updates tests to use new factory headers/functions.
src/datacell/sparse_vector_datacell_test.cpp Updates tests to use MakeFlattenInstance.
src/datacell/sparse_graph_datacell_test.cpp Updates tests to use MakeGraphInstance and adds factory include.
src/datacell/graph_interface.h Removes static MakeInstance and related header dependencies.
src/datacell/graph_interface.cpp Removes factory implementation from interface TU.
src/datacell/graph_factory.h Adds new public graph factory API declaration.
src/datacell/graph_factory.cpp Implements graph factory logic formerly in GraphInterface::MakeInstance.
src/datacell/graph_datacell_test.cpp Updates tests to use MakeGraphInstance.
src/datacell/flatten_interface.h Removes static MakeInstance and related header dependencies.
src/datacell/flatten_interface.cpp Removes factory implementation from interface TU.
src/datacell/flatten_factory.h Adds new public flatten factory API declaration.
src/datacell/flatten_factory.cpp Implements flatten factory logic formerly in FlattenInterface::MakeInstance.
src/datacell/flatten_datacell_test.cpp Updates tests to use MakeFlattenInstance.
src/datacell/compressed_graph_datacell_test.cpp Updates tests to use MakeGraphInstance.
src/algorithm/warp.cpp Uses MakeFlattenInstance for code storage creation.
src/algorithm/pyramid.h Uses MakeFlattenInstance and includes factory headers.
src/algorithm/ivf.cpp Uses MakeFlattenInstance and includes the flatten factory header.
src/algorithm/inner_index_interface.h Removes dataset_impl.h to reduce coupling.
src/algorithm/hgraph.cpp Uses both new factory functions for codes/graph creation.
src/algorithm/brute_force.cpp Uses MakeFlattenInstance and includes factory header.

Comment on lines +23 to +24
FlattenInterfacePtr
MakeFlattenInstance(const FlattenInterfaceParamPtr& param, const IndexCommonParam& common_param);
Comment on lines +25 to +34
GraphInterfacePtr
MakeGraphInstance(const GraphInterfaceParamPtr& graph_param, const IndexCommonParam& common_param) {
switch (graph_param->graph_storage_type_) {
case GraphStorageTypes::GRAPH_STORAGE_TYPE_SPARSE:
return std::make_shared<SparseGraphDataCell>(graph_param, common_param);
case GraphStorageTypes::GRAPH_STORAGE_TYPE_VALUE_COMPRESSED:
return std::make_shared<CompressedGraphDataCell>(graph_param, common_param);
case GraphStorageTypes::GRAPH_STORAGE_TYPE_VALUE_FLAT:
auto io_string = std::dynamic_pointer_cast<GraphDataCellParameter>(graph_param)
->io_parameter_->GetTypeName();

#include <fmt/format.h>

#include "graph_factory.h"
@@ -13,204 +13,3 @@
// limitations under the License.

LHT129 added 2 commits May 8, 2026 15:41
Add the concrete parameter and common parameter headers at the translation units that still relied on transitive includes after the datacell factory split. This keeps the refactor intact while restoring CI builds across TSAN, CircleCI, and test targets.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Run the repository formatter after adding the direct include dependencies introduced by the factory split follow-up. This keeps the CI-only include fixes aligned with the project include ordering rules.

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copilot AI review requested due to automatic review settings May 8, 2026 08:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

Comment thread src/index/hnsw.cpp
Comment on lines 29 to +33
#include "datacell/flatten_datacell.h"
#include "datacell/flatten_datacell_parameter.h"
#include "datacell/flatten_factory.h"
#include "datacell/graph_datacell_parameter.h"
#include "datacell/graph_factory.h"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/index size/XL version/1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants