fix: terminate label_propagation on bipartite / weight-symmetric graphs#1456
Open
kaihirota wants to merge 1 commit into
Open
fix: terminate label_propagation on bipartite / weight-symmetric graphs#1456kaihirota wants to merge 1 commit into
kaihirota wants to merge 1 commit into
Conversation
The synchronous label propagation loop in
`graphiti_core.utils.maintenance.community_operations.label_propagation`
(and its duplicate in `graphiti_core.driver.operations.graph_utils`)
reads labels from a snapshot and writes to a parallel map, then swaps.
On bipartite or weight-symmetric graphs this is mathematically prone to
oscillation: two label assignments can flip-flop between iterations
because no node ever sees a neighbour's updated label within the same
pass. The loop has no iteration cap, so `client.build_communities()`
never returns on certain inputs.
Reproduction: a 5-node graph X-Y(2), X-Z(2), Y-A(1), Z-B(1) — a shape
that arises naturally from `add_episode` over rich, dated prose with a
hub entity and competing role claims — drives the synchronous LPA into
an indefinite oscillation. py-spy stack of a hung run shows the main
thread spending 100% CPU in `label_propagation` for over an hour with
no forward progress.
Fix:
- Switch to **asynchronous** LPA: read and write to `community_map` in
place so later nodes within the same iteration see earlier nodes'
fresh labels. Async LPA still has no convergence proof on adversarial
inputs, but it converges in practice on graphs that arise from
realistic `add_episode` corpora.
- Add a `max_iterations=100` keyword arg as the hard termination
guarantee. When the cap is reached, log a warning so callers can see
that the returned clustering may not be a fixed point.
- Replace the asymmetric tiebreak with a two-rule deterministic
procedure: if the current community is among the candidates of
highest total weight, keep it (self-stickiness — preserves the
previous tiebreak's intent of avoiding gratuitous label churn on
stable graphs); otherwise pick the smallest community ID among
highest-weight candidates. Output is reproducible across runs.
The new keyword arg has a default, so all existing call sites
(the four database drivers under
`graphiti_core/driver/{neo4j,falkordb,kuzu,neptune}/operations/graph_ops.py`
and the legacy fallback in `community_operations.get_community_clusters`)
continue to work unchanged.
Tests at `tests/utils/maintenance/test_community_operations.py` cover
the oscillating regression case (asserts termination + node coverage),
output invariants (every input UUID appears exactly once), singleton
nodes, an isolated node alongside a connected component, complete
graph collapse, two-disjoint-component preservation, and run-to-run
determinism. Each test is parametrized over both copies of
`label_propagation` so any future divergence between them is caught.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The synchronous label propagation loop in
graphiti_core.utils.maintenance.community_operations.label_propagation(and itsduplicate in
graphiti_core.driver.operations.graph_utils) reads labels from asnapshot and writes to a parallel map, then swaps. On bipartite or
weight-symmetric graphs this is mathematically prone to oscillation: two label
assignments can flip-flop between iterations because no node ever sees a
neighbour's updated label within the same pass. The loop has no iteration cap,
so
client.build_communities()never returns on certain inputs.This PR closes four separate reports of the same root cause:
edge_count=2minimal pure-Python repro)Reproduction: a 5-node weighted undirected graph
— a shape that arises naturally from
add_episodeover rich, dated prose witha hub entity and competing role claims — drives the synchronous LPA into an
indefinite oscillation. py-spy stack of a hung run shows the main thread
spending 100% CPU in
label_propagationfor over an hour with no forwardprogress.
Fix:
community_mapin place solater nodes within the same iteration see earlier nodes' fresh labels. Async
LPA still has no convergence proof on adversarial inputs, but it converges in
practice on graphs that arise from realistic
add_episodecorpora.max_iterations=100keyword arg as the hard termination guarantee.When the cap is reached, log a warning so callers can see that the returned
clustering may not be a fixed point.
the current community is among the candidates of highest total weight, keep
it (self-stickiness — preserves the previous tiebreak's intent of avoiding
gratuitous label churn on stable graphs); otherwise pick the smallest
community ID among highest-weight candidates. Output is reproducible across
runs.
The new keyword arg has a default, so all existing call sites (the four
database drivers under
graphiti_core/driver/{neo4j,falkordb,kuzu,neptune}/operations/graph_ops.pyand the legacy fallback in
community_operations.get_community_clusters)continue to work unchanged.
Tests at
tests/utils/maintenance/test_community_operations.pycover theoscillating regression case (asserts termination + node coverage), output
invariants (every input UUID appears exactly once), singleton nodes, an
isolated node alongside a connected component, complete graph collapse,
two-disjoint-component preservation, and run-to-run determinism. Each test is
parametrized over both copies of
label_propagationso any future divergencebetween them is caught.
Relationship to #1388
#1388 is an alternative fix for the same bug, also using async LPA but with an
oscillation/cycle detector instead of a hard iteration cap, and a larger diff
(+734/-69 vs +334/-45 here). This PR offers a smaller, more targeted
alternative: async LPA +
max_iterations=100cap + deterministic tiebreak,with parametrized tests over both copies of
label_propagation. Maintainersshould pick whichever shape they prefer — happy to close this in favor of
#1388, or vice versa.
Type of Change
Testing
Breaking Changes
Checklist
make lintpasses)Related Issues
Closes #402
Closes #1355
Closes #1397
Closes #1400