feat(core): make semaphore concurrency configurable per Graphiti instance#1472
Open
prasmussen15 wants to merge 1 commit into
Open
feat(core): make semaphore concurrency configurable per Graphiti instance#1472prasmussen15 wants to merge 1 commit into
prasmussen15 wants to merge 1 commit into
Conversation
…ance The concurrency cap for `semaphore_gather` was previously sourced exclusively from the `SEMAPHORE_LIMIT` environment variable, which is process-wide and difficult to vary per client. The `Graphiti(max_coroutines=...)` constructor parameter already existed but only propagated to a handful of `semaphore_gather` call sites, so most ingestion / dedup paths still used the env-var default regardless of the constructor value. This change: - Adds `max_coroutines: int | None = None` to `GraphitiClients` so the constructor value flows to every helper that already receives `clients`. - Plumbs `max_coroutines` through helpers that take a bare `driver` / `llm_client` (`retrieve_previous_episodes_bulk`, `build_communities`, `build_community`, `get_community_clusters`, `_extract_entity_summaries_batch`). - Forwards `self.max_coroutines` from `Graphiti` to those helpers and to its own `semaphore_gather` calls. - Reads the value via `getattr(clients, 'max_coroutines', None)` (the same defensive pattern used for `clients.tracer`) so existing duck-typed mocks in the test suite continue to work. - Renames the in-code default to `DEFAULT_SEMAPHORE_LIMIT = 20`. `SEMAPHORE_LIMIT` is still respected as a fallback for backwards compatibility but is no longer the primary configuration mechanism. - Updates the `Graphiti.__init__` docstring to describe the new precedence (constructor > env var > built-in default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Today the concurrency cap used by every
semaphore_gathercall insidegraphiti_coreis sourced from theSEMAPHORE_LIMITenv var. That makes the limit process-wide and impossible to vary perGraphitiinstance — different graph clients in the same process are forced to share one global cap, even though they may have very different LLM tiers, latency budgets, or workload shapes.Graphiti(max_coroutines=...)was already the documented override, but it only flowed to ~5 of the 25+semaphore_gathercall sites. Everywhere else (bulk ingest, dedup, edge resolution, attribute extraction, community building, search fan-out) silently fell back to the env var.This PR makes the constructor parameter the actual primary mechanism, with the env var demoted to a backwards-compatible default.
Changes
GraphitiClientsgainsmax_coroutines: int | None = None.Graphiti.__init__now passesmax_coroutines=self.max_coroutinesinto theGraphitiClientsit constructs, so every helper that already takesclientscan read the per-instance limit.clients(retrieve_previous_episodes_bulk,build_communities,build_community,get_community_clusters,_extract_entity_summaries_batch) gain amax_coroutines: int | None = Noneparameter, forwarded fromGraphiti.semaphore_gathercall sites ingraphiti.py,bulk_utils.py,node_operations.py,edge_operations.py,community_operations.py, andsearch/search.pynow forward the per-instance limit. Reads usegetattr(clients, 'max_coroutines', None)— the same defensive pattern already used forclients.tracer— so duck-typedSimpleNamespacemocks in the test suite keep working.helpers.pyintroduces a namedDEFAULT_SEMAPHORE_LIMIT = 20constant.SEMAPHORE_LIMIT = int(os.getenv('SEMAPHORE_LIMIT', DEFAULT_SEMAPHORE_LIMIT))keeps the env var as a fallback default, with a comment explaining that constructor /GraphitiClients.max_coroutinesis now the recommended override.Graphiti.__init__docstring formax_coroutinesis updated to describe the new precedence: constructor → env var → built-in default.Precedence
After this change:
Graphiti(max_coroutines=N)— wins everywhere.SEMAPHORE_LIMITenv var — fallback when the constructor value isNone.DEFAULT_SEMAPHORE_LIMIT(20) — built-in default when neither is set.Backwards compatibility
SEMAPHORE_LIMITenv var continues to work as before for users who rely on it.Nonedefaults.SimpleNamespace/model_constructare unaffected because reads are guarded withgetattr.Tests
make format/make lint(ruff, pyright ongraphiti_core): clean.pytest tests/utils/ tests/helpers_test.py -k 'not _int': 160 passed.pytest tests/embedder tests/llm_client tests/cross_encoder tests/driver -k 'not _int': 127 passed.tests/test_graphiti_mock.py,tests/test_add_triplet.py) were not run end-to-end here — they require live Neo4j/FalkorDB/Redis and exhibit the same connection errors onmainwithout that infrastructure.🤖 Generated with Claude Code