Skip to content

feat(core): make semaphore concurrency configurable per Graphiti instance#1472

Open
prasmussen15 wants to merge 1 commit into
mainfrom
feat/configurable-semaphore-limit
Open

feat(core): make semaphore concurrency configurable per Graphiti instance#1472
prasmussen15 wants to merge 1 commit into
mainfrom
feat/configurable-semaphore-limit

Conversation

@prasmussen15
Copy link
Copy Markdown
Collaborator

Summary

Today the concurrency cap used by every semaphore_gather call inside graphiti_core is sourced from the SEMAPHORE_LIMIT env var. That makes the limit process-wide and impossible to vary per Graphiti instance — different graph clients in the same process are forced to share one global cap, even though they may have very different LLM tiers, latency budgets, or workload shapes.

Graphiti(max_coroutines=...) was already the documented override, but it only flowed to ~5 of the 25+ semaphore_gather call sites. Everywhere else (bulk ingest, dedup, edge resolution, attribute extraction, community building, search fan-out) silently fell back to the env var.

This PR makes the constructor parameter the actual primary mechanism, with the env var demoted to a backwards-compatible default.

Changes

  • GraphitiClients gains max_coroutines: int | None = None. Graphiti.__init__ now passes max_coroutines=self.max_coroutines into the GraphitiClients it constructs, so every helper that already takes clients can read the per-instance limit.
  • Helpers that don't take clients (retrieve_previous_episodes_bulk, build_communities, build_community, get_community_clusters, _extract_entity_summaries_batch) gain a max_coroutines: int | None = None parameter, forwarded from Graphiti.
  • semaphore_gather call sites in graphiti.py, bulk_utils.py, node_operations.py, edge_operations.py, community_operations.py, and search/search.py now forward the per-instance limit. Reads use getattr(clients, 'max_coroutines', None) — the same defensive pattern already used for clients.tracer — so duck-typed SimpleNamespace mocks in the test suite keep working.
  • helpers.py introduces a named DEFAULT_SEMAPHORE_LIMIT = 20 constant. SEMAPHORE_LIMIT = int(os.getenv('SEMAPHORE_LIMIT', DEFAULT_SEMAPHORE_LIMIT)) keeps the env var as a fallback default, with a comment explaining that constructor / GraphitiClients.max_coroutines is now the recommended override.
  • Graphiti.__init__ docstring for max_coroutines is updated to describe the new precedence: constructor → env var → built-in default.

Precedence

After this change:

  1. Graphiti(max_coroutines=N) — wins everywhere.
  2. SEMAPHORE_LIMIT env var — fallback when the constructor value is None.
  3. DEFAULT_SEMAPHORE_LIMIT (20) — built-in default when neither is set.

Backwards compatibility

  • The SEMAPHORE_LIMIT env var continues to work as before for users who rely on it.
  • No existing public signatures were removed; new parameters are keyword-only with None defaults.
  • Test fixtures using SimpleNamespace / model_construct are unaffected because reads are guarded with getattr.

Tests

  • make format / make lint (ruff, pyright on graphiti_core): clean.
  • pytest tests/utils/ tests/helpers_test.py -k 'not _int': 160 passed.
  • pytest tests/embedder tests/llm_client tests/cross_encoder tests/driver -k 'not _int': 127 passed.
  • DB-fixture-dependent suites (tests/test_graphiti_mock.py, tests/test_add_triplet.py) were not run end-to-end here — they require live Neo4j/FalkorDB/Redis and exhibit the same connection errors on main without that infrastructure.

🤖 Generated with Claude Code

…ance

The concurrency cap for `semaphore_gather` was previously sourced
exclusively from the `SEMAPHORE_LIMIT` environment variable, which is
process-wide and difficult to vary per client. The `Graphiti(max_coroutines=...)`
constructor parameter already existed but only propagated to a handful
of `semaphore_gather` call sites, so most ingestion / dedup paths still
used the env-var default regardless of the constructor value.

This change:

- Adds `max_coroutines: int | None = None` to `GraphitiClients` so the
  constructor value flows to every helper that already receives
  `clients`.
- Plumbs `max_coroutines` through helpers that take a bare `driver` /
  `llm_client` (`retrieve_previous_episodes_bulk`, `build_communities`,
  `build_community`, `get_community_clusters`,
  `_extract_entity_summaries_batch`).
- Forwards `self.max_coroutines` from `Graphiti` to those helpers and
  to its own `semaphore_gather` calls.
- Reads the value via `getattr(clients, 'max_coroutines', None)` (the
  same defensive pattern used for `clients.tracer`) so existing
  duck-typed mocks in the test suite continue to work.
- Renames the in-code default to `DEFAULT_SEMAPHORE_LIMIT = 20`.
  `SEMAPHORE_LIMIT` is still respected as a fallback for backwards
  compatibility but is no longer the primary configuration mechanism.
- Updates the `Graphiti.__init__` docstring to describe the new
  precedence (constructor > env var > built-in default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant