embedder: guard against long inputs by chunking, batching and averaging by Yuyuyaka · Pull Request #1487 · getzep/graphiti

Yuyuyaka · 2026-05-12T20:12:34Z

Title: embedder: guard against long inputs by chunking, batching and averaging

Summary

This patch makes the OpenAI-based embedder resilient to very long text inputs
that previously triggered provider token-limit errors (causing 500s). It adds
character-based chunking, batching and averaging of per-chunk embeddings as a
safe default, and includes a minimal Dockerfile so maintainers can reproduce
and publish a patched image.

What changed

OpenAIEmbedder.create now:
- chunks long str inputs into roughly DEFAULT_MAX_INPUT_CHARS characters,
- embeds chunks in batches and averages resulting vectors into a single
  representation,
- falls back to a direct call if chunked embedding fails.
create_batch delegates to create() so each item is guarded.
Added docker/Dockerfile + build_image.sh for a reproducible local image.

Why

Long messages (> provider token limit) caused the upstream embedding API to
reject requests and Graphiti to return 500, which cascaded to OpenClaw gateway
timeouts. This change prevents oversized inputs from being forwarded directly
to the provider and provides a conservative, backwards-compatible default.

Testing done

Verified locally inside the running Graphiti container by copying the
patched openai.py and restarting: POST /get-memory with ~20k chars now
returns HTTP 200 and does not raise embedding errors.
Built a reproducible image locally via docker build (tag
graphiti-local:embedder-chunking-20260512).

How to review

Review the new openai.py implementation for correctness and style.
Consider whether averaging embeddings is the desired aggregation strategy or
whether the code should instead store multiple per-chunk embeddings and
change retrieval logic.

Notes for maintainers

This is a conservative, defensive change. Alternatives: per-chunk storage,
more sophisticated summarization before embedding, or token-aware chunking.
If accepted, please publish a patch release and update downstream images.

…ing + averaging)

…egation

Yuyuyaka added 2 commits May 12, 2026 23:12

embedder: guard OpenAI embedder against long inputs (chunking + batch…

3af1758

…ing + averaging)

embedder: token-aware chunking, retries/backoff and configurable aggr…

7468721

…egation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedder: guard against long inputs by chunking, batching and averaging#1487

embedder: guard against long inputs by chunking, batching and averaging#1487
Yuyuyaka wants to merge 2 commits into
getzep:mainfrom
Yuyuyaka:fix/embedder-chunking-20260512

Yuyuyaka commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yuyuyaka commented May 12, 2026

Summary

What changed

Why

Testing done

How to review

Notes for maintainers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant