Skip to content

embedder: guard against long inputs by chunking, batching and averaging#1487

Open
Yuyuyaka wants to merge 2 commits into
getzep:mainfrom
Yuyuyaka:fix/embedder-chunking-20260512
Open

embedder: guard against long inputs by chunking, batching and averaging#1487
Yuyuyaka wants to merge 2 commits into
getzep:mainfrom
Yuyuyaka:fix/embedder-chunking-20260512

Conversation

@Yuyuyaka
Copy link
Copy Markdown

Title: embedder: guard against long inputs by chunking, batching and averaging

Summary

This patch makes the OpenAI-based embedder resilient to very long text inputs
that previously triggered provider token-limit errors (causing 500s). It adds
character-based chunking, batching and averaging of per-chunk embeddings as a
safe default, and includes a minimal Dockerfile so maintainers can reproduce
and publish a patched image.

What changed

  • OpenAIEmbedder.create now:
    • chunks long str inputs into roughly DEFAULT_MAX_INPUT_CHARS characters,
    • embeds chunks in batches and averages resulting vectors into a single
      representation,
    • falls back to a direct call if chunked embedding fails.
  • create_batch delegates to create() so each item is guarded.
  • Added docker/Dockerfile + build_image.sh for a reproducible local image.

Why

Long messages (> provider token limit) caused the upstream embedding API to
reject requests and Graphiti to return 500, which cascaded to OpenClaw gateway
timeouts. This change prevents oversized inputs from being forwarded directly
to the provider and provides a conservative, backwards-compatible default.

Testing done

  • Verified locally inside the running Graphiti container by copying the
    patched openai.py and restarting: POST /get-memory with ~20k chars now
    returns HTTP 200 and does not raise embedding errors.
  • Built a reproducible image locally via docker build (tag
    graphiti-local:embedder-chunking-20260512).

How to review

  • Review the new openai.py implementation for correctness and style.
  • Consider whether averaging embeddings is the desired aggregation strategy or
    whether the code should instead store multiple per-chunk embeddings and
    change retrieval logic.

Notes for maintainers

  • This is a conservative, defensive change. Alternatives: per-chunk storage,
    more sophisticated summarization before embedding, or token-aware chunking.
  • If accepted, please publish a patch release and update downstream images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant