fix(openresponses): bound resume-stream buffer and enforce response ownership#10569
Merged
Merged
Conversation
…wnership The background=true resumable-stream path had two latent issues. 1. Unbounded resume buffer. AppendEvent grew StreamEvents without limit, so a long-running or abandoned background generation could consume process memory without bound. The store now caps the buffer (event count and total bytes, mirroring llama.cpp's byte-capped slot ring), evicting oldest events from the front and advancing a droppedThrough watermark. GetEventsAfter returns ErrOffsetLost when the requested starting_after is below the watermark, and handleStreamResume surfaces that as HTTP 409 before committing to the SSE response, so a resuming client gets a clear error instead of a silently truncated stream. 2. Missing ownership check (IDOR). GET /responses/:id, its stream resume, and /cancel looked up responses purely by ID, letting any caller who knows or guesses an ID read or cancel another caller's response. Responses now carry the creating caller's identity (auth.GetUser), stamped at creation and compared on read/cancel/resume; a mismatch returns 404 (not 403) so existence is not leaked. Backward compatible: responses with no owner (single-key / no-auth deployments) remain accessible. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
mudler
approved these changes
Jun 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Two latent issues in the OpenResponses
background=trueresumable-stream path, found during a design review.1. Unbounded resume-stream buffer (memory risk)
AppendEventappended toStoredResponse.StreamEventswithout any limit. A long-running or abandoned background generation (especially one no client ever resumes) grew this buffer indefinitely and could exhaust process memory.The store now caps the per-response resume buffer by event count (default 8192) and total serialized bytes (default 64 MiB), mirroring llama.cpp's byte-capped slot ring. When a cap is exceeded the oldest events are evicted from the front (their payloads nil'd so they can be GC'd) and a
droppedThroughwatermark advances.Offset-lost handling:
GetEventsAfternow returns the sentinelErrOffsetLostwhen the requestedstarting_afteris below the dropped watermark, i.e. the events the client expects next were evicted.handleStreamResumechecks this before writing SSE headers and returns HTTP 409 with a clear message, so a resuming client gets an explicit error instead of a stream that silently skips the gap.Caps live on the store (defaults from package constants, lowerable in tests). A package constant is used for v1 rather than new config plumbing, per the small-PR scope.
2. Response-ownership check (latent IDOR)
GET /responses/:id, its?stream=trueresume, and/cancellooked up responses purely by ID with no ownership check, so any caller who knew or guessed an ID could read or cancel another caller's response/stream.Responses now carry the creating caller's identity (
auth.GetUser(c).ID), stamped at creation viastore.SetOwnerbefore the ID is handed back to the client, and compared on read / cancel / resume. On mismatch the handlers return 404 (not 403) so the existence of another caller's response is not leaked.Backward compatible: the check is gated on a non-empty owner. In single-key / no-auth deployments
auth.GetUserreturns nil, the owner is empty, and existing behavior is preserved (the response stays accessible).Notes for Reviewers
Both fixes are included. The ownership fix was assessed as small and clean before implementing:
auth.GetUser(c)(the same source the usage/billing middleware uses).StoreBackground+ the three synchronousStorecalls) already has the echo context, so owner stamping needed no new plumbing.Owner stringand aSetOwnersetter; the identity comparison (accessAllowed) and theauthdependency live in the handler layer.TDD:
store_test.gogains two Ginkgo specs - one that appends past a lowered cap and asserts the buffer stays bounded, oldest events are evicted, andGetEventsAfterbelow the watermark returnsErrOffsetLost; and one that asserts owner stamping + the allow/deny decision (including the empty-owner backward-compat path). Both fail before the change.gofmt,go vet,golangci-lint(new-from-merge-base), andgo test ./core/http/endpoints/openresponses/...all pass.Signed commits