Skip to content

fix: use BUFFERED request body mode for reliable ClearRouteCache#1201

Open
maleck13 wants to merge 1 commit into
Kuadrant:mainfrom
maleck13:fix/hairpin-session-flake
Open

fix: use BUFFERED request body mode for reliable ClearRouteCache#1201
maleck13 wants to merge 1 commit into
Kuadrant:mainfrom
maleck13:fix/hairpin-session-flake

Conversation

@maleck13

@maleck13 maleck13 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

STREAMED body mode causes Envoy to silently ignore header mutations and ClearRouteCache in the ext_proc body phase. This led to intermittent (~15-25%) routing failures where hairpin initialize requests were handled by the broker instead of the backend MCP server.

BUFFERED mode guarantees the route cache clear takes effect before Envoy selects an upstream. Limited to 1MB request body (Envoy default) which is sufficient for MCP JSON-RPC payloads.

Also adds x-client-id header to broker and lazyinit requests, and improves server2 test server logging.

Summary by CodeRabbit

  • Bug Fixes

    • Improved request handling so routing mutations are applied more reliably, reducing intermittent tools/call routing failures.
    • Added default client identifiers to outbound MCP requests for more consistent tracking and handling.
  • Documentation

    • Expanded routing guidance with details on request body buffering and its impact on routing behavior and request size limits.
  • Chores

    • Enhanced server-side logging for MCP requests and responses to make session and client activity easier to trace.

STREAMED body mode causes Envoy to silently ignore header mutations
and ClearRouteCache in the ext_proc body phase. This led to
intermittent (~15-25%) routing failures where hairpin initialize
requests were handled by the broker instead of the backend MCP server.

BUFFERED mode guarantees the route cache clear takes effect before
Envoy selects an upstream. Limited to 1MB request body (Envoy default)
which is sufficient for MCP JSON-RPC payloads.

Also adds x-client-id header to broker and lazyinit requests, and
improves server2 test server logging.

Signed-off-by: craig <cbrookes@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR switches Envoy ext_proc request-body handling to BUFFERED in generated and checked-in config, adds x-client-id headers for broker and lazy-init MCP requests, updates routing documentation, and expands test-server logging for MCP calls and /mcp responses.

Changes

Buffered ext_proc routing and client headers

Layer / File(s) Summary
Buffered ext_proc mode
internal/controller/mcpgatewayextension_controller.go, config/istio/envoyfilter.yaml, docs/design/routing.md
The generated EnvoyFilter and checked-in Istio config now use BUFFERED request-body mode, and the routing doc describes its effect on header mutations, ClearRouteCache, and request-size limits.
MCP client headers
internal/broker/upstream/mcp.go, internal/clients/clients.go
NewUpstreamMCP adds x-client-id: broker, and Initialize(...) sets passThroughHeaders["x-client-id"] to lazyinit before creating the hairpin client.
Server call logging
internal/tests/server2/server2.go
RunServer adds before-any and on-success MCP hooks, and /mcp now uses logResponse with a response recorder to log status plus request/response session IDs and X-Client-Id.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Kuadrant/mcp-gateway#802: Touches the same Envoy ext_proc request-body mode setting in config/istio/envoyfilter.yaml and buildEnvoyFilter, but switches it in the opposite direction.

Suggested labels

review-effort/medium, high-risk

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change to switch ext_proc request bodies to BUFFERED for reliable ClearRouteCache.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added high-risk Touches concurrency, auth, sessions, CRDs, ext_proc, or routing review-effort/medium Medium review effort (3): few files, moderate logic labels Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/clients/clients.go`:
- Line 97: The assignment to passThroughHeaders in Initialize can panic when the
map argument is nil. Add a defensive nil guard before setting the x-client-id
entry, and only write the lazyinit value when passThroughHeaders is non-nil; use
the Initialize function and the passThroughHeaders map name to locate the
change.

In `@internal/tests/server2/server2.go`:
- Around line 225-253: responseRecorder is missing http.Flusher forwarding,
which breaks StreamableHTTPServer SSE streaming when logResponse wraps the
handler. Update responseRecorder to preserve streaming behavior by forwarding
Flush to the underlying ResponseWriter when it implements http.Flusher, and
ensure logResponse still uses responseRecorder without interfering with
streaming responses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 006be1ed-f7bd-4cd9-840b-00ffa976e6e9

📥 Commits

Reviewing files that changed from the base of the PR and between 8e81b9d and e12f95e.

📒 Files selected for processing (7)
  • config/istio/envoyfilter.yaml
  • docs/design/routing.md
  • internal/broker/upstream/mcp.go
  • internal/clients/clients.go
  • internal/controller/mcpgatewayextension_controller.go
  • internal/mcp-router/response_handlers.go
  • internal/tests/server2/server2.go
💤 Files with no reviewable changes (1)
  • internal/mcp-router/response_handlers.go


url := buildHairpinURL(gatewayHost, mcpPath)
hairpinHTTPClient := hairpinClientPool.Get(conf.Hostname)
passThroughHeaders["x-client-id"] = "lazyinit"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# find call sites of clients.Initialize and inspect the map arg
rg -nP -C4 '\bInitialize\s*\(' --type=go -g '!**/*_test.go'

Repository: Kuadrant/mcp-gateway

Length of output: 158


Defensive guard for passThroughHeaders assignment

The write at line 97 passThroughHeaders["x-client-id"] = "lazyinit" will panic if the argument is nil. While the current caller in request_handlers.go initializes the map, Initialize is exported and any future caller passing nil will crash the service.

Add a nil check to prevent runtime panics:

Defensive guard
 	url := buildHairpinURL(gatewayHost, mcpPath)
 	hairpinHTTPClient := hairpinClientPool.Get(conf.Hostname)
+	if passThroughHeaders == nil {
+		passThroughHeaders = map[string]string{}
+	}
 	passThroughHeaders["x-client-id"] = "lazyinit"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
passThroughHeaders["x-client-id"] = "lazyinit"
if passThroughHeaders == nil {
passThroughHeaders = map[string]string{}
}
passThroughHeaders["x-client-id"] = "lazyinit"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/clients/clients.go` at line 97, The assignment to passThroughHeaders
in Initialize can panic when the map argument is nil. Add a defensive nil guard
before setting the x-client-id entry, and only write the lazyinit value when
passThroughHeaders is non-nil; use the Initialize function and the
passThroughHeaders map name to locate the change.

Comment on lines +225 to +253
type responseRecorder struct {
http.ResponseWriter
status int
}

func (r *responseRecorder) WriteHeader(code int) {
r.status = code
r.ResponseWriter.WriteHeader(code)
}

func logResponse(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rec := &responseRecorder{ResponseWriter: w, status: http.StatusOK}
next.ServeHTTP(rec, r)
reqSession := r.Header.Get("Mcp-Session-Id")
if reqSession == "" {
reqSession = "-"
}
respSession := rec.Header().Get("Mcp-Session-Id")
if respSession == "" {
respSession = "-"
}
clientID := r.Header.Get("X-Client-Id")
if clientID == "" {
clientID = "-"
}
log.Printf("%s %s %d req-session=%s resp-session=%s x-client-id=%s", r.Method, r.URL.Path, rec.status, reqSession, respSession, clientID)
})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# check mcp-go streamable server for Flusher usage
fd -t f -e go . | xargs rg -nP 'http\.Flusher|\.Flush\(\)' -g '*streamable*' 2>/dev/null
rg -nP 'http\.Flusher|\.Flush\(\)' $(go env GOMODCACHE 2>/dev/null)/github.com/mark3labs/mcp-go* 2>/dev/null | head

Repository: Kuadrant/mcp-gateway

Length of output: 158


responseRecorder breaks streaming by missing http.Flusher support

responseRecorder embeds http.ResponseWriter but lacks a Flush() method. The underlying StreamableHTTPServer requires http.Flusher for SSE responses; wrapping it causes streaming calls to stall or fail.

Fix: Forward `Flush`
 func (r *responseRecorder) WriteHeader(code int) {
 	r.status =code
 	r.ResponseWriter.WriteHeader(code)
+}
+
+func (r *responseRecorder) Flush() {
+	if f, ok := r.ResponseWriter.(http.Flusher); ok {
+		f.Flush()
+	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type responseRecorder struct {
http.ResponseWriter
status int
}
func (r *responseRecorder) WriteHeader(code int) {
r.status = code
r.ResponseWriter.WriteHeader(code)
}
func logResponse(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rec := &responseRecorder{ResponseWriter: w, status: http.StatusOK}
next.ServeHTTP(rec, r)
reqSession := r.Header.Get("Mcp-Session-Id")
if reqSession == "" {
reqSession = "-"
}
respSession := rec.Header().Get("Mcp-Session-Id")
if respSession == "" {
respSession = "-"
}
clientID := r.Header.Get("X-Client-Id")
if clientID == "" {
clientID = "-"
}
log.Printf("%s %s %d req-session=%s resp-session=%s x-client-id=%s", r.Method, r.URL.Path, rec.status, reqSession, respSession, clientID)
})
}
type responseRecorder struct {
http.ResponseWriter
status int
}
func (r *responseRecorder) WriteHeader(code int) {
r.status = code
r.ResponseWriter.WriteHeader(code)
}
func (r *responseRecorder) Flush() {
if f, ok := r.ResponseWriter.(http.Flusher); ok {
f.Flush()
}
}
func logResponse(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rec := &responseRecorder{ResponseWriter: w, status: http.StatusOK}
next.ServeHTTP(rec, r)
reqSession := r.Header.Get("Mcp-Session-Id")
if reqSession == "" {
reqSession = "-"
}
respSession := rec.Header().Get("Mcp-Session-Id")
if respSession == "" {
respSession = "-"
}
clientID := r.Header.Get("X-Client-Id")
if clientID == "" {
clientID = "-"
}
log.Printf("%s %s %d req-session=%s resp-session=%s x-client-id=%s", r.Method, r.URL.Path, rec.status, reqSession, respSession, clientID)
})
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/tests/server2/server2.go` around lines 225 - 253, responseRecorder
is missing http.Flusher forwarding, which breaks StreamableHTTPServer SSE
streaming when logResponse wraps the handler. Update responseRecorder to
preserve streaming behavior by forwarding Flush to the underlying ResponseWriter
when it implements http.Flusher, and ensure logResponse still uses
responseRecorder without interfering with streaming responses.

@jasonmadigan jasonmadigan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. the docs CI failure looks like something that was broken on main already - found because router.md was touched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high-risk Touches concurrency, auth, sessions, CRDs, ext_proc, or routing review-effort/medium Medium review effort (3): few files, moderate logic

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants