fix(broker): report served tool/prompt count on upstream status error path#1165
fix(broker): report served tool/prompt count on upstream status error path#1165namansh70747 wants to merge 1 commit into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughIn ChangessetStatus error-path count fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes broker status reporting so that when an upstream goes unhealthy, the MCP manager reports the tool/prompt counts that are actually being served (instead of leaving stale counts from the last healthy cycle). This aligns the Ready=False state with the TotalTools/TotalPrompts values the controller surfaces on MCPServerRegistration.
Changes:
- Update
MCPManager.setStatus()to populateTotalTools/TotalPromptson the error path by reading the currently served cached sets undertoolsLock. - Add a unit test covering both error scenarios: tools removed (counts drop to 0) vs transient list failure (counts remain accurate).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
internal/broker/upstream/manager.go |
On setStatus error path, report served tool/prompt counts under toolsLock to avoid stale status values. |
internal/broker/upstream/manager_test.go |
Adds coverage to ensure error-path status reports 0 after removal and preserves counts when cached tools remain served. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/broker/upstream/manager_test.go`:
- Around line 418-429: The test in the "keeps count when tools are still served"
function validates that TotalTools count is preserved during a transient error,
but does not validate the same behavior for TotalPrompts. Add initialization of
manager.prompts with a slice containing a specific number of elements (similar
to how manager.tools is initialized) and add an assertion to verify that
manager.status.TotalPrompts matches the expected count after calling setStatus,
ensuring the prompt count is also preserved when errors occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8c810d80-ac1a-48b9-9b3f-36446277225a
📒 Files selected for processing (2)
internal/broker/upstream/manager.gointernal/broker/upstream/manager_test.go
| t.Run("keeps count when tools are still served", func(t *testing.T) { | ||
| mock := newMockMCP("test-server", "test_") | ||
| manager, err := NewUpstreamMCPManager(mock, newMockToolsAdderDeleter(), nil, logger, 0, mcpv1alpha1.InvalidToolPolicyFilterOut) | ||
| require.NoError(t, err) | ||
| // a transient tools/list error leaves the previously served set in place | ||
| manager.tools = make([]mcp.Tool, 3) | ||
|
|
||
| manager.setStatus(fmt.Errorf("list failed"), 3, 0, nil, nil) | ||
|
|
||
| assert.False(t, manager.status.Ready) | ||
| assert.Equal(t, 3, manager.status.TotalTools, "still-served tools should keep their count") | ||
| }) |
There was a problem hiding this comment.
Cover prompt-count behavior in transient-error case
Line 418-429 validates only TotalTools, but the fixed branch also updates TotalPrompts. Seed manager.prompts and assert manager.status.TotalPrompts to lock this contract down.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/broker/upstream/manager_test.go` around lines 418 - 429, The test in
the "keeps count when tools are still served" function validates that TotalTools
count is preserved during a transient error, but does not validate the same
behavior for TotalPrompts. Add initialization of manager.prompts with a slice
containing a specific number of elements (similar to how manager.tools is
initialized) and add an assertion to verify that manager.status.TotalPrompts
matches the expected count after calling setStatus, ensuring the prompt count is
also preserved when errors occur.
|
oops, this was closed prematurely - I think this is fine as a follow on to #1187 when that lands. keeping open |
… path When an upstream connection or ping fails, manage() removes the server's tools and prompts and then calls setStatus to mark it not ready. setStatus returned early on the error path without updating TotalTools/TotalPrompts, so the status kept the count from the last healthy cycle. The controller surfaces that value as status.discoveredTools (the Tools printer column), so kubectl get mcpserverregistration showed a server as not ready while still listing its old tool count. Report the count actually being served on the error path: 0 when the tools were removed (connect/ping failure, rejected server) and the cached count when a transient tools/list error leaves the previously served set in place. Signed-off-by: Naman Sharma <namsh70747@gmail.com>
78565c0 to
9300947
Compare
|
referencing #1164 as the linked bug — that one's still also addressed coderabbit's comment: seeded |
Fixes #1164
When an upstream connection or ping fails,
manage()removes the server's tools and prompts and then callssetStatusto mark it not ready.setStatusreturned early on the error path without updatingTotalTools/TotalPrompts, so the status kept the count from the last healthy cycle. The controller surfaces that value asstatus.discoveredTools(theToolsprinter column), sokubectl get mcpserverregistrationshowed a server asReady=Falsewhile still listing its old tool count.This reports the count actually being served on the error path:
0when the tools were removed (connect/ping failure, rejected server)tools/listerror leaves the previously served set in place (defaultFilterOutpolicy), so a momentary list blip doesn't wrongly report 0 while tools are still servedThe read is guarded by the existing
toolsLock, which nosetStatuscaller holds, so there's no deadlock. It lines up with the success path, wheretoolCount == len(man.tools).Test:
TestMCPManager_setStatus_ErrorReportsServedCountcovers both cases (removed → 0, still served → keeps count). It fails on the old behaviour and passes with the fix.Noted on the issue that the MCPServerRegistration/Status endpoint coupling is changing soon — happy for this to be superseded by that work; sending it as a small interim fix.
Summary by CodeRabbit