fix(broker): report served tool/prompt count on upstream status error path by namansh70747 · Pull Request #1165 · Kuadrant/mcp-gateway

namansh70747 · 2026-06-19T07:06:44Z

When an upstream connection or ping fails, manage() removes the server's tools and prompts and then calls setStatus to mark it not ready. setStatus returned early on the error path without updating TotalTools/TotalPrompts, so the status kept the count from the last healthy cycle. The controller surfaces that value as status.discoveredTools (the Tools printer column), so kubectl get mcpserverregistration showed a server as Ready=False while still listing its old tool count.

This reports the count actually being served on the error path:

0 when the tools were removed (connect/ping failure, rejected server)
the cached count when a transient tools/list error leaves the previously served set in place (default FilterOut policy), so a momentary list blip doesn't wrongly report 0 while tools are still served

The read is guarded by the existing toolsLock, which no setStatus caller holds, so there's no deadlock. It lines up with the success path, where toolCount == len(man.tools).

Test: TestMCPManager_setStatus_ErrorReportsServedCount covers both cases (removed → 0, still served → keeps count). It fails on the old behaviour and passes with the fix.

Noted on the issue that the MCPServerRegistration/Status endpoint coupling is changing soon — happy for this to be superseded by that work; sending it as a small interim fix.

Summary by CodeRabbit

Bug Fixes
- Improved accuracy of tool and prompt counts during error states, ensuring proper status reporting when the system encounters operational issues.

coderabbitai · 2026-06-19T07:06:58Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d79cb766-ee17-41cd-a2f3-9dfa264257de

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

In MCPManager.setStatus, the error path now reads len(man.tools) and len(man.prompts) under toolsLock read lock and assigns them to status.TotalTools/TotalPrompts before returning. A new test covers both the cleared-count and preserved-count scenarios.

Changes

setStatus error-path count fix

Layer / File(s)	Summary
Error-path count update and test `internal/broker/upstream/manager.go`, `internal/broker/upstream/manager_test.go`	`setStatus` now assigns `TotalTools`/`TotalPrompts` from the live cached lengths under `toolsLock` on the error path instead of leaving stale values. New test asserts counts clear to `0` when tools are removed and remain accurate on transient errors.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

review-effort/medium

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly describes the main change: updating setStatus to report accurate tool/prompt counts on error paths, matching the PR's core objective.
Linked Issues check	✅ Passed	Code changes fully address `#1164` by updating setStatus to report actual served counts (0 when removed, cached count when still served) under toolsLock protection.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the identified bug: setStatus error path logic and corresponding test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Copilot

Pull request overview

This PR fixes broker status reporting so that when an upstream goes unhealthy, the MCP manager reports the tool/prompt counts that are actually being served (instead of leaving stale counts from the last healthy cycle). This aligns the Ready=False state with the TotalTools/TotalPrompts values the controller surfaces on MCPServerRegistration.

Changes:

Update MCPManager.setStatus() to populate TotalTools/TotalPrompts on the error path by reading the currently served cached sets under toolsLock.
Add a unit test covering both error scenarios: tools removed (counts drop to 0) vs transient list failure (counts remain accurate).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`internal/broker/upstream/manager.go`	On `setStatus` error path, report served tool/prompt counts under `toolsLock` to avoid stale status values.
`internal/broker/upstream/manager_test.go`	Adds coverage to ensure error-path status reports 0 after removal and preserves counts when cached tools remain served.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/broker/upstream/manager_test.go`:
- Around line 418-429: The test in the "keeps count when tools are still served"
function validates that TotalTools count is preserved during a transient error,
but does not validate the same behavior for TotalPrompts. Add initialization of
manager.prompts with a slice containing a specific number of elements (similar
to how manager.tools is initialized) and add an assertion to verify that
manager.status.TotalPrompts matches the expected count after calling setStatus,
ensuring the prompt count is also preserved when errors occur.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c810d80-ac1a-48b9-9b3f-36446277225a

📥 Commits

Reviewing files that changed from the base of the PR and between f7dfb26 and 78565c0.

📒 Files selected for processing (2)

internal/broker/upstream/manager.go
internal/broker/upstream/manager_test.go

coderabbitai · 2026-06-19T07:09:09Z

+	t.Run("keeps count when tools are still served", func(t *testing.T) {
+		mock := newMockMCP("test-server", "test_")
+		manager, err := NewUpstreamMCPManager(mock, newMockToolsAdderDeleter(), nil, logger, 0, mcpv1alpha1.InvalidToolPolicyFilterOut)
+		require.NoError(t, err)
+		// a transient tools/list error leaves the previously served set in place
+		manager.tools = make([]mcp.Tool, 3)
+
+		manager.setStatus(fmt.Errorf("list failed"), 3, 0, nil, nil)
+
+		assert.False(t, manager.status.Ready)
+		assert.Equal(t, 3, manager.status.TotalTools, "still-served tools should keep their count")
+	})


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cover prompt-count behavior in transient-error case

Line 418-429 validates only TotalTools, but the fixed branch also updates TotalPrompts. Seed manager.prompts and assert manager.status.TotalPrompts to lock this contract down.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/broker/upstream/manager_test.go` around lines 418 - 429, The test in the "keeps count when tools are still served" function validates that TotalTools count is preserved during a transient error, but does not validate the same behavior for TotalPrompts. Add initialization of manager.prompts with a slice containing a specific number of elements (similar to how manager.tools is initialized) and add an assertion to verify that manager.status.TotalPrompts matches the expected count after calling setStatus, ensuring the prompt count is also preserved when errors occur.

namansh70747 · 2026-06-19T09:34:30Z

Thanks again for the steer on #1164, @maleck13 — put the small fix up here whenever you get a chance. No rush.

jasonmadigan · 2026-06-23T10:20:07Z

Thanks for filing the issue and the fix. This is part of a broader change we're planning internally (#629) which will remove the status syncing that causes this staleness. Since that work is already assigned to a maintainer, we'll address it there. Generally, if an issue is assigned it's best to check before sending a PR, saves duplicated effort. Appreciate you flagging it though.

oops, this was closed prematurely - I think this is fine as a follow on to #1187 when that lands. keeping open

… path When an upstream connection or ping fails, manage() removes the server's tools and prompts and then calls setStatus to mark it not ready. setStatus returned early on the error path without updating TotalTools/TotalPrompts, so the status kept the count from the last healthy cycle. The controller surfaces that value as status.discoveredTools (the Tools printer column), so kubectl get mcpserverregistration showed a server as not ready while still listing its old tool count. Report the count actually being served on the error path: 0 when the tools were removed (connect/ping failure, rejected server) and the cached count when a transient tools/list error leaves the previously served set in place. Signed-off-by: Naman Sharma <namsh70747@gmail.com>

namansh70747 · 2026-06-23T16:40:43Z

referencing #1164 as the linked bug — that one's still triage/needs-triage; if accepting it unblocks the triage/needs-issue label here, happy to wait. otherwise let me know if a separate issue would be cleaner.

also addressed coderabbit's comment: seeded manager.prompts and added the TotalPrompts assertion in the transient-error sub-test to lock that contract down too.

Copilot AI review requested due to automatic review settings June 19, 2026 07:06

Copilot started reviewing on behalf of namansh70747 June 19, 2026 07:07 View session

coderabbitai Bot added the review-effort/medium Medium review effort (3): few files, moderate logic label Jun 19, 2026

Copilot AI reviewed Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

david-martin added the triage/needs-issue PR needs a linked issue label Jun 22, 2026

namansh70747 marked this pull request as draft June 23, 2026 09:07

namansh70747 mentioned this pull request Jun 23, 2026

broker: protocolValidation stays valid on the status after an upstream goes down #1184

Closed

jasonmadigan closed this Jun 23, 2026

jasonmadigan reopened this Jun 23, 2026

namansh70747 force-pushed the fix/stale-tool-count-on-status-error branch from 78565c0 to 9300947 Compare June 23, 2026 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(broker): report served tool/prompt count on upstream status error path#1165

fix(broker): report served tool/prompt count on upstream status error path#1165
namansh70747 wants to merge 1 commit into
Kuadrant:mainfrom
namansh70747:fix/stale-tool-count-on-status-error

namansh70747 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

namansh70747 commented Jun 19, 2026

Uh oh!

jasonmadigan commented Jun 23, 2026 •

edited

Loading

Uh oh!

namansh70747 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

namansh70747 commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

namansh70747 commented Jun 19, 2026

Uh oh!

jasonmadigan commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

namansh70747 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

namansh70747 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

jasonmadigan commented Jun 23, 2026 •

edited

Loading