Skip to content

httpserver: make /-/reload handler cancellation-safe#5234

Open
kimjune01 wants to merge 1 commit into
prometheus:mainfrom
kimjune01:fix/reload-handler-cancellation-safe
Open

httpserver: make /-/reload handler cancellation-safe#5234
kimjune01 wants to merge 1 commit into
prometheus:mainfrom
kimjune01:fix/reload-handler-cancellation-safe

Conversation

@kimjune01
Copy link
Copy Markdown

@kimjune01 kimjune01 commented May 12, 2026

Summary

  • Fix race condition in the /-/reload HTTP handler where a cancelled client request could leave the reload channel in a stuck state
  • Use select with context.Done() to detect client disconnection before sending on the reload channel
  • Drain the error channel on cancellation to prevent goroutine leaks

Fixes #5103

Test plan

  • New tests verify reload handler behavior under normal conditions and client cancellation
  • Existing httpserver tests pass unchanged

Summary by CodeRabbit

  • Bug Fixes

    • Improved reload endpoint reliability with better handling of client disconnections and request timeouts.
  • Tests

    • Added comprehensive reload handler tests covering success, error handling, and cancellation scenarios.

Review Change Stack

Fix three bugs in the reload handler identified in prometheus#5103:

1. Use a buffered error channel (size 1) so the reloader goroutine
   never blocks if the HTTP client disconnects before reading the
   result.

2. Remove the incorrect `defer close(errc)` — the handler is the
   receiver, not the sender; closing from the receiver side could
   panic the sender.

3. Wrap both the enqueue (send to reloadCh) and the reply (receive
   from errc) in select statements that also listen on
   req.Context().Done(), so a client disconnect unblocks the handler
   at either stage.

Fixes prometheus#5103

Signed-off-by: June Kim <kimjune01@gmail.com>
@kimjune01 kimjune01 requested a review from a team as a code owner May 12, 2026 08:48
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

The /-/reload POST handler is hardened against client disconnection deadlocks. A buffered error channel replaces the unsafe unbuffered approach, and context-aware select statements protect both the enqueue and result-receive checkpoints. Four new tests validate the handler behavior across success, error, and cancellation scenarios.

Changes

Reload Handler Cancellation Safety

Layer / File(s) Summary
Handler cancellation-safety implementation
httpserver/httpserver.go
Handler creates a buffered errc and uses two select statements: one to send the channel to reloadCh (aborting with HTTP 422 on context cancellation) and one to receive the result (also aborting with HTTP 422 if context is done). The unsafe defer close(errc) is removed.
Test coverage for reload scenarios
httpserver/httpserver_test.go
Imports context, errors, and time. Four tests verify: successful reload returning 200 OK, error propagation returning 500, client disconnect before enqueue returning 422, and client disconnect during reload allowing buffered send without blocking.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Handler
  participant ReloadQueue as reloadCh
  participant Reloader
  Client->>Handler: POST /-/reload
  Handler->>Handler: Create buffered errc chan
  alt Context done before send
    Handler-->>Client: 422 UnprocessableEntity
  else Send errc to queue
    Handler->>ReloadQueue: Send errc
    ReloadQueue->>Reloader: Receive errc
    alt Reloader completes
      Reloader->>Handler: Send result on errc
      Handler-->>Client: 200 OK (nil) or 500 (error)
    else Context done during reload
      Client-xHandler: Context cancelled
      Handler-->>Client: 422 UnprocessableEntity
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: making the /-/reload handler safe for cancellation.
Description check ✅ Passed The description covers the problem, solution, test plan, and issue reference, matching the template requirements for a bugfix with tests.
Linked Issues check ✅ Passed The code changes directly address all objectives from issue #5103: using buffered channel, removing unsafe defer close, adding context-aware select statements, and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the /-/reload handler cancellation issue and adding corresponding tests; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@httpserver/httpserver_test.go`:
- Around line 34-40: The tests spawn goroutines that call require.* (which calls
t.FailNow) — replace these with assert.* or move the assertions out of the
goroutine: capture the httptest.ResponseRecorder (w) and any response values
inside the goroutine, send them on a channel or close(done) and then after
<-done perform require.Equal(t, http.StatusOK, w.Code) in the main test
goroutine; update the four occurrences referencing the done channel,
router.ServeHTTP, httptest.NewRequest and httptest.NewRecorder so assertions run
in the test goroutine or use assert.* inside the goroutine instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 40886f45-c83a-456b-a48c-5b6f09629879

📥 Commits

Reviewing files that changed from the base of the PR and between 7806d6e and 508c0b3.

📒 Files selected for processing (2)
  • httpserver/httpserver.go
  • httpserver/httpserver_test.go

Comment on lines +34 to +40
go func() {
defer close(done)
req := httptest.NewRequest("POST", "/-/reload", nil)
w := httptest.NewRecorder()
router.ServeHTTP(w, req)
require.Equal(t, http.StatusOK, w.Code)
}()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

testify require FailNow from goroutine other than test goroutine

💡 Result:

In Go’s testing and in stretchr/testify, you must not call require.* / t.FailNow (which it uses under the hood) from a goroutine other than the goroutine running the test. Evidence: - Go’s testing.T.FailNow documentation says FailNow “must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test.” It calls runtime.Goexit in the current goroutine, so it only stops that goroutine, not the whole test. [1] - testify’s require package documentation states that require “terminate current test” and “must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test.” [2] What to do instead: - Use assert.* inside goroutines (it records an error via t.Errorf-style behavior rather than invoking FailNow). This is the common workaround suggested in testify discussions: “use assert in goroutines, not require.” [3] - Or funnel the result back to the main test goroutine (e.g., via a channel) and perform require/assert in that main goroutine, so FailNow happens on the correct goroutine. [4] Why this matters: - If require calls t.FailNow from the wrong goroutine, runtime.Goexit will exit only that goroutine, so the expected assertion/termination semantics for the test can break (and can even lead to hangs in some patterns). [5][6]

Citations:


🏁 Script executed:

# Check the actual test file at the mentioned line ranges
cat -n httpserver/httpserver_test.go | sed -n '30,70p'

Repository: prometheus/alertmanager

Length of output: 1246


🏁 Script executed:

# Also check the other two ranges
cat -n httpserver/httpserver_test.go | sed -n '82,125p'

Repository: prometheus/alertmanager

Length of output: 1618


Use assert.* or move assertions outside goroutines.

require.* calls t.FailNow(), which per Go's testing documentation must be called from the goroutine running the test—calling it from another goroutine exits only that goroutine via runtime.Goexit(), leaving the test in a misleading state instead of properly failing.

All four new tests have this issue (lines 34–40, 58–65, 86–92, 115–121). Fix by either:

  1. Moving assertions outside the goroutine and capturing response objects in the main goroutine after <-done, or
  2. Using assert.* inside the goroutines to record failures without violating the threading contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@httpserver/httpserver_test.go` around lines 34 - 40, The tests spawn
goroutines that call require.* (which calls t.FailNow) — replace these with
assert.* or move the assertions out of the goroutine: capture the
httptest.ResponseRecorder (w) and any response values inside the goroutine, send
them on a channel or close(done) and then after <-done perform require.Equal(t,
http.StatusOK, w.Code) in the main test goroutine; update the four occurrences
referencing the done channel, router.ServeHTTP, httptest.NewRequest and
httptest.NewRecorder so assertions run in the test goroutine or use assert.*
inside the goroutine instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

weboperations: make /-/reload handler cancellation-safe

1 participant