Skip to content

playlist-proxy stopPlaylistProxy doesn't prevent queued reconnects #1132

Description

@jakebromberg

Problem

The EventSource 'error' handler in playlist-proxy.service.ts unconditionally schedules a reconnect timer with no guard against stopPlaylistProxy having been called. Two failure modes:

  1. If stopPlaylistProxy() runs before an error fires, the error handler still schedules reconnectTimer = setTimeout(() => connectSSE(), reconnectDelay). The timer fires after stopPlaylistProxy clears its own copy of reconnectTimer, and a fresh SSE connection opens with no operator instruction to do so.

  2. If multiple 'error' events fire before any reconnect runs (rare but possible during cascading TCP failure), each handler reassigns reconnectTimer to a fresh setTimeout — but the prior setTimeout handle is still scheduled. Each prior setTimeout invokes connectSSE() independently. The stopPlaylistProxy clearTimeout(reconnectTimer) only clears the most recent handle. Stacked parallel SSE connections result.

The second pattern also tends to escalate reconnectDelay past MAX_RECONNECT_DELAY (Math.min(reconnectDelay * 2, MAX_RECONNECT_DELAY) runs on every error, even ones that lead to immediately-cancelled timers).

Evidence

apps/backend/services/playlist-proxy.service.ts:240-246:

es.addEventListener('error', () => {
  console.error(`[playlist-proxy] SSE error, reconnecting in ${reconnectDelay}ms`);
  es.close();
  if (currentEventSource === es) currentEventSource = null;
  reconnectTimer = setTimeout(() => connectSSE(), reconnectDelay);
  reconnectDelay = Math.min(reconnectDelay * 2, MAX_RECONNECT_DELAY);
});

apps/backend/services/playlist-proxy.service.ts:168-178 (stop path only clears the latest handle):

export function stopPlaylistProxy(): void {
  if (currentEventSource) { currentEventSource.close(); currentEventSource = null; }
  if (reconnectTimer) { clearTimeout(reconnectTimer); reconnectTimer = null; }
  connected = false;
}

Reproduction

Pattern 1: call stopPlaylistProxy() while an error is queued in the EventLoop. Pattern 2: pre-deploy disconnect cascade where multiple 'error' events fire in rapid succession before any reconnect runs.

Acceptance criteria

  • Introduce a module-level stopped flag; set in stopPlaylistProxy, cleared in startPlaylistProxy/connectSSE. The 'error' handler must check it before scheduling.
  • Before assigning a new reconnectTimer, clearTimeout any prior value (prevents stacking).
  • Only escalate reconnectDelay when a reconnect is actually scheduled (move the line inside the guard).
  • Unit test: cancellation during pending-reconnect; multiple-error stacking.

Related

  • The file's existing self-documenting comment block (lines 189-200) about why there is no app-level heartbeat — relevant context for whoever fixes this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingresiliencePrevents prod regressions or surfaces them earlier

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions