Skip to content

fix(cli): kill spawned claude binary when launcher exits (orphan / 'two cursors' bug)#1190

Open
johnmarktaylor91 wants to merge 1 commit intoslopus:mainfrom
johnmarktaylor91:fix/launcher-orphan-claude-on-parent-death
Open

fix(cli): kill spawned claude binary when launcher exits (orphan / 'two cursors' bug)#1190
johnmarktaylor91 wants to merge 1 commit intoslopus:mainfrom
johnmarktaylor91:fix/launcher-orphan-claude-on-parent-death

Conversation

@johnmarktaylor91
Copy link
Copy Markdown

Symptom

When happy is installed against a binary claude (Homebrew or the native installer at ~/.local/share/claude/versions/<v>/), users intermittently report that typing produces characters at two alternating cursor positions in the same pane, making input impossible. Often described as "two cursors" or "tmux is broken." It isn't tmux.

Root cause

runClaudeCli in packages/happy-cli/scripts/claude_version_utils.cjs has two branches:

  • .js/.cjs claude: import(importUrl) -- runs in-process, no orphan possible.
  • Binary claude: spawn(cliPath, args, { stdio: 'inherit' }) with only a child.on('exit', ...) handler. No signal forwarders, no parent-death detection.

When the launcher process dies for any reason -- terminal SIGHUP on pane close, kill -TERM, an OOM, an uncaught exception in the launcher's fetch interceptor, or the parent (happy) crashing without sending signals down the chain -- the spawned claude binary keeps running. Linux reparents it to PID 1, but it stays in the TTY's foreground process group (STAT=Sl+).

Next time happy launches a session in the same pane, a second claude binds to the same /dev/pts/N. Both are in the foreground PG, so the kernel race-distributes each keystroke between them. That's the "two cursors" symptom.

Reproduction on a machine with a binary claude install:

ps -eo pid,ppid,stat,tty,cmd | grep -E 'claude/versions' | grep -v grep
# look for: PPID=1, STAT=Sl+, multiple instances on same pts/N

Fix

In the binary branch of runClaudeCli:

  1. Forward SIGINT/SIGTERM/SIGHUP/SIGQUIT from launcher to the spawned child.
  2. On launcher 'exit', send SIGTERM to the child (covers natural exit).
  3. 1Hz ppidWatch that fires when the launcher's process.ppid === 1 (covers ungraceful parent death where no signal is propagated). Interval is unref()'d so it doesn't keep the event loop alive after normal child exit.

21 lines added, no behavior change for the existing happy path.

Verified locally

Synthetic launcher mirroring runClaudeCli's spawn pattern, with a long-lived /bin/sleep as the fake child:

Test Scenario Result
1 UNPATCHED control: kill -TERM launcher child alive, PPID=1 (bug reproduced)
2 PATCHED + kill -TERM launcher child dies
3 PATCHED + kill -HUP launcher (terminal-close case) child dies
4 PATCHED + kill -KILL launcher's parent shell child dies within 1s via ppid watch

Notes

  • Doesn't change the .js/.cjs import-path branch, since orphans are impossible there.
  • Could alternatively use prctl(PR_SET_PDEATHSIG) via FFI for instant detection, but that's Linux-only and adds a native dep. The 1Hz polling is portable and adds negligible CPU.
  • I'm happy to add a test if you want one -- the synthetic harness above is straightforward to port into the test suite.

The binary-file branch of runClaudeCli spawns claude with stdio:'inherit'
and propagates child->parent exit, but installs no parent->child signal
handlers and no parent-death detection. When the launcher process dies
(terminal close, happy crash, kill -TERM, OOM, etc.) the spawned claude
process keeps running, gets reparented to PID 1, and stays bound to the
TTY's foreground process group.

The next time happy starts a session in the same pane, a second claude
joins the same /dev/pts/N. Both have '+' in STAT (Sl+) meaning both are
in the foreground process group; the kernel race-distributes each
keystroke between them, producing the 'two cursors' / 'characters
alternating' symptom users have reported. Typing becomes impossible
until the orphan is killed manually.

Fix: in runClaudeCli's binary branch, install signal forwarders for
SIGINT/SIGTERM/SIGHUP/SIGQUIT, an exit handler that sends SIGTERM to
the child, and a 1Hz ppid watcher that fires when the launcher itself
gets reparented to PID 1 (catches the case where the launcher's parent
dies ungracefully without sending a signal down the chain). The
ppid-watch interval is unref()'d so it doesn't keep the event loop
alive past normal child exit.

Verified locally with a synthetic launcher that mirrors runClaudeCli's
spawn pattern. Without the patch: SIGTERM to launcher leaves child
alive with PPID=1. With the patch: child dies on SIGTERM, SIGHUP, or
parent SIGKILL within ~1s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant