Files
hermes-webui/api
nesquena-hermes 71ba863ce5 fix(terminal): drop PR_SET_PDEATHSIG preexec_fn that killed every Linux shell
Fixes #2853. The `_terminal_shell_preexec_fn` added in `71d8a8fb` called
`prctl(PR_SET_PDEATHSIG, SIGTERM)` so orphaned PTY shells would die when
the WebUI process crashed. But that signal is **per-thread**, not
per-process, and WebUI runs `ThreadingHTTPServer`: every HTTP request is
handled in its own short-lived worker thread.

Flow that broke every Linux user:

1. User clicks the terminal toggle → frontend hits `POST /api/terminal/start`.
2. ThreadingHTTPServer spins up a worker thread to handle that one request.
3. The worker thread calls `subprocess.Popen(..., preexec_fn=...)`.
4. The shell calls `prctl(PR_SET_PDEATHSIG, SIGTERM)` in its preexec_fn.
   Its registered "parent" is now the WebUI worker thread that called Popen.
5. The handler returns its JSON response and the worker thread exits.
6. The kernel sees the pdeathsig-parent thread has died and sends SIGTERM
   to the PTY shell. The shell dies within ~10 ms of being created.
7. The reader loop sees EIO on the master FD, emits `terminal_closed`, and
   the frontend writes `[terminal closed]`.

macOS users were unaffected because `libc.prctl` doesn't exist there —
`ctypes.CDLL(None)` returns a libc handle, `libc.prctl` raises
`AttributeError`, the bare-`except` swallows it, and the shell starts
with no pdeathsig configured.

Empirical verification on this Linux host (real PTY + `subprocess.Popen`
inside a `threading.Thread` that joins immediately):

  with    preexec_fn → proc.poll() == -15 (SIGTERM), master FD returns EIO
  without preexec_fn → proc.poll() == None (alive), master FD returns "HELLO\\r\\n"

Same shell, same PTY, same threading topology as WebUI.

Fix
---

Drop the `preexec_fn` entirely. The orphan-shell-on-crash case the original
PR was navigating is rare for self-hosted single-user installs, and the
existing `atexit.register(close_all_terminals)` + explicit `close_terminal`
paths cover graceful shutdown. A future fix (option B in the issue) can
re-introduce pdeathsig pinned to a long-lived supervisor thread, but that
is a follow-up — this PR is the smallest unbricks-Linux-today change.

Tests
-----

- Invert `test_terminal_shell_uses_parent_death_signal_preexec` →
  `test_terminal_shell_does_not_use_pdeathsig_preexec`: asserts
  `preexec_fn` is NOT in the Popen kwargs.
- Add `test_pty_shell_survives_when_spawning_thread_exits`: spawns a
  real PTY shell via `start_terminal` from a worker thread, waits for
  the worker to join, asserts the shell is still alive after a half-second
  grace window. This is the contract the original tests never exercised.
- Update `test_terminal_module_registers_graceful_shutdown_reaper` to
  refuse re-introduction of the preexec_fn or the `libc.prctl(1, SIGTERM)`
  call (treats either as a regression).

All 27 terminal-related tests pass locally.

Refs #2853
2026-05-24 17:13:31 +00:00
..
2026-05-18 07:14:26 +08:00
2026-05-24 15:52:34 +00:00
2026-04-29 19:54:07 -07:00
2026-05-20 22:24:40 +00:00
2026-05-15 16:39:45 -07:00
2026-05-24 15:52:34 +00:00