mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 19:20:16 +00:00
6a26e82c22
Three changes from the pre-merge Opus review: **MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not just real services. Typical noise values: `"0"` (truthy in Python!) and `"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)` existence check would auto-promote interactive `./start.sh` runs to foreground mode on every Mac dev machine — silently breaking the most common installation path (no /health probe, no browser open, no log file, hanging shell). Fix: new `_is_real_supervisor_value()` helper that filters noise. For `XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"` prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`) which still triggers correctly. 7 new tests in `TestXPCServiceNameNoiseFilter`: - 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection - 3 real Label forms → correct detection - Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd **SHOULD-FIX 1** — Test env leakage The original `clean_env` fixture stripped supervisor-detection env vars but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR) that `main()` mutates onto `os.environ`. After `test_foreground_exports_resolved_env_vars` ran, later tests would import bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of "127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*), but it was a footgun for future tests. Fix: extend `clean_env` to also `delenv` the three resolved vars before each test. **SHOULD-FIX 2** — Pre-execv executability guard If `discover_launcher_python` returns a path that doesn't exist or isn't executable, `os.execv` raises OSError → wrapper catches → SystemExit(1) → supervisor restarts → loop forever. That's exactly the failure mode this PR is supposed to eliminate. Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts infinite supervisor loop into a single visible RuntimeError. 1 new test in `TestForegroundExecutabilityGuard` pinning that the guard fires before execv when the python path is non-executable. **Docs** — supervisor.md updates - New section explaining the XPC_SERVICE_NAME noise filter and what values trigger / don't trigger detection - New section listing supervisors that are NOT auto-detected (runit, daemontools, PM2, Foreman/Honcho, custom shell-script supervisors) with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1 Verification - 3820 tests pass (+9 from this commit's new tests vs the original PR push of 3811) - Filter manually verified end-to-end with the live os.environ: XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers - run-browser-tests.sh ALL CHECKS PASSED on the worktree Items deferred from the Opus review - #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve() so it's stable; not a real concern in practice - #6 two startup messages in foreground mode: cosmetic, useful for diagnostics - #7 stricter explicit-only mode: leaves user the override of just not passing --foreground (current behavior) - #8 test stub return value: trivial, can fix later if regression surface - #9 argparse positional-after-option ordering: test reads fine These can be follow-up issues if anyone hits them.
238 lines
7.2 KiB
Markdown
238 lines
7.2 KiB
Markdown
# Running Hermes Web UI under a process supervisor
|
|
|
|
Use a process supervisor (launchd, systemd, supervisord, runit, s6) when you
|
|
want the Web UI to start at boot, restart on crash, or be managed alongside
|
|
other services.
|
|
|
|
## TL;DR
|
|
|
|
Pass ``--foreground`` to ``bootstrap.py`` (or ``bash start.sh``):
|
|
|
|
```bash
|
|
bash start.sh --foreground
|
|
```
|
|
|
|
Or set ``HERMES_WEBUI_FOREGROUND=1`` in the environment. The Web UI will
|
|
auto-detect launchd / systemd / supervisord even without the flag, but being
|
|
explicit is safer.
|
|
|
|
## Why ``--foreground`` matters
|
|
|
|
Without it, ``bootstrap.py`` does this:
|
|
|
|
1. Spawn ``server.py`` as a detached subprocess (``start_new_session=True``)
|
|
2. Probe ``/health`` until the server is up
|
|
3. Exit 0
|
|
|
|
That works for an interactive shell run (``./start.sh`` returns to your
|
|
prompt with the server alive in the background). It is **broken** under any
|
|
process supervisor: the supervisor sees its tracked PID exit, marks the job
|
|
as completed, and respawns ``bootstrap.py``. The respawn fails to bind port
|
|
8787 (the orphaned server still has it), exits non-zero, supervisor
|
|
respawns again — loop.
|
|
|
|
In foreground mode, ``bootstrap.py`` does its setup work and then calls
|
|
``os.execv`` to replace its own process with ``server.py``. The supervisor
|
|
sees the long-lived server as the original child. ``KeepAlive=true`` /
|
|
``Restart=always`` work correctly.
|
|
|
|
## launchd (macOS)
|
|
|
|
``~/Library/LaunchAgents/com.example.hermes-webui.plist``:
|
|
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
<plist version="1.0">
|
|
<dict>
|
|
<key>Label</key>
|
|
<string>com.example.hermes-webui</string>
|
|
|
|
<key>ProgramArguments</key>
|
|
<array>
|
|
<string>/bin/bash</string>
|
|
<string>/Users/yourname/hermes-webui/start.sh</string>
|
|
<string>--foreground</string>
|
|
</array>
|
|
|
|
<key>WorkingDirectory</key>
|
|
<string>/Users/yourname/hermes-webui</string>
|
|
|
|
<key>RunAtLoad</key>
|
|
<true/>
|
|
|
|
<key>KeepAlive</key>
|
|
<true/>
|
|
|
|
<key>StandardOutPath</key>
|
|
<string>/Users/yourname/.hermes/webui/launchd-stdout.log</string>
|
|
|
|
<key>StandardErrorPath</key>
|
|
<string>/Users/yourname/.hermes/webui/launchd-stderr.log</string>
|
|
|
|
<key>EnvironmentVariables</key>
|
|
<dict>
|
|
<key>HOME</key>
|
|
<string>/Users/yourname</string>
|
|
<key>PATH</key>
|
|
<string>/usr/local/bin:/usr/bin:/bin</string>
|
|
</dict>
|
|
</dict>
|
|
</plist>
|
|
```
|
|
|
|
Load:
|
|
|
|
```bash
|
|
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
launchctl print gui/$(id -u)/com.example.hermes-webui # check state
|
|
```
|
|
|
|
Reload after editing the plist:
|
|
|
|
```bash
|
|
launchctl unload ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
launchctl load ~/Library/LaunchAgents/com.example.hermes-webui.plist
|
|
```
|
|
|
|
launchd sets ``XPC_SERVICE_NAME`` automatically, so even without the
|
|
``--foreground`` argument the Web UI will auto-promote to foreground mode.
|
|
The flag is still recommended as documentation of intent.
|
|
|
|
## systemd (Linux)
|
|
|
|
``~/.config/systemd/user/hermes-webui.service``:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Hermes Web UI
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
WorkingDirectory=%h/hermes-webui
|
|
ExecStart=/bin/bash %h/hermes-webui/start.sh --foreground
|
|
Restart=on-failure
|
|
RestartSec=5
|
|
|
|
# Optional: route stdout/stderr to journald instead of files
|
|
StandardOutput=journal
|
|
StandardError=journal
|
|
|
|
[Install]
|
|
WantedBy=default.target
|
|
```
|
|
|
|
Enable + start:
|
|
|
|
```bash
|
|
systemctl --user daemon-reload
|
|
systemctl --user enable --now hermes-webui.service
|
|
journalctl --user -u hermes-webui.service -f
|
|
```
|
|
|
|
systemd sets ``INVOCATION_ID`` and ``JOURNAL_STREAM`` (when stdio is wired to
|
|
the journal), both of which auto-promote to foreground mode.
|
|
|
|
## supervisord (cross-platform)
|
|
|
|
``/etc/supervisor/conf.d/hermes-webui.conf``:
|
|
|
|
```ini
|
|
[program:hermes-webui]
|
|
command=/bin/bash /home/youruser/hermes-webui/start.sh --foreground
|
|
directory=/home/youruser/hermes-webui
|
|
user=youruser
|
|
autostart=true
|
|
autorestart=true
|
|
stopsignal=TERM
|
|
stopwaitsecs=10
|
|
stdout_logfile=/var/log/hermes-webui.out.log
|
|
stderr_logfile=/var/log/hermes-webui.err.log
|
|
environment=HOME="/home/youruser",PATH="/usr/local/bin:/usr/bin:/bin"
|
|
```
|
|
|
|
Reload + start:
|
|
|
|
```bash
|
|
sudo supervisorctl reread
|
|
sudo supervisorctl update
|
|
sudo supervisorctl status hermes-webui
|
|
```
|
|
|
|
supervisord sets ``SUPERVISOR_ENABLED``, which auto-promotes to foreground
|
|
mode.
|
|
|
|
## Auto-detected env vars (full list)
|
|
|
|
These trigger ``--foreground`` behavior even when the flag is not passed:
|
|
|
|
| Env var | Set by | Notes |
|
|
|---|---|---|
|
|
| ``INVOCATION_ID`` | systemd | Set on every service activation |
|
|
| ``JOURNAL_STREAM`` | systemd | Set when stdio is wired to journald |
|
|
| ``NOTIFY_SOCKET`` | systemd ``Type=notify`` / s6 | sd_notify-style notification socket |
|
|
| ``XPC_SERVICE_NAME`` | launchd | Set to the plist Label — narrowed to ``com.<rdns>.<svc>`` form (see below) |
|
|
| ``SUPERVISOR_ENABLED`` | supervisord | Always set under supervisord |
|
|
| ``HERMES_WEBUI_FOREGROUND`` | you | Explicit opt-in; accepts ``1`` / ``true`` / ``yes`` / ``on`` |
|
|
|
|
### XPC_SERVICE_NAME noise filter
|
|
|
|
macOS launchd sets ``XPC_SERVICE_NAME`` in **every Terminal-spawned shell**,
|
|
not just real services. Typical noise values:
|
|
|
|
- ``0`` — set on launchd descendants generally
|
|
- ``application.com.apple.Terminal.<UUID>`` — Terminal.app shells
|
|
- ``application.com.googlecode.iterm2`` — iTerm2
|
|
- ``application.com.microsoft.VSCode`` — VSCode integrated terminal
|
|
|
|
A bare existence check on this var would auto-promote interactive
|
|
``./start.sh`` runs to foreground mode on every Mac dev machine, breaking
|
|
the most common installation path. We narrow detection to launchd
|
|
**Label-style** names (typically reverse-DNS like ``com.example.foo``).
|
|
Real launchd plists always use this form. If you ever see
|
|
``XPC_SERVICE_NAME=0`` in your service environment, the auto-detect will
|
|
ignore it — set ``HERMES_WEBUI_FOREGROUND=1`` or pass ``--foreground``
|
|
explicitly to be safe.
|
|
|
|
### Supervisors that are NOT auto-detected
|
|
|
|
The following set no env var that we can reliably detect. Pass
|
|
``--foreground`` (or ``HERMES_WEBUI_FOREGROUND=1``) explicitly:
|
|
|
|
- **runit** (without sd_notify) — pure runit chains
|
|
- **daemontools** / ``svc``
|
|
- **PM2** (Node.js process manager occasionally repurposed for Python)
|
|
- **Foreman** / **Honcho** (Procfile-style)
|
|
- **Docker** with a custom CMD entrypoint that doesn't already use ``exec``
|
|
- **Custom shell-script supervisors** that fork-and-wait
|
|
|
|
If your supervisor isn't in the auto-detect list and you see the orphan-PID
|
|
respawn loop, set ``HERMES_WEBUI_FOREGROUND=1`` in the service environment.
|
|
|
|
## Diagnostic recipe
|
|
|
|
If the Web UI keeps getting respawned and you suspect the double-fork loop:
|
|
|
|
```bash
|
|
# Check the running PID for the server
|
|
lsof -iTCP:8787 -sTCP:LISTEN
|
|
|
|
# Get its parent — should be the supervisor itself, NOT init (PID 1)
|
|
PID=$(lsof -tiTCP:8787 -sTCP:LISTEN)
|
|
ps -p "$PID" -o pid,ppid,cmd
|
|
ps -p "$(ps -o ppid= -p "$PID" | tr -d ' ')" -o pid,cmd
|
|
```
|
|
|
|
A healthy foreground-mode setup looks like:
|
|
|
|
```
|
|
PID PPID CMD
|
|
12345 6789 /path/to/python /path/to/server.py
|
|
6789 1 /sbin/launchd # or /usr/lib/systemd/systemd, etc.
|
|
```
|
|
|
|
If PPID is ``1`` (init) when it should be the supervisor, the orphan-server
|
|
loop is happening — re-check that ``--foreground`` (or one of the env vars)
|
|
is reaching the process.
|