Commit Graph

10 Commits

Author SHA1 Message Date
Hermes Bot 6a26e82c22 fix(bootstrap): address Opus pre-merge review feedback (#1478)
Three changes from the pre-merge Opus review:

**MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal

macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not
just real services. Typical noise values: `"0"` (truthy in Python!) and
`"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)`
existence check would auto-promote interactive `./start.sh` runs to
foreground mode on every Mac dev machine — silently breaking the most
common installation path (no /health probe, no browser open, no log file,
hanging shell).

Fix: new `_is_real_supervisor_value()` helper that filters noise. For
`XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"`
prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`)
which still triggers correctly.

7 new tests in `TestXPCServiceNameNoiseFilter`:
- 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection
- 3 real Label forms → correct detection
- Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd

**SHOULD-FIX 1** — Test env leakage

The original `clean_env` fixture stripped supervisor-detection env vars
but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR)
that `main()` mutates onto `os.environ`. After
`test_foreground_exports_resolved_env_vars` ran, later tests would import
bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of
"127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*),
but it was a footgun for future tests.

Fix: extend `clean_env` to also `delenv` the three resolved vars before
each test.

**SHOULD-FIX 2** — Pre-execv executability guard

If `discover_launcher_python` returns a path that doesn't exist or isn't
executable, `os.execv` raises OSError → wrapper catches → SystemExit(1)
→ supervisor restarts → loop forever. That's exactly the failure mode
this PR is supposed to eliminate.

Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts
infinite supervisor loop into a single visible RuntimeError.

1 new test in `TestForegroundExecutabilityGuard` pinning that the guard
fires before execv when the python path is non-executable.

**Docs** — supervisor.md updates

- New section explaining the XPC_SERVICE_NAME noise filter and what
  values trigger / don't trigger detection
- New section listing supervisors that are NOT auto-detected (runit,
  daemontools, PM2, Foreman/Honcho, custom shell-script supervisors)
  with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1

Verification

- 3820 tests pass (+9 from this commit's new tests vs the original PR
  push of 3811)
- Filter manually verified end-to-end with the live os.environ:
  XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers
- run-browser-tests.sh ALL CHECKS PASSED on the worktree

Items deferred from the Opus review

- #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve()
  so it's stable; not a real concern in practice
- #6 two startup messages in foreground mode: cosmetic, useful for
  diagnostics
- #7 stricter explicit-only mode: leaves user the override of just not
  passing --foreground (current behavior)
- #8 test stub return value: trivial, can fix later if regression surface
- #9 argparse positional-after-option ordering: test reads fine

These can be follow-up issues if anyone hits them.
2026-05-02 17:52:13 +00:00
Hermes Bot f84b6a4e2f fix(bootstrap): add --foreground mode for process supervisors (#1458 Bug #1)
Issue #1458 reports persistent-host crashes (≥1/day) when running the WebUI
under launchd KeepAlive on macOS. Root cause: `bootstrap.py` calls
`subprocess.Popen([python, "server.py"], start_new_session=True)`, probes
/health, then exits 0. Under any process supervisor (launchd, systemd,
supervisord, runit, s6), the supervisor sees its tracked PID exit, marks
the program as "completed," and respawns it. The new bootstrap fails to
bind port 8787 (orphaned server still has it), exits non-zero, supervisor
respawns again — loop until the orphan crashes for some other reason and
the next respawn finds the port free.

This PR addresses Bug #1 of the three failure modes tracked in #1458:
the `bootstrap.py` double-fork breaking process supervisors. Bug #2
(state.db FD leak) and Bug #3 (HTTP-unhealthy wedge) remain open under
the same issue — they need diagnosis data before a fix can land.

Changes
-------

1. `bootstrap.py`:
   - New `--foreground` argparse flag with help text mentioning launchd /
     systemd / supervisord.
   - New `_detect_supervisor()` that returns the env var name for any
     supervisor it detects: `INVOCATION_ID` / `JOURNAL_STREAM` /
     `NOTIFY_SOCKET` (systemd, s6), `XPC_SERVICE_NAME` (launchd),
     `SUPERVISOR_ENABLED` (supervisord), or `HERMES_WEBUI_FOREGROUND` for
     the explicit user opt-in. Truthy values for the explicit opt-in:
     `1` / `true` / `yes` / `on` (case-insensitive).
   - `main()` branches on `args.foreground or _detect_supervisor()`:
     - **Foreground path:** chdir to `agent_dir or REPO_ROOT`, then
       `os.execv(python, [python, server_path])` to replace the bootstrap
       process image with the server. The supervisor sees the long-lived
       server as the original child. No `wait_for_health` probe — the
       supervisor's KeepAlive / Restart=on-failure handles liveness.
     - **Default path:** unchanged. Spawn server as detached child via
       `Popen + start_new_session=True`, probe /health, return 0. This
       still works for interactive `bash start.sh` invocations.
   - Resolved env vars (HOST/PORT/STATE_DIR/AGENT_DIR) are now mutated on
     `os.environ` directly instead of into a local `env` copy so they
     are inherited across `os.execv`.

2. `docs/supervisor.md` (new): runnable launchd plist, systemd .service,
   and supervisord conf examples + a diagnostic recipe (`lsof` + ppid
   chain) for catching the orphan-loop in production.

3. `.gitignore`: allowlist `docs/supervisor.md` (the directory uses an
   opt-in pattern; matches the existing `!docs/docker.md` precedent).

4. `tests/test_bootstrap_foreground.py` (new): 35 regression tests
   covering the argparse flag, `_detect_supervisor()` behavior across all
   five supervisor env vars, the explicit opt-in's truthy/falsy values,
   and `main()`'s execv-vs-Popen routing decision under each input
   combination. `os.execv` is monkeypatched in the routing tests — we
   pin the structural choice (which call is made, with which args, in
   which cwd, with which env) not the post-exec behavior.

Why this scope and no more
--------------------------

Bug #2 (state.db FD leak) lists 5 candidate paths and asks the reporter
for `lsof -p <pid> | sort | uniq -c | sort -rn | head -20` output to
disambiguate. Until that data lands, any "fix" would be speculative —
explicitly out of scope per the contributor-pickup comment on the issue.

Bug #3 (launchd-running, port-listening, HTTP-unhealthy) was added in
@stefanpieter's reply comment. Diagnosis is in flight; no concrete fix
shape yet. Also out of scope.

Running locally end-to-end verifies the behavior:

```
[bootstrap] Starting Hermes Web UI on http://127.0.0.1:8789 (foreground mode: --foreground)
$ pgrep -af 'server.py'
2997632 /home/.../python /tmp/wt-fix-1458/server.py
$ ps -o ppid -p 2997632
2997581   ← bash that ran bootstrap.py — same PID as the original bootstrap
$ ps -p 2997581 -o cmd
... bootstrap.py ...   ← but exec'd into server.py
```

The same PID that bash forked for `bootstrap.py` is now `server.py`.
A supervisor watching that PID would correctly observe the long-lived
server. No double-fork.

Verification
------------

- 3811 tests pass (`pytest tests/` — full suite, +51 from this PR plus
  master-merge-in)
- All 35 new bootstrap-foreground tests pass
- `bash scripts/run-browser-tests.sh` PASS (HTTP API checks against worktree)
- `bash scripts/webui_qa_agent.sh 8789` PASS (23/23 visual QA)
- Live verified: server starts cleanly under both `--foreground` and
  `HERMES_WEBUI_FOREGROUND=1`; PID lineage confirms no double-fork

Closes #1458 (Bug #1 only). Bugs #2 and #3 remain tracked under the
issue.
2026-05-02 17:37:54 +00:00
nesquena-hermes 4ee9368464 Opus pre-release follow-ups for PR #1445
REQUIRED:
- _fully_unquote_path range(3) -> range(10) — defense-in-depth so quadruple-
  encoded .. is rejected by validator instead of slipping through (not
  exploitable but contract violation)
- docs/EXTENSIONS.md trust-model callout moved to top of file with explicit
  'don't enable in untrusted env / don't point at user-writable dir' guidance

NICE-TO-HAVE (taken since Nathan asked for all fixes big and small):
- URL list cap at _MAX_URL_LIST=32 to avoid pathological rendering
- One-shot WARNING log for rejected URLs (silent drop now visible to admin)
- One-shot WARNING log for URL list truncation
- MIME map: ttf (font/ttf), otf (font/otf), wasm (application/wasm)

5 regression tests in tests/test_pr1445_opus_followups.py pin all invariants.
2026-05-02 03:49:40 +00:00
Ryan Jones 9de61a0b9a feat: add opt-in webui extension hooks 2026-05-02 03:36:54 +00:00
nesquena-hermes b57525241b v0.50.260: Docker reliability batch - PR #1428 + broader UX/docs improvements + Opus advisor fixes
Combines PR #1428 (UID/GID alignment) with a broader Docker reliability pass
that addresses recurring user reports about compose files not working.

Constituent PR:
- #1428 sunnysktsang - Align agent UID/GID with webui (fixes #1399).
  Two- and three-container compose files had agent at UID 10000 (image
  default) and webui at UID 1000 (WANTED_UID default), causing permission
  denied on shared hermes-home volume. All services now use ${UID:-1000}.

Plus broader Docker UX overhaul:
- All 3 compose files document HERMES_SKIP_CHMOD/HERMES_HOME_MODE escape
  hatches inline (the v0.50.254 fix wasn't surfaced for Docker users).
- New .env.docker.example template covering UID/GID, paths, password,
  permission handling. UID/GID are uncommented with placeholder values
  per Opus advisor (so macOS users don't skim past).
- New docs/docker.md - comprehensive guide: 5-min quickstart, failure
  mode table with one-line fixes, bind-mount migration, multi-container
  architecture diagram, macOS Docker Desktop VirtioFS note, link to
  community sunnysktsang/hermes-suite all-in-one image.
- README Docker section rewritten - clearer quickstart, failure-mode
  table, link to docs/docker.md. Stale /root/.hermes references removed.

Plus Opus pre-release advisor MUST-FIX:
- HERMES_HOME_MODE has DIFFERENT semantics in the WebUI vs the agent
  image. WebUI: credential-file mode threshold (0640 allows group bits).
  Agent: HERMES_HOME directory mode (default 0700). 0640 on a directory
  has no owner-execute bit, so the agent can't traverse its own home and
  bricks. My initial draft recommended HERMES_HOME_MODE=0640 in agent
  service blocks - corrected to 0750 across all 4 surfaces (compose
  files, .env.docker.example, docs/docker.md). 3 regression tests pin
  the asymmetry.

12 regression tests total in test_v050260_docker_invariants.py.
Full suite: 3627 passed, 0 failed.

Nathan explicitly authorized merge with my own review + Opus only, no
independent review needed.
2026-05-01 23:10:52 +00:00
bergeouss f14280e2c4 fix(#1195): route sessions to profile dir even when dir doesn't exist yet (#1373)
When a user switched profiles and created a new session, the session
was saved to the default profile directory instead of the active
profile directory — because get_hermes_home_for_profile() silently
fell back to _DEFAULT_HERMES_HOME when the profile directory didn't
exist yet on disk.

Root cause: api/profiles.py:156 had `if profile_dir.is_dir(): return
profile_dir; return _DEFAULT_HERMES_HOME`. New profiles (no session
yet, so no dir) routed every session back to default.

Fix: remove the is_dir() guard, return the profile path
unconditionally. The profile directory is created on first use by
the agent/session layer.

5 regression tests in tests/test_issue1195_session_profile_routing.py:
existing-profile, non-existent-profile (the core fix), None, empty-
string, 'default' all return the expected path.

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
2026-04-30 23:24:31 +00:00
nesquena-hermes dca8624454 fix(ui): restore rail-era app titlebar state (v0.50.226) (#1163)
Merged as v0.50.226.

Integration branch absorbed @aronprins's original PR #1141 with one reviewer fix from @nesquena (`1d11646`: queue hide tooltip updated to reference the queue pill, not the removed titlebar badge).

**Full gate results:**
- 2595 tests passing 
- Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14) 
- Independent review: APPROVED by @nesquena 

Thank you @aronprins for the clean PR — the titlebar is properly restored.
2026-04-27 11:43:32 -07:00
nesquena-hermes 76e602af25 feat: remove bubble_layout setting end-to-end (#777)
Removes the bubble_layout toggle from Settings, all persistence, CSS, i18n strings, and the UI docs demo. The CSS was already effectively dead. Users with a saved bubble_layout value in settings.json get a clean migration via _SETTINGS_LEGACY_DROP_KEYS.

Credit: @aronprins (PR #760 / #777)

Co-authored-by: aronprins <aronprins@users.noreply.github.com>
2026-04-20 22:34:45 +00:00
Aron Prins 9a3dc10d93 feat: redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70 (PR #587 by @aronprins)
Redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70

Squash-merges PR #587 by @aronprins (Aron Prins). Full credit to @aronprins for all feature and fix work.

Transcript redesign: unified --msg-rail/--msg-max CSS variables, user turns as tinted cards, thinking cards as bordered panels, error card treatment, day-change separators, composer fade.

Approval/clarify as composer flyouts: cards slide up from behind composer top, overflow:hidden + translateY clip prevents travel visibility, focus({preventScroll:true}).

Streaming lifecycle: DOM order user→thinking→tool cards→response, no mid-stream jump. Live tool cards inserted before [data-live-assistant].

Persistence: reasoning attached before s.save(), _restore_reasoning_metadata on reload, role=tool rows preserved in S.messages, CLI-session tool-result fallback.

Workspace panel FOUC fix: [data-workspace-panel] set at parse time.

Docs: docs/ui-ux/index.html + two-stage-proposal.html.

Maintainer additions (433b867): CHANGELOG v0.50.70, version badge, usage badge loop simplification.

Reviewed and approved by @nesquena (independent review). 1361 tests passing.
2026-04-16 14:04:42 -07:00
nesquena-hermes 57a4f573f6 docs: HERMES.md deep-dive, Why Hermes in README, screenshot layout
* docs: add HERMES.md deep-dive, Why Hermes section in README, and screenshot layout

- HERMES.md: full why-Hermes document -- assistant vs. agent mental
  model, three pillars (memory/scheduling/reach), four-category
  taxonomy of AI tools, per-tool comparison sections with tables
  (Claude Code, Codex CLI, OpenCode, Cursor/Copilot, Claude.ai),
  compounding advantage, who it's for, what it's not, quick reference
- README: hero screenshot stays full-width; two new UI screenshots in
  side-by-side HTML table with captions below
- README: new Why Hermes section with 6-bullet summary, comparison
  table, and link to HERMES.md
- README: HERMES.md added to Docs section
- docs/images/: two UI screenshots (workspace browser, sessions view)

* docs: fact-check and update all comparisons; add Open Interpreter section

Researched current state of each tool before updating:

Claude Code:
- Scheduled jobs: now Partial (has /loop session-scoped, cloud-managed
  /schedule via claude.ai/code, and desktop app automations); updated
  table to reflect this with footnotes distinguishing self-hosted cron
- Persistent memory: Partial (CLAUDE.md, MEMORY.md, rolling auto-memory
  but not full automatic cross-session recall)
- Provider-agnostic: No -- supports Bedrock/Vertex but Claude models only
- Web UI: Yes but Anthropic-hosted (not self-hosted)

Codex CLI:
- Persistent memory: Partial (session history + AGENTS.md since v0.100.0)
- Scheduled jobs: Partial (desktop app Automations only; CLI has no native
  scheduling as of early 2026, open feature request)
- Provider-agnostic: Yes (10+ providers)

OpenCode:
- Web UI: now Yes (embedded in binary + official desktop app)
- Persistent memory: Partial (SQLite sessions + AGENTS.md, not semantic)
- Messaging: community Telegram bot only, not first-party

Open Interpreter: added as new comparison section
- Most common 'why not just use this' question; addressed head-on
- Session-scoped, no persistent memory by their own docs, no scheduler,
  no messaging integration; powerful for one-shot tasks, not always-on

README Why Hermes table: updated to include Open Interpreter column,
fixed Claude Code self-hosted row (No -- scheduling runs on Anthropic
cloud), added footnotes for partial entries

* docs: add OpenClaw comparison; update category framework and quick reference table

OpenClaw (openclaw.ai, MIT, 347k stars) is the most direct Hermes
competitor -- both are open-source, self-hosted, always-on agents with
persistent memory, cron, and messaging integration. Added:

- Full OpenClaw section in HERMES.md with honest comparison: where it
  wins (15+ messaging platforms incl. iMessage/WeChat, native Chrome CDP
  browser control, voice wake words, ClawHub marketplace) and where
  Hermes differs (self-improving skills system, Python/ML ecosystem,
  web UI, multi-profile, sub-agent orchestration)
- Category 4 framework updated: now lists both Hermes and OpenClaw,
  with the key architectural distinction called out
- Quick reference table expanded to include OpenClaw column (now 8 tools)
- New rows added: self-improving skills, browser/computer control,
  Python/ML ecosystem
- README Why Hermes table updated: OpenClaw replaces OpenCode column,
  self-improving skills row replaces generic skills row, callout line
  at bottom addresses OpenClaw head-on

* docs: major accuracy pass -- OpenClaw deep-dive, Claude Code corrections, drop Open Interpreter

OpenClaw:
- Expanded comparison from a table to a full prose section with
  'Where OpenClaw wins' / 'Where Hermes wins' structure
- Honest about OpenClaw strengths: 15+ messaging platforms, native
  Chrome CDP browser control, voice wake words, 13k+ ClawHub skills
- Hermes advantages called out clearly: self-improving skills as a
  first-class automatic loop (vs marketplace-install model), stability
  (documented OpenClaw update regressions, Telegram breakage in early
  2026, WhatsApp protocol instability), security (156 CVEs and 1,184
  malicious skills found in ClawHub audit vs Hermes's no marketplace
  attack surface), Python/ML ecosystem, full web UI vs dashboard-only,
  and first-class multi-profile support
- Category 4 framework updated to name both Hermes and OpenClaw
- Table updated: added stability/security rows, corrected web UI row
  (OpenClaw has a gateway dashboard but not a full chat UI)

Claude Code corrections (researched against official docs at code.claude.com):
- Skills/Hooks: changed from No to Yes -- has a full Hooks system (13
  event types, 4 handler types) and a Plugin/Skills marketplace since
  v2.0.12; unified with slash commands in v2.1.0
- Messaging: changed from No to Partial -- Channels feature (Telegram,
  Discord, iMessage, Webhooks) in research preview since v2.1.80; deep
  Slack integration that triggers cloud sessions and creates PRs
- Added Claude Cowork row: separate product with 38+ connectors
  (Slack, Gmail, Teams, Notion, Jira, Salesforce, etc.)
- Scheduling footnote updated: cloud-managed has 1-hour minimum interval
- Provider-agnostic clarified: routes through Bedrock/Vertex but always
  Claude models; cannot swap to GPT or Gemini

Open Interpreter removed:
- Less relevant comparison than OpenClaw for the 'always-on agent' frame
- Kept coverage focused on the tools people actually compare Hermes to

Quick reference table:
- Now 7 tools wide (added OpenClaw, kept Claude Code, Codex, OpenCode,
  Cursor, Claude.ai, Hermes)
- New rows: self-improving skills, browser/computer control, stability
- Updated: Claude Code messaging to Partial, OpenClaw web UI to
  'Dashboard only', skills rows differentiated by type

* docs: apply full editorial pass from hermes-edit-list.md

Writing patterns fixed:
- Em dashes reduced by ~80%; replaced with commas, periods, parens
- All 'Not X, it's Y' negative parallelism rewritten as positive
  statements; 'What Hermes Is Not' section renamed 'Scope and Limits'
  and reframed positively throughout
- 'It compounds.' standalone flourish removed
- 'meaningfully' removed everywhere (was appearing 3+ times)
- 'leverages' -> 'uses' in README
- 'remembers everything' softened to 'retains context across sessions'
- Bolded Hermes column in Quick Reference table un-bolded (only genuine
  differentiator cells kept bold: self-improving skills, always-on,
  orchestrates other agents)
- 'The honest summary' framing removed from OpenClaw section
- 'Hermes is different.' cliche transition cut from README
- Rule-of-three slogans trimmed (e.g. 'Same agent, same memory...')
- 'tired of re-explaining' -> 'don't want to re-explaining'

Duplicate content removed:
- 'day one / day one hundred' comparison kept only in Compounding
  Advantage section; removed from Pillar 1

Factual accuracy fixes:
- Claude.ai comparison updated: memory now auto-generated from history
  (not just user-curated); code execution and file read/write noted
  as sandboxed (Artifacts), not flat No
- Category 2: Windsurf framed as 'earliest' on memory, Copilot
  'catching up'; removed overconfident 'most mature' claim
- Category 4 qualifier: 'as of early 2026' added
- '1-hour minimum' for Claude Code cloud scheduling softened to
  'minimum interval applies' (specific claim unverified)
- Claude Code scheduling table note: 'cloud or desktop-app only'
  (was just 'cloud-managed or session-scoped')
- README claim 'No other open-source tool combines...' removed;
  was false because OpenClaw does combine all three
- OpenClaw self-improving skills: 'No' -> 'Partial' with clarification
- README OpenClaw callout: 'relies on a marketplace' softened to
  'skill system centers on a community marketplace'
- 'meaningfully more stable' -> 'more stable'; 'supply chain issues'
  -> 'security incidents involving malicious skills'
- OpenClaw star count: '347k+' -> '~347k' (moving fast)
- Stability row added to OpenClaw table; bold removed from table

---------

Co-authored-by: Hermes <hermes@localhost>
2026-04-03 22:03:19 -07:00