feat: probe Ollama runtime via lore status --full and richer hook warning by attila · Pull Request #67 · attila/lore

attila · 2026-06-04T19:35:54Z

Summary

lore status previously reported Ollama as healthy whenever the daemon answered and the model
manifest was on disk — even when the bundled runner subprocess was missing or unloadable. The
current Homebrew Ollama bottle reproduces this exact gap: status shows ✓✓✓ while hooks silently
degrade to FTS with an opaque one-line warning.

This adds:

lore status --full — opt-in deeper check that probes Ollama with a real embed call against
the configured model. Default lore status stays sub-second and shows a discoverability hint
pointing at --full. Header lines print before the probe so the multi-second wait isn't a
frozen terminal.
lore init auto-probes — provisioning verifies inference end-to-end before declaring success.
Upgraded hook FTS-fallback warning — names the failure class (inference error, timed out,
transport error, HTTP <status>) and points at lore status --full for full diagnosis.
Per-process rate limit prevents repeat warnings of the same class; suppressions leave a
LORE_DEBUG=1 breadcrumb.
MCP lore_status parity — exposes Ollama state (installed/running/model_available/model)
in the metadata fence by default, with optional probe_runtime: true argument that runs the
probe and returns the structured ProbeOutcome (state + short_reason + status_line). Agents
now have first-class parity with lore status --full without shelling out and parsing stderr.

The probe returns a structured ProbeError (RunnerFailed / InferenceError / Timeout / Transport /
HttpStatus) and a single render_failure function centralises the variant→strings match so all
user-facing surfaces (status line, provision errors, provision actions, hook short-reason, MCP
metadata) stay synchronised when the enum gains a variant. ProbeError, ProbeOutcome, and
RenderedFailure carry #[non_exhaustive] so future variants aren't breaking changes for
downstream match consumers.

Discovered during implementation

ureq 3.x http_status_as_error defaults to true. The plan asserted ureq 3 returned
Ok(Response) for HTTP 5xx. Manual verification revealed the opposite — 5xx came back as
Error::StatusCode(_) without the body, rendering "transport error — http status: 500" instead of
Ollama's actual diagnostic. Reconfigured OllamaClient's agent with http_status_as_error(false)
and made is_healthy / has_model / pull_model check response.status() explicitly to
preserve their semantics. Sibling-path audit during code-review confirmed all five self.agent.*
callsites are covered.
Body truncation bumped 80 → 240 chars. The 80-char limit chopped Ollama's runner-not-found
errors mid-path, exactly where the diagnostic value lives. 240 fits multiple full Homebrew Cellar
paths on typical terminals while still bounding pathological multi-paragraph bodies.
Body read uses from_utf8_lossy. Non-UTF8 sequences in a 5xx body would otherwise collapse
the whole diagnostic to empty string. Lossy decode preserves what's decodable.
Plan correction: lore init aborts on probe failure. An earlier plan draft claimed init
continues to completion. The pre-existing cmd_init wrapper has always called
process::exit(1) on any non-empty result.errors; letting init continue past a broken embedder
would create a database with empty embeddings (silently broken vector search). Aborting forces
the user to fix Ollama and re-run; the manifest is already on disk so the re-run is fast. Plan
amended; code is right.

Code-review fixes applied

PR went through a multi-agent code review (10 reviewers). All four P1 findings addressed:

~~lore init aborts on probe failure~~ — plan amended (code is right).
#[non_exhaustive] added to ProbeError, ProbeOutcome, RenderedFailure.
~~check_status signature break~~ — not advertised as breaking; project isn't published to
crates.io yet, no external consumers.
HTTP 200 empty-embeddings now wraps in ProbeError::InferenceError (was anyhow!), so the
hook downcast classifies correctly. probe() also rejects {} bodies that prove nothing.

Plus all approved P2/P3 follow-ups: MCP parity, header-before-probe rendering, from_utf8_lossy,
13 new unit tests for extract_body_message / render_failure / ProbeError downcast,
keep_alive bumped 0→30s on --full for the "fix Ollama, re-run" loop, visible Skipped Runtime
line under --full, lore_debug! breadcrumb on rate-limit suppression.

Compounded learnings

Five durable learnings landed in docs/solutions/ so future sessions hit lookups instead of
research. All five include explicit "When to Apply" / "Skip when" guidance:

best-practices/ureq-3-http-status-as-error-defaults-true-2026-06-04.md
— ureq 3.x kept the status-as-error default; 5xx loses its body unless reconfigured.
best-practices/bin-and-lib-are-separate-crates-for-non-exhaustive-2026-06-04.md
— #[non_exhaustive] semantics across the lib/bin boundary in the same Cargo package.
design-patterns/multi-surface-render-function-pattern-2026-06-04.md
— one render(&Enum) -> Rendered function returning an N-field struct keeps every
surface in sync when variants change.
best-practices/http-error-body-extraction-and-char-boundary-safe-truncation-2026-06-04.md
— four-layer body extraction (lossy decode → JSON error → first line → char-bounded
truncation) with the multi-byte-codepoint safety property pinned by unit tests.
conventions/expensive-cli-checks-go-behind-opt-in-flag-with-rescue-mechanisms-2026-06-04.md
— the four-mechanism design that emerged from the default-on/opt-in product pivot:
fast default + discoverability hint + push-side warning + install-time auto-probe.

Verification (currently-broken Homebrew Ollama bottle)

Before this PR: lore status shows ✓✓✓, hook prints
Warning: Ollama unreachable (...), falling back to text search.

After this PR:

$ lore status        # sub-second, hint surfaces deeper check
  Model:        ✓ nomic-embed-text
  Runtime:      —  (run 'lore status --full' to verify inference)

$ lore status --full # header prints first, then a brief wait, then result
  Model:        ✓ nomic-embed-text
  Verifying inference runtime (may take a few seconds)…
  Runtime:      ✗ inference failed — error starting llama-server: llama-server binary not found
                  (checked: /opt/homebrew/Cellar/ollama/0.30.4/libexec/lib/ollama/llama-server,
                  /opt/homebrew/Cellar/ollama/0.30.4/libexec/llama-server, ...)

$ echo '{"hook_event_name":"PreToolUse",...}' | lore hook
Warning: Ollama embed failed (inference error); falling back to text search. Run 'lore status --full' for details.
{ ... FTS-fallback results ... }

Once Ollama is repaired (e.g. brew reinstall ollama) these flip back to green automatically — no
code change needed.

Pre-existing failure flagged

hook_transcript_tail_toggle_populates_field_when_enabled fails on main independently of this
PR. It uses FakeEmbedder and tests trace-record transcript_tail logic — completely untouched by
these changes. Root cause is a macOS-only tempdir/symlink (/var → /private/var) interaction that
the test setup doesn't canonicalise; CI is Linux-only so it never bites there. Tracked separately
for follow-up.

Plan

docs/plans/2026-06-04-001-feat-ollama-runtime-probe-plan.md
— went through three rounds of doc-review (coherence, feasibility, adversarial) plus a product
pivot from default-on probing to opt-in via --full. Plan status completed, with the ureq-3.x,
truncation, and lore init corrections recorded under "Discovered during implementation".

Test plan

cargo build --release
cargo clippy --all-targets --features test-support -- -D warnings
cargo test --features test-support --lib (792 passed, was 779 — 13 new unit tests added)
just ci (fmt, clippy, test, deny, doc) — passes except the pre-existing macOS hook test
Manual lore status against broken Homebrew Ollama (sub-second, hint shown)
Manual lore status --full against broken Homebrew Ollama (header prints first, then ✗
with Ollama's actual error body showing multiple full Cellar paths)
Manual hook invocation against broken Homebrew Ollama (informative warning + FTS results)
Run probe_succeeds_against_real_ollama integration test once Ollama is fixed (currently
expected-fail against broken bottle, which is the natural acceptance test)
Manual MCP lore_status call with probe_runtime: true against repaired Ollama (verify
runtime: ok in metadata fence)

Three-unit plan to expose Ollama inference breakage in lore status via an opt-in --full probe (U1+U2), and to upgrade the hook FTS-fallback warning so the push-side surface is informative (U3). Plan went through three rounds of doc-review (coherence, feasibility, adversarial) plus a product pivot from default-on probing to opt-in via --full.

Adds opt-in runtime verification to lore status via a new --full flag. The existing is_healthy + has_model checks only confirm the daemon answers and the manifest is on disk — they do not exercise the runner subprocess. The broken Homebrew bottle reproduces this exact gap: status reports ✓✓✓ while real inference fails, and lore silently degrades to FTS. Default lore status stays sub-second and shows a discoverability hint pointing at --full; --full runs a minimal embed against the configured model with keep_alive=0 so it does not pin the model in RAM. Provisioning (lore init) runs the probe automatically so fresh setups verify inference. ProbeError carries five variants (RunnerFailed, InferenceError, Timeout, Transport, HttpStatus) so failure modes render differently — a slow disk does not get misdiagnosed as a runner bundle bug. render_failure centralises the variant→strings match so adding a variant touches one site, not four. Body extraction parses Ollama's JSON error field when present and falls back to first-line+truncate. Spells out ureq 3.x mechanics: send_json returns Ok(Response) for HTTP 5xx in ureq 3, so probe and embed must check response.status() before read_json. The previous Embedder::embed code swallowed the broken-runner body because read_json failed parsing the error response as EmbedResponse. check_status now takes a full: bool parameter with no default — the explicit-no-default rule prevents a future Default impl from accidentally turning default lore status into the slow path.

Hook FTS-fallback warning at src/hook.rs now classifies the embed error via ProbeError, names the failure class via render_failure().short_reason, and points users at 'lore status --full' for full diagnosis. Per-process rate limit keyed on failure class prevents the same warning from repeating within one process (typically a no-op since each hook is its own process, but documents the intent if call shape changes). ureq 3.x defaults http_status_as_error to true — Error::StatusCode(_) is returned for HTTP 5xx without the response body. Manually testing the broken-Homebrew bottle revealed the runtime probe was rendering the error as 'transport error — http status: 500' instead of surfacing Ollama's actual diagnostic. Reconfigured OllamaClient's agent with http_status_as_error(false) so classify_embed_response can read the 5xx body and produce a RunnerFailed variant carrying Ollama's actual error string. is_healthy, has_model, and pull_model now check response.status() manually since they can no longer rely on the agent short-circuiting on non-2xx. Tested against the currently-broken Homebrew Ollama bottle: lore status — sub-second, hint line shown lore status --full — Runtime ✗ inference failed — error starting llama-server: llama-server binary not found... lore hook (PreToolUse) — Warning: Ollama embed failed (inference error); falling back to text search. Run 'lore status --full' for details. Followed by FTS results. Adds probe_succeeds_against_real_ollama integration test gated by 'just test-integration'. Pre-existing unrelated test failure: hook_transcript_tail_toggle_populates_ field_when_enabled fails on main without these changes; uses FakeEmbedder and tests trace-record transcript_tail logic, untouched by this PR.

Manual verification against the broken Homebrew bottle revealed the 80-char limit chopped Ollama's runner-not-found errors mid-path, exactly where the diagnostic value lives. 240 chars fits multiple full Homebrew Cellar paths on typical terminals while still bounding pathological multi-paragraph bodies. Plan KTD updated to match. Also flips plan status: active → completed and records the ureq 3.x http_status_as_error discovery under Discovered during implementation, so the plan reaches its post-merge resting shape.

Code-review surfaced a mismatch between the plan and the code. The plan claimed lore init "continues to completion on probe failure; does not abort", but cmd_init (pre-existing wrapper) calls process::exit(1) on any non-empty result.errors — before the ingest phase. The current code (abort on probe failure) is the right behaviour: letting init continue past a broken embedder would run full_ingest, fail per-chunk on embed calls, and create a database with empty embeddings — silently broken vector search. Aborting forces the user to fix Ollama and re-run; the manifest is already on disk so the re-run is fast. The plan's claim was based on a misread of pre-existing provision() semantics. Amend the plan to acknowledge this rather than change the code.

Multi-agent code review surfaced several issues; this commit lands the fixes the user approved. #2 (api-contract): mark ProbeError, ProbeOutcome, RenderedFailure with #[non_exhaustive] so adding a variant later isn't a breaking change for downstream match consumers. Main.rs (separate crate from the library) gets a wildcard arm in render_runtime_line for future variants. #4 (correctness + adversarial cross-reviewer): Embedder::embed and OllamaClient::probe both wrap the 2xx-with-empty-or-missing-embeddings case in ProbeError::InferenceError instead of a plain anyhow!. Without this, the hook downcast fell through to "unreachable" and the probe declared OK on {} bodies that provided no proof the runner ran. #5 (agent-native): expose Ollama state via MCP lore_status. New `ollama` metadata object carries installed/running/model_available/model. New optional `probe_runtime: true` argument runs the probe and exposes the ProbeOutcome (state + short_reason + status_line) under `runtime` — agents now have first-class parity with `lore status --full` without shelling out and parsing stderr. #6 (reliability): render the status header before the blocking probe so `lore status --full` doesn't show a frozen terminal during the 30 s probe wait. A visible "Verifying inference runtime..." line bridges the gap; the actual Runtime line lands when the probe returns. #7 (adversarial): use read_to_vec + String::from_utf8_lossy instead of read_to_string().unwrap_or_default(). Non-UTF8 sequences in a 5xx body no longer collapse the entire diagnostic to empty string. #10 (learnings): ureq sibling-path audit verified clean. All 5 self.agent.* call sites (is_healthy, has_model, pull_model, probe, embed) check response.status() or route through classify_embed_response. Plus three smaller follow-ups: - bump --full keep_alive from Some(0) to Some(30) so "fix Ollama, re-run --full to verify" hits a warm cache on the second invocation rather than paying cold load every time. - render ProbeOutcome::Skipped as a visible line under --full (user explicitly asked for runtime info; silent omission was surprising). - emit a lore_debug! breadcrumb when the hook FTS-fallback warning rate limit suppresses a repeat warning, so operators wondering "why didn't I see it?" have a thread to pull via LORE_DEBUG=1. Plus a snapshot accept for the lore_status tool input schema (new `probe_runtime` field).

Capture durable learnings surfaced during PR #67 so the next time someone hits these problems, the cost drops from research to lookup. best-practices/ureq-3-http-status-as-error-defaults-true — ureq 3.x kept the status-as-error default (renamed to http_status_as_error, true by default). 5xx responses surface as Error::StatusCode(u16) without the body. The plan and a feasibility reviewer both got this wrong; only manual verification surfaced it. Documents the reconfiguration pattern + sibling-callsite audit. best-practices/bin-and-lib-are-separate-crates-for-non-exhaustive — #[non_exhaustive] on a public enum in src/lib.rs requires a wildcard arm in src/main.rs matches but not in same-module-tree matches. bin and lib in the same Cargo package compile as separate crates. The asymmetry is non-obvious from the Rust reference. design-patterns/multi-surface-render-function-pattern — when an enum must render across N user-facing surfaces, centralise the variant-to- strings match in one function returning an N-field struct. Adding a variant touches one match arm; the struct fields are the discovery mechanism. Reference implementation is render_failure(&ProbeError) -> RenderedFailure in src/embeddings.rs. best-practices/http-error-body-extraction-and-char-boundary-safe- truncation — four-layer body extraction for arbitrary HTTP error bodies: lossy UTF-8 decode, JSON error field unwrap, first-non-empty- line fallback, control-character strip, char-count truncation with ellipsis. The char-boundary safety property is the one that prevents multi-byte slice panics; unit tests pin it. conventions/expensive-cli-checks-go-behind-opt-in-flag-with-rescue- mechanisms — the four-mechanism CLI design that emerged from the default-on-vs-opt-in product pivot: fast default + discoverability hint line + push-side warning + install-time auto-probe. Each mechanism catches a different user path; together they solve silent degradation without penalising routine status calls. All five include "When to Apply" and "Skip when" guidance so future agents can reason about applicability rather than copying blindly.

The macOS tempdir lives under /var/folders/... which symlinks to /private/var/folders/.... validate_transcript_path canonicalises the transcript path before the startswith(HOME) check, so the test's non-canonical HOME override failed the check on macOS while passing on Linux. Canonicalise the tempdir path before setting HOME. Also backfill the PR #67 (Ollama runtime probe) entry in ROADMAP.md under Completed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

attila added 3 commits June 4, 2026 20:26

attila marked this pull request as ready for review June 4, 2026 19:52

attila added 4 commits June 4, 2026 20:54

attila merged commit 08713f0 into main Jun 4, 2026
10 checks passed

attila deleted the feat/ollama-runtime-probe branch June 4, 2026 20:56

attila mentioned this pull request Jun 4, 2026

fix: canonicalise HOME in transcript-tail test for macOS #68

Merged

2 tasks

attila mentioned this pull request Jun 4, 2026

chore(release): cut v0.5.0 #69

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: probe Ollama runtime via lore status --full and richer hook warning#67

feat: probe Ollama runtime via lore status --full and richer hook warning#67
attila merged 7 commits into
mainfrom
feat/ollama-runtime-probe

attila commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

attila commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Discovered during implementation

Code-review fixes applied

Compounded learnings

Verification (currently-broken Homebrew Ollama bottle)

Pre-existing failure flagged

Plan

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

attila commented Jun 4, 2026 •

edited

Loading