feat: probe Ollama runtime via lore status --full and richer hook warning#67
Merged
Conversation
Three-unit plan to expose Ollama inference breakage in lore status via an opt-in --full probe (U1+U2), and to upgrade the hook FTS-fallback warning so the push-side surface is informative (U3). Plan went through three rounds of doc-review (coherence, feasibility, adversarial) plus a product pivot from default-on probing to opt-in via --full.
Adds opt-in runtime verification to lore status via a new --full flag. The existing is_healthy + has_model checks only confirm the daemon answers and the manifest is on disk — they do not exercise the runner subprocess. The broken Homebrew bottle reproduces this exact gap: status reports ✓✓✓ while real inference fails, and lore silently degrades to FTS. Default lore status stays sub-second and shows a discoverability hint pointing at --full; --full runs a minimal embed against the configured model with keep_alive=0 so it does not pin the model in RAM. Provisioning (lore init) runs the probe automatically so fresh setups verify inference. ProbeError carries five variants (RunnerFailed, InferenceError, Timeout, Transport, HttpStatus) so failure modes render differently — a slow disk does not get misdiagnosed as a runner bundle bug. render_failure centralises the variant→strings match so adding a variant touches one site, not four. Body extraction parses Ollama's JSON error field when present and falls back to first-line+truncate. Spells out ureq 3.x mechanics: send_json returns Ok(Response) for HTTP 5xx in ureq 3, so probe and embed must check response.status() before read_json. The previous Embedder::embed code swallowed the broken-runner body because read_json failed parsing the error response as EmbedResponse. check_status now takes a full: bool parameter with no default — the explicit-no-default rule prevents a future Default impl from accidentally turning default lore status into the slow path.
Hook FTS-fallback warning at src/hook.rs now classifies the embed error via
ProbeError, names the failure class via render_failure().short_reason, and
points users at 'lore status --full' for full diagnosis. Per-process rate
limit keyed on failure class prevents the same warning from repeating within
one process (typically a no-op since each hook is its own process, but
documents the intent if call shape changes).
ureq 3.x defaults http_status_as_error to true — Error::StatusCode(_) is
returned for HTTP 5xx without the response body. Manually testing the
broken-Homebrew bottle revealed the runtime probe was rendering the error
as 'transport error — http status: 500' instead of surfacing Ollama's
actual diagnostic. Reconfigured OllamaClient's agent with
http_status_as_error(false) so classify_embed_response can read the 5xx
body and produce a RunnerFailed variant carrying Ollama's actual error
string. is_healthy, has_model, and pull_model now check response.status()
manually since they can no longer rely on the agent short-circuiting on
non-2xx.
Tested against the currently-broken Homebrew Ollama bottle:
lore status — sub-second, hint line shown
lore status --full — Runtime ✗ inference failed — error starting
llama-server: llama-server binary not found...
lore hook (PreToolUse) — Warning: Ollama embed failed (inference error);
falling back to text search. Run 'lore status
--full' for details. Followed by FTS results.
Adds probe_succeeds_against_real_ollama integration test gated by
'just test-integration'.
Pre-existing unrelated test failure: hook_transcript_tail_toggle_populates_
field_when_enabled fails on main without these changes; uses FakeEmbedder
and tests trace-record transcript_tail logic, untouched by this PR.
Manual verification against the broken Homebrew bottle revealed the 80-char limit chopped Ollama's runner-not-found errors mid-path, exactly where the diagnostic value lives. 240 chars fits multiple full Homebrew Cellar paths on typical terminals while still bounding pathological multi-paragraph bodies. Plan KTD updated to match. Also flips plan status: active → completed and records the ureq 3.x http_status_as_error discovery under Discovered during implementation, so the plan reaches its post-merge resting shape.
Code-review surfaced a mismatch between the plan and the code. The plan claimed lore init "continues to completion on probe failure; does not abort", but cmd_init (pre-existing wrapper) calls process::exit(1) on any non-empty result.errors — before the ingest phase. The current code (abort on probe failure) is the right behaviour: letting init continue past a broken embedder would run full_ingest, fail per-chunk on embed calls, and create a database with empty embeddings — silently broken vector search. Aborting forces the user to fix Ollama and re-run; the manifest is already on disk so the re-run is fast. The plan's claim was based on a misread of pre-existing provision() semantics. Amend the plan to acknowledge this rather than change the code.
Multi-agent code review surfaced several issues; this commit lands the fixes the user approved. #2 (api-contract): mark ProbeError, ProbeOutcome, RenderedFailure with #[non_exhaustive] so adding a variant later isn't a breaking change for downstream match consumers. Main.rs (separate crate from the library) gets a wildcard arm in render_runtime_line for future variants. #4 (correctness + adversarial cross-reviewer): Embedder::embed and OllamaClient::probe both wrap the 2xx-with-empty-or-missing-embeddings case in ProbeError::InferenceError instead of a plain anyhow!. Without this, the hook downcast fell through to "unreachable" and the probe declared OK on {} bodies that provided no proof the runner ran. #5 (agent-native): expose Ollama state via MCP lore_status. New `ollama` metadata object carries installed/running/model_available/model. New optional `probe_runtime: true` argument runs the probe and exposes the ProbeOutcome (state + short_reason + status_line) under `runtime` — agents now have first-class parity with `lore status --full` without shelling out and parsing stderr. #6 (reliability): render the status header before the blocking probe so `lore status --full` doesn't show a frozen terminal during the 30 s probe wait. A visible "Verifying inference runtime..." line bridges the gap; the actual Runtime line lands when the probe returns. #7 (adversarial): use read_to_vec + String::from_utf8_lossy instead of read_to_string().unwrap_or_default(). Non-UTF8 sequences in a 5xx body no longer collapse the entire diagnostic to empty string. #10 (learnings): ureq sibling-path audit verified clean. All 5 self.agent.* call sites (is_healthy, has_model, pull_model, probe, embed) check response.status() or route through classify_embed_response. Plus three smaller follow-ups: - bump --full keep_alive from Some(0) to Some(30) so "fix Ollama, re-run --full to verify" hits a warm cache on the second invocation rather than paying cold load every time. - render ProbeOutcome::Skipped as a visible line under --full (user explicitly asked for runtime info; silent omission was surprising). - emit a lore_debug! breadcrumb when the hook FTS-fallback warning rate limit suppresses a repeat warning, so operators wondering "why didn't I see it?" have a thread to pull via LORE_DEBUG=1. Plus a snapshot accept for the lore_status tool input schema (new `probe_runtime` field).
Capture durable learnings surfaced during PR #67 so the next time someone hits these problems, the cost drops from research to lookup. best-practices/ureq-3-http-status-as-error-defaults-true — ureq 3.x kept the status-as-error default (renamed to http_status_as_error, true by default). 5xx responses surface as Error::StatusCode(u16) without the body. The plan and a feasibility reviewer both got this wrong; only manual verification surfaced it. Documents the reconfiguration pattern + sibling-callsite audit. best-practices/bin-and-lib-are-separate-crates-for-non-exhaustive — #[non_exhaustive] on a public enum in src/lib.rs requires a wildcard arm in src/main.rs matches but not in same-module-tree matches. bin and lib in the same Cargo package compile as separate crates. The asymmetry is non-obvious from the Rust reference. design-patterns/multi-surface-render-function-pattern — when an enum must render across N user-facing surfaces, centralise the variant-to- strings match in one function returning an N-field struct. Adding a variant touches one match arm; the struct fields are the discovery mechanism. Reference implementation is render_failure(&ProbeError) -> RenderedFailure in src/embeddings.rs. best-practices/http-error-body-extraction-and-char-boundary-safe- truncation — four-layer body extraction for arbitrary HTTP error bodies: lossy UTF-8 decode, JSON error field unwrap, first-non-empty- line fallback, control-character strip, char-count truncation with ellipsis. The char-boundary safety property is the one that prevents multi-byte slice panics; unit tests pin it. conventions/expensive-cli-checks-go-behind-opt-in-flag-with-rescue- mechanisms — the four-mechanism CLI design that emerged from the default-on-vs-opt-in product pivot: fast default + discoverability hint line + push-side warning + install-time auto-probe. Each mechanism catches a different user path; together they solve silent degradation without penalising routine status calls. All five include "When to Apply" and "Skip when" guidance so future agents can reason about applicability rather than copying blindly.
2 tasks
attila
added a commit
that referenced
this pull request
Jun 4, 2026
The macOS tempdir lives under /var/folders/... which symlinks to /private/var/folders/.... validate_transcript_path canonicalises the transcript path before the startswith(HOME) check, so the test's non-canonical HOME override failed the check on macOS while passing on Linux. Canonicalise the tempdir path before setting HOME. Also backfill the PR #67 (Ollama runtime probe) entry in ROADMAP.md under Completed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lore statuspreviously reported Ollama as healthy whenever the daemon answered and the modelmanifest was on disk — even when the bundled runner subprocess was missing or unloadable. The
current Homebrew Ollama bottle reproduces this exact gap: status shows ✓✓✓ while hooks silently
degrade to FTS with an opaque one-line warning.
This adds:
lore status --full— opt-in deeper check that probes Ollama with a real embed call againstthe configured model. Default
lore statusstays sub-second and shows a discoverability hintpointing at
--full. Header lines print before the probe so the multi-second wait isn't afrozen terminal.
lore initauto-probes — provisioning verifies inference end-to-end before declaring success.inference error,timed out,transport error,HTTP <status>) and points atlore status --fullfor full diagnosis.Per-process rate limit prevents repeat warnings of the same class; suppressions leave a
LORE_DEBUG=1breadcrumb.lore_statusparity — exposes Ollama state (installed/running/model_available/model)in the metadata fence by default, with optional
probe_runtime: trueargument that runs theprobe and returns the structured
ProbeOutcome(state + short_reason + status_line). Agentsnow have first-class parity with
lore status --fullwithout shelling out and parsing stderr.The probe returns a structured
ProbeError(RunnerFailed / InferenceError / Timeout / Transport /HttpStatus) and a single
render_failurefunction centralises the variant→strings match so alluser-facing surfaces (status line, provision errors, provision actions, hook short-reason, MCP
metadata) stay synchronised when the enum gains a variant.
ProbeError,ProbeOutcome, andRenderedFailurecarry#[non_exhaustive]so future variants aren't breaking changes fordownstream
matchconsumers.Discovered during implementation
http_status_as_errordefaults totrue. The plan asserted ureq 3 returnedOk(Response)for HTTP 5xx. Manual verification revealed the opposite — 5xx came back asError::StatusCode(_)without the body, rendering "transport error — http status: 500" instead ofOllama's actual diagnostic. Reconfigured
OllamaClient's agent withhttp_status_as_error(false)and made
is_healthy/has_model/pull_modelcheckresponse.status()explicitly topreserve their semantics. Sibling-path audit during code-review confirmed all five
self.agent.*callsites are covered.
errors mid-path, exactly where the diagnostic value lives. 240 fits multiple full Homebrew Cellar
paths on typical terminals while still bounding pathological multi-paragraph bodies.
from_utf8_lossy. Non-UTF8 sequences in a 5xx body would otherwise collapsethe whole diagnostic to empty string. Lossy decode preserves what's decodable.
lore initaborts on probe failure. An earlier plan draft claimed initcontinues to completion. The pre-existing
cmd_initwrapper has always calledprocess::exit(1)on any non-emptyresult.errors; letting init continue past a broken embedderwould create a database with empty embeddings (silently broken vector search). Aborting forces
the user to fix Ollama and re-run; the manifest is already on disk so the re-run is fast. Plan
amended; code is right.
Code-review fixes applied
PR went through a multi-agent code review (10 reviewers). All four P1 findings addressed:
lore init aborts on probe failure— plan amended (code is right).#[non_exhaustive]added toProbeError,ProbeOutcome,RenderedFailure.check_status signature break— not advertised as breaking; project isn't published tocrates.io yet, no external consumers.
ProbeError::InferenceError(wasanyhow!), so thehook downcast classifies correctly.
probe()also rejects{}bodies that prove nothing.Plus all approved P2/P3 follow-ups: MCP parity, header-before-probe rendering,
from_utf8_lossy,13 new unit tests for
extract_body_message/render_failure/ProbeErrordowncast,keep_alivebumped 0→30s on--fullfor the "fix Ollama, re-run" loop, visible Skipped Runtimeline under
--full,lore_debug!breadcrumb on rate-limit suppression.Compounded learnings
Five durable learnings landed in
docs/solutions/so future sessions hit lookups instead ofresearch. All five include explicit "When to Apply" / "Skip when" guidance:
best-practices/ureq-3-http-status-as-error-defaults-true-2026-06-04.md— ureq 3.x kept the status-as-error default; 5xx loses its body unless reconfigured.
best-practices/bin-and-lib-are-separate-crates-for-non-exhaustive-2026-06-04.md—
#[non_exhaustive]semantics across the lib/bin boundary in the same Cargo package.design-patterns/multi-surface-render-function-pattern-2026-06-04.md— one
render(&Enum) -> Renderedfunction returning an N-field struct keeps everysurface in sync when variants change.
best-practices/http-error-body-extraction-and-char-boundary-safe-truncation-2026-06-04.md— four-layer body extraction (lossy decode → JSON error → first line → char-bounded
truncation) with the multi-byte-codepoint safety property pinned by unit tests.
conventions/expensive-cli-checks-go-behind-opt-in-flag-with-rescue-mechanisms-2026-06-04.md— the four-mechanism design that emerged from the default-on/opt-in product pivot:
fast default + discoverability hint + push-side warning + install-time auto-probe.
Verification (currently-broken Homebrew Ollama bottle)
Before this PR:
lore statusshows ✓✓✓, hook printsWarning: Ollama unreachable (...), falling back to text search.After this PR:
Once Ollama is repaired (e.g.
brew reinstall ollama) these flip back to green automatically — nocode change needed.
Pre-existing failure flagged
hook_transcript_tail_toggle_populates_field_when_enabledfails onmainindependently of thisPR. It uses
FakeEmbedderand tests trace-recordtranscript_taillogic — completely untouched bythese changes. Root cause is a macOS-only tempdir/symlink (
/var→/private/var) interaction thatthe test setup doesn't canonicalise; CI is Linux-only so it never bites there. Tracked separately
for follow-up.
Plan
docs/plans/2026-06-04-001-feat-ollama-runtime-probe-plan.md— went through three rounds of doc-review (coherence, feasibility, adversarial) plus a product
pivot from default-on probing to opt-in via
--full. Plan statuscompleted, with the ureq-3.x,truncation, and
lore initcorrections recorded under "Discovered during implementation".Test plan
cargo build --releasecargo clippy --all-targets --features test-support -- -D warningscargo test --features test-support --lib(792 passed, was 779 — 13 new unit tests added)just ci(fmt, clippy, test, deny, doc) — passes except the pre-existing macOS hook testlore statusagainst broken Homebrew Ollama (sub-second, hint shown)lore status --fullagainst broken Homebrew Ollama (header prints first, then ✗with Ollama's actual error body showing multiple full Cellar paths)
probe_succeeds_against_real_ollamaintegration test once Ollama is fixed (currentlyexpected-fail against broken bottle, which is the natural acceptance test)
lore_statuscall withprobe_runtime: trueagainst repaired Ollama (verifyruntime: ok in metadata fence)