Skip to content

feat: probe Ollama runtime via lore status --full and richer hook warning#67

Merged
attila merged 7 commits into
mainfrom
feat/ollama-runtime-probe
Jun 4, 2026
Merged

feat: probe Ollama runtime via lore status --full and richer hook warning#67
attila merged 7 commits into
mainfrom
feat/ollama-runtime-probe

Conversation

@attila

@attila attila commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

lore status previously reported Ollama as healthy whenever the daemon answered and the model
manifest was on disk — even when the bundled runner subprocess was missing or unloadable. The
current Homebrew Ollama bottle reproduces this exact gap: status shows ✓✓✓ while hooks silently
degrade to FTS with an opaque one-line warning.

This adds:

  • lore status --full — opt-in deeper check that probes Ollama with a real embed call against
    the configured model. Default lore status stays sub-second and shows a discoverability hint
    pointing at --full. Header lines print before the probe so the multi-second wait isn't a
    frozen terminal.
  • lore init auto-probes — provisioning verifies inference end-to-end before declaring success.
  • Upgraded hook FTS-fallback warning — names the failure class (inference error, timed out,
    transport error, HTTP <status>) and points at lore status --full for full diagnosis.
    Per-process rate limit prevents repeat warnings of the same class; suppressions leave a
    LORE_DEBUG=1 breadcrumb.
  • MCP lore_status parity — exposes Ollama state (installed/running/model_available/model)
    in the metadata fence by default, with optional probe_runtime: true argument that runs the
    probe and returns the structured ProbeOutcome (state + short_reason + status_line). Agents
    now have first-class parity with lore status --full without shelling out and parsing stderr.

The probe returns a structured ProbeError (RunnerFailed / InferenceError / Timeout / Transport /
HttpStatus) and a single render_failure function centralises the variant→strings match so all
user-facing surfaces (status line, provision errors, provision actions, hook short-reason, MCP
metadata) stay synchronised when the enum gains a variant. ProbeError, ProbeOutcome, and
RenderedFailure carry #[non_exhaustive] so future variants aren't breaking changes for
downstream match consumers.

Discovered during implementation

  • ureq 3.x http_status_as_error defaults to true. The plan asserted ureq 3 returned
    Ok(Response) for HTTP 5xx. Manual verification revealed the opposite — 5xx came back as
    Error::StatusCode(_) without the body, rendering "transport error — http status: 500" instead of
    Ollama's actual diagnostic. Reconfigured OllamaClient's agent with http_status_as_error(false)
    and made is_healthy / has_model / pull_model check response.status() explicitly to
    preserve their semantics. Sibling-path audit during code-review confirmed all five self.agent.*
    callsites are covered.
  • Body truncation bumped 80 → 240 chars. The 80-char limit chopped Ollama's runner-not-found
    errors mid-path, exactly where the diagnostic value lives. 240 fits multiple full Homebrew Cellar
    paths on typical terminals while still bounding pathological multi-paragraph bodies.
  • Body read uses from_utf8_lossy. Non-UTF8 sequences in a 5xx body would otherwise collapse
    the whole diagnostic to empty string. Lossy decode preserves what's decodable.
  • Plan correction: lore init aborts on probe failure. An earlier plan draft claimed init
    continues to completion. The pre-existing cmd_init wrapper has always called
    process::exit(1) on any non-empty result.errors; letting init continue past a broken embedder
    would create a database with empty embeddings (silently broken vector search). Aborting forces
    the user to fix Ollama and re-run; the manifest is already on disk so the re-run is fast. Plan
    amended; code is right.

Code-review fixes applied

PR went through a multi-agent code review (10 reviewers). All four P1 findings addressed:

  1. lore init aborts on probe failure — plan amended (code is right).
  2. #[non_exhaustive] added to ProbeError, ProbeOutcome, RenderedFailure.
  3. check_status signature break — not advertised as breaking; project isn't published to
    crates.io yet, no external consumers.
  4. HTTP 200 empty-embeddings now wraps in ProbeError::InferenceError (was anyhow!), so the
    hook downcast classifies correctly. probe() also rejects {} bodies that prove nothing.

Plus all approved P2/P3 follow-ups: MCP parity, header-before-probe rendering, from_utf8_lossy,
13 new unit tests for extract_body_message / render_failure / ProbeError downcast,
keep_alive bumped 0→30s on --full for the "fix Ollama, re-run" loop, visible Skipped Runtime
line under --full, lore_debug! breadcrumb on rate-limit suppression.

Compounded learnings

Five durable learnings landed in docs/solutions/ so future sessions hit lookups instead of
research. All five include explicit "When to Apply" / "Skip when" guidance:

Verification (currently-broken Homebrew Ollama bottle)

Before this PR: lore status shows ✓✓✓, hook prints
Warning: Ollama unreachable (...), falling back to text search.

After this PR:

$ lore status        # sub-second, hint surfaces deeper check
  Model:        ✓ nomic-embed-text
  Runtime:      —  (run 'lore status --full' to verify inference)

$ lore status --full # header prints first, then a brief wait, then result
  Model:        ✓ nomic-embed-text
  Verifying inference runtime (may take a few seconds)…
  Runtime:      ✗ inference failed — error starting llama-server: llama-server binary not found
                  (checked: /opt/homebrew/Cellar/ollama/0.30.4/libexec/lib/ollama/llama-server,
                  /opt/homebrew/Cellar/ollama/0.30.4/libexec/llama-server, ...)

$ echo '{"hook_event_name":"PreToolUse",...}' | lore hook
Warning: Ollama embed failed (inference error); falling back to text search. Run 'lore status --full' for details.
{ ... FTS-fallback results ... }

Once Ollama is repaired (e.g. brew reinstall ollama) these flip back to green automatically — no
code change needed.

Pre-existing failure flagged

hook_transcript_tail_toggle_populates_field_when_enabled fails on main independently of this
PR. It uses FakeEmbedder and tests trace-record transcript_tail logic — completely untouched by
these changes. Root cause is a macOS-only tempdir/symlink (/var/private/var) interaction that
the test setup doesn't canonicalise; CI is Linux-only so it never bites there. Tracked separately
for follow-up.

Plan

docs/plans/2026-06-04-001-feat-ollama-runtime-probe-plan.md
— went through three rounds of doc-review (coherence, feasibility, adversarial) plus a product
pivot from default-on probing to opt-in via --full. Plan status completed, with the ureq-3.x,
truncation, and lore init corrections recorded under "Discovered during implementation".

Test plan

  • cargo build --release
  • cargo clippy --all-targets --features test-support -- -D warnings
  • cargo test --features test-support --lib (792 passed, was 779 — 13 new unit tests added)
  • just ci (fmt, clippy, test, deny, doc) — passes except the pre-existing macOS hook test
  • Manual lore status against broken Homebrew Ollama (sub-second, hint shown)
  • Manual lore status --full against broken Homebrew Ollama (header prints first, then ✗
    with Ollama's actual error body showing multiple full Cellar paths)
  • Manual hook invocation against broken Homebrew Ollama (informative warning + FTS results)
  • Run probe_succeeds_against_real_ollama integration test once Ollama is fixed (currently
    expected-fail against broken bottle, which is the natural acceptance test)
  • Manual MCP lore_status call with probe_runtime: true against repaired Ollama (verify
    runtime: ok in metadata fence)

attila added 3 commits June 4, 2026 20:26
Three-unit plan to expose Ollama inference breakage in lore status via an
opt-in --full probe (U1+U2), and to upgrade the hook FTS-fallback warning so
the push-side surface is informative (U3). Plan went through three rounds of
doc-review (coherence, feasibility, adversarial) plus a product pivot from
default-on probing to opt-in via --full.
Adds opt-in runtime verification to lore status via a new --full flag. The
existing is_healthy + has_model checks only confirm the daemon answers and
the manifest is on disk — they do not exercise the runner subprocess. The
broken Homebrew bottle reproduces this exact gap: status reports ✓✓✓ while
real inference fails, and lore silently degrades to FTS.

Default lore status stays sub-second and shows a discoverability hint
pointing at --full; --full runs a minimal embed against the configured
model with keep_alive=0 so it does not pin the model in RAM. Provisioning
(lore init) runs the probe automatically so fresh setups verify inference.

ProbeError carries five variants (RunnerFailed, InferenceError, Timeout,
Transport, HttpStatus) so failure modes render differently — a slow disk
does not get misdiagnosed as a runner bundle bug. render_failure
centralises the variant→strings match so adding a variant touches one site,
not four. Body extraction parses Ollama's JSON error field when present and
falls back to first-line+truncate.

Spells out ureq 3.x mechanics: send_json returns Ok(Response) for HTTP 5xx
in ureq 3, so probe and embed must check response.status() before
read_json. The previous Embedder::embed code swallowed the broken-runner
body because read_json failed parsing the error response as EmbedResponse.

check_status now takes a full: bool parameter with no default — the
explicit-no-default rule prevents a future Default impl from accidentally
turning default lore status into the slow path.
Hook FTS-fallback warning at src/hook.rs now classifies the embed error via
ProbeError, names the failure class via render_failure().short_reason, and
points users at 'lore status --full' for full diagnosis. Per-process rate
limit keyed on failure class prevents the same warning from repeating within
one process (typically a no-op since each hook is its own process, but
documents the intent if call shape changes).

ureq 3.x defaults http_status_as_error to true — Error::StatusCode(_) is
returned for HTTP 5xx without the response body. Manually testing the
broken-Homebrew bottle revealed the runtime probe was rendering the error
as 'transport error — http status: 500' instead of surfacing Ollama's
actual diagnostic. Reconfigured OllamaClient's agent with
http_status_as_error(false) so classify_embed_response can read the 5xx
body and produce a RunnerFailed variant carrying Ollama's actual error
string. is_healthy, has_model, and pull_model now check response.status()
manually since they can no longer rely on the agent short-circuiting on
non-2xx.

Tested against the currently-broken Homebrew Ollama bottle:
  lore status         — sub-second, hint line shown
  lore status --full  — Runtime ✗ inference failed — error starting
                        llama-server: llama-server binary not found...
  lore hook (PreToolUse) — Warning: Ollama embed failed (inference error);
                           falling back to text search. Run 'lore status
                           --full' for details. Followed by FTS results.

Adds probe_succeeds_against_real_ollama integration test gated by
'just test-integration'.

Pre-existing unrelated test failure: hook_transcript_tail_toggle_populates_
field_when_enabled fails on main without these changes; uses FakeEmbedder
and tests trace-record transcript_tail logic, untouched by this PR.
@attila attila marked this pull request as ready for review June 4, 2026 19:52
attila added 4 commits June 4, 2026 20:54
Manual verification against the broken Homebrew bottle revealed the 80-char
limit chopped Ollama's runner-not-found errors mid-path, exactly where the
diagnostic value lives. 240 chars fits multiple full Homebrew Cellar paths
on typical terminals while still bounding pathological multi-paragraph
bodies. Plan KTD updated to match.

Also flips plan status: active → completed and records the ureq 3.x
http_status_as_error discovery under Discovered during implementation, so
the plan reaches its post-merge resting shape.
Code-review surfaced a mismatch between the plan and the code. The plan
claimed lore init "continues to completion on probe failure; does not
abort", but cmd_init (pre-existing wrapper) calls process::exit(1) on any
non-empty result.errors — before the ingest phase.

The current code (abort on probe failure) is the right behaviour: letting
init continue past a broken embedder would run full_ingest, fail per-chunk
on embed calls, and create a database with empty embeddings — silently
broken vector search. Aborting forces the user to fix Ollama and re-run;
the manifest is already on disk so the re-run is fast.

The plan's claim was based on a misread of pre-existing provision()
semantics. Amend the plan to acknowledge this rather than change the code.
Multi-agent code review surfaced several issues; this commit lands the
fixes the user approved.

#2 (api-contract): mark ProbeError, ProbeOutcome, RenderedFailure with
  #[non_exhaustive] so adding a variant later isn't a breaking change for
  downstream match consumers. Main.rs (separate crate from the library)
  gets a wildcard arm in render_runtime_line for future variants.

#4 (correctness + adversarial cross-reviewer): Embedder::embed and
  OllamaClient::probe both wrap the 2xx-with-empty-or-missing-embeddings
  case in ProbeError::InferenceError instead of a plain anyhow!. Without
  this, the hook downcast fell through to "unreachable" and the probe
  declared OK on {} bodies that provided no proof the runner ran.

#5 (agent-native): expose Ollama state via MCP lore_status. New `ollama`
  metadata object carries installed/running/model_available/model. New
  optional `probe_runtime: true` argument runs the probe and exposes the
  ProbeOutcome (state + short_reason + status_line) under `runtime` —
  agents now have first-class parity with `lore status --full` without
  shelling out and parsing stderr.

#6 (reliability): render the status header before the blocking probe so
  `lore status --full` doesn't show a frozen terminal during the 30 s
  probe wait. A visible "Verifying inference runtime..." line bridges the
  gap; the actual Runtime line lands when the probe returns.

#7 (adversarial): use read_to_vec + String::from_utf8_lossy instead of
  read_to_string().unwrap_or_default(). Non-UTF8 sequences in a 5xx body
  no longer collapse the entire diagnostic to empty string.

#10 (learnings): ureq sibling-path audit verified clean. All 5
  self.agent.* call sites (is_healthy, has_model, pull_model, probe,
  embed) check response.status() or route through classify_embed_response.

Plus three smaller follow-ups:
- bump --full keep_alive from Some(0) to Some(30) so "fix Ollama, re-run
  --full to verify" hits a warm cache on the second invocation rather
  than paying cold load every time.
- render ProbeOutcome::Skipped as a visible line under --full (user
  explicitly asked for runtime info; silent omission was surprising).
- emit a lore_debug! breadcrumb when the hook FTS-fallback warning rate
  limit suppresses a repeat warning, so operators wondering "why didn't
  I see it?" have a thread to pull via LORE_DEBUG=1.

Plus a snapshot accept for the lore_status tool input schema (new
`probe_runtime` field).
Capture durable learnings surfaced during PR #67 so the next time
someone hits these problems, the cost drops from research to lookup.

best-practices/ureq-3-http-status-as-error-defaults-true — ureq 3.x
kept the status-as-error default (renamed to http_status_as_error,
true by default). 5xx responses surface as Error::StatusCode(u16)
without the body. The plan and a feasibility reviewer both got this
wrong; only manual verification surfaced it. Documents the
reconfiguration pattern + sibling-callsite audit.

best-practices/bin-and-lib-are-separate-crates-for-non-exhaustive —
#[non_exhaustive] on a public enum in src/lib.rs requires a wildcard
arm in src/main.rs matches but not in same-module-tree matches. bin
and lib in the same Cargo package compile as separate crates. The
asymmetry is non-obvious from the Rust reference.

design-patterns/multi-surface-render-function-pattern — when an enum
must render across N user-facing surfaces, centralise the variant-to-
strings match in one function returning an N-field struct. Adding a
variant touches one match arm; the struct fields are the discovery
mechanism. Reference implementation is render_failure(&ProbeError) ->
RenderedFailure in src/embeddings.rs.

best-practices/http-error-body-extraction-and-char-boundary-safe-
truncation — four-layer body extraction for arbitrary HTTP error
bodies: lossy UTF-8 decode, JSON error field unwrap, first-non-empty-
line fallback, control-character strip, char-count truncation with
ellipsis. The char-boundary safety property is the one that prevents
multi-byte slice panics; unit tests pin it.

conventions/expensive-cli-checks-go-behind-opt-in-flag-with-rescue-
mechanisms — the four-mechanism CLI design that emerged from the
default-on-vs-opt-in product pivot: fast default + discoverability
hint line + push-side warning + install-time auto-probe. Each
mechanism catches a different user path; together they solve silent
degradation without penalising routine status calls.

All five include "When to Apply" and "Skip when" guidance so future
agents can reason about applicability rather than copying blindly.
@attila attila merged commit 08713f0 into main Jun 4, 2026
10 checks passed
@attila attila deleted the feat/ollama-runtime-probe branch June 4, 2026 20:56
attila added a commit that referenced this pull request Jun 4, 2026
The macOS tempdir lives under /var/folders/... which symlinks to
/private/var/folders/.... validate_transcript_path canonicalises
the transcript path before the startswith(HOME) check, so the test's
non-canonical HOME override failed the check on macOS while passing
on Linux. Canonicalise the tempdir path before setting HOME.

Also backfill the PR #67 (Ollama runtime probe) entry in ROADMAP.md
under Completed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@attila attila mentioned this pull request Jun 4, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant