Releases: Fortemi/fortemi
v2026.5.13
Security maintenance release for the dependency advisory sweep after 2026.5.12. This release updates vulnerable Rust and npm transitive artifacts, removes obsolete advisory allowlists, and includes the post-2026.5.12 documentation/issue-tracking cleanup commits.
Security
- Rust advisory updates - Updated
opensslfrom0.10.79to0.10.80,rand0.8.5to0.8.6, andrand0.9.2to0.9.3inCargo.lock. - npm advisory update - Pinned the MCP server dependency graph to
qs6.15.2viamcp-server/package.jsonoverrides and lockfile update. - Advisory allowlist cleanup - Removed the temporary
RUSTSEC-2026-0097ignores fromcargo auditandcargo denyconfiguration now that the patchedrandlines are present. - Supply-chain review completed - Verified crates.io/npm provenance and artifact integrity for the patched packages, diffed prior and patched artifacts, and closed Fortemi issue #857 with the evidence summary.
Changed
- Issue tracking preference - Updated project guidance to prefer the canonical Gitea tracker.
- Planning documentation - Captured licensing and storage planning updates for the upcoming distribution work.
Verification
cargo auditcargo deny check advisoriesnpm ls qsnpm audit --omit=dev
v2026.5.12
Realtime provider integration milestone. This release adds the standards-shaped call transport foundation, the first Twilio Voice adapter, Deepgram live ASR, provider-neutral call event outbox contracts, and the batch transcription bridge for completed Twilio recordings. It also includes fail-closed authentication defaults and the incoming webhook receiver foundation used by provider control-plane callbacks.
Highlights
| What Changed | Why You Care |
|---|---|
| Twilio Voice realtime adapter | Fortemi can accept signed Twilio Voice webhooks and Twilio Media Streams WebSocket audio for live call transcription. |
| Standards-shaped realtime contracts | Provider-specific wire formats stay inside adapters while call lifecycle, media frames, ASR events, and outbox rows stay provider-neutral. |
| Deepgram streaming ASR | Live call audio can produce partial/final transcript events with reconnect, failover accounting, and health metrics. |
| Recording-completed batch bridge | Twilio recordings are imported as audio attachments and queued through the existing AudioTranscriptionHandler for higher-quality post-call transcripts. |
| Fail-closed auth default | /api/v1/* endpoints now require auth by default unless an operator explicitly opts into anonymous local mode. |
Added
- Incoming webhook receivers —
POST /api/v1/webhooks/incoming, receiver lookup, payload validation, HMAC verification, and Twilio Voice schema support for provider control-plane callbacks. - Realtime call sessions — persisted
call_sessionsmetadata, call detail lookup atGET /api/v1/calls/{call_id}, realtime session metrics, and database coverage for active/completed call aggregation. - Realtime transport foundation — provider-neutral
MediaFrame, codec normalization, mock adapter fixtures, mock ASR backend, and Twilio adapter mapping helpers. - Twilio Voice + Media Streams support — signed Voice webhooks create/update call sessions,
/api/v1/realtime/twilio/{CallSid}accepts Twilio media streams, and recent-session gating rejects stale or unknown WebSocket attempts. - Deepgram streaming backend — WebSocket ASR client with secure API-key handling, event parsing, reconnect/backoff behavior, failover to a configured fallback backend, and health metrics.
- Transcript outbox foundation —
event_outboxtable and helpers for realtime transcript/call events, including high-volume transcript emission coverage. - Twilio call-event outbox contract — adapter-owned mapping for
call_started,state_change,recording_available, andendedevents so downstream consumers are insulated from Twilio status strings. - Twilio recording transcription bridge — completed recording callbacks download the recording into file storage, create an audio attachment note, queue
AudioTranscription, and link batch transcript policy metadata back to the call session. - Realtime provider setup documentation —
docs/deployment/realtime-providers.mdnow covers Twilio + Deepgram setup, consent/disclosure, troubleshooting, and contract completeness for local, single-tenant, multi-tenant, regulated, and future provider deployments.
Changed
- Authentication defaults are fail-closed —
REQUIRE_AUTHnow defaults totrue. Anonymous mode requires bothREQUIRE_AUTH=falseandI_UNDERSTAND_NO_AUTH=true; multi-tenant deployments reject anonymous mode regardless of acknowledgment. - CI and pre-commit provider-boundary checks — realtime provider-specific imports are rejected outside adapter modules.
- Docs site workflows — docsite clone steps use the configured build token for authenticated source access.
- Call-session schema stability — call sessions no longer keep an archive foreign key that could deadlock archive registry operations.
Fixed
- Call API response shape — call detail responses now align with the persisted session/transcript shape used by the realtime pipeline.
- Prepared test database use — call API tests now use the prepared DB path consistently.
- Syntactic chunker performance guard — relaxed an over-strict guard that could fail under noisy CI timing.
Security
- Fail-closed API authentication (ADR-094, fixes Gitea fortemi/fortemi#709). Existing single-user desktop and local-dev deployments that intentionally run anonymous must add
I_UNDERSTAND_NO_AUTH=truewhen settingREQUIRE_AUTH=false. Stock bundled compose files include that acknowledgment for local profiles. - Twilio webhook verification — Twilio signatures are validated against the externally visible URL using the receiver secret; proxy deployments must preserve
X-Forwarded-HostandX-Forwarded-Protofor validation to succeed.
Migration Notes
- Database migrations included — this release adds incoming webhook receiver, realtime call-session, and event-outbox tables. Migrations run through the normal startup/migration path.
- Auth opt-out must be explicit — deployments that depended on the old implicit anonymous default will now start authenticated unless they set the two-variable local-mode acknowledgment.
- Realtime recording batch transcripts require file storage — Twilio
recording.completedcallbacks can only queue the batch transcription bridge when the API process has file storage configured and can reach the recording URL.
v2026.5.11
Maintenance tag. No functional change since 2026.5.10. Cuts a clean release tag that lands correctly on both the Gitea (origin) and GitHub (github) remotes — the 2026.5.7, 2026.5.8, and 2026.5.10 tags exist on GitHub but point at the pre-LLM-wizard commit (a stale mirror artifact). This release is the first post-workstation tag that is byte-identical across both remotes.
What the 2026.5.7 → 2026.5.11 sequence delivered (roll-up)
For consumers landing on this changelog and wanting the short version of what shipped under the "workstation" theme:
- 2026.5.7 —
./workstationwrapper,docker-compose.workstation.ymlunified stack (Fortemi + HotM + Ollama), QUICKSTART.md, WORKSTATION-SETUP.md, three profiles (--backend-only,--no-ui, default-ui). - 2026.5.8 — Pluggable LLM backend selector:
.env.workstation.exampletemplate +./workstation configure-llmwizard covering ollama, vllm, openai, openrouter, llamacpp.extra_hosts: host.docker.internal:host-gatewaywired into compose so the same URL works on Linux/macOS/Windows. Doctor probes the configured backend. - 2026.5.9 — Correctness patch after validating against the
qwen36_vllm_autodeploy_basic.shreference: wizard now prompts for the served-model-name (not the HF path), defaults toqwen3.5:9bso model strings stay stable across backends. - 2026.5.10 — Docs surfacing: WORKSTATION-SETUP.md LLM backend section, README workstation-block callout, docs/content/quickstart.md "Building features on a dev box?" sibling callout, docs/content/inference-backends.md top callout.
No code changes
Bumped Cargo workspace + mcp-server to 2026.5.11. No schema, no behavior, no docs differ from 2026.5.10.
v2026.5.10
Docs-only release. The configure-llm wizard and the .env.workstation override layer landed in 2026.5.8–2026.5.9, but they were only discoverable via the wrapper's help output. This release threads them through the canonical docs so new users actually find them.
Documentation
- WORKSTATION-SETUP.md → new "LLM backend selection" section — Canonical ops reference for the backend layer. Covers the
env_file:mechanism, the wizard flow, the five backends with per-backend gotcha table, the served-name-vs-HF-path distinction for vLLM (with both the light Qwen2.5-7B and heavy 35B-on-3×A100 patterns), the doctor's backend-probe check, and the switch-backends procedure (file edit + restart, or runtime hot-swap via/api/v1/inference/config). - README.md → Local Workstation block — Added a "Want a different LLM than Ollama?" callout pointing at
configure-llm. Surfaces the backend flexibility at the discovery layer so users picking the workstation path see it from the start, not afterup. - docs/content/quickstart.md → new "Building features on a dev box?" callout — Sibling block to the existing "Looking for a desktop app?" callout. Routes developers at the workstation flow (with the
configure-llminvocation) instead of the Docker-bundle server path that the rest of this guide covers. - docs/content/inference-backends.md → top callout — One-line redirect for workstation users so they reach the wizard rather than reading the by-hand env-var guide. The guide itself remains the authoritative reference for Docker-bundle and from-source paths.
No code changes
No version-affecting code, schema, or behavior changes. The version bump is for tag-gated CI release jobs (release notes are generated from the CHANGELOG entry on tag push).
v2026.5.8
Picking a different LLM backend on the workstation no longer requires editing the compose file. New interactive wizard plus an .env.workstation override layer cover the five common cases (Ollama, vLLM, OpenAI, OpenRouter, llama.cpp) without Docker expertise.
Added
.env.workstation.example— Template with five copy-paste-ready provider blocks:ollama-local(default),vllm-local,openai-cloud,openrouter,llamacpp-local. Each block sets exactly the env vars the matric-api inference router needs, with inline guidance on host-to-container networking and the embedding pairing (cloud providers without embedding support automatically pair with the containerized Ollama). Copy to.env.workstationand uncomment one block../workstation configure-llm— Interactive wizard that walks through the five options. Prompts API keys silently (no terminal echo), prompts host ports for local-on-host backends, writes.env.workstationwith mode 600, backs up any existing file to.env.workstation.bak. Three aliases:configure-llm,config-llm,llm.- Doctor check #8: LLM backend — Reports which backend is selected (from
.env.workstationif present, else "ollama containerized default"), probes the configured endpoint, surfaces friendly remediation when the probe fails. Catches the most common "wrong port" / "vLLM not started" mistakes beforeup. - QUICKSTART Step 3.5 — New optional step with a decision table (have vLLM? have an OpenAI key? …) and a recommendation tree. Stays a skip-by-default step so users who just want the working ollama path see no extra friction.
Changed
docker-compose.workstation.ymlmatric-api service — Now loads.env.workstationviaenv_file:withrequired: false(compose 2.24+ spec), so it's silently no-op for users who don't create the file. When present, the override values supersede the inlineenvironment:block. Also addsextra_hosts: "host.docker.internal:host-gateway"so vLLM/llama.cpp on the host work on Linux without IP juggling — same URL works on macOS, Windows, and Linux.
Migration
- No-op for existing users on Ollama.
.env.workstationis gitignored and not auto-created; if it doesn't exist, the workstation behaves exactly as in 2026.5.7. - Switching to a cloud provider now takes ~30 seconds.
./workstation configure-llm→ pick option 3 or 4 → paste API key → done. Nodocker-compose.ymledits, no Dockerfile rebuild.
v2026.5.7
Local developer workstation: one command brings up Fortemi + HotM + Ollama in containers, with a friendly wrapper, pre-flight doctor, and step-by-step quickstart for users who have never touched Docker.
Added
./workstationwrapper script (#708) — Named subcommands for the full dev-box workflow:up,down,status,doctor,models pull,open,logs,shell,psql,reset,help. Theupcommand waits for healthy state and prints the URL;doctorruns 7 pre-flight checks (Docker, compose, ports, native ollama, HotM sibling repo, GPU passthrough, models) with explicit remediation text for each failure.docker-compose.workstation.yml(#708) — Single unified stack replacing the fragmented per-repo compose files for local dev. Includes ollama (GPU passthrough, bind-mounted~/.ollama/), postgres (pg18 + pgvector + PostGIS), matric-api (auth off, permissive CORS, rate limit disabled), HotM agent-proxy, and the HotM UI. Three profiles select which services come up:- default (
./workstation up --backend-only) — ollama + postgres + matric-api. HotM repo not required. hotmprofile (./workstation up --no-ui) — adds agent-proxy. Useful for API-only integrations.uiprofile (./workstation up, the default) — full stack including HotM UI athttp://localhost:4180.
- default (
- QUICKSTART.md — Five-step walkthrough for users new to Docker. Covers cloning both repos as siblings (or
--backend-onlyif you don't want HotM), running doctor, bringing the stack up, pulling models, and verifying in a browser. Includes "what if something breaks?" section for the six most common first-run failures. - WORKSTATION-SETUP.md — Operations reference manual: full command list, day-2 troubleshooting beyond the happy path, native-ollama removal (the one step that requires
sudoand a human), volume management, GPU verification. - README "Local workstation" section — Surfaces the workstation path alongside the bundle and HotM-desktop options so new users know they have three install paths.
Changed
agent-proxyis now profile-gated — Was always-on in earlier workstation drafts; now only starts under--profile hotmor--profile ui. Users who don't have the HotM sibling repo can run./workstation up --backend-onlyand get a working API without ever pulling HotM.- Workstation host ports remapped — Postgres on
5434(was5432) and agent-proxy on3011(was3001) to avoid collisions with native postgres and the sysops dashboard commonly running on3001. matric-api stays on3000; UI stays on4180; ollama stays on11434.
Notes
The workstation stack is for local development only. Production deployments continue to use docker-compose.bundle.yml (single-host headless backend) or the per-service ghcr.io/fortemi/* images. The workstation does not replace either path; it sits alongside them as the third option, optimized for "developer with a GPU laptop who wants to iterate end-to-end without setting up postgres by hand."
The full stack reaches healthy state in roughly 45 seconds on a clean host with the docker base images already pulled. First-time clean-install (everything pulled from scratch) is dominated by the ollama image (~10.6 GB) and the matric-api Rust build.
v2026.5.6
Two small but high-impact fixes to support-archive seeding: imported notes now have titles, and the seed no longer pins the GPU for hours.
Added
defer_inferenceflag onPOST /api/v1/backup/import(#677) — Whentrue, imported notes land as raw content only; the full NLP pipeline (embeddings, metadata, NER, linking, title generation) is skipped. FTS works immediately via the insert-trigger-maintained tsvector. Semantic backfill is on-demand viaPOST /api/v1/notes/reprocess. Defaultfalsepreserves prior behavior.titlefield onCreateNoteRequestandPOST /api/v1/notes(#675) — Optional explicit title. When provided, the AI title-generation pipeline step is skipped (caller's value is authoritative). Bulk-create accepts it on every item. Threaded through to the underlyingINSERT INTO note.SEED_WITH_INFERENCEenv var onseed-support-archive.sh(#677) — Operators who want immediate inference at seed time setSEED_WITH_INFERENCE=true. Defaultfalse— the seed now passesskip_embedding_regen=truevia env-driven toggle rather than the previously hard-coded flag.
Fixed
- Support archive notes had no titles (#675) —
scripts/rebuild-docs-shard.shnow derives a title for each doc: first H1 (skipping YAML front-matter) when available, otherwise the filename stem with hyphens/underscores normalised to spaces. Each note in the shard JSON now carries atitlefield that the API persists on insert. - Legacy
/backup/importenqueued the full NLP pipeline unconditionally (#677) — In the prior release a manual/backup/importof the support archive could produce ~965 background jobs for 193 notes and pin Ollama for hours on edge hardware. The newer/knowledge-shard/uploadendpoint hadskip_embedding_regen;/backup/importnow has the equivalentdefer_inferencegate.
v2026.5.5
Docker bundle behavior change: the bundled fortemi-docs support archive is now opt-in to mirror the native build path.
Changed
- Support archive is opt-in by default (#672) — The Docker bundle no longer auto-seeds the bundled Fortémi documentation on first boot. Behavior now matches the native
cargo runpath (which never auto-seeded). Two opt-in routes:- Auto-seed on first boot: set
LOAD_SUPPORT_MEMORY=truein.envbeforedocker compose ... up. - One-command seed on a running instance:
docker compose -f docker-compose.bundle.yml exec fortemi /app/seed-support-archive.sh(idempotent; flag file on the persistentpgdatavolume tracks state). - Legacy
DISABLE_SUPPORT_MEMORY=truestill wins as a force-skip — kept for back-compat with bundles that pre-date the flip. Used bydocker-compose.minimal.ymlto guarantee skip regardless of upstream config.
- Auto-seed on first boot: set
docker/seed-support-archive.shis now safe to invoke manually at any time inside a running container. TheMANUAL_INVOCATIONflag (defaulttrue) distinguishes operator-invoked from entrypoint-invoked runs; the entrypoint sets it tofalseso auto-seed requires explicitLOAD_SUPPORT_MEMORY=true.- README "Quick Start" updated — no longer claims the bundle auto-seeds the support archive; points operators at the new dedicated section.
Added
- README "Support Archive (fortemi-docs)" section — what the archive is, both opt-in paths, querying via the
X-Fortemi-Memory: fortemi-docsheader, the additionalPOST /api/v1/notes/reprocessopt-in for semantic search, and the refresh-on-upgrade procedure (drop archive + remove flag file + re-seed). .env.exampleSupport Memory Archive block rewritten to document both opt-in paths with copy-paste recipes plus a quick search example.
Migration notes
Existing .env |
Behavior after upgrade |
|---|---|
DISABLE_SUPPORT_MEMORY=true |
Unchanged — still skipped |
DISABLE_SUPPORT_MEMORY=false (prior default) |
Changed — no longer auto-seeds. Add LOAD_SUPPORT_MEMORY=true to restore. |
DISABLE_SUPPORT_MEMORY unset |
Was never explicit; still no auto-seed |
| Already-seeded instance (flag file present) | Unchanged — flag file persists across restarts; the existing archive stays |
No data loss for anyone. Worst case is a previously-default operator notices the docs aren't loaded on a fresh deploy and runs the one-command opt-in.
Fixed (CI infrastructure, already in v2026.5.4 via re-tag but documented here for completeness)
Create Gitea ReleaseandCreate GitHub Releasetag-gating (#669) — Belt-and-suspendersstartsWith(github.ref, 'refs/tags/v')clause on both job conditions. Gitea Actions'needs.X.result == 'success'evaluator doesn't propagateskippedupstream the way GitHub Actions does, leading to release-creation jobs firing on push-to-main withtag_name: "main"(rejected by both registries).Publish Dev Image(Gitea + GitHub) tag-gating (#670) — Same evaluator quirk; added!startsWith(github.ref, 'refs/tags/')to keep dev-publish jobs from running on tag pushes and contending with the proper tag-only release publishes. Also bounded theCreate GitHub Releasecurl with--connect-timeout 10 --max-time 60so future hangs fail fast.- Shard-rebuild host-port collision (#671) —
scripts/ci/rebuild-shard-in-ci.shnow picks a unique host port (30000 + ($$ % 5000)) instead of fixed3000. The parallelpublish-release(Gitea) andpublish-github(ghcr.io) jobs share a runner; both invoke this script; the second-to-arrive previously crashed with "Bind for 0.0.0.0:3000 failed: port is already allocated".
v2026.5.4
First-class provider profiles for all advertised inference platforms (#654 series), three runtime-config follow-ups (#655 #656 #657), and CI hardening for the auto-shard-rebuild and release publication paths.
Added — Inference: first-class provider parity (#654)
- Provider profile catalog (#658) —
crates/matric-inference/src/provider_profiles.rsships a&'static [ProviderProfile]describing the four v1 providers (Ollama, OpenAI, OpenRouter, llama.cpp). Each entry carries the wire protocol family (BackendKind::OllamaorBackendKind::OpenAICompatible), default base URL, required-vs-optional API key, capability list, env-var conventions, recommended default models, extra-header injection rules, and health/models endpoints. Future providers (vLLM, LiteLLM, LocalAI, Groq, Together, …) become 5-line additions to the catalog with no enum touching, no parser surface. - Catalog-driven
/api/v1/inference/providers(#659) — Replaces hard-coded match arms with a single loop overprovider_profiles::iter(). Response gains asupports_embeddingsfield so BYOK UIs can render the OpenRouter-style "chat only" case correctly. - Profile-aware
/api/v1/inference/test-connection(#659) — Hints likeopenrouterorllamacpproute to the right wire-protocol probe (BackendKind::OllamavsOpenAICompatible) instead of falling through to URL auto-detection. - OpenRouter native runtime config (#660) —
POST /api/v1/inference/configaccepts anopenrouterblock alongsideollama/openai/llamacpp.HTTP-Referer/X-Titleheaders default tohttps://fortemi.io/Fortemi; overridable per-deployment viaOPENROUTER_HTTP_REFERER/OPENROUTER_APP_NAMEenv vars or the runtimehttp_referer/app_namefields. - Independent embedding/generation routing (#661) —
MATRIC_EMBEDDING_PROVIDERenv var andembedding_backendfield onPOST /api/v1/inference/configroute embedding calls through a different provider than the active default. Killer use case: OpenRouter for chat (no embedding API), local Ollama or llama.cpp for embeddings. Validated against the catalog: pointing at a provider without the Embedding capability returns 400 with a descriptive error before persisting. - Atomic-swap and dry-run modes (#659) —
POST /api/v1/inference/config?dry_run=truevalidates the merged config and returns the would-be effective state without persisting or hot-swapping.?atomic=trueprobes every backend the request touches before committing; on any probe failure, abort with 503 + structuredfailures: [...]array. Avoids the brief error window where a half-applied config serves bad creds.
Added — Runtime-config follow-ups
InferenceConfigChangedSSE event on hot-swap (#657, #663) — New variant onServerEventemitted fromPOSTandDELETE/api/v1/inference/config. Carriesdefault_backend,embedding_backend, and achanged_fieldsarray of dotted field names (openrouter.api_key,embedding_backend). API keys never appear in event payloads — only field names. Reactive UIs (HotM provider pill, MCP-tool clients, dashboards) can update without polling.DELETEevents use the sentinelchanged_fields: ["__reset__"].- Inference config audit log (#656, #664) — New
inference_config_audittable records every operator-driven mutation: actor, timestamp, action (set/reset/set_archive/reset_archive), redacted before/after JSON blobs, source IP. NewGET /api/v1/inference/config/audit?limit=50&changed_by=&action=endpoint returns recent entries with filter support. Best-effort writer — DB failure logs atwarnbut never blocks the live config change. - Per-archive inference provider override (#655, #665) — Storage + API surface for multi-tenant routing: new
archive_inference_overridetable keyed byschema_name.GET/POST/DELETE/api/v1/inference/confighonorX-Fortemi-Memory; archive overrides shallow-merge on top of the global config (precedence:archive_override > db_override > env > default). Audit log distinguishes archive operations viaset_archive/reset_archiveactions. Live runtime routing (per-archiveProviderRegistrycache + request-time resolver) is filed as #666 — substantial scope, follow-up.
Changed — Inference docs (#662)
- README "Multi-Provider Inference" rewritten with the catalog-driven profile table (backend protocol, API key requirement, embedding support, default models per profile), runtime reconfiguration recipes (
?dry_run=true,?atomic=truecurl examples), and the independent embedding/generation routing story. - README "Bring Your Own LLM" uses native profile names (
MATRIC_INFERENCE_DEFAULT=llamacpp/openrouter) instead of the legacyMATRIC_INFERENCE_DEFAULT=openaiescape hatch. Legacy recipe preserved for unknown OpenAI-compatible endpoints (vLLM, LiteLLM, on-prem). - CLAUDE.md "Inference Providers" expanded: dedicated OpenRouter section with all five env vars, new "Independent Embedding/Generation Routing" subsection, runtime hot-swap recipes for
embedding_backendset/clear,dry_run, andatomic. .env.example—MATRIC_INFERENCE_DEFAULTlists all 4 valid ids; newMATRIC_EMBEDDING_PROVIDERblock;OPENROUTER_GEN_MODEL/OPENROUTER_APP_NAME(renamed from_X_TITLEfor runtime-field consistency); new llama.cpp section.
Fixed
matric-coreevent variant count assertions (#667) — Two test assertions hard-coded47for the variant count; #663 added a 48th. Failing CI runs from the prior release surfaced this. Fixed bothevents.rs::test_all_variants_metadata_is_completeandasyncapi.rs::build_spec_produces_valid_structureto expect 48.- Auto-shard-rebuild rate-limit (#668) —
scripts/ci/rebuild-shard-in-ci.shnow passesRATE_LIMIT_ENABLED=falseto the transient API container. The rebuild fires ~200POST /api/v1/notescalls in ~4 s; the bundle's default rate limiter (100 req / 60 s) was 429-ing the second half. Companion:scripts/rebuild-docs-shard.shnow aborts withexit 1if more than 5% of imports fail, so this kind of partial failure aborts loudly instead of silently emitting a half-empty.shard. - Release-job tag gating (#669) —
Create Gitea ReleaseandCreate GitHub Releasejobs inci-builder.yamlhadneeds: publish-release|github+if: needs.X.result == 'success'; on Gitea Actions this didn't propagateskippedcorrectly andCreate GitHub Releasefired on push-to-main, attempting a release withtag_name: "main". Belt-and-suspenders fix adds explicitstartsWith(github.ref, 'refs/tags/v')to both job conditions.
v2026.5.2
Three deployment-experience fixes plus a diagnostic update on a misfiled regression.
Added
docker-compose.llamacpp.yml— bundled llama.cpp inference sidecar (#646, PR #649) —ghcr.io/ggerganov/llama.cpp:serverexposed on:8080/v1(OpenAI-compatible protocol). Brought up alongside the bundle viadocker compose -f docker-compose.bundle.yml -f docker-compose.llamacpp.yml up -d. Tunable throughLLAMACPP_MODEL_FILE,LLAMACPP_CTX_SIZE,LLAMACPP_GPU_LAYERS. NVIDIA GPU stanza commented out for opt-in. Unblocks operators who already run llama.cpp on the host and don't want Ollama.docker-compose.minimal.yml— minimal-footprint overlay (#648, PR #651) — Reduces idle bundle footprint to ~2 GB by disabling support-archive seeding, swapping the fast extraction model fromqwen3.5:9b(~8 GB) toqwen2.5:3b(~2 GB), cappingJOB_MAX_CONCURRENT=1+GPU_MAX_CONCURRENT=1, and trimmingMAX_MEMORIES=2. Brought up viadocker compose -f docker-compose.bundle.yml -f docker-compose.minimal.yml up -d. Trade-off:qwen2.5:3bchat quality is materially lower than the default — meant for resource-constrained hosts where "make it run" beats "make it good".- README "Bring Your Own LLM" subsection (#646, PR #649) — Copy-paste
.envrecipes for routing inference to llama.cpp, OpenAI proper, or OpenRouter. Explicit instructions for disabling Ollama entirely (setMATRIC_INFERENCE_DEFAULT=openai; the Ollama backend is not constructed when it isn't the default, sohost.docker.internal:11434is not probed). Ollama is now framed as one option among several, not the assumed default. - README "Resource Requirements" subsection (#648, PR #651) — Per-component idle RAM table, prominent
DISABLE_SUPPORT_MEMORY=truecallout, minimal-overlay invocation, and theqwen2.5:3bchat-quality trade-off note. Surfaces resource expectations for operators sizing hosts. - CONTRIBUTING.md "sqlx compile-time query checks" subsection (#647, PR #650) — Names the
missing graph-class failure mode as a class, documents both resolution paths (liveDATABASE_URLorcargo sqlx prepare --workspace+SQLX_OFFLINE=true), and explains when to refresh.sqlx/. README "From Source" section gets a callout pointing here so users who hit the error in the wild find the fix.
Investigated, no code change required
- #647 "missing graph" regression — Reproducer test showed the codebase does not use
sqlx::query!/sqlx::query_as!/sqlx::query_scalar!compile-time macros (only the runtimesqlx::query("…")API).cargo check --workspacesucceeds on a fresh clone with noDATABASE_URL, no.sqlx/, and noSQLX_OFFLINE.cargo sqlx prepare --workspaceagainst a clean migrated Postgres reports "no queries found". The error class the issue describes does not reproduce onmain. Documentation from PR #650 stays merged as preventative coverage; awaiting the verbatim error text from the reporter to identify the actual source. Seecrates/matric-jobs/src/pause.rs:173andcrates/matric-db/src/schema_context.rs:108for the runtime-only pattern this codebase uses.
Changed
Cargo.lockrefreshed for the2026.5.1workspace version bump that landed in the prior release commit.