Skip to content

Releases: Fortemi/fortemi

v2026.5.13

25 May 20:28
v2026.5.13
7f74632

Choose a tag to compare

Security maintenance release for the dependency advisory sweep after 2026.5.12. This release updates vulnerable Rust and npm transitive artifacts, removes obsolete advisory allowlists, and includes the post-2026.5.12 documentation/issue-tracking cleanup commits.

Security

  • Rust advisory updates - Updated openssl from 0.10.79 to 0.10.80, rand 0.8.5 to 0.8.6, and rand 0.9.2 to 0.9.3 in Cargo.lock.
  • npm advisory update - Pinned the MCP server dependency graph to qs 6.15.2 via mcp-server/package.json overrides and lockfile update.
  • Advisory allowlist cleanup - Removed the temporary RUSTSEC-2026-0097 ignores from cargo audit and cargo deny configuration now that the patched rand lines are present.
  • Supply-chain review completed - Verified crates.io/npm provenance and artifact integrity for the patched packages, diffed prior and patched artifacts, and closed Fortemi issue #857 with the evidence summary.

Changed

  • Issue tracking preference - Updated project guidance to prefer the canonical Gitea tracker.
  • Planning documentation - Captured licensing and storage planning updates for the upcoming distribution work.

Verification

  • cargo audit
  • cargo deny check advisories
  • npm ls qs
  • npm audit --omit=dev

v2026.5.12

25 May 07:21
5829555

Choose a tag to compare

Realtime provider integration milestone. This release adds the standards-shaped call transport foundation, the first Twilio Voice adapter, Deepgram live ASR, provider-neutral call event outbox contracts, and the batch transcription bridge for completed Twilio recordings. It also includes fail-closed authentication defaults and the incoming webhook receiver foundation used by provider control-plane callbacks.

Highlights

What Changed Why You Care
Twilio Voice realtime adapter Fortemi can accept signed Twilio Voice webhooks and Twilio Media Streams WebSocket audio for live call transcription.
Standards-shaped realtime contracts Provider-specific wire formats stay inside adapters while call lifecycle, media frames, ASR events, and outbox rows stay provider-neutral.
Deepgram streaming ASR Live call audio can produce partial/final transcript events with reconnect, failover accounting, and health metrics.
Recording-completed batch bridge Twilio recordings are imported as audio attachments and queued through the existing AudioTranscriptionHandler for higher-quality post-call transcripts.
Fail-closed auth default /api/v1/* endpoints now require auth by default unless an operator explicitly opts into anonymous local mode.

Added

  • Incoming webhook receiversPOST /api/v1/webhooks/incoming, receiver lookup, payload validation, HMAC verification, and Twilio Voice schema support for provider control-plane callbacks.
  • Realtime call sessions — persisted call_sessions metadata, call detail lookup at GET /api/v1/calls/{call_id}, realtime session metrics, and database coverage for active/completed call aggregation.
  • Realtime transport foundation — provider-neutral MediaFrame, codec normalization, mock adapter fixtures, mock ASR backend, and Twilio adapter mapping helpers.
  • Twilio Voice + Media Streams support — signed Voice webhooks create/update call sessions, /api/v1/realtime/twilio/{CallSid} accepts Twilio media streams, and recent-session gating rejects stale or unknown WebSocket attempts.
  • Deepgram streaming backend — WebSocket ASR client with secure API-key handling, event parsing, reconnect/backoff behavior, failover to a configured fallback backend, and health metrics.
  • Transcript outbox foundationevent_outbox table and helpers for realtime transcript/call events, including high-volume transcript emission coverage.
  • Twilio call-event outbox contract — adapter-owned mapping for call_started, state_change, recording_available, and ended events so downstream consumers are insulated from Twilio status strings.
  • Twilio recording transcription bridge — completed recording callbacks download the recording into file storage, create an audio attachment note, queue AudioTranscription, and link batch transcript policy metadata back to the call session.
  • Realtime provider setup documentationdocs/deployment/realtime-providers.md now covers Twilio + Deepgram setup, consent/disclosure, troubleshooting, and contract completeness for local, single-tenant, multi-tenant, regulated, and future provider deployments.

Changed

  • Authentication defaults are fail-closedREQUIRE_AUTH now defaults to true. Anonymous mode requires both REQUIRE_AUTH=false and I_UNDERSTAND_NO_AUTH=true; multi-tenant deployments reject anonymous mode regardless of acknowledgment.
  • CI and pre-commit provider-boundary checks — realtime provider-specific imports are rejected outside adapter modules.
  • Docs site workflows — docsite clone steps use the configured build token for authenticated source access.
  • Call-session schema stability — call sessions no longer keep an archive foreign key that could deadlock archive registry operations.

Fixed

  • Call API response shape — call detail responses now align with the persisted session/transcript shape used by the realtime pipeline.
  • Prepared test database use — call API tests now use the prepared DB path consistently.
  • Syntactic chunker performance guard — relaxed an over-strict guard that could fail under noisy CI timing.

Security

  • Fail-closed API authentication (ADR-094, fixes Gitea fortemi/fortemi#709). Existing single-user desktop and local-dev deployments that intentionally run anonymous must add I_UNDERSTAND_NO_AUTH=true when setting REQUIRE_AUTH=false. Stock bundled compose files include that acknowledgment for local profiles.
  • Twilio webhook verification — Twilio signatures are validated against the externally visible URL using the receiver secret; proxy deployments must preserve X-Forwarded-Host and X-Forwarded-Proto for validation to succeed.

Migration Notes

  • Database migrations included — this release adds incoming webhook receiver, realtime call-session, and event-outbox tables. Migrations run through the normal startup/migration path.
  • Auth opt-out must be explicit — deployments that depended on the old implicit anonymous default will now start authenticated unless they set the two-variable local-mode acknowledgment.
  • Realtime recording batch transcripts require file storage — Twilio recording.completed callbacks can only queue the batch transcription bridge when the API process has file storage configured and can reach the recording URL.

v2026.5.11

19 May 03:58
v2026.5.11
bb80740

Choose a tag to compare

Maintenance tag. No functional change since 2026.5.10. Cuts a clean release tag that lands correctly on both the Gitea (origin) and GitHub (github) remotes — the 2026.5.7, 2026.5.8, and 2026.5.10 tags exist on GitHub but point at the pre-LLM-wizard commit (a stale mirror artifact). This release is the first post-workstation tag that is byte-identical across both remotes.

What the 2026.5.7 → 2026.5.11 sequence delivered (roll-up)

For consumers landing on this changelog and wanting the short version of what shipped under the "workstation" theme:

  • 2026.5.7./workstation wrapper, docker-compose.workstation.yml unified stack (Fortemi + HotM + Ollama), QUICKSTART.md, WORKSTATION-SETUP.md, three profiles (--backend-only, --no-ui, default-ui).
  • 2026.5.8 — Pluggable LLM backend selector: .env.workstation.example template + ./workstation configure-llm wizard covering ollama, vllm, openai, openrouter, llamacpp. extra_hosts: host.docker.internal:host-gateway wired into compose so the same URL works on Linux/macOS/Windows. Doctor probes the configured backend.
  • 2026.5.9 — Correctness patch after validating against the qwen36_vllm_autodeploy_basic.sh reference: wizard now prompts for the served-model-name (not the HF path), defaults to qwen3.5:9b so model strings stay stable across backends.
  • 2026.5.10 — Docs surfacing: WORKSTATION-SETUP.md LLM backend section, README workstation-block callout, docs/content/quickstart.md "Building features on a dev box?" sibling callout, docs/content/inference-backends.md top callout.

No code changes

Bumped Cargo workspace + mcp-server to 2026.5.11. No schema, no behavior, no docs differ from 2026.5.10.

v2026.5.10

19 May 03:08

Choose a tag to compare

Docs-only release. The configure-llm wizard and the .env.workstation override layer landed in 2026.5.8–2026.5.9, but they were only discoverable via the wrapper's help output. This release threads them through the canonical docs so new users actually find them.

Documentation

  • WORKSTATION-SETUP.md → new "LLM backend selection" section — Canonical ops reference for the backend layer. Covers the env_file: mechanism, the wizard flow, the five backends with per-backend gotcha table, the served-name-vs-HF-path distinction for vLLM (with both the light Qwen2.5-7B and heavy 35B-on-3×A100 patterns), the doctor's backend-probe check, and the switch-backends procedure (file edit + restart, or runtime hot-swap via /api/v1/inference/config).
  • README.md → Local Workstation block — Added a "Want a different LLM than Ollama?" callout pointing at configure-llm. Surfaces the backend flexibility at the discovery layer so users picking the workstation path see it from the start, not after up.
  • docs/content/quickstart.md → new "Building features on a dev box?" callout — Sibling block to the existing "Looking for a desktop app?" callout. Routes developers at the workstation flow (with the configure-llm invocation) instead of the Docker-bundle server path that the rest of this guide covers.
  • docs/content/inference-backends.md → top callout — One-line redirect for workstation users so they reach the wizard rather than reading the by-hand env-var guide. The guide itself remains the authoritative reference for Docker-bundle and from-source paths.

No code changes

No version-affecting code, schema, or behavior changes. The version bump is for tag-gated CI release jobs (release notes are generated from the CHANGELOG entry on tag push).

v2026.5.8

18 May 21:38

Choose a tag to compare

Picking a different LLM backend on the workstation no longer requires editing the compose file. New interactive wizard plus an .env.workstation override layer cover the five common cases (Ollama, vLLM, OpenAI, OpenRouter, llama.cpp) without Docker expertise.

Added

  • .env.workstation.example — Template with five copy-paste-ready provider blocks: ollama-local (default), vllm-local, openai-cloud, openrouter, llamacpp-local. Each block sets exactly the env vars the matric-api inference router needs, with inline guidance on host-to-container networking and the embedding pairing (cloud providers without embedding support automatically pair with the containerized Ollama). Copy to .env.workstation and uncomment one block.
  • ./workstation configure-llm — Interactive wizard that walks through the five options. Prompts API keys silently (no terminal echo), prompts host ports for local-on-host backends, writes .env.workstation with mode 600, backs up any existing file to .env.workstation.bak. Three aliases: configure-llm, config-llm, llm.
  • Doctor check #8: LLM backend — Reports which backend is selected (from .env.workstation if present, else "ollama containerized default"), probes the configured endpoint, surfaces friendly remediation when the probe fails. Catches the most common "wrong port" / "vLLM not started" mistakes before up.
  • QUICKSTART Step 3.5 — New optional step with a decision table (have vLLM? have an OpenAI key? …) and a recommendation tree. Stays a skip-by-default step so users who just want the working ollama path see no extra friction.

Changed

  • docker-compose.workstation.yml matric-api service — Now loads .env.workstation via env_file: with required: false (compose 2.24+ spec), so it's silently no-op for users who don't create the file. When present, the override values supersede the inline environment: block. Also adds extra_hosts: "host.docker.internal:host-gateway" so vLLM/llama.cpp on the host work on Linux without IP juggling — same URL works on macOS, Windows, and Linux.

Migration

  • No-op for existing users on Ollama. .env.workstation is gitignored and not auto-created; if it doesn't exist, the workstation behaves exactly as in 2026.5.7.
  • Switching to a cloud provider now takes ~30 seconds. ./workstation configure-llm → pick option 3 or 4 → paste API key → done. No docker-compose.yml edits, no Dockerfile rebuild.

v2026.5.7

18 May 16:30

Choose a tag to compare

Local developer workstation: one command brings up Fortemi + HotM + Ollama in containers, with a friendly wrapper, pre-flight doctor, and step-by-step quickstart for users who have never touched Docker.

Added

  • ./workstation wrapper script (#708) — Named subcommands for the full dev-box workflow: up, down, status, doctor, models pull, open, logs, shell, psql, reset, help. The up command waits for healthy state and prints the URL; doctor runs 7 pre-flight checks (Docker, compose, ports, native ollama, HotM sibling repo, GPU passthrough, models) with explicit remediation text for each failure.
  • docker-compose.workstation.yml (#708) — Single unified stack replacing the fragmented per-repo compose files for local dev. Includes ollama (GPU passthrough, bind-mounted ~/.ollama/), postgres (pg18 + pgvector + PostGIS), matric-api (auth off, permissive CORS, rate limit disabled), HotM agent-proxy, and the HotM UI. Three profiles select which services come up:
    • default (./workstation up --backend-only) — ollama + postgres + matric-api. HotM repo not required.
    • hotm profile (./workstation up --no-ui) — adds agent-proxy. Useful for API-only integrations.
    • ui profile (./workstation up, the default) — full stack including HotM UI at http://localhost:4180.
  • QUICKSTART.md — Five-step walkthrough for users new to Docker. Covers cloning both repos as siblings (or --backend-only if you don't want HotM), running doctor, bringing the stack up, pulling models, and verifying in a browser. Includes "what if something breaks?" section for the six most common first-run failures.
  • WORKSTATION-SETUP.md — Operations reference manual: full command list, day-2 troubleshooting beyond the happy path, native-ollama removal (the one step that requires sudo and a human), volume management, GPU verification.
  • README "Local workstation" section — Surfaces the workstation path alongside the bundle and HotM-desktop options so new users know they have three install paths.

Changed

  • agent-proxy is now profile-gated — Was always-on in earlier workstation drafts; now only starts under --profile hotm or --profile ui. Users who don't have the HotM sibling repo can run ./workstation up --backend-only and get a working API without ever pulling HotM.
  • Workstation host ports remapped — Postgres on 5434 (was 5432) and agent-proxy on 3011 (was 3001) to avoid collisions with native postgres and the sysops dashboard commonly running on 3001. matric-api stays on 3000; UI stays on 4180; ollama stays on 11434.

Notes

The workstation stack is for local development only. Production deployments continue to use docker-compose.bundle.yml (single-host headless backend) or the per-service ghcr.io/fortemi/* images. The workstation does not replace either path; it sits alongside them as the third option, optimized for "developer with a GPU laptop who wants to iterate end-to-end without setting up postgres by hand."

The full stack reaches healthy state in roughly 45 seconds on a clean host with the docker base images already pulled. First-time clean-install (everything pulled from scratch) is dominated by the ollama image (~10.6 GB) and the matric-api Rust build.

v2026.5.6

11 May 05:22
4c62e5d

Choose a tag to compare

Two small but high-impact fixes to support-archive seeding: imported notes now have titles, and the seed no longer pins the GPU for hours.

Added

  • defer_inference flag on POST /api/v1/backup/import (#677) — When true, imported notes land as raw content only; the full NLP pipeline (embeddings, metadata, NER, linking, title generation) is skipped. FTS works immediately via the insert-trigger-maintained tsvector. Semantic backfill is on-demand via POST /api/v1/notes/reprocess. Default false preserves prior behavior.
  • title field on CreateNoteRequest and POST /api/v1/notes (#675) — Optional explicit title. When provided, the AI title-generation pipeline step is skipped (caller's value is authoritative). Bulk-create accepts it on every item. Threaded through to the underlying INSERT INTO note.
  • SEED_WITH_INFERENCE env var on seed-support-archive.sh (#677) — Operators who want immediate inference at seed time set SEED_WITH_INFERENCE=true. Default false — the seed now passes skip_embedding_regen=true via env-driven toggle rather than the previously hard-coded flag.

Fixed

  • Support archive notes had no titles (#675) — scripts/rebuild-docs-shard.sh now derives a title for each doc: first H1 (skipping YAML front-matter) when available, otherwise the filename stem with hyphens/underscores normalised to spaces. Each note in the shard JSON now carries a title field that the API persists on insert.
  • Legacy /backup/import enqueued the full NLP pipeline unconditionally (#677) — In the prior release a manual /backup/import of the support archive could produce ~965 background jobs for 193 notes and pin Ollama for hours on edge hardware. The newer /knowledge-shard/upload endpoint had skip_embedding_regen; /backup/import now has the equivalent defer_inference gate.

v2026.5.5

10 May 22:22
4c62e5d

Choose a tag to compare

Docker bundle behavior change: the bundled fortemi-docs support archive is now opt-in to mirror the native build path.

Changed

  • Support archive is opt-in by default (#672) — The Docker bundle no longer auto-seeds the bundled Fortémi documentation on first boot. Behavior now matches the native cargo run path (which never auto-seeded). Two opt-in routes:
    • Auto-seed on first boot: set LOAD_SUPPORT_MEMORY=true in .env before docker compose ... up.
    • One-command seed on a running instance: docker compose -f docker-compose.bundle.yml exec fortemi /app/seed-support-archive.sh (idempotent; flag file on the persistent pgdata volume tracks state).
    • Legacy DISABLE_SUPPORT_MEMORY=true still wins as a force-skip — kept for back-compat with bundles that pre-date the flip. Used by docker-compose.minimal.yml to guarantee skip regardless of upstream config.
  • docker/seed-support-archive.sh is now safe to invoke manually at any time inside a running container. The MANUAL_INVOCATION flag (default true) distinguishes operator-invoked from entrypoint-invoked runs; the entrypoint sets it to false so auto-seed requires explicit LOAD_SUPPORT_MEMORY=true.
  • README "Quick Start" updated — no longer claims the bundle auto-seeds the support archive; points operators at the new dedicated section.

Added

  • README "Support Archive (fortemi-docs)" section — what the archive is, both opt-in paths, querying via the X-Fortemi-Memory: fortemi-docs header, the additional POST /api/v1/notes/reprocess opt-in for semantic search, and the refresh-on-upgrade procedure (drop archive + remove flag file + re-seed).
  • .env.example Support Memory Archive block rewritten to document both opt-in paths with copy-paste recipes plus a quick search example.

Migration notes

Existing .env Behavior after upgrade
DISABLE_SUPPORT_MEMORY=true Unchanged — still skipped
DISABLE_SUPPORT_MEMORY=false (prior default) Changed — no longer auto-seeds. Add LOAD_SUPPORT_MEMORY=true to restore.
DISABLE_SUPPORT_MEMORY unset Was never explicit; still no auto-seed
Already-seeded instance (flag file present) Unchanged — flag file persists across restarts; the existing archive stays

No data loss for anyone. Worst case is a previously-default operator notices the docs aren't loaded on a fresh deploy and runs the one-command opt-in.

Fixed (CI infrastructure, already in v2026.5.4 via re-tag but documented here for completeness)

  • Create Gitea Release and Create GitHub Release tag-gating (#669) — Belt-and-suspenders startsWith(github.ref, 'refs/tags/v') clause on both job conditions. Gitea Actions' needs.X.result == 'success' evaluator doesn't propagate skipped upstream the way GitHub Actions does, leading to release-creation jobs firing on push-to-main with tag_name: "main" (rejected by both registries).
  • Publish Dev Image (Gitea + GitHub) tag-gating (#670) — Same evaluator quirk; added !startsWith(github.ref, 'refs/tags/') to keep dev-publish jobs from running on tag pushes and contending with the proper tag-only release publishes. Also bounded the Create GitHub Release curl with --connect-timeout 10 --max-time 60 so future hangs fail fast.
  • Shard-rebuild host-port collision (#671) — scripts/ci/rebuild-shard-in-ci.sh now picks a unique host port (30000 + ($$ % 5000)) instead of fixed 3000. The parallel publish-release (Gitea) and publish-github (ghcr.io) jobs share a runner; both invoke this script; the second-to-arrive previously crashed with "Bind for 0.0.0.0:3000 failed: port is already allocated".

v2026.5.4

10 May 16:41
4c62e5d

Choose a tag to compare

First-class provider profiles for all advertised inference platforms (#654 series), three runtime-config follow-ups (#655 #656 #657), and CI hardening for the auto-shard-rebuild and release publication paths.

Added — Inference: first-class provider parity (#654)

  • Provider profile catalog (#658) — crates/matric-inference/src/provider_profiles.rs ships a &'static [ProviderProfile] describing the four v1 providers (Ollama, OpenAI, OpenRouter, llama.cpp). Each entry carries the wire protocol family (BackendKind::Ollama or BackendKind::OpenAICompatible), default base URL, required-vs-optional API key, capability list, env-var conventions, recommended default models, extra-header injection rules, and health/models endpoints. Future providers (vLLM, LiteLLM, LocalAI, Groq, Together, …) become 5-line additions to the catalog with no enum touching, no parser surface.
  • Catalog-driven /api/v1/inference/providers (#659) — Replaces hard-coded match arms with a single loop over provider_profiles::iter(). Response gains a supports_embeddings field so BYOK UIs can render the OpenRouter-style "chat only" case correctly.
  • Profile-aware /api/v1/inference/test-connection (#659) — Hints like openrouter or llamacpp route to the right wire-protocol probe (BackendKind::Ollama vs OpenAICompatible) instead of falling through to URL auto-detection.
  • OpenRouter native runtime config (#660) — POST /api/v1/inference/config accepts an openrouter block alongside ollama/openai/llamacpp. HTTP-Referer / X-Title headers default to https://fortemi.io / Fortemi; overridable per-deployment via OPENROUTER_HTTP_REFERER / OPENROUTER_APP_NAME env vars or the runtime http_referer / app_name fields.
  • Independent embedding/generation routing (#661) — MATRIC_EMBEDDING_PROVIDER env var and embedding_backend field on POST /api/v1/inference/config route embedding calls through a different provider than the active default. Killer use case: OpenRouter for chat (no embedding API), local Ollama or llama.cpp for embeddings. Validated against the catalog: pointing at a provider without the Embedding capability returns 400 with a descriptive error before persisting.
  • Atomic-swap and dry-run modes (#659) — POST /api/v1/inference/config?dry_run=true validates the merged config and returns the would-be effective state without persisting or hot-swapping. ?atomic=true probes every backend the request touches before committing; on any probe failure, abort with 503 + structured failures: [...] array. Avoids the brief error window where a half-applied config serves bad creds.

Added — Runtime-config follow-ups

  • InferenceConfigChanged SSE event on hot-swap (#657, #663) — New variant on ServerEvent emitted from POST and DELETE /api/v1/inference/config. Carries default_backend, embedding_backend, and a changed_fields array of dotted field names (openrouter.api_key, embedding_backend). API keys never appear in event payloads — only field names. Reactive UIs (HotM provider pill, MCP-tool clients, dashboards) can update without polling. DELETE events use the sentinel changed_fields: ["__reset__"].
  • Inference config audit log (#656, #664) — New inference_config_audit table records every operator-driven mutation: actor, timestamp, action (set / reset / set_archive / reset_archive), redacted before/after JSON blobs, source IP. New GET /api/v1/inference/config/audit?limit=50&changed_by=&action= endpoint returns recent entries with filter support. Best-effort writer — DB failure logs at warn but never blocks the live config change.
  • Per-archive inference provider override (#655, #665) — Storage + API surface for multi-tenant routing: new archive_inference_override table keyed by schema_name. GET / POST / DELETE /api/v1/inference/config honor X-Fortemi-Memory; archive overrides shallow-merge on top of the global config (precedence: archive_override > db_override > env > default). Audit log distinguishes archive operations via set_archive / reset_archive actions. Live runtime routing (per-archive ProviderRegistry cache + request-time resolver) is filed as #666 — substantial scope, follow-up.

Changed — Inference docs (#662)

  • README "Multi-Provider Inference" rewritten with the catalog-driven profile table (backend protocol, API key requirement, embedding support, default models per profile), runtime reconfiguration recipes (?dry_run=true, ?atomic=true curl examples), and the independent embedding/generation routing story.
  • README "Bring Your Own LLM" uses native profile names (MATRIC_INFERENCE_DEFAULT=llamacpp / openrouter) instead of the legacy MATRIC_INFERENCE_DEFAULT=openai escape hatch. Legacy recipe preserved for unknown OpenAI-compatible endpoints (vLLM, LiteLLM, on-prem).
  • CLAUDE.md "Inference Providers" expanded: dedicated OpenRouter section with all five env vars, new "Independent Embedding/Generation Routing" subsection, runtime hot-swap recipes for embedding_backend set/clear, dry_run, and atomic.
  • .env.exampleMATRIC_INFERENCE_DEFAULT lists all 4 valid ids; new MATRIC_EMBEDDING_PROVIDER block; OPENROUTER_GEN_MODEL / OPENROUTER_APP_NAME (renamed from _X_TITLE for runtime-field consistency); new llama.cpp section.

Fixed

  • matric-core event variant count assertions (#667) — Two test assertions hard-coded 47 for the variant count; #663 added a 48th. Failing CI runs from the prior release surfaced this. Fixed both events.rs::test_all_variants_metadata_is_complete and asyncapi.rs::build_spec_produces_valid_structure to expect 48.
  • Auto-shard-rebuild rate-limit (#668) — scripts/ci/rebuild-shard-in-ci.sh now passes RATE_LIMIT_ENABLED=false to the transient API container. The rebuild fires ~200 POST /api/v1/notes calls in ~4 s; the bundle's default rate limiter (100 req / 60 s) was 429-ing the second half. Companion: scripts/rebuild-docs-shard.sh now aborts with exit 1 if more than 5% of imports fail, so this kind of partial failure aborts loudly instead of silently emitting a half-empty .shard.
  • Release-job tag gating (#669) — Create Gitea Release and Create GitHub Release jobs in ci-builder.yaml had needs: publish-release|github + if: needs.X.result == 'success'; on Gitea Actions this didn't propagate skipped correctly and Create GitHub Release fired on push-to-main, attempting a release with tag_name: "main". Belt-and-suspenders fix adds explicit startsWith(github.ref, 'refs/tags/v') to both job conditions.

v2026.5.2

10 May 03:27
4c62e5d

Choose a tag to compare

Three deployment-experience fixes plus a diagnostic update on a misfiled regression.

Added

  • docker-compose.llamacpp.yml — bundled llama.cpp inference sidecar (#646, PR #649) — ghcr.io/ggerganov/llama.cpp:server exposed on :8080/v1 (OpenAI-compatible protocol). Brought up alongside the bundle via docker compose -f docker-compose.bundle.yml -f docker-compose.llamacpp.yml up -d. Tunable through LLAMACPP_MODEL_FILE, LLAMACPP_CTX_SIZE, LLAMACPP_GPU_LAYERS. NVIDIA GPU stanza commented out for opt-in. Unblocks operators who already run llama.cpp on the host and don't want Ollama.
  • docker-compose.minimal.yml — minimal-footprint overlay (#648, PR #651) — Reduces idle bundle footprint to ~2 GB by disabling support-archive seeding, swapping the fast extraction model from qwen3.5:9b (~8 GB) to qwen2.5:3b (~2 GB), capping JOB_MAX_CONCURRENT=1 + GPU_MAX_CONCURRENT=1, and trimming MAX_MEMORIES=2. Brought up via docker compose -f docker-compose.bundle.yml -f docker-compose.minimal.yml up -d. Trade-off: qwen2.5:3b chat quality is materially lower than the default — meant for resource-constrained hosts where "make it run" beats "make it good".
  • README "Bring Your Own LLM" subsection (#646, PR #649) — Copy-paste .env recipes for routing inference to llama.cpp, OpenAI proper, or OpenRouter. Explicit instructions for disabling Ollama entirely (set MATRIC_INFERENCE_DEFAULT=openai; the Ollama backend is not constructed when it isn't the default, so host.docker.internal:11434 is not probed). Ollama is now framed as one option among several, not the assumed default.
  • README "Resource Requirements" subsection (#648, PR #651) — Per-component idle RAM table, prominent DISABLE_SUPPORT_MEMORY=true callout, minimal-overlay invocation, and the qwen2.5:3b chat-quality trade-off note. Surfaces resource expectations for operators sizing hosts.
  • CONTRIBUTING.md "sqlx compile-time query checks" subsection (#647, PR #650) — Names the missing graph-class failure mode as a class, documents both resolution paths (live DATABASE_URL or cargo sqlx prepare --workspace + SQLX_OFFLINE=true), and explains when to refresh .sqlx/. README "From Source" section gets a callout pointing here so users who hit the error in the wild find the fix.

Investigated, no code change required

  • #647 "missing graph" regression — Reproducer test showed the codebase does not use sqlx::query! / sqlx::query_as! / sqlx::query_scalar! compile-time macros (only the runtime sqlx::query("…") API). cargo check --workspace succeeds on a fresh clone with no DATABASE_URL, no .sqlx/, and no SQLX_OFFLINE. cargo sqlx prepare --workspace against a clean migrated Postgres reports "no queries found". The error class the issue describes does not reproduce on main. Documentation from PR #650 stays merged as preventative coverage; awaiting the verbatim error text from the reporter to identify the actual source. See crates/matric-jobs/src/pause.rs:173 and crates/matric-db/src/schema_context.rs:108 for the runtime-only pattern this codebase uses.

Changed

  • Cargo.lock refreshed for the 2026.5.1 workspace version bump that landed in the prior release commit.