Skip to content

refactor(backends): self-describing WrappedServer backends (#2287)#2320

Draft
jeremyfowers wants to merge 40 commits into
mainfrom
feat/self-describing-backends
Draft

refactor(backends): self-describing WrappedServer backends (#2287)#2320
jeremyfowers wants to merge 40 commits into
mainfrom
feat/self-describing-backends

Conversation

@jeremyfowers

@jeremyfowers jeremyfowers commented Jun 19, 2026

Copy link
Copy Markdown
Member

Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor + a server class + a stateless behavior object, and every scattered if (recipe == "...") site is rewritten to read a registry built from those descriptors. Backend-specific logic no longer leaks into the router, model manager, system-info, CLI, or docs.

Layout — a backend is a folder

include/lemon/backends/<stem>/<stem>.h          # descriptor (header-only inline const, CLI-safe)
include/lemon/backends/<stem>/<stem>_server.h   # WrappedServer subclass + create()/spec()/ops() decls
server/backends/<stem>/<stem>_server.cpp        # server class impl, BackendOps subclass, create/spec/ops
                                                # (+ any backend-private helpers, e.g. llamacpp_gguf.cpp)

Adding a backend = one LEMON_BACKENDS line in CMakeLists.txt + that folder + a backend_versions.json pin + server_models.json entries. No router, CLI, doc, or support-matrix edits — those are all derived. CMake globs each backend folder (CONFIGURE_DEPENDS), so backend-private helper files need no build edit.

What changed

  • Descriptor (backend_descriptor.h) — plain data describing what a backend is: recipe, display name, binary, config section, default device, SlotPolicy, selectable_backend, uses_ctx_size, dynamic_models, declarative options[], OS/GPU support[], default labels, required checkpoints, plus editorial/policy fields (modality, experimental, web_priority, rocm_channels, version_policy, exposes_prometheus_metrics, rocm_requires_cwsr_fix, self_manages_downloads).

  • Two-tier registry, generated from LEMON_BACKENDS at CMake configure time — a CLI-safe data registry (descriptors only; links into both lemonade and lemond) and a server-only factory registry (binds each descriptor to its class's create(), spec(), ops()). This split lets the CLI read recipe options/flags from descriptors without linking server classes.

  • BackendOps — stateless per-backend behavior (backend_ops.h): the model-management logic that happens without a running subprocess. The base class is the shared Hugging Face behavior; each backend overrides only its policy points, so shared download/cache logic is inherited, not copied. Methods: populate_metadata, resolve_checkpoint_path, find_imported_checkpoint, validate_registration_checkpoint, select_checkpoint_files, discover_models, is_downloaded, validate_checkpoint_file, download_model, invalidates_cache_after_download, resolve_version, check_install, classify_unavailable. This is what let model_manager.cpp and system_info.cpp shed their per-recipe switchboards (resolve_model_path went from a ~290-line if/else to one ops_for(recipe)->… call).

  • Descriptor/ops-driven sites — router creation, NPU/slot eviction & cloud LRU exemption (SlotPolicy, no recipe literals left in router.cpp), device type, recipe options / CLI flags / defaults, config-section identity, ROCm channels (recipe_has_rocm_channels), the support matrix (RECIPE_DEFS deleted from system_info.cpp), recipe→label inference, FLM dynamic discovery, the FLM install-state machine, cloud availability + discovery, and the install-state UI hints.

  • Registration helpersmake_server<T> / make_spec<T> / single_ops<T> keep the per-backend create()/spec()/ops() one-liners DRY (irregular backends — cloud, ryzenai, vllm — keep bespoke bodies).

  • /system-info recipes entries enriched with display_name / selectable_backend / uses_ctx_size / options / support. The desktop app reads recipe display names from /system-info instead of hardcoded TypeScript.

  • Docs generationdocs/tools/gen_backend_docs.py boots lemond, reads /system-info + server_models.json, and rewrites marker-delimited regions of six docs (README.md support matrix, guide/cli.md, guide/configuration/README.md, guide/configuration/multi-model.md, custom-models.md, dev/backends-reference.md) plus assets/models.js. A CI job (backend-docs-drift) fails on drift. Authoring guide added at dev/adding-a-backend.md (both wired into the mkdocs nav).

  • ModelInfo::extras — generic map<string, json> populated from unknown server_models.json keys, so a new backend adds per-model fields without editing shared structs.

Verification

Local: lemond + lemonade CLI + web-app build clean; C++ unit tests ctest 5/5 (incl. GgufCapabilities, AutoTune, LatestVersionFallback, InstallAtomicity); server_endpoints 71/71; /system-info carries the enriched fields; docs --check clean; a registry smoke confirms all backends register and route. Cross-platform + clean-environment validation via CI.

One pre-existing local failure unrelated to this change (reproduced on main): server_cli2 test_020_list — a built-in collection name with a space ("Lite Collection") breaks the test's whitespace-based table parser.

Notes for reviewers

  • recipeOptionsConfig.ts (the TypeScript-typed per-recipe option forms) is intentionally left to maintainers per AGENTS.md; the schema is now exposed via /system-info for a future dynamic migration.
  • Backend install still goes through each backend's BackendSpec (install params are class-side behavior); the descriptor supplies the binary name.
  • Deliberately left as documented exceptions (not oversights): cloud recipe checks (the dynamic-models exception), collection.omni (the orchestrator exception, not a WrappedServer), inspect_repo repo→recipe detection (its collection branch is that same exception), and defaults.json generation (its variant *_args/*_bin keys aren't in the descriptor options, so generating it would need a config-schema expansion that risks the config contract).

🤖 Generated with Claude Code

Make each inference backend describe itself with a plain-data descriptor plus
a server class, and rewrite the scattered `if (recipe == "...")` sites to read
a registry built from those descriptors. Adding a backend becomes one
LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or
support-matrix edits.

- Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe
  data registry and a server-only factory registry, generated from the
  LEMON_BACKENDS list at CMake configure time.
- All 9 backends carry a descriptor (device, slot policy, options, support
  matrix, labels, binary) and a create().
- Descriptor-driven: router creation, NPU/slot eviction, device type, recipe
  options/CLI flags, config-section identity, support matrix, recipe labels,
  cloud availability.
- /system-info recipes enriched with display_name/selectable_backend/options/
  support; the app reads recipe display names from it instead of hardcoded TS.
- docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from
  /system-info; a CI step fails on drift. Authoring guide in
  docs/dev/adding-a-backend.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 19, 2026
@jeremyfowers

Copy link
Copy Markdown
Member Author

CI status

All cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake LEMON_BACKENDS codegen, and the CLI-safe/server-only registry split compile everywhere. Functional jobs exercising this change pass: CLI/Endpoints (ubuntu + macOS), Test .exe (whisper, moonshine, stable-diffusion, text-to-speech), backend-docs-drift, plus locally endpoints (69), pinning (6), app-regression (37).

The single red — Test CLI/Endpoints (windows-latest) → test_026_anthropic_messages_tool_calling — is a pre-existing flaky timeout, not from this PR. It's a 500 s ReadTimeout on a tool-calling inference request that the Windows runner intermittently can't finish in time:

  • main run 27765794877: same job fails on the same test with the identical read timeout=500 signature.
  • main run 27795912134: same job passes.

This PR touches backend construction, not inference, anthropic_api.cpp, or the tool-calling loop, so it can't change that test's latency. Re-running the job.

jeremyfowers and others added 27 commits June 19, 2026 16:25
Restructure the self-describing backends to the layout the issue #2287 plan
specified — one folder per backend — instead of the flat file layout I used
before. This also folds the earlier _descriptor/_factory split into the spec's
cleaner shape: the descriptor is a header-only `inline const` and create() lives
with the server class.

Each backend now lives in its own folder, in namespace lemon::backends::<stem>:
  include/lemon/backends/<stem>/<stem>.h         inline const descriptor (CLI-safe data)
  include/lemon/backends/<stem>/<stem>_server.h  WrappedServer subclass + create() decl
  server/backends/<stem>/<stem>_server.cpp       implementation + create() def

Shared registry/util files stay at the top of backends/. The CMake foreach over
LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry
headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and
*_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the existing curated docs generate from the backend descriptors instead of
just shipping a separate reference file — closing appendix rows 14 and 22.

- Expand the descriptor with the editorial fields the curated docs need:
  `modality`, `experimental`, `web_display_name`, and a per-support-row
  `device_summary` (RecipeBackendDef). These keep the descriptor the single
  source of truth.
- /system-info exposes them plus a registry `order` index and `slot_policy`.
- gen_backend_docs.py now targets multiple docs and renders:
    * README.md "Supported Configurations" HTML matrix (grouped by modality,
      merged rows, rowspans, experimental tag) — wrapped in GENERATED markers;
    * docs/guide/configuration/multi-model.md NPU-exclusivity list.
  The backend-docs-drift CI job's --check now covers all three docs.

The generated README matrix is also more complete than the hand-written one
(it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes
and prose outside the markers are preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render
them from the descriptor options. This also fixes pre-existing drift: the section
documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no
longer registers, and omitted the moonshine and vllm recipes. Now covered by the
backend-docs-drift CI check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add inline-marker support to the generator and wrap the `--recipe` "Common
values" list in custom-models.md so it renders from the descriptor recipe set
(plus collection.omni). Now covered by the backend-docs-drift CI check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the last two cleanly-derivable doc touchpoints (appendix rows 16 and 21).

- configuration/README.md "Example config.json": generated from a fresh lemond's
  GET /internal/config (the real canonical config). This also fixes pre-existing
  drift — the hand-written block had `config_version: 1` (now 2), `prefer_system:
  false` (now true), a stray `device` key, and an invalid trailing comma. `port`
  is normalized to the documented default 13305.
- docs/assets/models.js RECIPE_PRIORITY + RECIPE_DISPLAY_NAMES: generated from
  descriptors. A new `web_priority` editorial field preserves the curated website
  ordering (so the order is descriptor-sourced, not a silent reorder); legacy
  `oga-*` recipes are dropped as agreed. Adds the correct `vllm` display name.

The generator now drives 7 docs and supports both `<!-- -->` (Markdown) and
`/* */` (JS) GENERATED markers. backend-docs-drift --check covers all of them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive spec; drop device map)

Two agreed plan touchpoints were left incomplete; this finishes them.

Row 4 — try_get_spec_for_recipe was still a hand-written 8-branch if-ladder in
backend_utils.cpp, which also forced it to #include all 8 server headers. Each
backend now exposes a uniform `spec()` accessor (alongside create()); the
generated factory registry binds it, and `backends::spec_for(recipe)` /
try_get_spec_for_recipe iterate the registry. backend_utils.cpp now includes
ZERO server headers. Also reroute the two leaking `Server::SPEC` references
(model_manager find_flm_binary) through the registry.

Row 5 — get_device_type_from_recipe still carried the full recipe->device map,
redundant with BackendDescriptor::default_device. Reduced to a DEVICE_NONE
fallback for non-descriptor recipes (collections/unknown); the descriptor is the
single source via ModelManager::device_type_for_recipe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce a stateless per-backend behavior interface for model management that
happens WITHOUT a running subprocess (checkpoint-path resolution, download,
dynamic discovery, per-model metadata, version detection, availability) — the
home for the recipe switchboards currently scattered through model_manager and
system_info.

- BackendOps base class (lemon/backends/backend_ops.h): shared default behavior;
  backends override only the policy points they need (inherit shared logic, don't
  copy it). Methods are added incrementally as switchboards migrate; each has a
  default so adding one never forces edits to backends that don't override it.
- Each backend folder exposes a uniform ops() singleton (alongside create()/spec()),
  bound into BackendRegistration; backends::ops_for(recipe) returns it.
- Purely additive: every backend uses the default base ops for now, so there is
  no behavior change yet. Migrations follow in subsequent commits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…readers into folders

Replace the populate_model_metadata recipe switchboard with
ops_for(recipe)->populate_metadata(). The backend-specific readers move into
their folders:

- GGUF metadata reader (read_gguf_metadata + byte parsers) -> backends/llamacpp/
  llamacpp_gguf.{h,cpp}; LlamaCppOps::populate_metadata reads arch + capability
  labels there.
- FLM model-file helpers (config.json ctx window, model-dir discovery) ->
  backends/fastflowlm/fastflowlm_models.{h,cpp}; FlmOps::populate_metadata uses it.

model_manager no longer knows how either backend stores or introspects models.
CMake now globs each backend folder's *.cpp (CONFIGURE_DEPENDS) so backend-private
helper files need no CMake edit; the backend LIST stays explicit.

Verified: GGUF context windows still populate (131072/128000/32768 for sample
models) and test_gguf_capabilities passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…llamacpp||sd-cpp)&&rocm)

Add a `rocm_channels` descriptor field (llamacpp {"stable","nightly"}, sd-cpp
{"stable"}) and a recipe_has_rocm_channels() registry helper. Replace the
hardcoded `(recipe=="llamacpp"||recipe=="sd-cpp") && rocm` predicate — copied
across backend_utils.cpp (3×), backend_manager.cpp (2×), and system_info.cpp —
with the descriptor check. rocm_channel_for_recipe() now clamps a requested
channel to one the backend publishes (so sd-cpp's missing "nightly" -> "stable"
falls out of the data instead of a per-recipe special case).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rst leak)

Replace the ~290-line recipe switchboard in ModelManager::resolve_model_path
with ops_for(recipe)->resolve_checkpoint_path(). The model manager now only does
the generic prefix (collections, local_path/local_upload, HF cache-dir
computation) and hands off to the backend.

- New BackendOps::resolve_checkpoint_path; base = the shared HF behavior
  (active-snapshot variant/aux resolution, main-repo fallback, directory
  fallback). Backends override only their artifact layout:
    * llamacpp -> GGUF resolver (sharding/folder/quant-token), moved into
      backends/llamacpp/llamacpp_gguf (resolve_gguf_path).
    * ryzenai -> genai_config.json directory; kokoro -> index.json;
      whispercpp -> first .bin; cloud -> ""; flm -> checkpoint passthrough.
- New shared backends/hf_cache_util (exists/dir_options/active_snapshot_path/
  repo_id_to_cache_dir_name) so ops reuse the same HF-cache mechanics.

model_manager.cpp -362 lines; resolve_model_path 365 -> 34. Verified all recipes
still resolve as downloaded (llamacpp variants, whisper .bin, kokoro index,
sd-cpp, ryzenai, flm) via /models.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…FLM cluster → folder

Dynamic discovery, download status, and downloading now flow through BackendOps
instead of recipe switchboards in model_manager:

- discover_models: build_cache loops descriptors with dynamic_models=true and
  merges ops->discover_models(). FLM (`flm list`) and cloud (per-provider) both
  implement it — the two bespoke discovery blocks collapse to one generic loop.
- is_downloaded: base = shared HF completeness (ModelManager::checkpoints_complete);
  CloudOps → true; FlmOps → installed-set membership. Replaces the flm_set/cloud/
  else branches in build_cache and add_model_to_cache.
- validate_checkpoint_file: LlamaCppOps does the GGUF-magic check (was an inline
  llamacpp branch in are_required_checkpoints_complete).
- download_model: base = shared HF engine (download_from_huggingface_engine);
  FlmOps → flm pull; CloudOps → no-op. download_registered_model just dispatches.
  invalidates_cache_after_download() replaces the recipe=="flm" cache-reset.

The whole FLM cluster (find_flm_binary, flm_installed_checkpoints, flm_discover_models,
flm_download) moves into backends/fastflowlm/fastflowlm_models. model_manager keeps
only the generic HF engine.

Verified: server_endpoints 69 pass; download status correct for every recipe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s hook

get_recipe_version now reads version.txt generically and lets the backend ops
override, instead of branching on recipe. The per-backend version commands move
into their folders:

- system llama-server version (`llama-server --version` + regex) -> backends/
  llamacpp; LlamaCppOps::resolve_version returns it for the "system" backend.
- flm version (`flm version --json`) -> backends/fastflowlm (flm_version());
  FlmOps::resolve_version returns it when no version.txt is present.

Removes SystemInfo::get_system_llamacpp_version / get_flm_version and the
llamacpp-system / flm branches from system_info.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
config_section duplicated the recipe string in 8 descriptors; it defaults to the
recipe via effective_config_section(), so set those to "". Only sd-cpp ("sdcpp")
and ryzenai-llm ("ryzenai") keep an explicit section because theirs genuinely
differ from the recipe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_metrics descriptor flag

prometheus_metrics.cpp hardcoded `recipe == "llamacpp"` to decide whether to
scrape a backend subprocess's /metrics. Replace with a descriptor flag
(exposes_prometheus_metrics; llamacpp = true) so a new backend that exposes
Prometheus metrics opts in via its descriptor, not by editing the metrics code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These backend-specific per-model fields no longer sit on the shared ModelInfo
struct: llamacpp reads info.extra<bool>("hf_load", false) and moonshine reads
info.extra<int>("moonshine_arch", -1). Removed the typed fields, their explicit
parse sites, and their kKnownKeys entries; added parse_extras() to the two
ModelInfo-building paths that lacked it (add_model_to_cache, get_model_info_
unfiltered) so extras populate everywhere a model is built from JSON.

Verified: llamacpp models still resolve/download (hf_load path intact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded (sd-cpp||llamacpp||vllm)&&rocm recipe-list in
is_recipe_installed and build_recipes_info with a rocm_requires_cwsr_fix
descriptor flag (set on those three backends). The kernel CWSR detection
(needs_gfx1151_cwsr_fix) stays in system_info as generic hardware detection;
only "which backends' rocm build needs it" is now descriptor data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ps hook

is_recipe_installed now finds the managed binary generically and asks the
backend's ops whether it's actually installed, instead of hardcoding the
llamacpp-system HIP check and the flm PATH fallback:

- check_install(backend, binary_found) ops hook; base = installed iff binary
  found. LlamaCppOps adds the ggml HIP-plugin requirement for the "system"
  build on AMD GPUs; FlmOps treats a PATH-installed flm as present.
- is_ggml_hip_plugin_available moves into backends/llamacpp; find_flm_executable
  and run_flm_validate move into backends/fastflowlm. Removed from path_utils
  (+ their orphaned decls/comments).

system_info no longer carries llamacpp/flm-specific availability knowledge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… vs AtLeast)

The update-required check special-cased recipe=="flm" to allow an installed
version newer than the pin. Replace with a version_policy descriptor field
(Exact default; flm = AtLeast for its system-managed package). system_info no
longer names flm in the version-comparison logic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `flm remove` subprocess orchestration moves out of ModelManager::delete_model
into backends/fastflowlm (flm_remove). model_manager keeps only the generic
HF-cache deletion path; the flm branch is now a thin call into the backend.

Verified: server_endpoints 69 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cipe blocks

RuntimeConfig::recipe_options() had a hardcoded nested→flat translation block per
recipe (llamacpp/whispercpp/moonshine/sdcpp/vllm). Replace with a single loop
over the descriptors: each option's config.json key is derived from its name
role (*_backend → "backend", *_args → variant "<backend>_args"/"args",
*_device → "device", else the option name verbatim for sd-cpp's steps/cfg_scale/
width/height). Adding a backend no longer requires editing this function.

Verified: server_endpoints 69 pass (config/params translation unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… across descriptor↔server.h)

The backend binary name (and recipe) were duplicated between the descriptor
(<stem>.h) and the BackendSpec literal (<stem>_server.h) — the cross-file
redundancy. Remove the static SPEC member; each backend's spec() now builds the
BackendSpec lazily from descriptor.binary (+ descriptor.recipe, or the explicit
"ryzenai-server" install id where it differs) plus the class's get_install_params
and split flag. In-class binary lookups go through spec(); server.cpp's sd upscale
uses try_get_spec_for_recipe.

Net: the binary name now lives in exactly one place (the descriptor). Lazy
function-local statics also avoid any static-init-order coupling between the
descriptor and the spec.

Verified: builds green; system-info install detection unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The recipe was repeated on every support row (6x in llamacpp.h). Introduce
a recipe-free BackendSupport struct; the owning descriptor's recipe is filled
in by recipe_defs() when flattening to RecipeBackendDef.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The preceding generic block already handles backend_versions[recipe] for any
recipe, so the recipe=="llamacpp" branch was unreachable duplicate code.
Removing it also drops a hardcoded backend name from shared code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
find_flm_server_by_type -> find_coexisting_server_by_type matches on
SlotPolicy::CoexistByType; count_pinned_servers_by_type skips
SlotPolicy::Unmetered instead of recipe=="cloud". router.cpp now holds
zero backend-name string literals; both behaviors are unchanged (flm is the
only CoexistByType backend, cloud the only Unmetered one).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ipe==flm

Add BackendDescriptor::self_manages_downloads (true only for flm) and
ModelManager::backend_self_manages_downloads(). The two load-time
auto-download guards in server.cpp/ollama_api.cpp now consult it instead of
hardcoding recipe != "flm". flm is the only backend with the flag set, so
behavior is identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
resolve_and_register_local_model() had a recipe if/else scanning the imported
directory for each backend's primary artifact (.gguf / .bin / genai_config.json
dir). Replace with BackendOps::find_imported_checkpoint(dir): default ""
registers the directory (sd-cpp/kokoro/moonshine); llamacpp reuses
resolve_gguf_path, whisper finds the .bin, ryzenai finds genai_config.json's
dir (and its resolve_checkpoint_path now reuses the same scan). server.cpp
holds no per-recipe import logic. Verified via local_import smoke tests for
llamacpp (ignores mmproj), whisper, and a default backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconcile the self-describing-backends refactor with main's divergence:
- backend_manager.cpp: keep both includes (main's backend_version_policy.h for
  resolve_latest_pin + our backend_descriptor_registry.h); the two version
  concerns are orthogonal.
- model_manager.cpp resolve_model_path: keep the ops-based one-liner
  (backends::ops_for(recipe)->resolve_checkpoint_path) over main's inline
  recipe switchboard.
- Port main's #2300 GGUF resolver improvements into llamacpp resolve_gguf_path:
  factor cases 0-5 into a resolve_gguf_variant lambda, resolve against the
  active refs/main snapshot first, then broaden to all snapshots when the
  active one lacks the variant. Restores test_034.
- Regenerate backend docs/models.js for main's new server_models entries.

Verified: C++ build clean; ctest 4/4 (incl. GgufCapabilities, LatestVersionFallback,
InstallAtomicity); server_endpoints 70/70 (incl. main's #2300 test_034);
server_cli2 only the pre-existing test_020 collection-name parsing failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jeremyfowers and others added 8 commits June 22, 2026 19:18
On Windows the merged include chain pulls in the windows.h max() macro into
this TU, turning std::numeric_limits<T>::max() into a syntax error (C2589).
Wrap the calls as (std::numeric_limits<T>::max)() so the macro cannot expand.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
flm models come from flm's model_list.json at runtime (0 entries in
server_models.json), but the descriptor had dynamic_models=false, so
build_cache skipped flm's ops->discover_models() and flm models (e.g.
llama3.2-1b-FLM) never registered -> 404. The build_cache comment already
documents flm as a dynamic-discovery backend alongside cloud; align the
descriptor with that intent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
model_manager's download path hardcoded recipe == "moonshine" to fetch a
variant directory of files. Add BackendOps::select_checkpoint_files (default
nullopt = the GGUF/direct-file defaults) and override it in MoonshineOps. The
download path no longer names a backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
system_info hardcoded a recipe == "flm" block to classify FLM's
supported-but-unavailable state (.deb/driver manual setup) and emit
troubleshoot links. Add BackendOps::classify_unavailable (default nullopt =
the generic installable/no-fetch path) and implement it in FlmOps. system_info
no longer names a backend in its install-state machine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…==llamacpp

bench hardcoded recipe == "llamacpp" to send the llamacpp_backend override.
Use the CLI-safe descriptor registry: any recipe with selectable_backend gets
its <config_section>_backend override (llamacpp and vllm today).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… ops

model_manager hardcoded actual_recipe == "llamacpp" to require a :variant on
GGUF checkpoints at registration. Add BackendOps::validate_registration_checkpoint
(default accept) and implement the GGUF rule in LlamaCppOps. Verified: a GGUF
checkpoint without :variant is still rejected; other recipes are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DRY pass across the backend folders:
- Add backends::make_server<T>(ctx) for the standard (log_level, model_manager,
  backend_manager) construction; the 6 plain create() bodies now call it instead
  of repeating the three context fields. cloud/ryzenai keep bespoke create().
- Each *_server.h closed and re-opened namespace lemon::backends just to nest the
  per-backend namespace; nest it inline instead (8 headers). ryzenai is left as-is
  (its legacy RyzenAIServer lives in namespace lemon, not lemon::backends).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main advanced ~15 commits with its own GGUF-reader consolidation
(lemon/gguf_reader.h + gguf_capabilities.h, ModelInfo::gguf) and a cloud
discovery security gate. Reconcile:
- model_manager.cpp (6 conflicts): keep the ops-based forms (populate_metadata,
  validate_checkpoint_file, discover_models, resolve_model_path, download
  file-selection) over main's inline recipe switchboards.
- Consolidate GGUF reading on main's shared lemon::gguf_reader: drop the now
  -redundant reader from backends/llamacpp/llamacpp_gguf.{h,cpp} (~240 lines),
  keeping only the unique resolve_gguf_path; LlamaCppOps::populate_metadata now
  fills ModelInfo::gguf via the shared reader.
- Port main's cloud-discovery allow_insecure_http gate into CloudOps::discover_models.
- Regenerate docs for main's new server_models entries.

Verified: build clean; ctest 5/5 (incl. GgufCapabilities, AutoTune); endpoints
71/71; cli only the pre-existing test_020; docs drift clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread docs/guide/cli.md
Comment on lines -332 to +333
| `--ctx-size SIZE` | Context size for the model | `4096` |
| `--ctx-size SIZE` | Context size for the model | auto |

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice its already catching some stale docs

cc @bitgamma

@jeremyfowers jeremyfowers self-assigned this Jun 25, 2026
jeremyfowers and others added 4 commits June 25, 2026 16:51
The per-backend spec()/ops() are the name-based adapter the CMake codegen binds
(<stem>::spec/ops), so the functions must exist — but their bodies were
repetitive. Add make_spec<T>(descriptor[, split]) (backend_utils.h, where
BackendSpec is complete) and single_ops<T>() (backend_registry.h, next to
make_server) so the 7 standard spec() and 7 custom ops() collapse to one line
each. ryzenai (install key != recipe) and cloud (no spec) keep bespoke spec();
sd-cpp/vllm keep default_backend_ops(). Pure refactor — registry binding,
71/71 endpoints, and all-backends-registered smoke unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two backend dev docs added by this work (dev/adding-a-backend.md and the
generated dev/backends-reference.md) were not wired into the Development nav.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onfig/defaults

Per-recipe config defaults are now declared in each backend descriptor
(takes_args / arg_variants / bin_variants / config_extra -> config_defaults())
instead of hand-maintained blocks in defaults.json. The committed
resources/defaults.json stays fully populated (so it remains the discoverable
reference for factory defaults) but is now generated:

- New GET /internal/config/defaults emits the canonical default config
  (ConfigFile::base_defaults(): global keys + descriptor-derived per-recipe
  sections, host/deployment-independent). Documented alongside /internal/config.
- gen_backend_docs.py -> gen_backend_boilerplate.py, which mirrors that endpoint
  verbatim into resources/defaults.json (whole-file) in addition to the doc
  regions. The existing CI --check now also fails if defaults.json drifts.

config_file keeps reading defaults.json at runtime; base_defaults() re-seeds the
descriptor blocks so the descriptor stays authoritative even if the file lags.
Verified: a fresh config.json reproduces every prior default; endpoints 71/71;
generator --check clean; black clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single-installable-unit path keyed off recipe != "llamacpp"; switch it to
repo_kind != "gguf", the same server-provided classification the function
already uses for the collection branch. Behavior-equivalent (collections are
handled earlier, so by here repo_kind is gguf or onnx-ryzenai), and it drops the
last backend-name literal from hf_pull.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant