refactor(backends): self-describing WrappedServer backends (#2287)#2320
refactor(backends): self-describing WrappedServer backends (#2287)#2320jeremyfowers wants to merge 40 commits into
Conversation
Make each inference backend describe itself with a plain-data descriptor plus a server class, and rewrite the scattered `if (recipe == "...")` sites to read a registry built from those descriptors. Adding a backend becomes one LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or support-matrix edits. - Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe data registry and a server-only factory registry, generated from the LEMON_BACKENDS list at CMake configure time. - All 9 backends carry a descriptor (device, slot policy, options, support matrix, labels, binary) and a create(). - Descriptor-driven: router creation, NPU/slot eviction, device type, recipe options/CLI flags, config-section identity, support matrix, recipe labels, cloud availability. - /system-info recipes enriched with display_name/selectable_backend/options/ support; the app reads recipe display names from it instead of hardcoded TS. - docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from /system-info; a CI step fails on drift. Authoring guide in docs/dev/adding-a-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI statusAll cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake The single red — Test CLI/Endpoints (windows-latest) →
This PR touches backend construction, not inference, |
Restructure the self-describing backends to the layout the issue #2287 plan specified — one folder per backend — instead of the flat file layout I used before. This also folds the earlier _descriptor/_factory split into the spec's cleaner shape: the descriptor is a header-only `inline const` and create() lives with the server class. Each backend now lives in its own folder, in namespace lemon::backends::<stem>: include/lemon/backends/<stem>/<stem>.h inline const descriptor (CLI-safe data) include/lemon/backends/<stem>/<stem>_server.h WrappedServer subclass + create() decl server/backends/<stem>/<stem>_server.cpp implementation + create() def Shared registry/util files stay at the top of backends/. The CMake foreach over LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and *_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the existing curated docs generate from the backend descriptors instead of
just shipping a separate reference file — closing appendix rows 14 and 22.
- Expand the descriptor with the editorial fields the curated docs need:
`modality`, `experimental`, `web_display_name`, and a per-support-row
`device_summary` (RecipeBackendDef). These keep the descriptor the single
source of truth.
- /system-info exposes them plus a registry `order` index and `slot_policy`.
- gen_backend_docs.py now targets multiple docs and renders:
* README.md "Supported Configurations" HTML matrix (grouped by modality,
merged rows, rowspans, experimental tag) — wrapped in GENERATED markers;
* docs/guide/configuration/multi-model.md NPU-exclusivity list.
The backend-docs-drift CI job's --check now covers all three docs.
The generated README matrix is also more complete than the hand-written one
(it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes
and prose outside the markers are preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render them from the descriptor options. This also fixes pre-existing drift: the section documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no longer registers, and omitted the moonshine and vllm recipes. Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add inline-marker support to the generator and wrap the `--recipe` "Common values" list in custom-models.md so it renders from the descriptor recipe set (plus collection.omni). Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the last two cleanly-derivable doc touchpoints (appendix rows 16 and 21). - configuration/README.md "Example config.json": generated from a fresh lemond's GET /internal/config (the real canonical config). This also fixes pre-existing drift — the hand-written block had `config_version: 1` (now 2), `prefer_system: false` (now true), a stray `device` key, and an invalid trailing comma. `port` is normalized to the documented default 13305. - docs/assets/models.js RECIPE_PRIORITY + RECIPE_DISPLAY_NAMES: generated from descriptors. A new `web_priority` editorial field preserves the curated website ordering (so the order is descriptor-sourced, not a silent reorder); legacy `oga-*` recipes are dropped as agreed. Adds the correct `vllm` display name. The generator now drives 7 docs and supports both `<!-- -->` (Markdown) and `/* */` (JS) GENERATED markers. backend-docs-drift --check covers all of them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive spec; drop device map) Two agreed plan touchpoints were left incomplete; this finishes them. Row 4 — try_get_spec_for_recipe was still a hand-written 8-branch if-ladder in backend_utils.cpp, which also forced it to #include all 8 server headers. Each backend now exposes a uniform `spec()` accessor (alongside create()); the generated factory registry binds it, and `backends::spec_for(recipe)` / try_get_spec_for_recipe iterate the registry. backend_utils.cpp now includes ZERO server headers. Also reroute the two leaking `Server::SPEC` references (model_manager find_flm_binary) through the registry. Row 5 — get_device_type_from_recipe still carried the full recipe->device map, redundant with BackendDescriptor::default_device. Reduced to a DEVICE_NONE fallback for non-descriptor recipes (collections/unknown); the descriptor is the single source via ModelManager::device_type_for_recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce a stateless per-backend behavior interface for model management that happens WITHOUT a running subprocess (checkpoint-path resolution, download, dynamic discovery, per-model metadata, version detection, availability) — the home for the recipe switchboards currently scattered through model_manager and system_info. - BackendOps base class (lemon/backends/backend_ops.h): shared default behavior; backends override only the policy points they need (inherit shared logic, don't copy it). Methods are added incrementally as switchboards migrate; each has a default so adding one never forces edits to backends that don't override it. - Each backend folder exposes a uniform ops() singleton (alongside create()/spec()), bound into BackendRegistration; backends::ops_for(recipe) returns it. - Purely additive: every backend uses the default base ops for now, so there is no behavior change yet. Migrations follow in subsequent commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…readers into folders
Replace the populate_model_metadata recipe switchboard with
ops_for(recipe)->populate_metadata(). The backend-specific readers move into
their folders:
- GGUF metadata reader (read_gguf_metadata + byte parsers) -> backends/llamacpp/
llamacpp_gguf.{h,cpp}; LlamaCppOps::populate_metadata reads arch + capability
labels there.
- FLM model-file helpers (config.json ctx window, model-dir discovery) ->
backends/fastflowlm/fastflowlm_models.{h,cpp}; FlmOps::populate_metadata uses it.
model_manager no longer knows how either backend stores or introspects models.
CMake now globs each backend folder's *.cpp (CONFIGURE_DEPENDS) so backend-private
helper files need no CMake edit; the backend LIST stays explicit.
Verified: GGUF context windows still populate (131072/128000/32768 for sample
models) and test_gguf_capabilities passes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…llamacpp||sd-cpp)&&rocm)
Add a `rocm_channels` descriptor field (llamacpp {"stable","nightly"}, sd-cpp
{"stable"}) and a recipe_has_rocm_channels() registry helper. Replace the
hardcoded `(recipe=="llamacpp"||recipe=="sd-cpp") && rocm` predicate — copied
across backend_utils.cpp (3×), backend_manager.cpp (2×), and system_info.cpp —
with the descriptor check. rocm_channel_for_recipe() now clamps a requested
channel to one the backend publishes (so sd-cpp's missing "nightly" -> "stable"
falls out of the data instead of a per-recipe special case).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rst leak)
Replace the ~290-line recipe switchboard in ModelManager::resolve_model_path
with ops_for(recipe)->resolve_checkpoint_path(). The model manager now only does
the generic prefix (collections, local_path/local_upload, HF cache-dir
computation) and hands off to the backend.
- New BackendOps::resolve_checkpoint_path; base = the shared HF behavior
(active-snapshot variant/aux resolution, main-repo fallback, directory
fallback). Backends override only their artifact layout:
* llamacpp -> GGUF resolver (sharding/folder/quant-token), moved into
backends/llamacpp/llamacpp_gguf (resolve_gguf_path).
* ryzenai -> genai_config.json directory; kokoro -> index.json;
whispercpp -> first .bin; cloud -> ""; flm -> checkpoint passthrough.
- New shared backends/hf_cache_util (exists/dir_options/active_snapshot_path/
repo_id_to_cache_dir_name) so ops reuse the same HF-cache mechanics.
model_manager.cpp -362 lines; resolve_model_path 365 -> 34. Verified all recipes
still resolve as downloaded (llamacpp variants, whisper .bin, kokoro index,
sd-cpp, ryzenai, flm) via /models.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…FLM cluster → folder Dynamic discovery, download status, and downloading now flow through BackendOps instead of recipe switchboards in model_manager: - discover_models: build_cache loops descriptors with dynamic_models=true and merges ops->discover_models(). FLM (`flm list`) and cloud (per-provider) both implement it — the two bespoke discovery blocks collapse to one generic loop. - is_downloaded: base = shared HF completeness (ModelManager::checkpoints_complete); CloudOps → true; FlmOps → installed-set membership. Replaces the flm_set/cloud/ else branches in build_cache and add_model_to_cache. - validate_checkpoint_file: LlamaCppOps does the GGUF-magic check (was an inline llamacpp branch in are_required_checkpoints_complete). - download_model: base = shared HF engine (download_from_huggingface_engine); FlmOps → flm pull; CloudOps → no-op. download_registered_model just dispatches. invalidates_cache_after_download() replaces the recipe=="flm" cache-reset. The whole FLM cluster (find_flm_binary, flm_installed_checkpoints, flm_discover_models, flm_download) moves into backends/fastflowlm/fastflowlm_models. model_manager keeps only the generic HF engine. Verified: server_endpoints 69 pass; download status correct for every recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s hook get_recipe_version now reads version.txt generically and lets the backend ops override, instead of branching on recipe. The per-backend version commands move into their folders: - system llama-server version (`llama-server --version` + regex) -> backends/ llamacpp; LlamaCppOps::resolve_version returns it for the "system" backend. - flm version (`flm version --json`) -> backends/fastflowlm (flm_version()); FlmOps::resolve_version returns it when no version.txt is present. Removes SystemInfo::get_system_llamacpp_version / get_flm_version and the llamacpp-system / flm branches from system_info. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
config_section duplicated the recipe string in 8 descriptors; it defaults to the
recipe via effective_config_section(), so set those to "". Only sd-cpp ("sdcpp")
and ryzenai-llm ("ryzenai") keep an explicit section because theirs genuinely
differ from the recipe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_metrics descriptor flag prometheus_metrics.cpp hardcoded `recipe == "llamacpp"` to decide whether to scrape a backend subprocess's /metrics. Replace with a descriptor flag (exposes_prometheus_metrics; llamacpp = true) so a new backend that exposes Prometheus metrics opts in via its descriptor, not by editing the metrics code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These backend-specific per-model fields no longer sit on the shared ModelInfo
struct: llamacpp reads info.extra<bool>("hf_load", false) and moonshine reads
info.extra<int>("moonshine_arch", -1). Removed the typed fields, their explicit
parse sites, and their kKnownKeys entries; added parse_extras() to the two
ModelInfo-building paths that lacked it (add_model_to_cache, get_model_info_
unfiltered) so extras populate everywhere a model is built from JSON.
Verified: llamacpp models still resolve/download (hf_load path intact).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded (sd-cpp||llamacpp||vllm)&&rocm recipe-list in is_recipe_installed and build_recipes_info with a rocm_requires_cwsr_fix descriptor flag (set on those three backends). The kernel CWSR detection (needs_gfx1151_cwsr_fix) stays in system_info as generic hardware detection; only "which backends' rocm build needs it" is now descriptor data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ps hook is_recipe_installed now finds the managed binary generically and asks the backend's ops whether it's actually installed, instead of hardcoding the llamacpp-system HIP check and the flm PATH fallback: - check_install(backend, binary_found) ops hook; base = installed iff binary found. LlamaCppOps adds the ggml HIP-plugin requirement for the "system" build on AMD GPUs; FlmOps treats a PATH-installed flm as present. - is_ggml_hip_plugin_available moves into backends/llamacpp; find_flm_executable and run_flm_validate move into backends/fastflowlm. Removed from path_utils (+ their orphaned decls/comments). system_info no longer carries llamacpp/flm-specific availability knowledge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… vs AtLeast) The update-required check special-cased recipe=="flm" to allow an installed version newer than the pin. Replace with a version_policy descriptor field (Exact default; flm = AtLeast for its system-managed package). system_info no longer names flm in the version-comparison logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `flm remove` subprocess orchestration moves out of ModelManager::delete_model into backends/fastflowlm (flm_remove). model_manager keeps only the generic HF-cache deletion path; the flm branch is now a thin call into the backend. Verified: server_endpoints 69 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cipe blocks RuntimeConfig::recipe_options() had a hardcoded nested→flat translation block per recipe (llamacpp/whispercpp/moonshine/sdcpp/vllm). Replace with a single loop over the descriptors: each option's config.json key is derived from its name role (*_backend → "backend", *_args → variant "<backend>_args"/"args", *_device → "device", else the option name verbatim for sd-cpp's steps/cfg_scale/ width/height). Adding a backend no longer requires editing this function. Verified: server_endpoints 69 pass (config/params translation unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… across descriptor↔server.h) The backend binary name (and recipe) were duplicated between the descriptor (<stem>.h) and the BackendSpec literal (<stem>_server.h) — the cross-file redundancy. Remove the static SPEC member; each backend's spec() now builds the BackendSpec lazily from descriptor.binary (+ descriptor.recipe, or the explicit "ryzenai-server" install id where it differs) plus the class's get_install_params and split flag. In-class binary lookups go through spec(); server.cpp's sd upscale uses try_get_spec_for_recipe. Net: the binary name now lives in exactly one place (the descriptor). Lazy function-local statics also avoid any static-init-order coupling between the descriptor and the spec. Verified: builds green; system-info install detection unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The recipe was repeated on every support row (6x in llamacpp.h). Introduce a recipe-free BackendSupport struct; the owning descriptor's recipe is filled in by recipe_defs() when flattening to RecipeBackendDef. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The preceding generic block already handles backend_versions[recipe] for any recipe, so the recipe=="llamacpp" branch was unreachable duplicate code. Removing it also drops a hardcoded backend name from shared code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
find_flm_server_by_type -> find_coexisting_server_by_type matches on SlotPolicy::CoexistByType; count_pinned_servers_by_type skips SlotPolicy::Unmetered instead of recipe=="cloud". router.cpp now holds zero backend-name string literals; both behaviors are unchanged (flm is the only CoexistByType backend, cloud the only Unmetered one). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ipe==flm Add BackendDescriptor::self_manages_downloads (true only for flm) and ModelManager::backend_self_manages_downloads(). The two load-time auto-download guards in server.cpp/ollama_api.cpp now consult it instead of hardcoding recipe != "flm". flm is the only backend with the flag set, so behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
resolve_and_register_local_model() had a recipe if/else scanning the imported directory for each backend's primary artifact (.gguf / .bin / genai_config.json dir). Replace with BackendOps::find_imported_checkpoint(dir): default "" registers the directory (sd-cpp/kokoro/moonshine); llamacpp reuses resolve_gguf_path, whisper finds the .bin, ryzenai finds genai_config.json's dir (and its resolve_checkpoint_path now reuses the same scan). server.cpp holds no per-recipe import logic. Verified via local_import smoke tests for llamacpp (ignores mmproj), whisper, and a default backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconcile the self-describing-backends refactor with main's divergence: - backend_manager.cpp: keep both includes (main's backend_version_policy.h for resolve_latest_pin + our backend_descriptor_registry.h); the two version concerns are orthogonal. - model_manager.cpp resolve_model_path: keep the ops-based one-liner (backends::ops_for(recipe)->resolve_checkpoint_path) over main's inline recipe switchboard. - Port main's #2300 GGUF resolver improvements into llamacpp resolve_gguf_path: factor cases 0-5 into a resolve_gguf_variant lambda, resolve against the active refs/main snapshot first, then broaden to all snapshots when the active one lacks the variant. Restores test_034. - Regenerate backend docs/models.js for main's new server_models entries. Verified: C++ build clean; ctest 4/4 (incl. GgufCapabilities, LatestVersionFallback, InstallAtomicity); server_endpoints 70/70 (incl. main's #2300 test_034); server_cli2 only the pre-existing test_020 collection-name parsing failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On Windows the merged include chain pulls in the windows.h max() macro into this TU, turning std::numeric_limits<T>::max() into a syntax error (C2589). Wrap the calls as (std::numeric_limits<T>::max)() so the macro cannot expand. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
flm models come from flm's model_list.json at runtime (0 entries in server_models.json), but the descriptor had dynamic_models=false, so build_cache skipped flm's ops->discover_models() and flm models (e.g. llama3.2-1b-FLM) never registered -> 404. The build_cache comment already documents flm as a dynamic-discovery backend alongside cloud; align the descriptor with that intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
model_manager's download path hardcoded recipe == "moonshine" to fetch a variant directory of files. Add BackendOps::select_checkpoint_files (default nullopt = the GGUF/direct-file defaults) and override it in MoonshineOps. The download path no longer names a backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
system_info hardcoded a recipe == "flm" block to classify FLM's supported-but-unavailable state (.deb/driver manual setup) and emit troubleshoot links. Add BackendOps::classify_unavailable (default nullopt = the generic installable/no-fetch path) and implement it in FlmOps. system_info no longer names a backend in its install-state machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…==llamacpp bench hardcoded recipe == "llamacpp" to send the llamacpp_backend override. Use the CLI-safe descriptor registry: any recipe with selectable_backend gets its <config_section>_backend override (llamacpp and vllm today). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… ops model_manager hardcoded actual_recipe == "llamacpp" to require a :variant on GGUF checkpoints at registration. Add BackendOps::validate_registration_checkpoint (default accept) and implement the GGUF rule in LlamaCppOps. Verified: a GGUF checkpoint without :variant is still rejected; other recipes are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DRY pass across the backend folders: - Add backends::make_server<T>(ctx) for the standard (log_level, model_manager, backend_manager) construction; the 6 plain create() bodies now call it instead of repeating the three context fields. cloud/ryzenai keep bespoke create(). - Each *_server.h closed and re-opened namespace lemon::backends just to nest the per-backend namespace; nest it inline instead (8 headers). ryzenai is left as-is (its legacy RyzenAIServer lives in namespace lemon, not lemon::backends). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main advanced ~15 commits with its own GGUF-reader consolidation
(lemon/gguf_reader.h + gguf_capabilities.h, ModelInfo::gguf) and a cloud
discovery security gate. Reconcile:
- model_manager.cpp (6 conflicts): keep the ops-based forms (populate_metadata,
validate_checkpoint_file, discover_models, resolve_model_path, download
file-selection) over main's inline recipe switchboards.
- Consolidate GGUF reading on main's shared lemon::gguf_reader: drop the now
-redundant reader from backends/llamacpp/llamacpp_gguf.{h,cpp} (~240 lines),
keeping only the unique resolve_gguf_path; LlamaCppOps::populate_metadata now
fills ModelInfo::gguf via the shared reader.
- Port main's cloud-discovery allow_insecure_http gate into CloudOps::discover_models.
- Regenerate docs for main's new server_models entries.
Verified: build clean; ctest 5/5 (incl. GgufCapabilities, AutoTune); endpoints
71/71; cli only the pre-existing test_020; docs drift clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| | `--ctx-size SIZE` | Context size for the model | `4096` | | ||
| | `--ctx-size SIZE` | Context size for the model | auto | |
There was a problem hiding this comment.
nice its already catching some stale docs
cc @bitgamma
The per-backend spec()/ops() are the name-based adapter the CMake codegen binds (<stem>::spec/ops), so the functions must exist — but their bodies were repetitive. Add make_spec<T>(descriptor[, split]) (backend_utils.h, where BackendSpec is complete) and single_ops<T>() (backend_registry.h, next to make_server) so the 7 standard spec() and 7 custom ops() collapse to one line each. ryzenai (install key != recipe) and cloud (no spec) keep bespoke spec(); sd-cpp/vllm keep default_backend_ops(). Pure refactor — registry binding, 71/71 endpoints, and all-backends-registered smoke unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two backend dev docs added by this work (dev/adding-a-backend.md and the generated dev/backends-reference.md) were not wired into the Development nav. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onfig/defaults Per-recipe config defaults are now declared in each backend descriptor (takes_args / arg_variants / bin_variants / config_extra -> config_defaults()) instead of hand-maintained blocks in defaults.json. The committed resources/defaults.json stays fully populated (so it remains the discoverable reference for factory defaults) but is now generated: - New GET /internal/config/defaults emits the canonical default config (ConfigFile::base_defaults(): global keys + descriptor-derived per-recipe sections, host/deployment-independent). Documented alongside /internal/config. - gen_backend_docs.py -> gen_backend_boilerplate.py, which mirrors that endpoint verbatim into resources/defaults.json (whole-file) in addition to the doc regions. The existing CI --check now also fails if defaults.json drifts. config_file keeps reading defaults.json at runtime; base_defaults() re-seeds the descriptor blocks so the descriptor stays authoritative even if the file lags. Verified: a fresh config.json reproduces every prior default; endpoints 71/71; generator --check clean; black clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single-installable-unit path keyed off recipe != "llamacpp"; switch it to repo_kind != "gguf", the same server-provided classification the function already uses for the collection branch. Behavior-equivalent (collections are handled earlier, so by here repo_kind is gguf or onnx-ryzenai), and it drops the last backend-name literal from hf_pull. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor + a server class + a stateless behavior object, and every scattered
if (recipe == "...")site is rewritten to read a registry built from those descriptors. Backend-specific logic no longer leaks into the router, model manager, system-info, CLI, or docs.Layout — a backend is a folder
Adding a backend = one
LEMON_BACKENDSline inCMakeLists.txt+ that folder + abackend_versions.jsonpin +server_models.jsonentries. No router, CLI, doc, or support-matrix edits — those are all derived. CMake globs each backend folder (CONFIGURE_DEPENDS), so backend-private helper files need no build edit.What changed
Descriptor (
backend_descriptor.h) — plain data describing what a backend is: recipe, display name, binary, config section, default device,SlotPolicy,selectable_backend,uses_ctx_size,dynamic_models, declarativeoptions[], OS/GPUsupport[], default labels, required checkpoints, plus editorial/policy fields (modality,experimental,web_priority,rocm_channels,version_policy,exposes_prometheus_metrics,rocm_requires_cwsr_fix,self_manages_downloads).Two-tier registry, generated from
LEMON_BACKENDSat CMake configure time — a CLI-safe data registry (descriptors only; links into bothlemonadeandlemond) and a server-only factory registry (binds each descriptor to its class'screate(),spec(),ops()). This split lets the CLI read recipe options/flags from descriptors without linking server classes.BackendOps— stateless per-backend behavior (backend_ops.h): the model-management logic that happens without a running subprocess. The base class is the shared Hugging Face behavior; each backend overrides only its policy points, so shared download/cache logic is inherited, not copied. Methods:populate_metadata,resolve_checkpoint_path,find_imported_checkpoint,validate_registration_checkpoint,select_checkpoint_files,discover_models,is_downloaded,validate_checkpoint_file,download_model,invalidates_cache_after_download,resolve_version,check_install,classify_unavailable. This is what letmodel_manager.cppandsystem_info.cppshed their per-recipe switchboards (resolve_model_pathwent from a ~290-lineif/elseto oneops_for(recipe)->…call).Descriptor/ops-driven sites — router creation, NPU/slot eviction & cloud LRU exemption (
SlotPolicy, no recipe literals left inrouter.cpp), device type, recipe options / CLI flags / defaults, config-section identity, ROCm channels (recipe_has_rocm_channels), the support matrix (RECIPE_DEFSdeleted fromsystem_info.cpp), recipe→label inference, FLM dynamic discovery, the FLM install-state machine, cloud availability + discovery, and the install-state UI hints.Registration helpers —
make_server<T>/make_spec<T>/single_ops<T>keep the per-backendcreate()/spec()/ops()one-liners DRY (irregular backends — cloud, ryzenai, vllm — keep bespoke bodies)./system-inforecipesentries enriched withdisplay_name/selectable_backend/uses_ctx_size/options/support. The desktop app reads recipe display names from/system-infoinstead of hardcoded TypeScript.Docs generation —
docs/tools/gen_backend_docs.pybootslemond, reads/system-info+server_models.json, and rewrites marker-delimited regions of six docs (README.mdsupport matrix,guide/cli.md,guide/configuration/README.md,guide/configuration/multi-model.md,custom-models.md,dev/backends-reference.md) plusassets/models.js. A CI job (backend-docs-drift) fails on drift. Authoring guide added atdev/adding-a-backend.md(both wired into the mkdocs nav).ModelInfo::extras— genericmap<string, json>populated from unknownserver_models.jsonkeys, so a new backend adds per-model fields without editing shared structs.Verification
Local:
lemond+lemonadeCLI + web-app build clean; C++ unit tests ctest 5/5 (incl. GgufCapabilities, AutoTune, LatestVersionFallback, InstallAtomicity); server_endpoints 71/71;/system-infocarries the enriched fields; docs--checkclean; a registry smoke confirms all backends register and route. Cross-platform + clean-environment validation via CI.One pre-existing local failure unrelated to this change (reproduced on
main):server_cli2test_020_list— a built-in collection name with a space ("Lite Collection") breaks the test's whitespace-based table parser.Notes for reviewers
recipeOptionsConfig.ts(the TypeScript-typed per-recipe option forms) is intentionally left to maintainers perAGENTS.md; the schema is now exposed via/system-infofor a future dynamic migration.BackendSpec(install params are class-side behavior); the descriptor supplies the binary name.cloudrecipe checks (the dynamic-models exception),collection.omni(the orchestrator exception, not aWrappedServer),inspect_reporepo→recipe detection (its collection branch is that same exception), anddefaults.jsongeneration (its variant*_args/*_binkeys aren't in the descriptoroptions, so generating it would need a config-schema expansion that risks the config contract).🤖 Generated with Claude Code