refactor(backends): self-describing WrappedServer backends (#2287) by jeremyfowers · Pull Request #2320 · lemonade-sdk/lemonade

jeremyfowers · 2026-06-19T16:48:45Z

Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor + a server class + a stateless behavior object, and every scattered if (recipe == "...") site is rewritten to read a registry built from those descriptors. Backend-specific logic no longer leaks into the router, model manager, system-info, CLI, or docs.

Layout — a backend is a folder

include/lemon/backends/<stem>/<stem>.h          # descriptor (header-only inline const, CLI-safe)
include/lemon/backends/<stem>/<stem>_server.h   # WrappedServer subclass + create()/spec()/ops() decls
server/backends/<stem>/<stem>_server.cpp        # server class impl, BackendOps subclass, create/spec/ops
                                                # (+ any backend-private helpers, e.g. llamacpp_gguf.cpp)

Adding a backend = one LEMON_BACKENDS line in CMakeLists.txt + that folder + a backend_versions.json pin + server_models.json entries. No router, CLI, doc, or support-matrix edits — those are all derived. CMake globs each backend folder (CONFIGURE_DEPENDS), so backend-private helper files need no build edit.

What changed

Descriptor (backend_descriptor.h) — plain data describing what a backend is: recipe, display name, binary, config section, default device, SlotPolicy, selectable_backend, uses_ctx_size, dynamic_models, declarative options[], OS/GPU support[], default labels, required checkpoints, plus editorial/policy fields (modality, experimental, web_priority, rocm_channels, version_policy, exposes_prometheus_metrics, rocm_requires_cwsr_fix, self_manages_downloads).
Two-tier registry, generated from LEMON_BACKENDS at CMake configure time — a CLI-safe data registry (descriptors only; links into both lemonade and lemond) and a server-only factory registry (binds each descriptor to its class's create(), spec(), ops()). This split lets the CLI read recipe options/flags from descriptors without linking server classes.
BackendOps — stateless per-backend behavior (backend_ops.h): the model-management logic that happens without a running subprocess. The base class is the shared Hugging Face behavior; each backend overrides only its policy points, so shared download/cache logic is inherited, not copied. Methods: populate_metadata, resolve_checkpoint_path, find_imported_checkpoint, validate_registration_checkpoint, select_checkpoint_files, discover_models, is_downloaded, validate_checkpoint_file, download_model, invalidates_cache_after_download, resolve_version, check_install, classify_unavailable. This is what let model_manager.cpp and system_info.cpp shed their per-recipe switchboards (resolve_model_path went from a ~290-line if/else to one ops_for(recipe)->… call).
Descriptor/ops-driven sites — router creation, NPU/slot eviction & cloud LRU exemption (SlotPolicy, no recipe literals left in router.cpp), device type, recipe options / CLI flags / defaults, config-section identity, ROCm channels (recipe_has_rocm_channels), the support matrix (RECIPE_DEFS deleted from system_info.cpp), recipe→label inference, FLM dynamic discovery, the FLM install-state machine, cloud availability + discovery, and the install-state UI hints.
Registration helpers — make_server<T> / make_spec<T> / single_ops<T> keep the per-backend create()/spec()/ops() one-liners DRY (irregular backends — cloud, ryzenai, vllm — keep bespoke bodies).
/system-info recipes entries enriched with display_name / selectable_backend / uses_ctx_size / options / support. The desktop app reads recipe display names from /system-info instead of hardcoded TypeScript.
Docs generation — docs/tools/gen_backend_docs.py boots lemond, reads /system-info + server_models.json, and rewrites marker-delimited regions of six docs (README.md support matrix, guide/cli.md, guide/configuration/README.md, guide/configuration/multi-model.md, custom-models.md, dev/backends-reference.md) plus assets/models.js. A CI job (backend-docs-drift) fails on drift. Authoring guide added at dev/adding-a-backend.md (both wired into the mkdocs nav).
ModelInfo::extras — generic map<string, json> populated from unknown server_models.json keys, so a new backend adds per-model fields without editing shared structs.

Verification

Local: lemond + lemonade CLI + web-app build clean; C++ unit tests ctest 5/5 (incl. GgufCapabilities, AutoTune, LatestVersionFallback, InstallAtomicity); server_endpoints 71/71; /system-info carries the enriched fields; docs --check clean; a registry smoke confirms all backends register and route. Cross-platform + clean-environment validation via CI.

One pre-existing local failure unrelated to this change (reproduced on main): server_cli2 test_020_list — a built-in collection name with a space ("Lite Collection") breaks the test's whitespace-based table parser.

Notes for reviewers

recipeOptionsConfig.ts (the TypeScript-typed per-recipe option forms) is intentionally left to maintainers per AGENTS.md; the schema is now exposed via /system-info for a future dynamic migration.
Backend install still goes through each backend's BackendSpec (install params are class-side behavior); the descriptor supplies the binary name.
Deliberately left as documented exceptions (not oversights): cloud recipe checks (the dynamic-models exception), collection.omni (the orchestrator exception, not a WrappedServer), inspect_repo repo→recipe detection (its collection branch is that same exception), and defaults.json generation (its variant *_args/*_bin keys aren't in the descriptor options, so generating it would need a config-schema expansion that risks the config contract).

🤖 Generated with Claude Code

Make each inference backend describe itself with a plain-data descriptor plus a server class, and rewrite the scattered `if (recipe == "...")` sites to read a registry built from those descriptors. Adding a backend becomes one LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or support-matrix edits. - Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe data registry and a server-only factory registry, generated from the LEMON_BACKENDS list at CMake configure time. - All 9 backends carry a descriptor (device, slot policy, options, support matrix, labels, binary) and a create(). - Descriptor-driven: router creation, NPU/slot eviction, device type, recipe options/CLI flags, config-section identity, support matrix, recipe labels, cloud availability. - /system-info recipes enriched with display_name/selectable_backend/options/ support; the app reads recipe display names from it instead of hardcoded TS. - docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from /system-info; a CI step fails on drift. Authoring guide in docs/dev/adding-a-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfowers · 2026-06-19T17:45:39Z

CI status

All cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake LEMON_BACKENDS codegen, and the CLI-safe/server-only registry split compile everywhere. Functional jobs exercising this change pass: CLI/Endpoints (ubuntu + macOS), Test .exe (whisper, moonshine, stable-diffusion, text-to-speech), backend-docs-drift, plus locally endpoints (69), pinning (6), app-regression (37).

The single red — Test CLI/Endpoints (windows-latest) → test_026_anthropic_messages_tool_calling — is a pre-existing flaky timeout, not from this PR. It's a 500 s ReadTimeout on a tool-calling inference request that the Windows runner intermittently can't finish in time:

main run 27765794877: same job fails on the same test with the identical read timeout=500 signature.
main run 27795912134: same job passes.

This PR touches backend construction, not inference, anthropic_api.cpp, or the tool-calling loop, so it can't change that test's latency. Re-running the job.

Restructure the self-describing backends to the layout the issue #2287 plan specified — one folder per backend — instead of the flat file layout I used before. This also folds the earlier _descriptor/_factory split into the spec's cleaner shape: the descriptor is a header-only `inline const` and create() lives with the server class. Each backend now lives in its own folder, in namespace lemon::backends::<stem>: include/lemon/backends/<stem>/<stem>.h inline const descriptor (CLI-safe data) include/lemon/backends/<stem>/<stem>_server.h WrappedServer subclass + create() decl server/backends/<stem>/<stem>_server.cpp implementation + create() def Shared registry/util files stay at the top of backends/. The CMake foreach over LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and *_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Make the existing curated docs generate from the backend descriptors instead of just shipping a separate reference file — closing appendix rows 14 and 22. - Expand the descriptor with the editorial fields the curated docs need: `modality`, `experimental`, `web_display_name`, and a per-support-row `device_summary` (RecipeBackendDef). These keep the descriptor the single source of truth. - /system-info exposes them plus a registry `order` index and `slot_policy`. - gen_backend_docs.py now targets multiple docs and renders: * README.md "Supported Configurations" HTML matrix (grouped by modality, merged rows, rowspans, experimental tag) — wrapped in GENERATED markers; * docs/guide/configuration/multi-model.md NPU-exclusivity list. The backend-docs-drift CI job's --check now covers all three docs. The generated README matrix is also more complete than the hand-written one (it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes and prose outside the markers are preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render them from the descriptor options. This also fixes pre-existing drift: the section documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no longer registers, and omitted the moonshine and vllm recipes. Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add inline-marker support to the generator and wrap the `--recipe` "Common values" list in custom-models.md so it renders from the descriptor recipe set (plus collection.omni). Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Close the last two cleanly-derivable doc touchpoints (appendix rows 16 and 21). - configuration/README.md "Example config.json": generated from a fresh lemond's GET /internal/config (the real canonical config). This also fixes pre-existing drift — the hand-written block had `config_version: 1` (now 2), `prefer_system: false` (now true), a stray `device` key, and an invalid trailing comma. `port` is normalized to the documented default 13305. - docs/assets/models.js RECIPE_PRIORITY + RECIPE_DISPLAY_NAMES: generated from descriptors. A new `web_priority` editorial field preserves the curated website ordering (so the order is descriptor-sourced, not a silent reorder); legacy `oga-*` recipes are dropped as agreed. Adds the correct `vllm` display name. The generator now drives 7 docs and supports both `` (Markdown) and `/* */` (JS) GENERATED markers. backend-docs-drift --check covers all of them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ive spec; drop device map) Two agreed plan touchpoints were left incomplete; this finishes them. Row 4 — try_get_spec_for_recipe was still a hand-written 8-branch if-ladder in backend_utils.cpp, which also forced it to #include all 8 server headers. Each backend now exposes a uniform `spec()` accessor (alongside create()); the generated factory registry binds it, and `backends::spec_for(recipe)` / try_get_spec_for_recipe iterate the registry. backend_utils.cpp now includes ZERO server headers. Also reroute the two leaking `Server::SPEC` references (model_manager find_flm_binary) through the registry. Row 5 — get_device_type_from_recipe still carried the full recipe->device map, redundant with BackendDescriptor::default_device. Reduced to a DEVICE_NONE fallback for non-descriptor recipes (collections/unknown); the descriptor is the single source via ModelManager::device_type_for_recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Introduce a stateless per-backend behavior interface for model management that happens WITHOUT a running subprocess (checkpoint-path resolution, download, dynamic discovery, per-model metadata, version detection, availability) — the home for the recipe switchboards currently scattered through model_manager and system_info. - BackendOps base class (lemon/backends/backend_ops.h): shared default behavior; backends override only the policy points they need (inherit shared logic, don't copy it). Methods are added incrementally as switchboards migrate; each has a default so adding one never forces edits to backends that don't override it. - Each backend folder exposes a uniform ops() singleton (alongside create()/spec()), bound into BackendRegistration; backends::ops_for(recipe) returns it. - Purely additive: every backend uses the default base ops for now, so there is no behavior change yet. Migrations follow in subsequent commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…readers into folders Replace the populate_model_metadata recipe switchboard with ops_for(recipe)->populate_metadata(). The backend-specific readers move into their folders: - GGUF metadata reader (read_gguf_metadata + byte parsers) -> backends/llamacpp/ llamacpp_gguf.{h,cpp}; LlamaCppOps::populate_metadata reads arch + capability labels there. - FLM model-file helpers (config.json ctx window, model-dir discovery) -> backends/fastflowlm/fastflowlm_models.{h,cpp}; FlmOps::populate_metadata uses it. model_manager no longer knows how either backend stores or introspects models. CMake now globs each backend folder's *.cpp (CONFIGURE_DEPENDS) so backend-private helper files need no CMake edit; the backend LIST stays explicit. Verified: GGUF context windows still populate (131072/128000/32768 for sample models) and test_gguf_capabilities passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…llamacpp||sd-cpp)&&rocm) Add a `rocm_channels` descriptor field (llamacpp {"stable","nightly"}, sd-cpp {"stable"}) and a recipe_has_rocm_channels() registry helper. Replace the hardcoded `(recipe=="llamacpp"||recipe=="sd-cpp") && rocm` predicate — copied across backend_utils.cpp (3×), backend_manager.cpp (2×), and system_info.cpp — with the descriptor check. rocm_channel_for_recipe() now clamps a requested channel to one the backend publishes (so sd-cpp's missing "nightly" -> "stable" falls out of the data instead of a per-recipe special case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rst leak) Replace the ~290-line recipe switchboard in ModelManager::resolve_model_path with ops_for(recipe)->resolve_checkpoint_path(). The model manager now only does the generic prefix (collections, local_path/local_upload, HF cache-dir computation) and hands off to the backend. - New BackendOps::resolve_checkpoint_path; base = the shared HF behavior (active-snapshot variant/aux resolution, main-repo fallback, directory fallback). Backends override only their artifact layout: * llamacpp -> GGUF resolver (sharding/folder/quant-token), moved into backends/llamacpp/llamacpp_gguf (resolve_gguf_path). * ryzenai -> genai_config.json directory; kokoro -> index.json; whispercpp -> first .bin; cloud -> ""; flm -> checkpoint passthrough. - New shared backends/hf_cache_util (exists/dir_options/active_snapshot_path/ repo_id_to_cache_dir_name) so ops reuse the same HF-cache mechanics. model_manager.cpp -362 lines; resolve_model_path 365 -> 34. Verified all recipes still resolve as downloaded (llamacpp variants, whisper .bin, kokoro index, sd-cpp, ryzenai, flm) via /models. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…FLM cluster → folder Dynamic discovery, download status, and downloading now flow through BackendOps instead of recipe switchboards in model_manager: - discover_models: build_cache loops descriptors with dynamic_models=true and merges ops->discover_models(). FLM (`flm list`) and cloud (per-provider) both implement it — the two bespoke discovery blocks collapse to one generic loop. - is_downloaded: base = shared HF completeness (ModelManager::checkpoints_complete); CloudOps → true; FlmOps → installed-set membership. Replaces the flm_set/cloud/ else branches in build_cache and add_model_to_cache. - validate_checkpoint_file: LlamaCppOps does the GGUF-magic check (was an inline llamacpp branch in are_required_checkpoints_complete). - download_model: base = shared HF engine (download_from_huggingface_engine); FlmOps → flm pull; CloudOps → no-op. download_registered_model just dispatches. invalidates_cache_after_download() replaces the recipe=="flm" cache-reset. The whole FLM cluster (find_flm_binary, flm_installed_checkpoints, flm_discover_models, flm_download) moves into backends/fastflowlm/fastflowlm_models. model_manager keeps only the generic HF engine. Verified: server_endpoints 69 pass; download status correct for every recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s hook get_recipe_version now reads version.txt generically and lets the backend ops override, instead of branching on recipe. The per-backend version commands move into their folders: - system llama-server version (`llama-server --version` + regex) -> backends/ llamacpp; LlamaCppOps::resolve_version returns it for the "system" backend. - flm version (`flm version --json`) -> backends/fastflowlm (flm_version()); FlmOps::resolve_version returns it when no version.txt is present. Removes SystemInfo::get_system_llamacpp_version / get_flm_version and the llamacpp-system / flm branches from system_info. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

config_section duplicated the recipe string in 8 descriptors; it defaults to the recipe via effective_config_section(), so set those to "". Only sd-cpp ("sdcpp") and ryzenai-llm ("ryzenai") keep an explicit section because theirs genuinely differ from the recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_metrics descriptor flag prometheus_metrics.cpp hardcoded `recipe == "llamacpp"` to decide whether to scrape a backend subprocess's /metrics. Replace with a descriptor flag (exposes_prometheus_metrics; llamacpp = true) so a new backend that exposes Prometheus metrics opts in via its descriptor, not by editing the metrics code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

These backend-specific per-model fields no longer sit on the shared ModelInfo struct: llamacpp reads info.extra<bool>("hf_load", false) and moonshine reads info.extra<int>("moonshine_arch", -1). Removed the typed fields, their explicit parse sites, and their kKnownKeys entries; added parse_extras() to the two ModelInfo-building paths that lacked it (add_model_to_cache, get_model_info_ unfiltered) so extras populate everywhere a model is built from JSON. Verified: llamacpp models still resolve/download (hf_load path intact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the hardcoded (sd-cpp||llamacpp||vllm)&&rocm recipe-list in is_recipe_installed and build_recipes_info with a rocm_requires_cwsr_fix descriptor flag (set on those three backends). The kernel CWSR detection (needs_gfx1151_cwsr_fix) stays in system_info as generic hardware detection; only "which backends' rocm build needs it" is now descriptor data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ps hook is_recipe_installed now finds the managed binary generically and asks the backend's ops whether it's actually installed, instead of hardcoding the llamacpp-system HIP check and the flm PATH fallback: - check_install(backend, binary_found) ops hook; base = installed iff binary found. LlamaCppOps adds the ggml HIP-plugin requirement for the "system" build on AMD GPUs; FlmOps treats a PATH-installed flm as present. - is_ggml_hip_plugin_available moves into backends/llamacpp; find_flm_executable and run_flm_validate move into backends/fastflowlm. Removed from path_utils (+ their orphaned decls/comments). system_info no longer carries llamacpp/flm-specific availability knowledge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… vs AtLeast) The update-required check special-cased recipe=="flm" to allow an installed version newer than the pin. Replace with a version_policy descriptor field (Exact default; flm = AtLeast for its system-managed package). system_info no longer names flm in the version-comparison logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The `flm remove` subprocess orchestration moves out of ModelManager::delete_model into backends/fastflowlm (flm_remove). model_manager keeps only the generic HF-cache deletion path; the flm branch is now a thin call into the backend. Verified: server_endpoints 69 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cipe blocks RuntimeConfig::recipe_options() had a hardcoded nested→flat translation block per recipe (llamacpp/whispercpp/moonshine/sdcpp/vllm). Replace with a single loop over the descriptors: each option's config.json key is derived from its name role (*_backend → "backend", *_args → variant "<backend>_args"/"args", *_device → "device", else the option name verbatim for sd-cpp's steps/cfg_scale/ width/height). Adding a backend no longer requires editing this function. Verified: server_endpoints 69 pass (config/params translation unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… across descriptor↔server.h) The backend binary name (and recipe) were duplicated between the descriptor (<stem>.h) and the BackendSpec literal (<stem>_server.h) — the cross-file redundancy. Remove the static SPEC member; each backend's spec() now builds the BackendSpec lazily from descriptor.binary (+ descriptor.recipe, or the explicit "ryzenai-server" install id where it differs) plus the class's get_install_params and split flag. In-class binary lookups go through spec(); server.cpp's sd upscale uses try_get_spec_for_recipe. Net: the binary name now lives in exactly one place (the descriptor). Lazy function-local statics also avoid any static-init-order coupling between the descriptor and the spec. Verified: builds green; system-info install detection unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The recipe was repeated on every support row (6x in llamacpp.h). Introduce a recipe-free BackendSupport struct; the owning descriptor's recipe is filled in by recipe_defs() when flattening to RecipeBackendDef. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The preceding generic block already handles backend_versions[recipe] for any recipe, so the recipe=="llamacpp" branch was unreachable duplicate code. Removing it also drops a hardcoded backend name from shared code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

find_flm_server_by_type -> find_coexisting_server_by_type matches on SlotPolicy::CoexistByType; count_pinned_servers_by_type skips SlotPolicy::Unmetered instead of recipe=="cloud". router.cpp now holds zero backend-name string literals; both behaviors are unchanged (flm is the only CoexistByType backend, cloud the only Unmetered one). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ipe==flm Add BackendDescriptor::self_manages_downloads (true only for flm) and ModelManager::backend_self_manages_downloads(). The two load-time auto-download guards in server.cpp/ollama_api.cpp now consult it instead of hardcoding recipe != "flm". flm is the only backend with the flag set, so behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

resolve_and_register_local_model() had a recipe if/else scanning the imported directory for each backend's primary artifact (.gguf / .bin / genai_config.json dir). Replace with BackendOps::find_imported_checkpoint(dir): default "" registers the directory (sd-cpp/kokoro/moonshine); llamacpp reuses resolve_gguf_path, whisper finds the .bin, ryzenai finds genai_config.json's dir (and its resolve_checkpoint_path now reuses the same scan). server.cpp holds no per-recipe import logic. Verified via local_import smoke tests for llamacpp (ignores mmproj), whisper, and a default backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reconcile the self-describing-backends refactor with main's divergence: - backend_manager.cpp: keep both includes (main's backend_version_policy.h for resolve_latest_pin + our backend_descriptor_registry.h); the two version concerns are orthogonal. - model_manager.cpp resolve_model_path: keep the ops-based one-liner (backends::ops_for(recipe)->resolve_checkpoint_path) over main's inline recipe switchboard. - Port main's #2300 GGUF resolver improvements into llamacpp resolve_gguf_path: factor cases 0-5 into a resolve_gguf_variant lambda, resolve against the active refs/main snapshot first, then broaden to all snapshots when the active one lacks the variant. Restores test_034. - Regenerate backend docs/models.js for main's new server_models entries. Verified: C++ build clean; ctest 4/4 (incl. GgufCapabilities, LatestVersionFallback, InstallAtomicity); server_endpoints 70/70 (incl. main's #2300 test_034); server_cli2 only the pre-existing test_020 collection-name parsing failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

On Windows the merged include chain pulls in the windows.h max() macro into this TU, turning std::numeric_limits<T>::max() into a syntax error (C2589). Wrap the calls as (std::numeric_limits<T>::max)() so the macro cannot expand. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

flm models come from flm's model_list.json at runtime (0 entries in server_models.json), but the descriptor had dynamic_models=false, so build_cache skipped flm's ops->discover_models() and flm models (e.g. llama3.2-1b-FLM) never registered -> 404. The build_cache comment already documents flm as a dynamic-discovery backend alongside cloud; align the descriptor with that intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

model_manager's download path hardcoded recipe == "moonshine" to fetch a variant directory of files. Add BackendOps::select_checkpoint_files (default nullopt = the GGUF/direct-file defaults) and override it in MoonshineOps. The download path no longer names a backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

system_info hardcoded a recipe == "flm" block to classify FLM's supported-but-unavailable state (.deb/driver manual setup) and emit troubleshoot links. Add BackendOps::classify_unavailable (default nullopt = the generic installable/no-fetch path) and implement it in FlmOps. system_info no longer names a backend in its install-state machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…==llamacpp bench hardcoded recipe == "llamacpp" to send the llamacpp_backend override. Use the CLI-safe descriptor registry: any recipe with selectable_backend gets its <config_section>_backend override (llamacpp and vllm today). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… ops model_manager hardcoded actual_recipe == "llamacpp" to require a :variant on GGUF checkpoints at registration. Add BackendOps::validate_registration_checkpoint (default accept) and implement the GGUF rule in LlamaCppOps. Verified: a GGUF checkpoint without :variant is still rejected; other recipes are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DRY pass across the backend folders: - Add backends::make_server<T>(ctx) for the standard (log_level, model_manager, backend_manager) construction; the 6 plain create() bodies now call it instead of repeating the three context fields. cloud/ryzenai keep bespoke create(). - Each *_server.h closed and re-opened namespace lemon::backends just to nest the per-backend namespace; nest it inline instead (8 headers). ryzenai is left as-is (its legacy RyzenAIServer lives in namespace lemon, not lemon::backends). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

main advanced ~15 commits with its own GGUF-reader consolidation (lemon/gguf_reader.h + gguf_capabilities.h, ModelInfo::gguf) and a cloud discovery security gate. Reconcile: - model_manager.cpp (6 conflicts): keep the ops-based forms (populate_metadata, validate_checkpoint_file, discover_models, resolve_model_path, download file-selection) over main's inline recipe switchboards. - Consolidate GGUF reading on main's shared lemon::gguf_reader: drop the now -redundant reader from backends/llamacpp/llamacpp_gguf.{h,cpp} (~240 lines), keeping only the unique resolve_gguf_path; LlamaCppOps::populate_metadata now fills ModelInfo::gguf via the shared reader. - Port main's cloud-discovery allow_insecure_http gate into CloudOps::discover_models. - Regenerate docs for main's new server_models entries. Verified: build clean; ctest 5/5 (incl. GgufCapabilities, AutoTune); endpoints 71/71; cli only the pre-existing test_020; docs drift clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfowers · 2026-06-25T20:44:47Z

-| `--ctx-size SIZE` | Context size for the model | `4096` |
+| `--ctx-size SIZE` | Context size for the model | auto |


nice its already catching some stale docs

cc @bitgamma

The per-backend spec()/ops() are the name-based adapter the CMake codegen binds (<stem>::spec/ops), so the functions must exist — but their bodies were repetitive. Add make_spec<T>(descriptor[, split]) (backend_utils.h, where BackendSpec is complete) and single_ops<T>() (backend_registry.h, next to make_server) so the 7 standard spec() and 7 custom ops() collapse to one line each. ryzenai (install key != recipe) and cloud (no spec) keep bespoke spec(); sd-cpp/vllm keep default_backend_ops(). Pure refactor — registry binding, 71/71 endpoints, and all-backends-registered smoke unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The two backend dev docs added by this work (dev/adding-a-backend.md and the generated dev/backends-reference.md) were not wired into the Development nav. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…onfig/defaults Per-recipe config defaults are now declared in each backend descriptor (takes_args / arg_variants / bin_variants / config_extra -> config_defaults()) instead of hand-maintained blocks in defaults.json. The committed resources/defaults.json stays fully populated (so it remains the discoverable reference for factory defaults) but is now generated: - New GET /internal/config/defaults emits the canonical default config (ConfigFile::base_defaults(): global keys + descriptor-derived per-recipe sections, host/deployment-independent). Documented alongside /internal/config. - gen_backend_docs.py -> gen_backend_boilerplate.py, which mirrors that endpoint verbatim into resources/defaults.json (whole-file) in addition to the doc regions. The existing CI --check now also fails if defaults.json drifts. config_file keeps reading defaults.json at runtime; base_defaults() re-seeds the descriptor blocks so the descriptor stays authoritative even if the file lags. Verified: a fresh config.json reproduces every prior default; endpoints 71/71; generator --check clean; black clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The single-installable-unit path keyed off recipe != "llamacpp"; switch it to repo_kind != "gguf", the same server-provided classification the function already uses for the collection branch. Behavior-equivalent (collections are handled earlier, so by here repo_kind is gguf or onnx-ryzenai), and it drops the last backend-name literal from hf_pull. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added the enhancement New feature or request label Jun 19, 2026

jeremyfowers and others added 27 commits June 19, 2026 16:25

jeremyfowers and others added 8 commits June 22, 2026 19:18

jeremyfowers commented Jun 25, 2026

View reviewed changes

jeremyfowers self-assigned this Jun 25, 2026

jeremyfowers and others added 4 commits June 25, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(backends): self-describing WrappedServer backends (#2287)#2320

refactor(backends): self-describing WrappedServer backends (#2287)#2320
jeremyfowers wants to merge 40 commits into
mainfrom
feat/self-describing-backends

jeremyfowers commented Jun 19, 2026 •

edited

Loading

Uh oh!

jeremyfowers commented Jun 19, 2026

Uh oh!

jeremyfowers Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		\| `--ctx-size SIZE` \| Context size for the model \| `4096` \|
		\| `--ctx-size SIZE` \| Context size for the model \| auto \|

Uh oh!

Conversation

jeremyfowers commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Layout — a backend is a folder

What changed

Verification

Notes for reviewers

Uh oh!

jeremyfowers commented Jun 19, 2026

CI status

Uh oh!

jeremyfowers Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeremyfowers commented Jun 19, 2026 •

edited

Loading