Add MLX backend adapter for lemon-mlx-engine inference by bong-water-water-bong · Pull Request #2419 · lemonade-sdk/lemonade

bong-water-water-bong · 2026-06-25T13:09:22Z

Summary

Adds a new mlx recipe backend to lemond that spawns lemon-mlx-engine's OpenAI-compatible HTTP inference server as a subprocess.

Architecture

lemond spawns mlx-server as a subprocess, health-checks it, and proxies /v1/chat/completions with automatic model name → filesystem path rewriting.

Files Changed

File	Change
src/cpp/server/backends/mlx_server.cpp	New — Backend adapter
src/cpp/include/lemon/backends/mlx_server.h	New — Header
src/cpp/server/router.cpp	Added mlx case
src/cpp/server/backends/backend_utils.cpp	Recipe registration
src/cpp/server/system_info.cpp	GPU defs (gfx115x/110X/120X + metal)
CMakeLists.txt	Source file

Verified

Model appears in /v1/models with recipe=mlx
/v1/load spawns mlx-server (704ms startup)
Health check passes, backend watchdog active
Tested on gfx1151 (Strix Halo)

…de-sdk#2402) The macOS Metal asset embeds the macOS runner version (e.g. sd-...-bin-Darwin-macOS-15.7.7-arm64.zip), which changes on every upstream build. PR lemonade-sdk#2102 replaced the hardcoded version with a `*` wildcard but never added code to resolve it, so the literal `*` went into the download URL and 404'd. Resolve the wildcard against the GitHub Releases-by-tag API before building the download URL. No-op (zero network cost) for any asset name without a wildcard. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds mlx_server backend that spawns lemon-mlx-engine's OpenAI-compatible HTTP inference server as a lemond subprocess backend. Files: - src/cpp/server/backends/mlx_server.cpp — backend adapter - src/cpp/include/lemon/backends/mlx_server.h — header - Modified CMakeLists.txt — add source file - Modified backend_utils.cpp — register 'mlx' recipe - Modified system_info.cpp — recipe defs (Linux ROCm + macOS Metal) Recipe name: 'mlx' Binary: 'mlx-server' (built from source, system package install) Protocol: OpenAI-compatible HTTP on localhost (health at /health) Supports: AMD GPU (gfx1150/gfx1151/gfx110X/gfx120X) on Linux, Metal on macOS

- Add MLX case in router.cpp create_backend_server() - Include mlx_server.h in router.cpp - Fix mlx_server.cpp to remove unused options parsing - Add test model entry in server_models.json - Update backend_versions.json with mlx entry Part of integrating lemon-mlx-engine as a lemond backend.

🔴 CRITICAL fixes: - Store model_path_ and rewrite model field in forwarded requests (mlx-server requires filesystem path, not public model name) - Map responses() to /v1/chat/completions (mlx-server lacks /v1/responses) 🟡 WARNING fixes: - Pass --no-download flag to prevent unintended HF downloads - Propagate LD_LIBRARY_PATH and ROCm_DIR to subprocess environment - Always inherit stderr for debugging - Add fs::is_directory() validation for model path Verified: lemond spawns mlx-server, health check passes (908ms startup)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0609837537

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T13:20:20Z

+    "size": 19.3
+  },
+  "MLX-Llama-3.2-1B-Instruct": {
+    "checkpoint": "/home/bcloud/models/llama-1b",


Replace the machine-local MLX checkpoint path

This built-in suggested model points at /home/bcloud/models/llama-1b, which only exists on the author's machine. In Lemonade's normal load path, downloaded from server_models.json is not trusted; the model manager recomputes download status and will try to treat this checkpoint as a Hugging Face repo/cache entry before failing on other users' systems. Use a portable checkpoint or leave local MLX models to user/extra-model registration.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:20Z

+    std::string model_path = model_info.checkpoint();
+    if (model_path.empty()) {
+        model_path = model_info.resolved_path();


Prefer resolved MLX paths after downloads

For a normal registered MLX model with a Hugging Face checkpoint, checkpoint() stays as the repo id while resolved_path() is the local cache directory populated by the model manager. Because this code always uses the non-empty checkpoint first, a successfully pulled MLX model still checks fs::exists("org/repo") and fails to load. Prefer resolved_path() when it is populated, and only fall back to checkpoint() for explicit local-path registrations.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:20Z

+    if (!model_path_.empty()) {
+        modified["model"] = model_path_;
+    }
+    return forward_request("/v1/chat/completions", modified);


Rewrite MLX model fields for streaming requests

This model-path rewrite only runs for non-streaming calls. With stream: true, the server uses Router::chat_completion_stream/completion_stream, which forwards the original request body directly through forward_streaming_request; mlx-server then receives the public Lemonade model name instead of the filesystem path it requires. Override forward_streaming_request for MLX and apply the same model_path_ rewrite before forwarding.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:20Z

+    if (!model_path_.empty()) {
+        modified["model"] = model_path_;
+    }
+    return forward_request("/v1/chat/completions", modified);


Translate or reject MLX Responses API requests

A /v1/responses request uses the Responses schema (input, max_output_tokens, response-shaped output), not the Chat Completions schema (messages, chat-shaped output). Forwarding it unchanged to /v1/chat/completions makes OpenAI responses.create(...) calls fail or return the wrong response shape for MLX models; the streaming path also still targets /v1/responses, which this backend says mlx-server lacks. Either perform a full request/response translation or return an unsupported-operation error.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:20Z

-}
+  "clear_bin_if_lemonade_below": "9.4.0",
+  "mlx": {
+    "system": "26aad7e"


Avoid pinning a version for the system MLX backend

For a system backend, get_recipe_version() returns unknown because there is no Lemonade-managed version.txt, but this new expected version makes system_info mark an otherwise available mlx-server as update_required. The advertised action is lemonade backends install mlx:system, which BackendManager::install_backend() immediately no-ops for system, so users are left with a permanently update-required backend even when the binary is in PATH.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:20Z

+#endif
+
+    // Start process (always inherit stderr for debugging)
+    process_handle_ = ProcessManager::start_process(executable, args, env_vars, true, true);


Pass MLX environment overrides via env_vars

Here env_vars is passed as the third argument, but ProcessManager::start_process() treats that parameter as working_dir; the actual environment override parameter is the sixth vector<pair<...>>. On Linux installs that rely on LD_LIBRARY_PATH or ROCm_DIR for ROCm libraries, the spawned mlx-server will not receive those values and may fail to start even though lemond's environment is correct.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T13:20:21Z

+    if (process_handle_.pid > 0) {
+#endif
+        ProcessManager::stop_process(process_handle_);
+        process_handle_ = {nullptr, 0};


Use WrappedServer handle helpers during MLX unload

MLX directly reads and writes process_handle_ while the backend watchdog/status paths access the same field through process_mutex_, and unload() does not stop the watchdog before killing the process. Once wait_for_ready() has started the watchdog, unloading an MLX model can race with get_process_handle_snapshot()/watchdog reset logic and expose a stale PID; use set_process_handle()/consume_process_handle_for_cleanup() and stop the watchdog like the other subprocess backends.

Useful? React with 👍 / 👎.

jeremyfowers · 2026-06-25T17:55:41Z

keep in mind #2287 is coming

jeremyfowers and others added 4 commits June 24, 2026 22:25

github-actions Bot added the enhancement New feature or request label Jun 25, 2026

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MLX backend adapter for lemon-mlx-engine inference#2419

Add MLX backend adapter for lemon-mlx-engine inference#2419
bong-water-water-bong wants to merge 4 commits into
lemonade-sdk:mainfrom
bong-water-water-bong:feat/mlx-backend-10.8.1

bong-water-water-bong commented Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

jeremyfowers commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bong-water-water-bong commented Jun 25, 2026

Summary

Architecture

Files Changed

Verified

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

jeremyfowers commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants