Skip to content

feat: implement safe model pinning to prevent auto-eviction#2226

Merged
fl0rianr merged 4 commits into
lemonade-sdk:mainfrom
abn:feature/model-pinning
Jun 16, 2026
Merged

feat: implement safe model pinning to prevent auto-eviction#2226
fl0rianr merged 4 commits into
lemonade-sdk:mainfrom
abn:feature/model-pinning

Conversation

@abn

@abn abn commented Jun 13, 2026

Copy link
Copy Markdown
Contributor
  • Exclude pinned models from the LRU auto-eviction policy.
  • Fail loads with 400 Bad Request (slots_pinned_error) when all slots are pinned.
  • Register /api/v1/pin endpoint to dynamically toggle pin states.
  • Expose pinned status in /health response and lemonade status outputs.
  • Add lemonade pin/unpin CLI subcommands and --pinned options.
  • Check slots capacity pre-emptively on CLI load commands.
  • Support Pin Model load checkbox and sidebar dynamic toggle buttons in the desktop app and web UI.

@github-actions github-actions Bot added enhancement New feature or request area::api HTTP REST API surface and route handlers app web ui labels Jun 13, 2026
@jeremyfowers jeremyfowers requested a review from kpoineal June 14, 2026 17:02
@jeremyfowers

Copy link
Copy Markdown
Member

icymi: the existing GUI is being fully replaced by the GUI working group, led by @kpoineal. He has discretion for whether he wants to let anything else merge into the old GUI.

@abn

abn commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

@jeremyfowers I did miss it; have not been actively following the updates - my bad. However, is the feature itself something of interest? If it is, I can separate out the GUI change and leave it in a separate PR so a decision on it does not block the feature itself.

@kpoineal would be amazing to get your 2c on here as well.

@abn abn force-pushed the feature/model-pinning branch from 8640864 to d0db27f Compare June 14, 2026 20:03
@abn

abn commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Preemptively separated out the server, cli and ui commits.

@kpoineal

Copy link
Copy Markdown
Collaborator

Thanks for the heads up @jeremyfowers. We're good letting this one merge — model pinning is a valuable backend capability and the C++ / API work belongs on main regardless. The new UI working group will implement the pin/unpin toggles natively in the prototype once this lands. Approving for main. ✅

@kpoineal

Copy link
Copy Markdown
Collaborator

Hey @abn — really appreciate this feature, model pinning is exactly the kind of thing that makes the multi-model workflow feel solid. Once this lands in main you're very welcome to port the UI side over to our kpoin/ui-testing branch (the new web UI working group branch) if you're interested. Would love to have it in the new UI.

@abn abn force-pushed the feature/model-pinning branch from d0db27f to d4f0421 Compare June 14, 2026 22:02
@abn

abn commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

@kpoineal awesome thanks for that - born out of personal pain 😄. Once this lands on main, I can try make the change on the ui branch. Looking forward to the new UI 🎉

@kpoineal

Copy link
Copy Markdown
Collaborator

@kpoineal awesome thanks for that - born out of personal pain 😄. Once this lands on main, I can try make the change on the ui branch. Looking forward to the new UI 🎉

You can actually try it now if you want :D it's in the kpoin/ui-testing branch under the prototype folder. There's a readme with instructions on how to get it started. Once we're happy with it we will move it into the app, but for now it's side by side.

@abn

abn commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@kpoineal created #2249 as a companion PR for this.

@jeremyfowers jeremyfowers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to review the new endpoint before this merges

@jeremyfowers jeremyfowers added this to the Lemonade v10.8 milestone Jun 15, 2026
@jeremyfowers jeremyfowers requested a review from fl0rianr June 15, 2026 16:53

@fl0rianr fl0rianr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this — model pinning is a useful capability, I like this. But I think we should fix a few semantics issues before merging:

  1. pinned currently only protects the LRU slot-eviction path. The existing EvictionEngine can still downsize or evict pinned models on idle timeout / VRAM pressure because it does not check server->is_pinned(). If the user-facing contract is “prevent auto-eviction”, pinned models should be skipped there too.

  2. The pin state is lost in some reload/retry paths. The normal load path sets new_server->set_pinned(pinned), but the retry server does not get the pinned flag. Watchdog reloads also call load_model(...) without preserving an existing pin state.

  3. /pin should validate that pinned is present and boolean rather than defaulting missing pinned to true, and should ideally return the same structured error shape as other API endpoints.

  4. The CLI load path removed the defensive validation for downloaded but still calls model_info["downloaded"].get().

Could we add tests for pinned models under LRU capacity, idle/VRAM auto-eviction, retry reload, and /pin validation?

@ckuethe

ckuethe commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Bikeshed: maybe use 409 or 422 when all slots are full, rather than 400? The request wasn't malformed or malicious, but it's a request that the server isn't willing to do.

The HTTP 409 Conflict response status code indicates a request conflict with the current state of the target resource.

Conflicts are most likely to occur in response to a PUT request. For example, you may get a 409 response when uploading a file that is older than the existing one on the server, resulting in a version control conflict.

or

The HTTP 422 Unprocessable Content response status code indicates that the server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.

@abn abn force-pushed the feature/model-pinning branch from 51fe765 to 2056d4c Compare June 15, 2026 19:12
@abn

abn commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@fl0rianr thank you for the review.

  1. Eviction Engine Evictions & Downsizing:

    • Updated EvictionEngine::evaluate_servers (src/cpp/server/eviction_engine.cpp) to check server->is_pinned(). Pinned models are now completely skipped and protected from both idle timeout downsizing (Tier 1) and full unloads under VRAM pressure (Tier 2).
  2. Pin State Preservation on Reloads/Watchdogs:

    • Updated Router::load_model to extract and preserve the pinned state of the existing dead server before it gets pruned by prune_unavailable_servers_locked().
    • The preserved pin state is now safely forwarded when initializing the replacement server instance in both normal retry/reload paths and watchdog reloads (reload_model_after_watchdog_reset).
  3. API Validation & Structured Error Responses:

    • Standardized REST payload validation in /api/v1/pin (src/cpp/server/server.cpp). It now strictly validates that pinned is present and is a boolean, returning standard structured JSON errors (type invalid_request_error, code slots_pinned_error) on failure instead of defaulting to true.
    • Updated the CLI load handler (handle_load_command) to defensively check for the "downloaded" key to handle any cases where it might not be present in the server's response.
  4. Integration Testing:

    • Added a new test suite (test/server_pinning.py) verifying all pinning behavior under the Eviction Engine, watchdog recovery, idle timer bypasses, slot constraints, and validation errors.
    • Fixed a minor path bug in the existing test/server_eviction.py tests where internal endpoint requests were prepending /api/v1 and throwing 404s.

@abn abn force-pushed the feature/model-pinning branch 2 times, most recently from 06765c6 to b36481a Compare June 15, 2026 19:22
@abn

abn commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Bikeshed: maybe use 409 or 422 when all slots are full, rather than 400? The request wasn't malformed or malicious, but it's a request that the server isn't willing to do.

@ckuethe fair point. Updated to use 409.

@jeremyfowers jeremyfowers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the new API to be /internal as discussed https://discordapp.com/channels/1392562559122407535/1516171356733968484/1516171364157882378

After that @fl0rianr should approve as well and then we can merge :)

@abn abn force-pushed the feature/model-pinning branch from b36481a to 7d133de Compare June 15, 2026 20:33
@fl0rianr

Copy link
Copy Markdown
Collaborator

I think there is still one correctness issue around pin semantics before I can approve.

/load currently defaults missing pinned to false, and Router::load_model() applies the computed final_pinned even when the model is already loaded. The CLI also always sends "pinned": false for a plain lemonade load MODEL.

That means a model pinned via lemonade pin or lemonade load --pinned can be silently unpinned by a later plain lemonade load MODEL. Since pin/unpin are explicit state-changing operations, I’d expect a load without explicit pin intent to preserve the existing pin state.

Could we distinguish “pinned omitted” from “pinned explicitly false”, and add a regression test for:

  1. load/pin model so /health reports pinned: true
  2. call /load again without explicit pin intent
  3. verify it remains pinned
  4. explicitly unpin and verify it becomes unpinned

The /internal/pin move looks good; the remaining public docs reference to /v1/pin should also be cleaned up, but the load/idempotency behavior is the main thing I’d like fixed before approval.

Comment thread docs/api/lemonade.md Outdated
@abn abn force-pushed the feature/model-pinning branch from 7d133de to 8f11bed Compare June 15, 2026 21:02
@abn

abn commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@fl0rianr addressed your comments.

@jeremyfowers jeremyfowers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the feedback @abn !

I think this will be a very popular feature.

@fl0rianr fl0rianr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up fixes — this addresses my remaining concern. Great!

I noticed a couple of small follow-ups that do not block my approval:

  • the multi-model docs still mention 400 Bad Request for slots_pinned_error, while the implementation/tests use 409;
  • pinned_models may still be worth aligning with the same filtering used by slot accounting.

But the correctness issues I raised are addressed. Approving from my side.

@jeremyfowers jeremyfowers enabled auto-merge June 15, 2026 21:14
abn added 4 commits June 15, 2026 23:27
- Exclude pinned models from the LRU auto-eviction policy and from idle timeout/VRAM downsizing in EvictionEngine.
- Throw SlotsPinnedException (returning 409 Conflict with slots_pinned_error) when all loaded model slots of a type are occupied by pinned models.
- Register quad-prefixed /api/v1/pin REST endpoint to dynamically toggle pinned status.
- Expose pinned status in /api/v1/health and all model status outputs.
- Document /pin API endpoints and multi-model configuration concepts.
- Preserve pinning status in reload and watchdog-crashed recovery flows.
- Implement robust /pin schema validation and standardized error payloads.
- Update CLI client SDK to serialize 'pinned' parameter and make requests to /pin.
- Add 'lemonade pin <model>' and 'lemonade unpin <model>' subcommands.
- Add '--pinned' flag to 'load' and 'run' subcommands.
- Check capacity pre-emptively on load commands and print warning on stderr.
- Document pin/unpin commands and --pinned options in the CLI reference.
- Handle missing "downloaded" field safely in load response.
- Export PinIcon component and import it in ModelManager.
- Track pinned models state using a react state Set in ModelManager.
- Add handleTogglePin callback sending requests to /pin and refreshing loaded status.
- Render dynamic pin toggle buttons next to eject buttons in the active models sidebar.
- Add 'pinned' property to all recipe options in recipeOptionsConfig to display the checkbox in the load options modal.
- Update sidebar styles to fit action buttons side-by-side cleanly.
- Implement a 5-case integration test suite for model pinning (LRU slots, VRAM eviction, watchdog reloads, idle timers, and API validation).
- Fix route prefixes in test/server_eviction.py to strip /api/v1 for root /internal/* routes.
- Expect 409 Conflict status code on slots pinned loading failures.
auto-merge was automatically disabled June 15, 2026 21:39

Head branch was pushed to by a user without write access

@abn abn force-pushed the feature/model-pinning branch from 8f11bed to 84912d3 Compare June 15, 2026 21:39
@fl0rianr fl0rianr enabled auto-merge June 15, 2026 21:48
@fl0rianr fl0rianr added this pull request to the merge queue Jun 15, 2026
Merged via the queue into lemonade-sdk:main with commit 13b9de6 Jun 16, 2026
66 checks passed
@abn abn deleted the feature/model-pinning branch June 16, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app area::api HTTP REST API surface and route handlers enhancement New feature or request web ui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants