feat: implement safe model pinning to prevent auto-eviction#2226
Conversation
abn
commented
Jun 13, 2026
- Exclude pinned models from the LRU auto-eviction policy.
- Fail loads with 400 Bad Request (slots_pinned_error) when all slots are pinned.
- Register /api/v1/pin endpoint to dynamically toggle pin states.
- Expose pinned status in /health response and lemonade status outputs.
- Add lemonade pin/unpin CLI subcommands and --pinned options.
- Check slots capacity pre-emptively on CLI load commands.
- Support Pin Model load checkbox and sidebar dynamic toggle buttons in the desktop app and web UI.
|
icymi: the existing GUI is being fully replaced by the GUI working group, led by @kpoineal. He has discretion for whether he wants to let anything else merge into the old GUI. |
|
@jeremyfowers I did miss it; have not been actively following the updates - my bad. However, is the feature itself something of interest? If it is, I can separate out the GUI change and leave it in a separate PR so a decision on it does not block the feature itself. @kpoineal would be amazing to get your 2c on here as well. |
8640864 to
d0db27f
Compare
|
Preemptively separated out the server, cli and ui commits. |
|
Thanks for the heads up @jeremyfowers. We're good letting this one merge — model pinning is a valuable backend capability and the C++ / API work belongs on main regardless. The new UI working group will implement the pin/unpin toggles natively in the prototype once this lands. Approving for main. ✅ |
|
Hey @abn — really appreciate this feature, model pinning is exactly the kind of thing that makes the multi-model workflow feel solid. Once this lands in main you're very welcome to port the UI side over to our |
d0db27f to
d4f0421
Compare
|
@kpoineal awesome thanks for that - born out of personal pain 😄. Once this lands on main, I can try make the change on the ui branch. Looking forward to the new UI 🎉 |
You can actually try it now if you want :D it's in the kpoin/ui-testing branch under the prototype folder. There's a readme with instructions on how to get it started. Once we're happy with it we will move it into the app, but for now it's side by side. |
252ac29 to
51fe765
Compare
jeremyfowers
left a comment
There was a problem hiding this comment.
I want to review the new endpoint before this merges
fl0rianr
left a comment
There was a problem hiding this comment.
Thanks for adding this — model pinning is a useful capability, I like this. But I think we should fix a few semantics issues before merging:
-
pinnedcurrently only protects the LRU slot-eviction path. The existing EvictionEngine can still downsize or evict pinned models on idle timeout / VRAM pressure because it does not check server->is_pinned(). If the user-facing contract is “prevent auto-eviction”, pinned models should be skipped there too. -
The pin state is lost in some reload/retry paths. The normal load path sets new_server->set_pinned(pinned), but the retry server does not get the pinned flag. Watchdog reloads also call load_model(...) without preserving an existing pin state.
-
/pin should validate that
pinnedis present and boolean rather than defaulting missingpinnedtotrue, and should ideally return the same structured error shape as other API endpoints. -
The CLI load path removed the defensive validation for
downloadedbut still calls model_info["downloaded"].get().
Could we add tests for pinned models under LRU capacity, idle/VRAM auto-eviction, retry reload, and /pin validation?
|
Bikeshed: maybe use 409 or 422 when all slots are full, rather than 400? The request wasn't malformed or malicious, but it's a request that the server isn't willing to do.
or
|
51fe765 to
2056d4c
Compare
|
@fl0rianr thank you for the review.
|
06765c6 to
b36481a
Compare
@ckuethe fair point. Updated to use |
jeremyfowers
left a comment
There was a problem hiding this comment.
Please change the new API to be /internal as discussed https://discordapp.com/channels/1392562559122407535/1516171356733968484/1516171364157882378
After that @fl0rianr should approve as well and then we can merge :)
b36481a to
7d133de
Compare
|
I think there is still one correctness issue around pin semantics before I can approve. /load currently defaults missing That means a model pinned via lemonade pin or lemonade load --pinned can be silently unpinned by a later plain lemonade load MODEL. Since pin/unpin are explicit state-changing operations, I’d expect a load without explicit pin intent to preserve the existing pin state. Could we distinguish “pinned omitted” from “pinned explicitly false”, and add a regression test for:
The /internal/pin move looks good; the remaining public docs reference to /v1/pin should also be cleaned up, but the load/idempotency behavior is the main thing I’d like fixed before approval. |
7d133de to
8f11bed
Compare
|
@fl0rianr addressed your comments. |
jeremyfowers
left a comment
There was a problem hiding this comment.
Thanks for taking the feedback @abn !
I think this will be a very popular feature.
fl0rianr
left a comment
There was a problem hiding this comment.
Thanks for the follow-up fixes — this addresses my remaining concern. Great!
I noticed a couple of small follow-ups that do not block my approval:
- the multi-model docs still mention
400 Bad Requestforslots_pinned_error, while the implementation/tests use409; pinned_modelsmay still be worth aligning with the same filtering used by slot accounting.
But the correctness issues I raised are addressed. Approving from my side.
- Exclude pinned models from the LRU auto-eviction policy and from idle timeout/VRAM downsizing in EvictionEngine. - Throw SlotsPinnedException (returning 409 Conflict with slots_pinned_error) when all loaded model slots of a type are occupied by pinned models. - Register quad-prefixed /api/v1/pin REST endpoint to dynamically toggle pinned status. - Expose pinned status in /api/v1/health and all model status outputs. - Document /pin API endpoints and multi-model configuration concepts. - Preserve pinning status in reload and watchdog-crashed recovery flows. - Implement robust /pin schema validation and standardized error payloads.
- Update CLI client SDK to serialize 'pinned' parameter and make requests to /pin. - Add 'lemonade pin <model>' and 'lemonade unpin <model>' subcommands. - Add '--pinned' flag to 'load' and 'run' subcommands. - Check capacity pre-emptively on load commands and print warning on stderr. - Document pin/unpin commands and --pinned options in the CLI reference. - Handle missing "downloaded" field safely in load response.
- Export PinIcon component and import it in ModelManager. - Track pinned models state using a react state Set in ModelManager. - Add handleTogglePin callback sending requests to /pin and refreshing loaded status. - Render dynamic pin toggle buttons next to eject buttons in the active models sidebar. - Add 'pinned' property to all recipe options in recipeOptionsConfig to display the checkbox in the load options modal. - Update sidebar styles to fit action buttons side-by-side cleanly.
- Implement a 5-case integration test suite for model pinning (LRU slots, VRAM eviction, watchdog reloads, idle timers, and API validation). - Fix route prefixes in test/server_eviction.py to strip /api/v1 for root /internal/* routes. - Expect 409 Conflict status code on slots pinned loading failures.
Head branch was pushed to by a user without write access
8f11bed to
84912d3
Compare