Platform
Linux/Arch
Lemonade Version
10.8.1
GPU / APU Model
AMD Ryzen AI Max 395+
Component
llama.cpp
Bug Description
Lemonade now forbids passing -md to llama.cpp, without providing a way to pass the heads in user_models.json, essentially making us unable to use a custom Gemma 4 or other models with separate heads.
Steps to Reproduce
- Download the QAT versions of Gemma 4 from Unsloth
- Download the associated MTP checkpoints
- Try to load the model and pass the MTP head as argument
Expected vs Actual Behavior
Lemonade should allow passing -md with user models, or at least use the same format in both server_models.json and user_models.json so we have at least one way to pass an MTP head.
Log Output
Additional Context
No response
Platform
Linux/Arch
Lemonade Version
10.8.1
GPU / APU Model
AMD Ryzen AI Max 395+
Component
llama.cpp
Bug Description
Lemonade now forbids passing -md to llama.cpp, without providing a way to pass the heads in user_models.json, essentially making us unable to use a custom Gemma 4 or other models with separate heads.
Steps to Reproduce
Expected vs Actual Behavior
Lemonade should allow passing -md with user models, or at least use the same format in both server_models.json and user_models.json so we have at least one way to pass an MTP head.
Log Output
Additional Context
No response