Skip to content

10.8.1 breaks MTP with external MTP heads on user models #2435

Description

@alucryd

Platform

Linux/Arch

Lemonade Version

10.8.1

GPU / APU Model

AMD Ryzen AI Max 395+

Component

llama.cpp

Bug Description

Lemonade now forbids passing -md to llama.cpp, without providing a way to pass the heads in user_models.json, essentially making us unable to use a custom Gemma 4 or other models with separate heads.

Steps to Reproduce

  1. Download the QAT versions of Gemma 4 from Unsloth
  2. Download the associated MTP checkpoints
  3. Try to load the model and pass the MTP head as argument

Expected vs Actual Behavior

Lemonade should allow passing -md with user models, or at least use the same format in both server_models.json and user_models.json so we have at least one way to pass an MTP head.

Log Output

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationengine::llamacppllama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions