Skip to content

Feat/low end models#26

Merged
Aatricks merged 3 commits into
mainfrom
feat/low-end-models
Jun 3, 2026
Merged

Feat/low end models#26
Aatricks merged 3 commits into
mainfrom
feat/low-end-models

Conversation

@Aatricks

@Aatricks Aatricks commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Added support for bonsai conversion to GGUF and corrected CI issues

Aatricks and others added 3 commits June 2, 2026 23:52
… tokenizer

Wires the Bonsai adapter end to end so ModelSpec.safetensors("deepgrove/Bonsai",
adapter = ConversionAdapter.BONSAI_QLINEAR) converts on-device — no host tool.

convert_llama_dir gains an `adapter` argument selecting a conversion profile:
  - "bonsai-qlinear": for each weight with a sibling `<name>.scales`, fold the
    per-output-row scale into the weight in f32 (before the Q/K permute + f16
    cast, matching the host's full-precision fold), then drop the scales; and
    bake the Llama-style tokenizer.
  - "" / "none": unchanged stock path (GPT2-BPE tokenizer via tokenizerPre).

tokenizer_bake gains bake_llama_tokenizer: emits tokenizer.ggml.model="llama",
tokens (verbatim SentencePiece vocab from tokenizer.json), constant -1000
scores, token_type (BYTE for <0xNN>, CONTROL for special, else NORMAL), the
special ids/flags, pre="default", no merges — the family Bonsai (LlamaTokenizer
without a tokenizer.model) lands in.

Threaded through nativeConvertSafetensors + SmolLM.convertSafetensorsToGguf +
DefaultModelRepository (which now skips the tokenizerPre fail-fast for the Bonsai
adapter, since it bakes a self-contained tokenizer).

Verified against the host tool (tools/safetensors-convert --adapter bonsai-qlinear)
on the real deepgrove/Bonsai checkpoint:
  - compare_gguf.py: 147/147 tensors match (proves the f32 fold + arch map).
  - compare_tokenizer_kv.py: 12/12 tokenizer KVs match (model=llama, scores,
    token_type, tokens, ids).
  - generation: both the host ref and the on-device-converted GGUF emit
    "Paris. Paris is the capital of France..." (greedy).
  - B2ConvertE2ETest: full resolve(safetensorsLocal(Bonsai, adapter=BONSAI_QLINEAR))
    → convert → load → generate "Paris" through the production path.
Model-package unit tests stay green (ConvertedModelResolveTest updated: the
stock-spec-without-tokenizerPre case still fails fast with instructions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
usage.md "Converting safetensors models": Bonsai
(ConversionAdapter.BONSAI_QLINEAR)
now converts on-device (no tokenizerPre needed); only non-Llama arches /
other
tokenizer families / sharded safetensors still need the host tool. B2
plan doc
marks the Bonsai adapter done + verified.
CI "Build Native Libraries" failed under GCC/libstdc++ with
"'uint16_t' does not name a type" in convert/hf_to_gguf.h — the header declares
uint16_t f32_to_f16(...) and a size_t-returning function but only included
<string>. macOS clang/libc++ pulls <cstdint> in transitively; GCC/libstdc++ does
not. Add <cstdint> + <cstddef> to the header, and add an explicit <cstdint> to
smollm_jni_convert.cpp (uses uint32_t).

Verified clean under real GCC 15.2 (g++ -std=c++17 -fsyntax-only) for all four
convert/*.cpp; the other three convert headers already included <cstdint>.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Aatricks Aatricks self-assigned this Jun 3, 2026
@Aatricks Aatricks merged commit 4131a09 into main Jun 3, 2026
1 check passed
@Aatricks Aatricks linked an issue Jun 3, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bonsai 1 bit image and LLM support Microsoft lens also

1 participant