Feat/low end models#26
Merged
Merged
Conversation
… tokenizer
Wires the Bonsai adapter end to end so ModelSpec.safetensors("deepgrove/Bonsai",
adapter = ConversionAdapter.BONSAI_QLINEAR) converts on-device — no host tool.
convert_llama_dir gains an `adapter` argument selecting a conversion profile:
- "bonsai-qlinear": for each weight with a sibling `<name>.scales`, fold the
per-output-row scale into the weight in f32 (before the Q/K permute + f16
cast, matching the host's full-precision fold), then drop the scales; and
bake the Llama-style tokenizer.
- "" / "none": unchanged stock path (GPT2-BPE tokenizer via tokenizerPre).
tokenizer_bake gains bake_llama_tokenizer: emits tokenizer.ggml.model="llama",
tokens (verbatim SentencePiece vocab from tokenizer.json), constant -1000
scores, token_type (BYTE for <0xNN>, CONTROL for special, else NORMAL), the
special ids/flags, pre="default", no merges — the family Bonsai (LlamaTokenizer
without a tokenizer.model) lands in.
Threaded through nativeConvertSafetensors + SmolLM.convertSafetensorsToGguf +
DefaultModelRepository (which now skips the tokenizerPre fail-fast for the Bonsai
adapter, since it bakes a self-contained tokenizer).
Verified against the host tool (tools/safetensors-convert --adapter bonsai-qlinear)
on the real deepgrove/Bonsai checkpoint:
- compare_gguf.py: 147/147 tensors match (proves the f32 fold + arch map).
- compare_tokenizer_kv.py: 12/12 tokenizer KVs match (model=llama, scores,
token_type, tokens, ids).
- generation: both the host ref and the on-device-converted GGUF emit
"Paris. Paris is the capital of France..." (greedy).
- B2ConvertE2ETest: full resolve(safetensorsLocal(Bonsai, adapter=BONSAI_QLINEAR))
→ convert → load → generate "Paris" through the production path.
Model-package unit tests stay green (ConvertedModelResolveTest updated: the
stock-spec-without-tokenizerPre case still fails fast with instructions).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
usage.md "Converting safetensors models": Bonsai (ConversionAdapter.BONSAI_QLINEAR) now converts on-device (no tokenizerPre needed); only non-Llama arches / other tokenizer families / sharded safetensors still need the host tool. B2 plan doc marks the Bonsai adapter done + verified.
CI "Build Native Libraries" failed under GCC/libstdc++ with "'uint16_t' does not name a type" in convert/hf_to_gguf.h — the header declares uint16_t f32_to_f16(...) and a size_t-returning function but only included <string>. macOS clang/libc++ pulls <cstdint> in transitively; GCC/libstdc++ does not. Add <cstdint> + <cstddef> to the header, and add an explicit <cstdint> to smollm_jni_convert.cpp (uses uint32_t). Verified clean under real GCC 15.2 (g++ -std=c++17 -fsyntax-only) for all four convert/*.cpp; the other three convert headers already included <cstdint>. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added support for bonsai conversion to GGUF and corrected CI issues