Feat/low end models by Aatricks · Pull Request #26 · Aatricks/llmedge

Aatricks · 2026-06-03T04:07:09Z

Added support for bonsai conversion to GGUF and corrected CI issues

… tokenizer Wires the Bonsai adapter end to end so ModelSpec.safetensors("deepgrove/Bonsai", adapter = ConversionAdapter.BONSAI_QLINEAR) converts on-device — no host tool. convert_llama_dir gains an `adapter` argument selecting a conversion profile: - "bonsai-qlinear": for each weight with a sibling `<name>.scales`, fold the per-output-row scale into the weight in f32 (before the Q/K permute + f16 cast, matching the host's full-precision fold), then drop the scales; and bake the Llama-style tokenizer. - "" / "none": unchanged stock path (GPT2-BPE tokenizer via tokenizerPre). tokenizer_bake gains bake_llama_tokenizer: emits tokenizer.ggml.model="llama", tokens (verbatim SentencePiece vocab from tokenizer.json), constant -1000 scores, token_type (BYTE for <0xNN>, CONTROL for special, else NORMAL), the special ids/flags, pre="default", no merges — the family Bonsai (LlamaTokenizer without a tokenizer.model) lands in. Threaded through nativeConvertSafetensors + SmolLM.convertSafetensorsToGguf + DefaultModelRepository (which now skips the tokenizerPre fail-fast for the Bonsai adapter, since it bakes a self-contained tokenizer). Verified against the host tool (tools/safetensors-convert --adapter bonsai-qlinear) on the real deepgrove/Bonsai checkpoint: - compare_gguf.py: 147/147 tensors match (proves the f32 fold + arch map). - compare_tokenizer_kv.py: 12/12 tokenizer KVs match (model=llama, scores, token_type, tokens, ids). - generation: both the host ref and the on-device-converted GGUF emit "Paris. Paris is the capital of France..." (greedy). - B2ConvertE2ETest: full resolve(safetensorsLocal(Bonsai, adapter=BONSAI_QLINEAR)) → convert → load → generate "Paris" through the production path. Model-package unit tests stay green (ConvertedModelResolveTest updated: the stock-spec-without-tokenizerPre case still fails fast with instructions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

usage.md "Converting safetensors models": Bonsai (ConversionAdapter.BONSAI_QLINEAR) now converts on-device (no tokenizerPre needed); only non-Llama arches / other tokenizer families / sharded safetensors still need the host tool. B2 plan doc marks the Bonsai adapter done + verified.

CI "Build Native Libraries" failed under GCC/libstdc++ with "'uint16_t' does not name a type" in convert/hf_to_gguf.h — the header declares uint16_t f32_to_f16(...) and a size_t-returning function but only included <string>. macOS clang/libc++ pulls <cstdint> in transitively; GCC/libstdc++ does not. Add <cstdint> + <cstddef> to the header, and add an explicit <cstdint> to smollm_jni_convert.cpp (uses uint32_t). Verified clean under real GCC 15.2 (g++ -std=c++17 -fsyntax-only) for all four convert/*.cpp; the other three convert headers already included <cstdint>. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Aatricks and others added 3 commits June 2, 2026 23:52

Aatricks self-assigned this Jun 3, 2026

Aatricks merged commit 4131a09 into main Jun 3, 2026
1 check passed

Aatricks linked an issue Jun 3, 2026 that may be closed by this pull request

bonsai 1 bit image and LLM support Microsoft lens also #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/low end models#26

Feat/low end models#26
Aatricks merged 3 commits into
mainfrom
feat/low-end-models

Aatricks commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aatricks commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant