FLUX.2 Klein 4B image gen + Bonsai ternary low-end (sequential) by Aatricks · Pull Request #27 · Aatricks/llmedge

Aatricks · 2026-06-03T18:57:44Z

On-device FLUX.2 Klein 4B image generation, plus PrismML Bonsai (ternary QAT) support tuned for low-end Android. Answers the "bonsai 1-bit image" GitHub issue (image half).

What's here

FLUX.2 Klein 4B via stable-diffusion.cpp: split-model (DiT + Qwen3-4B encoder + VAE). New JNI slots diffusion_model_path + llm_path; Flux2Klein helper. ~6–8 GB phones.
Bonsai QAT → Q2_K: scripts/convert_bonsai_flux2_to_bfl.py converts Bonsai's diffusers transformer → BFL layout; quantize to Q2_K → coherent ~1.3 GB DiT (ggml ternary tq1_0/tq2_0 are too coarse for Bonsai's per-128 scales; Q2_K's per-16 sub-scales preserve quality). Published: Aatricks/bonsai-image-ternary-4B-FLUX2-klein-GGUF, wired into Flux2Klein.bonsaiImageRequest.
sequential low-memory: run the Qwen3 encoder and the DiT in separate phases (precompute → free → generate) so peak RAM ≈ max(encoder, DiT) ≈ 2.6 GB vs ~4.0 GB → 4 GB-phone tier. New sdcpp sd_precompute_condition / sd_generate_image_with_precomputed_condition + FLUX.2 conditioner-skip + encoder-only Qwen3 handle; auto-orchestrated by ImageGenerationExecutor.
Build fix: prepare_sdcpp_mods nameref bug (SD_ROOT_OVERRIDE never reached cmake); stale mods/ overlay now opt-in.

Verification (host JNI, no emulator)

sd-cli + Flux2KleinLinuxE2ETest: base, Bonsai-Q2K, and sequential paths all generate coherent images on the real JNI.
ImageClientTest asserts split-model arg routing; image unit suite green.
Measured peaks: encoder-only 2621 MB → freed → DiT-only 1393 MB.

Notes

Bumps stable-diffusion.cpp submodule to a fork (Aatricks/stable-diffusion.cpp, branch llmedge-flux2-sequential) carrying the precomputed-condition API — upstream leejet doesn't have it.
TQ types load+run on CPU but Metal/Vulkan lack ternary kernels; low-end CPU path is the target.
llmedge-examples demo-toggle edits are separate (not in this PR).

Add support for FLUX.2 Klein 4B — the distilled diffusion-transformer architecture behind PrismML's binary/ternary "Bonsai Image". Bonsai's own 1-bit/ternary weights ship only as MLX (Apple) and GemLite (CUDA) packings, neither of which loads on Android; this GGUF build via stable-diffusion.cpp is the Android-runnable equivalent. FLUX.2 is a split model (separate diffusion transformer, Qwen3-4B text encoder, and VAE) rather than a single checkpoint, so the sdcpp JNI bridge gains diffusion_model_path + llm_path slots alongside the existing model_path/t5xxl_path. A new Flux2Klein helper + ImageGenerationRequest .splitDiffusionModel route the DiT to diffusion_model_path and the Qwen3 encoder to llm_path (model_path left empty) and offload weights to CPU. - JNI: nativeCreate gains diffusionModelPath/llmPath (appended after preferredBackend in loadWithRuntimeBackend to preserve positional mock order) - Kotlin: thread the two slots through the load request/support, runtime planner, and ImageClient request - Flux2Klein: public preset (DiT + Qwen3-4B Q3_K_M encoder + VAE, CFG 1.0/4 steps) - desktop CMake: compile the full sdcpp_jni_*.cpp split set so the host JNI library exports nativeCreate (enables host image/video E2E) - tests: Flux2KleinLinuxE2ETest (real JNI split-model generation) + ImageClientTest split-routing assertion - docs: README + docs/index.md Verified on host (sd-cli reference + JNI E2E generate coherent images; full unit suite green). On-device viability is high-RAM-only and unverified; true low-end remains the deferred binary/ternary Bonsai conversion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e gen Add scripts/convert_bonsai_flux2_to_bfl.py: converts PrismML Bonsai Image "-unpacked" transformers (dense bf16, Flux2KleinPipeline diffusers naming) into the BFL tensor layout stable-diffusion.cpp expects, so the QAT ternary DiT can be quantized to GGUF and run on-device. The mapping is mostly renames; the only structural op is concatenating the separate double-block to_q/k/v (and add_{q,k,v}_proj) into the fused *_attn.qkv tensors (raw byte concat, row-major, no numpy/safetensors dep). 169 -> 149 tensors, oracle-matched to the base FLUX.2 Klein GGUF layout. Verified end to end on the CPU JNI path (no emulator): convert -> sd -M convert --type q2_K -> loads as "Flux.2 klein" -> generates a coherent image at ~1.3 GB DiT (vs ~2.5 GB base Q4_0). ggml's literal ternary types (tq1_0/tq2_0, ~0.8-1.0 GB) load and run on CPU but their per-256-weight scale is too coarse for Bonsai's per-128 trained scales (degraded output); Q2_K's per-16 sub-block scales preserve quality. README documents the recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Run the Qwen3 text encoder and the diffusion transformer in separate phases so peak RAM is max(encoder, DiT) instead of their sum — unlocking ~4GB-RAM phones for Bonsai/FLUX.2 Klein image generation. JNI: - SdHandle.llm_ctx + try_create_llm_only_handle: an encoder-only (Qwen3) context (VERSION_FLUX2_KLEIN), mirroring the T5-only handle, for the precompute phase; routed in nativeCreate when only an llm path is given; freed in nativeDestroy. - nativePrecomputeCondition: handle the llm_ctx (Qwen3) branch. - Drop the image precompute nullptr shims (now real in stable-diffusion.cpp); remove the duplicate raw structs from sdcpp_jni_shared.h. Kotlin: - ImageGenerationRequest.sequential + Flux2Klein.imageRequest(sequential=). - DiffusionRuntimeSpec.encoderOnly + loader routes it to llm_path only. - StableDiffusionLoadSupport: encoder-only resolve branch (llm path only). - ImageRuntimeRequestPlanner.imageSequentialPlan + ImageGenerationExecutor .generateSequential: phase 1 precompute on the encoder runtime, free it, phase 2 generate on the DiT-only runtime via the precomputed condition. Bumps the stable-diffusion.cpp submodule to the precomputed-condition commit. Verified on host (CPU JNI): encoder-only precompute (qwen3 2621MB, no DiT) -> DiT-only generate (flux 1298 + vae 95MB, no encoder) -> coherent image. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

prepare_sdcpp_mods bound a nameref named out_args_ref to the caller's nameref of the same name (a circular self-reference), so bash silently dropped its appends — SD_ROOT_OVERRIDE never reached cmake and the mods overlay was always bypassed. Bind a distinct nameref name and pass the underlying array name. The mods/ overlay is also currently stale vs the pinned stable-diffusion.cpp submodule (won't compile), and the active sdcpp customizations now live in the submodule directly. So default the overlay OFF (opt in with LLMEDGE_SDCPP_USE_MODS=1 after refreshing it) — the default build now uses the submodule directly and stays green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add Flux2Klein.bonsaiDiffusionModel pointing at the published Aatricks/bonsai-image-ternary-4B-FLUX2-klein-GGUF (Q2_K, ~1.3 GB) and a Flux2Klein.bonsaiImageRequest(...) convenience that defaults to sequential loading for ~4 GB-RAM devices. README documents the hosted model + sequential. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The submodule now carries the precomputed-condition (Lever 1) commit, which lives on the fork (leejet upstream doesn't have it). Repoint the submodule URL so the recorded commit is fetchable on a fresh clone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Aatricks and others added 6 commits June 3, 2026 11:02

Aatricks self-assigned this Jun 3, 2026

Aatricks merged commit 7d3121a into main Jun 3, 2026
1 check passed

Aatricks linked an issue Jun 3, 2026 that may be closed by this pull request

bonsai 1 bit image and LLM support Microsoft lens also #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLUX.2 Klein 4B image gen + Bonsai ternary low-end (sequential)#27

FLUX.2 Klein 4B image gen + Bonsai ternary low-end (sequential)#27
Aatricks merged 6 commits into
mainfrom
feat/flux2-klein-image

Aatricks commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aatricks commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's here

Verification (host JNI, no emulator)

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aatricks commented Jun 3, 2026 •

edited

Loading