Skip to content

Feature request: BitNet 1.58-bit ternary inference on ROCm (gfx1151) #2

Description

Request

Enable BitNet 1.58-bit (ternary weight) inference on ROCm builds. The architecture is listed as supported in the README but the pre-built gfx1151 binary fails to load BitNet models.

What We Tested

Binary: b1004-tech-preview (gfx1151)
Model: mlx-community/bitnet-b1.58-2B-4T
Result: warmup failed — model loads but inference crashes

Also tested mlx-community/Falcon-E-3B-Instruct-1.58bit — same result.

Why This Matters

BitNet 1.58-bit uses ternary weights {-1, 0, 1}. On unified memory hardware like Strix Halo (128GB), 1-bit models would:

  • Dramatically reduce memory bandwidth pressure — the main bottleneck on unified memory
  • Enable much larger effective model sizes — a 70B 1-bit model fits where a 70B FP16 cannot
  • Pair with MLX speed advantage — MLX is already 29-85% faster than vLLM on this hardware. Adding 1-bit would compound that.

Microsoft open-sourced BitNet. PrismML has Bonsai 1-bit MLX models (Apple Silicon). The architecture support is in the codebase — the ternary matmul kernels just need the ROCm/HIP path.

Available Models

mlx-community/bitnet-b1.58-2B-4T           — Microsoft official
mlx-community/Falcon-E-3B-Instruct-1.58bit — Falcon extreme quant
prism-ml/Bonsai-8B-mlx-1bit                — PrismML (Apple Silicon)
prism-ml/Bonsai-4B-mlx-1bit
prism-ml/Bonsai-1.7B-mlx-1bit

Test Environment

Hardware: AMD Strix Halo, 128GB unified, gfx1151
Binary:   b1004-tech-preview
OS:       CachyOS (Arch), kernel 7.0.0-1-mainline

Context

We have comprehensive MLX benchmarks on this hardware — 6 models passing at 21-151 tok/s (4-bit). Full results: https://github.com/stampby/bleeding-edge

Happy to test 1-bit builds when available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions