unsloth-rs

Rust implementations of transformer building blocks for LLM inference and fine-tuning.

Overview

unsloth-rs provides Rust implementations of common transformer operations built on the Candle ML framework:

Multi-head attention with grouped-query attention (GQA) support
Rotary position embeddings (RoPE)
RMS normalization
SwiGLU activation

Status

Version 1.0.0 - Core functionality stable. Current implementations are CPU reference implementations with GPU dispatch that uses Candle's CUDA backend.

Implemented

✅ Multi-head attention (CPU reference, Candle CUDA backend)
✅ Rotary position embeddings (RoPE)
✅ RMS normalization
✅ SwiGLU activation
✅ Memory estimation utilities
✅ Ternary quantization (5-15x compression achieved)
✅ Mixed precision training utilities (FP32/FP16/BF16)
✅ Benchmarking suite (CPU)
✅ 160 passing tests (100% pass rate)

In Progress

🚧 Flash Attention CubeCL GPU kernel (Phase 1 complete, Phase 2 ready for RTX 5080 validation)
🚧 Ternary GPU kernels (Phase 2-4 implemented, awaiting GPU profiling)
🚧 CI/CD pipeline setup

Planned

⏳ Gradient checkpointing (configuration exists, implementation planned)
⏳ GPU performance validation on RTX 5080/3090 Ti
⏳ RoPE, RMSNorm, SwiGLU GPU kernels
⏳ Advanced sparsity optimizations
⏳ Multi-GPU support

Installation

[dependencies]
unsloth-rs = "1.0.0"

For CUDA support (uses Candle's CUDA backend):

[dependencies]
unsloth-rs = { version = "1.0.0", features = ["cuda"] }

Usage

Attention

use unsloth_rs::kernels::{FusedAttention, FusedAttentionConfig};
use candle_core::{Device, Tensor};

fn main() -> anyhow::Result<()> {
    let device = Device::Cpu;
    
    let config = FusedAttentionConfig {
        hidden_size: 768,
        num_heads: 12,
        head_dim: 64,
        num_kv_heads: Some(4),  // GQA support
        ..Default::default()
    };
    
    let attention = FusedAttention::new(config, &device)?;
    
    // Create random input tensor: randn(mean, std_dev, shape, device)
    // 0.0f32 is Rust syntax for a 32-bit float literal with value 0.0
    let hidden_states = Tensor::randn(0.0f32, 1.0, (1, 128, 768), &device)?;
    let output = attention.forward(&hidden_states, None, None)?;
    
    Ok(())
}

Memory Estimation

use unsloth_rs::memory::{estimate_forward_memory, CheckpointConfig};

fn main() {
    let checkpoint = CheckpointConfig {
        enabled: true,
        checkpoint_every: 2,
    };
    
    let mem_bytes = estimate_forward_memory(
        4,     // batch_size
        2048,  // seq_len
        4096,  // hidden_size
        32,    // num_layers
        &checkpoint,
    );
    
    println!("Estimated memory: {} GB", mem_bytes as f64 / 1e9);
}

Benchmarks

Run benchmarks with:

cargo bench

Benchmarks test CPU performance across various configurations. GPU benchmarks require the cuda feature.

Development Roadmap

For detailed development plans and task breakdowns, see:

ROADMAP.md - Strategic development plan with phases and timelines
TASKS.md - Actionable task list with priorities and estimates
SUMMARY.md - Project review summary and execution guide

Contributing

Contributions are welcome, particularly:

GPU kernel implementations using CubeCL
Performance optimizations
Additional transformer operations

See TASKS.md for specific tasks that need implementation.

License

Licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github		.github
benches		benches
docs/archive		docs/archive
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
BENCHMARKING.md		BENCHMARKING.md
BRANCH_STRATEGY.md		BRANCH_STRATEGY.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CUBECL_09_MIGRATION.md		CUBECL_09_MIGRATION.md
CUBECL_IMPLEMENTATION_GUIDE.md		CUBECL_IMPLEMENTATION_GUIDE.md
CUBECL_RESEARCH_REQUIREMENTS.md		CUBECL_RESEARCH_REQUIREMENTS.md
Cargo.toml		Cargo.toml
FLASH_ATTENTION_PLAN.md		FLASH_ATTENTION_PLAN.md
GPU_SETUP.md		GPU_SETUP.md
GPU_TESTS_IMPLEMENTATION_SUMMARY.md		GPU_TESTS_IMPLEMENTATION_SUMMARY.md
HANDOFF.md		HANDOFF.md
INTEGRATION_TESTS_SUMMARY.md		INTEGRATION_TESTS_SUMMARY.md
ISSUE_STATUS.md		ISSUE_STATUS.md
LICENSE		LICENSE
NEXT_PHASE_PLAN.md		NEXT_PHASE_PLAN.md
PHASE_2_5_IMPLEMENTATION_PLAN.md		PHASE_2_5_IMPLEMENTATION_PLAN.md
PHASE_3_5_IMPLEMENTATION_PLAN.md		PHASE_3_5_IMPLEMENTATION_PLAN.md
PR_DESCRIPTION.md		PR_DESCRIPTION.md
PR_DESCRIPTION_FULL.md		PR_DESCRIPTION_FULL.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SUMMARY.md		SUMMARY.md
TASKS.md		TASKS.md
TASKS_2_5_2_6_PLAN.md		TASKS_2_5_2_6_PLAN.md
TERNARY_GPU_IMPLEMENTATION.md		TERNARY_GPU_IMPLEMENTATION.md
TESTING_STATUS.md		TESTING_STATUS.md
details.md		details.md
math.md		math.md
roadmap.md		roadmap.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unsloth-rs

Overview

Status

Implemented

In Progress

Planned

Installation

Usage

Attention

Memory Estimation

Benchmarks

Development Roadmap

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

unsloth-rs

Overview

Status

Implemented

In Progress

Planned

Installation

Usage

Attention

Memory Estimation

Benchmarks

Development Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages