Skip to content

xadupre/mbext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

217 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelBuilder for onnxruntime-genai

codecov

The code base comes from https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models. It adds fast unit tests checking discrepancies, end to end test with the trained model. It supports more architectures.

Convert a model

Example converting Qwen/Qwen3-8B to ONNX for CPU with int4 precision:

python -m modelbuilder.builder \
    -m Qwen/Qwen3-8B \
    -o qwen3-8b-cpu-int4 \
    -p int4 \
    -e cpu \
    -c cache_dir

The arguments are:

  • -m/--model_name: model name on Hugging Face (use -i/--input instead for a local folder).
  • -o/--output: folder where the ONNX model and additional files are written.
  • -p/--precision: precision of the model (int4, bf16, fp16 or fp32).
  • -e/--execution_provider: execution provider to target (cpu here).
  • -c/--cache_dir: cache directory for Hugging Face files and temporary ONNX external data files.

Style

black . && ruff check .

Development

pip install -e .[dev]

Fast Unit tests

pytest tests/fast

Long Unit tests

python tests/trained/test_trained_tiny_llm.py

With a better machine:

LONGTEST=1 pytest tests/trained

Llama.cpp tests

tests/fast_llama_cpp compares llama.cpp and onnxruntime-genai on the same model. The conversion script convert_hf_to_gguf.py comes from the llama.cpp repository and its requirements pin a specific torch version (for example torch==2.11.0). Installing them downgrades torch from the version installed for the rest of the test suite.

torchaudio (and torchvision) compiled against the previous torch build is then incompatible with the downgraded torch. Because transformers imports torchaudio lazily, the mismatch surfaces while loading the model as:

ModuleNotFoundError: Could not import module 'LlamaForCausalLM'.

Uninstalling torchaudio (and torchvision) removes the mismatch, since these tests only need torch itself:

pip uninstall -y torchaudio torchvision

You can see the results in stats/end2end_results.json. Example:

{'first_diff': 0, 'delta_length': 4, 'expected_length': 16, 'total_diff': 16, 'precision': 'fp32', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'generate', 'provider': 'cpu'}
{'max_abs_err': 1.6875, '%_gt_0.1': np.float64(0.5278832959081836), '%_gt_0.01': np.float64(0.962624750499002), 'avg_abs_discrepancy': 0.1749267578125, 'shape': (1, 5, 128256), 'dtype': dtype('float16'), 'precision': 'fp16', 'model_id': 'HuggingFaceTB/SmolLM3-3B', 'experiment': 'forward'}

About

modelbuilder

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages