Skip to content

Benchmarks: Add Mixture of Experts Model #679

Merged
polarG merged 27 commits into
microsoft:mainfrom
dpower4:feat/mixtral
Jun 30, 2025
Merged

Benchmarks: Add Mixture of Experts Model #679
polarG merged 27 commits into
microsoft:mainfrom
dpower4:feat/mixtral

Conversation

@dpower4

@dpower4 dpower4 commented Dec 19, 2024

Copy link
Copy Markdown
Contributor

Added MoE model using MixtralConfig.

  1. Added 8x7b and 8x22b variants
  2. Requires high VRAM as all experts are loaded in memory. Thus, disabled training due to memory constraint on test worker.

@dpower4 dpower4 requested review from a team, cp5555 and guoshzhao as code owners December 19, 2024 17:11
@codecov

codecov Bot commented Dec 19, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 88.23529% with 16 lines in your changes missing coverage. Please review.

Project coverage is 86.47%. Comparing base (deef9a3) to head (434e442).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...enchmarks/model_benchmarks/pytorch_mixtral_impl.py 86.20% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #679      +/-   ##
==========================================
+ Coverage   86.44%   86.47%   +0.03%     
==========================================
  Files         100      102       +2     
  Lines        7406     7541     +135     
==========================================
+ Hits         6402     6521     +119     
- Misses       1004     1020      +16     
Flag Coverage Δ
cpu-python3.10-unit-test 71.59% <38.80%> (-0.61%) ⬇️
cpu-python3.12-unit-test 71.59% <38.80%> (-0.61%) ⬇️
cpu-python3.7-unit-test 70.65% <9.55%> (-1.15%) ⬇️
cuda-unit-test 83.98% <85.82%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dpower4 dpower4 added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks model-benchmarks Model Benchmark Test for SuperBench Benchmarks labels Dec 31, 2024
@dpower4 dpower4 requested a review from abuccts December 31, 2024 01:46
@abuccts abuccts requested a review from Copilot April 21, 2025 22:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new Mixture of Experts (MoE) model variant using MixtralConfig with two parameter sets (8x7b and 8x22b), along with associated tests and documentation updates. It includes adding version checks for Python, conditional imports, benchmark registrations, and exporting support for the new model.

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/helper/decorator.py Added a Python version check decorator for tests.
tests/benchmarks/model_benchmarks/test_pytorch_mixtral.py Introduced tests for the new Mixtral MoE benchmark (8x7b variant).
superbench/benchmarks/model_benchmarks/pytorch_mixtral.py Implemented the Mixtral benchmark model and registered two variants.
superbench/benchmarks/model_benchmarks/init.py Updated module imports and all to conditionally include MoE model.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Extended ONNX export support to include Mixtral models.
docs/user-tutorial/benchmarks/model-benchmarks.md Added documentation for MoE models.
Files not reviewed (1)
  • docs/superbench-config.mdx: Language not supported
Comments suppressed due to low confidence (1)

superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py:28

  • [nitpick] Consider renaming this class to Torch2ONNXExporter to adhere to standard Python CamelCase naming conventions.
class torch2onnxExporter():

Comment thread superbench/benchmarks/model_benchmarks/pytorch_mixtral.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/__init__.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Outdated
Comment thread tests/helper/decorator.py Outdated
@guoshzhao guoshzhao mentioned this pull request May 14, 2025
40 tasks
@polarG polarG enabled auto-merge (squash) June 28, 2025 05:13
@polarG polarG merged commit 44e35cd into microsoft:main Jun 30, 2025
21 of 22 checks passed
polarG added a commit that referenced this pull request Aug 11, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
polarG added a commit that referenced this pull request Aug 12, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks model-benchmarks Model Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants