AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion

# AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion

## Environment & Versions
- **Chip**: Snapdragon XR2 Gen 2 (DSP architecture 69)
- **QAIRT Version**: 2.43.0
- **AIMET-ONNX Version**: 2.29.1

## Problem Description

### Scenario
I am trying to apply **W8A8** quantization to a modified MobileNetV2 model using AIMET, then deploy it to a Qualcomm device (Android).

- After quantization, the accuracy evaluated locally using `QuantizationSimModel` is **as expected** (good).
- However, after converting the quantized model to `.bin` format using `qairt-converter` + `qnn-context-binary-generator`, the accuracy **drops dramatically**.
- As a control experiment, converting and quantizing the **same original model directly** with `qnn-onnx-converter` yields **good accuracy** on the Qualcomm platform.

This indicates that the accuracy evaluation logic on the Qualcomm HTP backend itself is working correctly. The issue appears to be specific to the combination of **AIMET quantization + QAIRT conversion**.

### 1. AIMET Quantization Script
I used the following script to perform W8A8 quantization on the original ONNX model. The accuracy evaluated with `QuantizationSimModel` after quantization meets expectations.

```python
def main() -> None:
    config_name = "W8A8_minmax"

    model = onnx.load_model(MODEL_PATH)
    sim = aimet_onnx.QuantizationSimModel(
        model=model,
        config_file="htp_v69",          # Note: using htp_v69 config
        providers=["CUDAExecutionProvider"],
    )
    print("QuantizationSimModel created successfully")

    # Step 4: Calibration
    print(f"Starting calibration (using {CALIBRATION_SAMPLES} samples)...")
    sim.compute_encodings(
        _create_calibration_generator(dataset, CALIBRATION_SAMPLES, CALIBRATION_BATCH_SIZE)
    )
    print("Calibration completed")

    # Step 5: Export quantized model
    quantized_model_path = os.path.join(
        "models", model_name, config_name, model_format, f"{model_name}.onnx"
    )
    os.makedirs(os.path.dirname(quantized_model_path), exist_ok=True)

    sim.export(
        path=os.path.dirname(quantized_model_path), 
        filename_prefix=model_name
    )

    # Evaluate accuracy
    metrics = _evaluate_quantized_model(sim, dataset)
```
### QAIRT Conversion Script
I then used the following script to convert the AIMET-quantized model:
```pyhton
# set system parm
TARGET="aarch64-android"

# set target model
MODEL_NAME=1225-600
OPT_FLAG="W8A8_minmax"
MODEL_ROOT_DIR=models/$MODEL_NAME/$OPT_FLAG

# init file path
ONNX_MODEL_DIR=$MODEL_ROOT_DIR/onnx
QUALCOMM_MODEL_DIR=$MODEL_ROOT_DIR/qualcomm

mkdir -p $QUALCOMM_MODEL_DIR

ONNX_MODEL_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.onnx
ENCODING_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.encodings
DLC_MODEL_PATH=$QUALCOMM_MODEL_DIR/${MODEL_NAME}.dlc


qairt-converter \
  --input_network $ONNX_MODEL_PATH \
  --quantization_overrides $ENCODING_PATH \
  --output_path $DLC_MODEL_PATH \
  --source_model_input_shape 'inputImg' 1,3,224,224

qnn-context-binary-generator \
  --model libQnnModelDlc.so \
  --backend libQnnHtp.so \
  --dlc_path $DLC_MODEL_PATH \
  --output_dir $QUALCOMM_MODEL_DIR \
  --binary_file ${MODEL_NAME}

```
## Expected Behavior
The accuracy of the model after QAIRT conversion + context binary generation should be close to the accuracy observed with `QuantizationSimModel`.
## Actual Behavior
There is a severe accuracy collapse after the QAIRT conversion pipeline, while direct quantization via `qnn-onnx-converter` works fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion #4094

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion

Environment & Versions

Problem Description

Scenario

1. AIMET Quantization Script

QAIRT Conversion Script

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion #4094

Description

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion

Environment & Versions

Problem Description

Scenario

1. AIMET Quantization Script

QAIRT Conversion Script

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions