Skip to content

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion #4094

Description

@slatbox

AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion

Environment & Versions

  • Chip: Snapdragon XR2 Gen 2 (DSP architecture 69)
  • QAIRT Version: 2.43.0
  • AIMET-ONNX Version: 2.29.1

Problem Description

Scenario

I am trying to apply W8A8 quantization to a modified MobileNetV2 model using AIMET, then deploy it to a Qualcomm device (Android).

  • After quantization, the accuracy evaluated locally using QuantizationSimModel is as expected (good).
  • However, after converting the quantized model to .bin format using qairt-converter + qnn-context-binary-generator, the accuracy drops dramatically.
  • As a control experiment, converting and quantizing the same original model directly with qnn-onnx-converter yields good accuracy on the Qualcomm platform.

This indicates that the accuracy evaluation logic on the Qualcomm HTP backend itself is working correctly. The issue appears to be specific to the combination of AIMET quantization + QAIRT conversion.

1. AIMET Quantization Script

I used the following script to perform W8A8 quantization on the original ONNX model. The accuracy evaluated with QuantizationSimModel after quantization meets expectations.

def main() -> None:
    config_name = "W8A8_minmax"

    model = onnx.load_model(MODEL_PATH)
    sim = aimet_onnx.QuantizationSimModel(
        model=model,
        config_file="htp_v69",          # Note: using htp_v69 config
        providers=["CUDAExecutionProvider"],
    )
    print("QuantizationSimModel created successfully")

    # Step 4: Calibration
    print(f"Starting calibration (using {CALIBRATION_SAMPLES} samples)...")
    sim.compute_encodings(
        _create_calibration_generator(dataset, CALIBRATION_SAMPLES, CALIBRATION_BATCH_SIZE)
    )
    print("Calibration completed")

    # Step 5: Export quantized model
    quantized_model_path = os.path.join(
        "models", model_name, config_name, model_format, f"{model_name}.onnx"
    )
    os.makedirs(os.path.dirname(quantized_model_path), exist_ok=True)

    sim.export(
        path=os.path.dirname(quantized_model_path), 
        filename_prefix=model_name
    )

    # Evaluate accuracy
    metrics = _evaluate_quantized_model(sim, dataset)

QAIRT Conversion Script

I then used the following script to convert the AIMET-quantized model:

# set system parm
TARGET="aarch64-android"

# set target model
MODEL_NAME=1225-600
OPT_FLAG="W8A8_minmax"
MODEL_ROOT_DIR=models/$MODEL_NAME/$OPT_FLAG

# init file path
ONNX_MODEL_DIR=$MODEL_ROOT_DIR/onnx
QUALCOMM_MODEL_DIR=$MODEL_ROOT_DIR/qualcomm

mkdir -p $QUALCOMM_MODEL_DIR

ONNX_MODEL_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.onnx
ENCODING_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.encodings
DLC_MODEL_PATH=$QUALCOMM_MODEL_DIR/${MODEL_NAME}.dlc


qairt-converter \
  --input_network $ONNX_MODEL_PATH \
  --quantization_overrides $ENCODING_PATH \
  --output_path $DLC_MODEL_PATH \
  --source_model_input_shape 'inputImg' 1,3,224,224

qnn-context-binary-generator \
  --model libQnnModelDlc.so \
  --backend libQnnHtp.so \
  --dlc_path $DLC_MODEL_PATH \
  --output_dir $QUALCOMM_MODEL_DIR \
  --binary_file ${MODEL_NAME}

Expected Behavior

The accuracy of the model after QAIRT conversion + context binary generation should be close to the accuracy observed with QuantizationSimModel.

Actual Behavior

There is a severe accuracy collapse after the QAIRT conversion pipeline, while direct quantization via qnn-onnx-converter works fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    aimet-onnxNew feature or bug fix for AIMET ONNX

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions