AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion
Environment & Versions
- Chip: Snapdragon XR2 Gen 2 (DSP architecture 69)
- QAIRT Version: 2.43.0
- AIMET-ONNX Version: 2.29.1
Problem Description
Scenario
I am trying to apply W8A8 quantization to a modified MobileNetV2 model using AIMET, then deploy it to a Qualcomm device (Android).
- After quantization, the accuracy evaluated locally using
QuantizationSimModel is as expected (good).
- However, after converting the quantized model to
.bin format using qairt-converter + qnn-context-binary-generator, the accuracy drops dramatically.
- As a control experiment, converting and quantizing the same original model directly with
qnn-onnx-converter yields good accuracy on the Qualcomm platform.
This indicates that the accuracy evaluation logic on the Qualcomm HTP backend itself is working correctly. The issue appears to be specific to the combination of AIMET quantization + QAIRT conversion.
1. AIMET Quantization Script
I used the following script to perform W8A8 quantization on the original ONNX model. The accuracy evaluated with QuantizationSimModel after quantization meets expectations.
def main() -> None:
config_name = "W8A8_minmax"
model = onnx.load_model(MODEL_PATH)
sim = aimet_onnx.QuantizationSimModel(
model=model,
config_file="htp_v69", # Note: using htp_v69 config
providers=["CUDAExecutionProvider"],
)
print("QuantizationSimModel created successfully")
# Step 4: Calibration
print(f"Starting calibration (using {CALIBRATION_SAMPLES} samples)...")
sim.compute_encodings(
_create_calibration_generator(dataset, CALIBRATION_SAMPLES, CALIBRATION_BATCH_SIZE)
)
print("Calibration completed")
# Step 5: Export quantized model
quantized_model_path = os.path.join(
"models", model_name, config_name, model_format, f"{model_name}.onnx"
)
os.makedirs(os.path.dirname(quantized_model_path), exist_ok=True)
sim.export(
path=os.path.dirname(quantized_model_path),
filename_prefix=model_name
)
# Evaluate accuracy
metrics = _evaluate_quantized_model(sim, dataset)
QAIRT Conversion Script
I then used the following script to convert the AIMET-quantized model:
# set system parm
TARGET="aarch64-android"
# set target model
MODEL_NAME=1225-600
OPT_FLAG="W8A8_minmax"
MODEL_ROOT_DIR=models/$MODEL_NAME/$OPT_FLAG
# init file path
ONNX_MODEL_DIR=$MODEL_ROOT_DIR/onnx
QUALCOMM_MODEL_DIR=$MODEL_ROOT_DIR/qualcomm
mkdir -p $QUALCOMM_MODEL_DIR
ONNX_MODEL_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.onnx
ENCODING_PATH=$ONNX_MODEL_DIR/${MODEL_NAME}.encodings
DLC_MODEL_PATH=$QUALCOMM_MODEL_DIR/${MODEL_NAME}.dlc
qairt-converter \
--input_network $ONNX_MODEL_PATH \
--quantization_overrides $ENCODING_PATH \
--output_path $DLC_MODEL_PATH \
--source_model_input_shape 'inputImg' 1,3,224,224
qnn-context-binary-generator \
--model libQnnModelDlc.so \
--backend libQnnHtp.so \
--dlc_path $DLC_MODEL_PATH \
--output_dir $QUALCOMM_MODEL_DIR \
--binary_file ${MODEL_NAME}
Expected Behavior
The accuracy of the model after QAIRT conversion + context binary generation should be close to the accuracy observed with QuantizationSimModel.
Actual Behavior
There is a severe accuracy collapse after the QAIRT conversion pipeline, while direct quantization via qnn-onnx-converter works fine.
AIMET Quantized Model Experiences Severe Accuracy Drop After QAIRT Conversion
Environment & Versions
Problem Description
Scenario
I am trying to apply W8A8 quantization to a modified MobileNetV2 model using AIMET, then deploy it to a Qualcomm device (Android).
QuantizationSimModelis as expected (good)..binformat usingqairt-converter+qnn-context-binary-generator, the accuracy drops dramatically.qnn-onnx-converteryields good accuracy on the Qualcomm platform.This indicates that the accuracy evaluation logic on the Qualcomm HTP backend itself is working correctly. The issue appears to be specific to the combination of AIMET quantization + QAIRT conversion.
1. AIMET Quantization Script
I used the following script to perform W8A8 quantization on the original ONNX model. The accuracy evaluated with
QuantizationSimModelafter quantization meets expectations.QAIRT Conversion Script
I then used the following script to convert the AIMET-quantized model:
Expected Behavior
The accuracy of the model after QAIRT conversion + context binary generation should be close to the accuracy observed with
QuantizationSimModel.Actual Behavior
There is a severe accuracy collapse after the QAIRT conversion pipeline, while direct quantization via
qnn-onnx-converterworks fine.