Deploy AIMET quantized model with quantized encodings to QNN #3878

Anurag Ranjan (anuragranj) · 2025-03-05T22:00:12Z

Anurag Ranjan (anuragranj)
Mar 5, 2025

I am currently working on deploying an AIMET-quantized ONNX model with its quantization encodings on an Android device using QNN. However, I am facing discrepancies in the output, and I am unable to find any tutorial or documentation that outlines the complete workflow for this process.

Is there any official guide or example demonstrating how to take a standard model (such as MobileNet) through AIMET quantization and encodings, and successfully run it on an Android device using QNN? Any pointers or documentation on this would be greatly appreciated.

cc: Abhi Khobare (@quic-akhobare)

Bhushan Sonawane (quic-bhushans) · 2025-03-06T00:29:58Z

Bhushan Sonawane (quic-bhushans)
Mar 6, 2025

Hi Anurag Ranjan (@anuragranj)

This is two step process

Quantize with aimet-onnx
- Refer to example notebooks here https://quic.github.io/aimet-pages/releases/latest/examples/onnx/quantization/quantsim.html
Once you have .aimet model, there are multiple ways to compile it for on-device
- Setup QNN SDK and compile and run using QNN APIs
- Refer to section for guide to run on device https://quic.github.io/aimet-pages/releases/latest/userguide/on_target_inference.html
If not happy with performance on-device or see gaps, update quantization scheme and re-visit 1 and 2

We are actively working on improving our docs to make it easy to use. Any feedback would be appreciated.

3 replies

Anurag Ranjan (anuragranj) Mar 6, 2025
Author

Thanks Bhushan Sonawane (@quic-bhushans). We have the aimet_exported_file.onnx and aimet_exported_json_encodings.encoding file after doing QAT using AIMET. Now we convert it following the doc you linked (https://quic.github.io/aimet-pages/releases/latest/userguide/on_target_inference.html).

qairt-converter --input_network <AIMET_exported_model_path> --quantization_overrides <AIMET_exported_model.encodings>
                --output_path <non-quantized_dlc>

Do we still need to do the quantization step i.e.

qairt-quantizer --input_dlc <non-quantized_dlc> --output_dlc <quantized_dlc>
                --float_fallback

given that our model has already been tuned using AIMET's Quantization Aware Training? In this step do we need to provide an input_list? We noticed that when using the qnn-onnx-converter, with --input_list input_list.txt --quantization_overrides aimet_exported_json_encodings.encoding, the resulting deployed model provides substantially different results than the QAT model. It seems like the converter has modified the graph such that the qnn graph has new/missing nodes that do not correspond to the overrides generated by AIMET's QAT. We want to confirm that this will not be the case with the qairt-converter and that we can achieve parity in results between the pre-converted and post-converted model.

Anurag Ranjan (anuragranj) Mar 6, 2025
Author

Further, following all the steps leads to Device Creation Failure in the execution stage.

adb shell "cd /data/local/tmp/netrun ; export LD_LIBRARY_PATH=/data/local/tmp/netrun \
export ADSP_LIBRARY_PATH=/data/local/tmp/netrun ; ./qnn-net-run --backend libQnnHtp.so --input_list native_input_list.txt \
 --retrieve_context model.serialized.bin --use_native_input_files --use_native_output_files" 

qnn-net-run pid:12013

Device Creation failure

Any help would be really appreciated. Thanks.

Petros Toupas (ptoupas) Jul 18, 2025

Hi Anurag Ranjan (@anuragranj),
I am interested in your question about whether we need to provide the --input_list when the --quantization_overrides is also provided. I am also wondering about that, so I wanted to check with you whether you have eventually found the answer to that or not?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deploy AIMET quantized model with quantized encodings to QNN #3878

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Deploy AIMET quantized model with quantized encodings to QNN #3878

Uh oh!

Uh oh!

Anurag Ranjan (anuragranj) Mar 5, 2025

Replies: 1 comment · 3 replies

Uh oh!

Bhushan Sonawane (quic-bhushans) Mar 6, 2025

Uh oh!

Uh oh!

Anurag Ranjan (anuragranj) Mar 6, 2025 Author

Uh oh!

Anurag Ranjan (anuragranj) Mar 6, 2025 Author

Uh oh!

Petros Toupas (ptoupas) Jul 18, 2025

Anurag Ranjan (anuragranj)
Mar 5, 2025

Replies: 1 comment 3 replies

Bhushan Sonawane (quic-bhushans)
Mar 6, 2025

Anurag Ranjan (anuragranj) Mar 6, 2025
Author

Anurag Ranjan (anuragranj) Mar 6, 2025
Author