Does aimet and qnn htp backend support weight-only quantization? #4017

xiaoxiaosuaxuan · 2025-07-25T08:53:33Z

xiaoxiaosuaxuan
Jul 25, 2025

Does aimet and qnn htp backend support weight-only quantization? For example, the activation is fp16, and weight is 4bit/8bit quantized.

Or does qnn htp backend support fp16 matmul? If so, it may be feasible to manually dequantize the weight, and then perform fp16 matmul. The qnn operation documentation says that htp backend support fp16 fullyconnected, but I failed to run it on devices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does aimet and qnn htp backend support weight-only quantization? #4017

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Does aimet and qnn htp backend support weight-only quantization? #4017

Uh oh!

xiaoxiaosuaxuan Jul 25, 2025

Replies: 0 comments

xiaoxiaosuaxuan
Jul 25, 2025