Does aimet and qnn htp backend support weight-only quantization? #4017
Unanswered
xiaoxiaosuaxuan
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Does aimet and qnn htp backend support weight-only quantization? For example, the activation is fp16, and weight is 4bit/8bit quantized.
Or does qnn htp backend support fp16 matmul? If so, it may be feasible to manually dequantize the weight, and then perform fp16 matmul. The qnn operation documentation says that htp backend support fp16 fullyconnected, but I failed to run it on devices.
Beta Was this translation helpful? Give feedback.
All reactions