(1) As shown in the figure below, I want to disable quantization for the specified Concat node. How should I write the corresponding code or configuration rules?
(2) Taking the quantization tool TinyQ as an example, it can explicitly bind multiple operators through surgeon.connect to share the same set of quantization parameters (scale/zero_point). During operator fusion, only one set of QDQ nodes is retained, which reduces computation and accelerates inference.
Taking the Add operation as an example (shown in the figure below), to enable operator fusion and acceleration in the subsequent inference stage, it is required that the quantization parameters (scale) of all its input branches must be exactly the same. How can this be implemented in AIMET?
Looking forward to your reply to clarify my questions. Thank you.
(1) As shown in the figure below, I want to disable quantization for the specified Concat node. How should I write the corresponding code or configuration rules?
(2) Taking the quantization tool TinyQ as an example, it can explicitly bind multiple operators through surgeon.connect to share the same set of quantization parameters (scale/zero_point). During operator fusion, only one set of QDQ nodes is retained, which reduces computation and accelerates inference.
Taking the Add operation as an example (shown in the figure below), to enable operator fusion and acceleration in the subsequent inference stage, it is required that the quantization parameters (scale) of all its input branches must be exactly the same. How can this be implemented in AIMET?
Looking forward to your reply to clarify my questions. Thank you.