It requires teaching the product with high precision then quantizing the weights and activations to lessen precision through the inference period. This permits for the scaled-down product size while keeping high effectiveness. As quantization signifies model parameters with lessen-little bit integer (e.g., int8), the design measurement and runtime