Quantization Aware Training
Implement Quantization Aware Training (QAT), a technique that simulates the effects of quantization during the training process. By inserting fake quantization operations (rounding weights and activations to their nearest quantization levels), the model learns to be robust to the precision loss that occurs when deploying to low-bit hardware (e.g., INT8).
Fake Quantization:
quantized_value = round(original_value)
QAT Loss:
L = MSE(round(weights), weights) + MSE(round(activations), activations)
= mean((round(w) - w)^2) + mean((round(a) - a)^2)
Where:
weights = model weight values (float)
activations = model activation values (float)
round() = rounding to nearest integer (simulating quantization)
Example:
Input: weights = [1.1, 1.2], activations = [1.3, 1.4]
round(weights) = [1.0, 1.0]
round(activations) = [1.0, 1.0]
weight_loss = mean([(1.0-1.1)^2, (1.0-1.2)^2]) = mean([0.01, 0.04]) = 0.025
act_loss = mean([(1.0-1.3)^2, (1.0-1.4)^2]) = mean([0.09, 0.16]) = 0.125
Total loss = 0.025 + 0.125 = 0.15
Output: 0.15
The QAT loss quantifies the distortion introduced by rounding. During training, the model adjusts its weights to minimize this distortion, resulting in weights that are naturally closer to quantization-friendly values. This leads to minimal accuracy loss when the model is actually quantized for deployment.
Constraints:
Test Cases
[[1.1,1.2],[1.3,1.4]]0.02[[1.5,1.6],[1.7,1.8]]0.04