## Quantization
Quantization is a technique used to reduce the precision of a model's weights and activations, making it more efficient to compute and store. This is particularly useful in deep learning models where high precision can lead to slower inference times and larger model sizes.
**Example:** Suppose we have a neural network with weights that are represented as 32-bit floating point numbers. We can quantize these weights to 8-bit integers, reducing the memory usage and computation time.
**Constraints:** The input to the function will be a numpy array representing the weights of the model. The function should return the quantized weights.
Test Cases
Test Case 1
Input:
[[1.0, 2.0], [3.0, 4.0]]Expected:
[[1, 2], [3, 4]]Test Case 2
Input:
[[-10.0, 10.0], [-5.0, 5.0]]Expected:
[[-10, 10], [-5, 5]]+ 3 hidden test cases