Gradient Clipping
Implement gradient clipping to prevent the exploding gradient problem in deep neural networks. During backpropagation, gradients can grow exponentially large as they are propagated through many layers, causing unstable training. Gradient clipping rescales the gradient vector when its norm exceeds a specified threshold.
The gradient clipping formula is:
norm = ||grad|| (L2 norm of the gradient vector)
if norm > max_norm:
grad = grad * max_norm / norm
else:
grad = grad (unchanged)
Your function gradient_clipping(grads, max_norm) should compute the L2 norm of each row in the gradient matrix, and if it exceeds max_norm, rescale that row so its norm equals max_norm.
Example:
Input: grads = [[1, 2], [3, 4]], max_norm = 1.0
Row 0 norm: sqrt(1^2 + 2^2) = sqrt(5) = 2.236 > 1.0 -> clip
Row 1 norm: sqrt(3^2 + 4^2) = sqrt(25) = 5.0 > 1.0 -> clip
Clipped row 0: [1, 2] * 1.0 / 2.236 = [0.4472, 0.8944]
Clipped row 1: [3, 4] * 1.0 / 5.0 = [0.6, 0.8]
Output: [[0.4472, 0.8944], [0.6, 0.8]]
Gradient clipping is especially important in recurrent neural networks (RNNs), where the repeated application of weight matrices during backpropagation through time can cause gradients to explode. It preserves the direction of the gradient while limiting its magnitude.
Constraints:
Test Cases
[[1,2],[3,4]][[0.4472136,0.8944272],[0.8944272,1.7888544]][[5,10],[15,20]][[0.4472136,0.8944272],[0.8944272,1.7888544]]