L2 Regularization (Ridge)
Implement L2 regularization, a technique that adds a penalty proportional to the **sum of squared values** of the model weights to the loss function. Unlike L1 which induces sparsity, L2 regularization smoothly penalizes large weights, encouraging the model to use all features with small weights rather than relying heavily on a few features.
Formula:
L2_penalty = lambda * sum(w_i^2)
Total loss = original_loss + L2_penalty
Where:
lambda (strength) = regularization coefficient (hyperparameter)
w_i^2 = square of each weight
sum = sum over all weights in the model
Example:
Input: weights = [1, 2, 3], strength = 0.5
L2_penalty = 0.5 * (1^2 + 2^2 + 3^2)
= 0.5 * (1 + 4 + 9)
= 0.5 * 14
= 7.0
Output: 7.0
**Explanation:** L2 regularization penalizes the squared magnitude of weights, creating a smooth, differentiable penalty that grows quadratically with weight size. The gradient of w^2 is 2w, which means larger weights receive proportionally larger penalties -- unlike L1 where all weights receive the same constant push. This results in weights being shrunk smoothly toward zero but rarely reaching exactly zero. L2 is equivalent to weight decay in standard SGD and is also called Ridge regularization. It helps prevent overfitting by encouraging the model to spread its capacity across many small weights.
Constraints:
Test Cases
[[1, 2, 3], 0.5]7.0[[4, 5, 6], 1.0]77.0