Implement **Ridge Regression** (Linear Regression with L2 regularization) using gradient descent.
Loss Function:
Loss = (1/n) * sum((y_pred - y)^2) + alpha * sum(w^2)
Gradient Descent Update Rules:
dw = (2/n) * (X^T @ (y_pred - y)) + 2 * alpha * w
db = (2/n) * sum(y_pred - y)
w = w - learning_rate * dw
b = b - learning_rate * db
Note: The L2 penalty **only applies to weights**, not the bias term.
Your class should have:
Example:
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [5, 8, 11, 14]
alpha = 0.1
model = RidgeRegression(alpha=0.1, learning_rate=0.01)
model.fit(X, y, n_iterations=1000)
# Weights will be slightly shrunk toward 0 compared to unregularized regression
**Explanation:** L2 regularization adds a penalty proportional to the square of weights, discouraging large weight values and reducing overfitting.
Constraints:
Test Cases
Test Case 1
Input:
X=[[1,2],[2,3],[3,4],[4,5]], y=[5,8,11,14], alpha=0.1Expected:
weights shrunk toward 0 vs unregularized; predictions still close to yTest Case 2
Input:
X=[[1],[2],[3],[4],[5]], y=[2,4,6,8,10], alpha=0.0Expected:
equivalent to standard linear regression (weight~2.0)Test Case 3
Input:
X=[[1],[2],[3],[4],[5]], y=[2,4,6,8,10], alpha=10.0Expected:
weights significantly shrunk; predictions less accurate+ 2 hidden test cases