Implement **Lasso Regression** (Linear Regression with L1 regularization) using gradient descent.
Loss Function:
Loss = (1/n) * sum((y_pred - y)^2) + alpha * sum(|w|)
Gradient Descent Update Rules:
dw = (2/n) * (X^T @ (y_pred - y)) + alpha * sign(w)
db = (2/n) * sum(y_pred - y)
w = w - learning_rate * dw
b = b - learning_rate * db
Where sign(w) returns -1 for negative values, +1 for positive values, and 0 for zero.
Note: The L1 penalty **only applies to weights**, not the bias term.
Your class should have:
Example:
X = [[1, 2, 0], [2, 3, 0], [3, 4, 0], [4, 5, 0]]
y = [5, 8, 11, 14]
model = LassoRegression(alpha=0.1, learning_rate=0.01)
model.fit(X, y, n_iterations=1000)
# L1 tends to push irrelevant feature weights to exactly 0 (sparsity)
**Explanation:** L1 regularization encourages sparsity -- it can drive some weights to exactly zero, effectively performing feature selection. This is useful when you suspect only a subset of features are truly important.
Constraints:
Test Cases
X=[[1,2],[2,3],[3,4],[4,5]], y=[5,8,11,14], alpha=0.1predictions close to true values; some weight shrinkageX=[[1,0],[2,0],[3,0],[4,0]], y=[2,4,6,8], alpha=0.5weight for irrelevant feature (col 2) pushed toward 0X=[[1],[2],[3],[4],[5]], y=[2,4,6,8,10], alpha=0.0equivalent to standard linear regression (weight~2.0)