Click4Ai

106.

Medium

SGD with Momentum

Implement Stochastic Gradient Descent (SGD) with Momentum for updating neural network weights. Standard SGD can be slow and oscillate in narrow valleys of the loss surface. Momentum helps accelerate SGD by accumulating a velocity vector in the direction of persistent gradient descent, dampening oscillations.

The SGD with Momentum update rules are:

velocity = momentum * velocity - learning_rate * gradient

weights = weights + velocity

Your function sgd_with_momentum(weights, gradients, learning_rate, momentum) should initialize the velocity to zero (if not provided), compute one step of the momentum update, and return the updated weights.

Example:

Input: weights = [[1, 2], [3, 4]], gradients = [[0.1, 0.2], [0.3, 0.4]]

learning_rate = 0.01, momentum = 0.9

velocity (initial) = [[0, 0], [0, 0]]

velocity = 0.9 * [[0,0],[0,0]] - 0.01 * [[0.1,0.2],[0.3,0.4]]

= [[-0.001, -0.002], [-0.003, -0.004]]

weights = weights + velocity

Output: [[0.999, 1.998], [2.997, 3.996]]

The momentum term acts like a heavy ball rolling downhill -- it accumulates velocity in directions where the gradient consistently points, allowing the optimizer to move faster through flat regions and avoid getting stuck in shallow local minima. A typical momentum value is 0.9.

Constraints:

  • The learning rate should be between 0 and 1
  • The momentum should be between 0 and 1 (typically 0.9)
  • Initialize velocity to zeros if not provided
  • The input should be a 2D numpy array
  • Test Cases

    Test Case 1
    Input: [[1, 2], [3, 4]]
    Expected: [[0.99, 1.98], [2.97, 3.96]]
    Test Case 2
    Input: [[5, 6], [7, 8]]
    Expected: [[4.95, 5.94], [6.93, 7.92]]
    + 3 hidden test cases