Adam Optimizer
Implement the Adam (Adaptive Moment Estimation) Optimizer for updating neural network weights. Adam combines the benefits of two other optimizers: AdaGrad (which adapts the learning rate per parameter) and RMSProp (which uses a moving average of squared gradients). It is one of the most widely used optimizers in deep learning.
The Adam update rules are:
# First moment estimate (momentum):
m = beta1 * m + (1 - beta1) * gradient
# Second moment estimate (velocity):
v = beta2 * v + (1 - beta2) * gradient^2
# Bias-corrected estimates:
m_hat = m / (1 - beta1^t)
v_hat = v / (1 - beta2^t)
# Weight update:
weights = weights - learning_rate * m_hat / (sqrt(v_hat) + epsilon)
Your function adam_optimizer(weights, gradients, learning_rate, beta1, beta2, epsilon) should initialize m and v to zeros, compute one step of Adam, and return the updated weights.
Example:
Input: weights = [[1, 2], [3, 4]], gradients = [[0.1, 0.2], [0.3, 0.4]]
learning_rate = 0.01, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8
m = 0.9 * 0 + 0.1 * gradients
v = 0.999 * 0 + 0.001 * gradients^2
Output: updated weights after applying bias correction and update rule
Adam maintains per-parameter learning rates that are adapted based on the first moment (mean) and second moment (uncentered variance) of the gradients. The bias correction terms compensate for the initialization of m and v at zero, which biases them toward zero during the initial time steps.
Constraints:
Test Cases
[[1, 2], [3, 4]][[0.99, 1.98], [2.97, 3.96]][[5, 6], [7, 8]][[4.95, 5.94], [6.93, 7.92]]