Click4Ai

135.

Medium

Vanishing Gradient Problem

Demonstrate the vanishing gradient problem by computing the gradient magnitude at each layer of a deep sigmoid network. In deep networks with sigmoid activations, gradients shrink exponentially as they are backpropagated through many layers because the maximum value of sigmoid's derivative is 0.25. This makes early layers learn extremely slowly.

Demonstration:

For a network with N sigmoid layers, the gradient at layer k (from the output)

is approximately:

gradient_magnitude[k] ≈ (max_sigmoid_derivative)^k = 0.25^k

This means:

Layer 1 (closest to output): gradient ≈ 0.25^1 = 0.25

Layer 2: gradient ≈ 0.25^2 = 0.0625

Layer 5: gradient ≈ 0.25^5 = 0.000977

Layer 10: gradient ≈ 0.25^10 ≈ 0.0000001

The gradient vanishes exponentially with depth!

Example:

Input: num_layers = 5

sigmoid_derivative_max = 0.25

Gradient magnitudes per layer (from output to input):

Layer 5 (output): 0.25^1 = 0.25

Layer 4: 0.25^2 = 0.0625

Layer 3: 0.25^3 = 0.015625

Layer 2: 0.25^4 = 0.00390625

Layer 1 (input): 0.25^5 = 0.0009765625

Output: [0.0009765625, 0.00390625, 0.015625, 0.0625, 0.25]

**Explanation:** The vanishing gradient problem occurs because during backpropagation, gradients are multiplied by the derivatives of activation functions at each layer. For sigmoid, the maximum derivative is 0.25 (at x=0), so each layer multiplication shrinks the gradient by at least 75%. After many layers, the gradient reaching the early layers is essentially zero, preventing those layers from learning. This is why modern deep networks use ReLU activations (derivative = 1 for positive inputs) and skip connections (ResNet) to maintain gradient flow.

Constraints:

  • \`num_layers\` is a positive integer (1 to 20)
  • Return a 1D numpy array of gradient magnitudes for each layer, ordered from input layer to output layer
  • Use 0.25 as the maximum sigmoid derivative per layer
  • The gradient at layer i (from input) = 0.25^(num_layers - i)
  • Test Cases

    Test Case 1
    Input: 5
    Expected: [0.000977, 0.003906, 0.015625, 0.0625, 0.25]
    Test Case 2
    Input: 3
    Expected: [0.015625, 0.0625, 0.25]
    Test Case 3
    Input: 1
    Expected: [0.25]
    + 2 hidden test cases