He Initialization
Implement He (Kaiming) initialization for a neural network layer. He initialization is a weight initialization strategy specifically designed for layers using ReLU (Rectified Linear Unit) activation functions. Since ReLU sets all negative values to zero, it effectively halves the variance of the output, and He initialization compensates for this by using a larger initial variance than Xavier initialization.
The He initialization formula is:
std = sqrt(2 / fan_in)
weights ~ N(0, std)
where:
fan_in = number of input neurons
N(0, std) = normal distribution with mean 0 and standard deviation std
Your function he_initialization(num_inputs, num_outputs) should generate a weight matrix of shape (num_inputs, num_outputs) with values sampled from a normal distribution using the computed standard deviation.
Example:
Input: num_inputs = 256, num_outputs = 128
std = sqrt(2 / 256) = sqrt(0.0078125) = 0.0884
Output: A (256, 128) matrix with values sampled from N(0, 0.0884)
He initialization was introduced in the paper "Delving Deep into Rectifiers" by Kaiming He et al. It accounts for the fact that ReLU neurons output zero for half of their inputs (the negative half), so the variance needs to be doubled compared to Xavier initialization to maintain the signal magnitude through deep ReLU networks.
Constraints:
Test Cases
[256, 128][[0.089, -0.061, ...], ...][512, 256][[0.063, -0.043, ...], ...]