Xavier Initialization
Implement Xavier (Glorot) initialization for a neural network layer. Xavier initialization is a weight initialization strategy designed to keep the variance of activations and gradients consistent across layers. It is particularly suited for layers using sigmoid or tanh activation functions.
The Xavier initialization formula is:
std = sqrt(2 / (fan_in + fan_out))
weights ~ N(0, std)
where:
fan_in = number of input neurons
fan_out = number of output neurons
N(0, std) = normal distribution with mean 0 and standard deviation std
Your function xavier_initialization(num_inputs, num_outputs) should generate a weight matrix of shape (num_inputs, num_outputs) with values sampled from a normal distribution with the computed standard deviation.
Example:
Input: num_inputs = 256, num_outputs = 128
std = sqrt(2 / (256 + 128)) = sqrt(2 / 384) = 0.0722
Output: A (256, 128) matrix with values sampled from N(0, 0.0722)
Proper weight initialization is crucial for training deep networks. If weights are too large, activations can explode and gradients can vanish (or vice versa). Xavier initialization ensures that the variance of the outputs of each layer is approximately equal to the variance of its inputs, maintaining a stable signal flow through the network.
Constraints:
Test Cases
[256, 128][[0.047, -0.031, ...], ...][512, 256][[0.033, -0.023, ...], ...]