Global Average Pooling
Global Average Pooling (GAP) is a pooling operation that reduces each feature map (channel) to a single scalar value by computing the **mean of all spatial positions**. It replaces fully connected layers at the end of CNN architectures, reducing the total number of parameters and helping prevent overfitting.
Formula:
For a feature map of shape (channels, height, width):
output[c] = (1 / (H * W)) * sum(input[c, i, j]) for all i, j
Output shape: (batch_size, channels) -- one value per channel
For a single-channel input of shape (1, H, W):
output = mean of all H * W values
Example:
Input tensor (shape 1x2x2):
[[[1, 2],
[3, 4]]]
GAP computation:
output = (1 + 2 + 3 + 4) / 4 = 10 / 4 = 2.5
Output: 2.5
Global Average Pooling was popularized by the Network in Network (NIN) paper and is widely used in modern architectures like GoogLeNet and ResNet. It acts as a structural regularizer that enforces correspondences between feature maps and categories, making the network more interpretable.
Constraints:
Test Cases
[[[1, 2], [3, 4]]]2.5[[[5, 6], [7, 8]]]6.5