1x1 Convolution (Pointwise Convolution)
A 1x1 convolution (also called pointwise convolution) applies a single scalar kernel to every spatial position of the input. It performs **channel-wise linear combination** without considering spatial neighbors. In multi-channel inputs, it mixes information across channels while preserving spatial dimensions.
Formula:
For a single-channel input:
output[i, j] = input[i, j] * kernel[0, 0]
For multi-channel input (C_in channels -> C_out channels):
output[c_out, i, j] = sum(kernel[c_out, c_in] * input[c_in, i, j])
for c_in in range(C_in)
Output spatial dimensions = Input spatial dimensions (unchanged)
Example:
Input (2x2): Kernel (1x1):
[[1, 2], [[5]]
[3, 4]]
output[0,0] = 1 * 5 = 5
output[0,1] = 2 * 5 = 10
output[1,0] = 3 * 5 = 15
output[1,1] = 4 * 5 = 20
Output: [[5, 10],
[15, 20]]
1x1 convolutions were introduced in the Network in Network paper and are widely used in modern architectures. They serve three key purposes: (1) dimensionality reduction by reducing channel count, (2) adding non-linearity when followed by activation, and (3) cross-channel feature combination. GoogLeNet (Inception) uses them extensively to reduce computation.
Constraints:
Test Cases
[[1,2],[3,4]][[5,10],[15,20]][[5,6],[7,8]][[25,30],[35,40]]