Depthwise Separable Convolution
Depthwise separable convolution, popularized by MobileNet, factorizes a standard convolution into two steps: (1) a **depthwise convolution** that applies a separate filter to each input channel independently, and (2) a **pointwise (1x1) convolution** that combines the outputs across channels. This dramatically reduces computational cost.
Algorithm:
Step 1 - Depthwise Convolution:
For each channel c:
output_dw[c] = input[c] * kernel_dw[c] (element-wise)
Step 2 - Pointwise Convolution (1x1):
output[c_out] = sum(kernel_pw[c_out, c_in] * output_dw[c_in])
Cost comparison (for K x K kernel, C_in channels, C_out channels):
Standard conv: K^2 * C_in * C_out operations per position
Depthwise sep: K^2 * C_in + C_in * C_out (much fewer)
Example:
Input (2x2): Kernel (2x2):
[[1, 2], [[5, 6],
[3, 4]] [7, 8]]
Depthwise (element-wise multiplication):
output[0,0] = 1 * 5 = 5
output[0,1] = 2 * 6 = 12
output[1,0] = 3 * 7 = 21
output[1,1] = 4 * 8 = 32
Output: [[5, 12],
[21, 32]]
Depthwise separable convolutions reduce the number of parameters and computations by a factor of approximately K^2 (kernel size squared) compared to standard convolutions. They are the backbone of efficient architectures like MobileNet, Xception, and EfficientNet, making deep learning feasible on mobile and edge devices.
Constraints:
Test Cases
[[1,2],[3,4]][[5,12],[21,32]][[5,6],[7,8]][[25,36],[49,64]]