Transposed Convolution
Transposed convolution (also called deconvolution or fractionally-strided convolution) is an **upsampling operation** that increases the spatial dimensions of the input. It works by placing each input value at a position in the output and multiplying it by the kernel, accumulating overlapping contributions.
Formula:
output_size = (input_size - 1) * stride + kernel_size
For stride = 2, input (N x N), kernel (K x K):
output_size = (N - 1) * 2 + K
Algorithm:
For each input position (i, j):
output[i*stride : i*stride+K, j*stride : j*stride+K] += input[i,j] * kernel
Overlapping regions are summed together.
Example:
Input (2x2): Kernel (2x2): Stride = 2
[[1, 2], [[5, 6],
[3, 4]] [7, 8]]
output_size = (2-1)*2 + 2 = 4 -> 4x4 output
Place input[0,0]=1 * kernel at (0,0): [[5,6,0,0],[7,8,0,0],[0,0,0,0],[0,0,0,0]]
Place input[0,1]=2 * kernel at (0,2): add [[0,0,10,12],[0,0,14,16],[0,0,0,0],[0,0,0,0]]
Place input[1,0]=3 * kernel at (2,0): add [[0,0,0,0],[0,0,0,0],[15,18,0,0],[21,24,0,0]]
Place input[1,1]=4 * kernel at (2,2): add [[0,0,0,0],[0,0,0,0],[0,0,20,24],[0,0,28,32]]
Output: [[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]]
Transposed convolutions are essential in architectures that require upsampling, such as autoencoders, GANs (generators), and semantic segmentation networks (e.g., U-Net decoder). They learn upsampling parameters rather than using fixed interpolation.
Constraints:
Test Cases
[[1,2],[3,4]][[12,21,12,21],[21,30,21,30],[12,21,12,21],[21,30,21,30]][[5,6],[7,8]][[60,81,60,81],[81,108,81,108],[60,81,60,81],[81,108,81,108]]