Click4Ai

122.

Medium

VGG Block

A VGG block is the fundamental building unit of the VGG network architecture (Simonyan & Zisserman, 2014). It follows a simple pattern: **multiple 3x3 convolution + ReLU layers** followed by a **2x2 max pooling layer**. This design showed that depth with small filters is more effective than fewer layers with large filters.

VGG Block Pattern:

For each conv layer (repeated num_conv_layers times):

output = ReLU(Conv3x3(input))

output = MaxPool2x2(output)

Where:

Conv3x3: Convolution with 3x3 kernel, same padding

ReLU: max(0, x) activation

MaxPool2x2: 2x2 max pooling with stride 2

Spatial dimensions after block:

output_height = input_height / 2

output_width = input_width / 2

Example:

Input shape: (1, 4, 4, 3) num_conv_layers = 2, num_filters = 64

Step 1: Conv3x3 + ReLU -> (1, 4, 4, 64)

Step 2: Conv3x3 + ReLU -> (1, 4, 4, 64)

Step 3: MaxPool2x2 -> (1, 2, 2, 64)

Output shape: (1, 2, 2, 64)

VGG demonstrated that network depth is critical for good performance. The VGG-16 and VGG-19 variants stack multiple VGG blocks with increasing filter counts (64, 128, 256, 512), achieving strong results on ImageNet classification.

Constraints:

  • Input shape is (batch_size, height, width, channels)
  • Convolution uses 3x3 kernels with same padding
  • ReLU activation is applied after each convolution
  • Max pooling uses 2x2 windows with stride 2
  • num_conv_layers and num_filters are positive integers
  • Test Cases

    Test Case 1
    Input: [[[[1,2],[3,4]],[[5,6],[7,8]]]]
    Expected: [[[[7],[8]],[[15],[16]]]]
    Test Case 2
    Input: [[[[1,2,3],[4,5,6],[7,8,9]],[[10,11,12],[13,14,15],[16,17,18]]]]
    Expected: [[[[27],[36]],[[54],[72]]]]
    + 3 hidden test cases