Click4Ai

121.

Hard

LeNet-5 Implementation

LeNet-5, proposed by Yann LeCun in 1998, is one of the earliest and most influential CNN architectures. It was designed for handwritten digit recognition (MNIST) and consists of **2 convolutional layers** followed by **3 fully connected layers**. In this problem, compute the total output activations of the first convolutional layer.

LeNet-5 Architecture Overview:

Layer 1: Conv (input_size x input_size) -> (input_size - filter_size + 1)^2 * num_filters

Layer 2: Average Pooling (subsampling)

Layer 3: Conv

Layer 4: Average Pooling

Layer 5-7: Fully Connected Layers

Formula for first conv layer output:

spatial_output = input_size - filter_size + 1

total_output = spatial_output^2 * num_filters

Example:

Input: input_size = 32, filter_size = 5, num_filters = 6

Original LeNet-5 first layer:

spatial_output = 32 - 5 + 1 = 28

total_output = 28^2 * 6 = 784 * 6 = 4704

Output: 4704

LeNet-5 pioneered many concepts used in modern CNNs: local receptive fields, shared weights, and spatial subsampling. Despite its simplicity compared to modern architectures, it demonstrated that gradient-based learning could be applied to train multi-layer networks for visual pattern recognition.

Constraints:

  • input_size, filter_size, and num_filters are positive integers
  • filter_size <= input_size
  • No padding is applied (valid convolution)
  • Stride is 1
  • Test Cases

    Test Case 1
    Input: [28, 5, 6]
    Expected: 47040
    Test Case 2
    Input: [32, 5, 10]
    Expected: 108900
    + 3 hidden test cases