LeNet-5 Implementation
LeNet-5, proposed by Yann LeCun in 1998, is one of the earliest and most influential CNN architectures. It was designed for handwritten digit recognition (MNIST) and consists of **2 convolutional layers** followed by **3 fully connected layers**. In this problem, compute the total output activations of the first convolutional layer.
LeNet-5 Architecture Overview:
Layer 1: Conv (input_size x input_size) -> (input_size - filter_size + 1)^2 * num_filters
Layer 2: Average Pooling (subsampling)
Layer 3: Conv
Layer 4: Average Pooling
Layer 5-7: Fully Connected Layers
Formula for first conv layer output:
spatial_output = input_size - filter_size + 1
total_output = spatial_output^2 * num_filters
Example:
Input: input_size = 32, filter_size = 5, num_filters = 6
Original LeNet-5 first layer:
spatial_output = 32 - 5 + 1 = 28
total_output = 28^2 * 6 = 784 * 6 = 4704
Output: 4704
LeNet-5 pioneered many concepts used in modern CNNs: local receptive fields, shared weights, and spatial subsampling. Despite its simplicity compared to modern architectures, it demonstrated that gradient-based learning could be applied to train multi-layer networks for visual pattern recognition.
Constraints:
Test Cases
[28, 5, 6]47040[32, 5, 10]108900