Click4Ai

89.

Medium

Implement **Principal Component Analysis (PCA)** for dimensionality reduction.

Algorithm:

1. **Center** the data by subtracting the mean of each feature

2. Compute the **covariance matrix** of the centered data

3. Compute **eigenvalues and eigenvectors** of the covariance matrix

4. Sort eigenvectors by eigenvalues in descending order

5. Select the top n_components eigenvectors as principal components

6. **Transform:** Project data onto the selected components

Example:

pca = PCA(n_components=2)

X_reduced = pca.fit_transform(X) # X was (100, 10) → X_reduced is (100, 2)

**Explanation:** PCA finds the directions of maximum variance in the data. By projecting onto the top principal components, we reduce dimensionality while preserving as much information as possible.

Constraints:

  • Center the data (subtract mean) before computing covariance
  • Use np.linalg.eigh for symmetric matrix eigendecomposition
  • Store components as rows (shape: n_components x n_features)
  • Test Cases

    Test Case 1
    Input: X shape (100,5), n_components=2, fit_transform
    Expected: output shape (100, 2)
    Test Case 2
    Input: components shape after fit
    Expected: (n_components, n_features)
    Test Case 3
    Input: n_components=1, 2D data along a line
    Expected: Captures main direction of variance
    + 2 hidden test cases