Click4Ai

82.

Medium

Implement **K-Means++ Initialization**, a smart centroid initialization strategy that leads to better clustering results.

Algorithm:

1. Choose the first centroid randomly from the data points

2. For each remaining centroid:

a. Compute the distance from each point to its nearest existing centroid

b. Choose the next centroid with probability proportional to distance^2

3. Repeat until k centroids are selected

Example:

centroids = kmeans_plus_plus_init(X, k=3)

# centroids are spread out, not clustered together

print(centroids.shape) # (3, n_features)

**Explanation:** Standard K-Means randomly picks initial centroids, which can lead to poor clustering if centroids start close together. K-Means++ spreads them out by preferring points far from existing centroids, leading to faster convergence and better results.

Constraints:

  • Return a numpy array of shape (k, n_features)
  • Use squared distances for probability weighting
  • Normalize probabilities so they sum to 1
  • Test Cases

    Test Case 1
    Input: X=random(100,2), k=3
    Expected: shape (3, 2)
    Test Case 2
    Input: k=1
    Expected: shape (1, n_features), a single data point
    Test Case 3
    Input: X=[[0,0],[10,10],[20,20]], k=3
    Expected: All 3 points selected (they are maximally spread)
    + 2 hidden test cases