Implement **K-Means++ Initialization**, a smart centroid initialization strategy that leads to better clustering results.
Algorithm:
1. Choose the first centroid randomly from the data points
2. For each remaining centroid:
a. Compute the distance from each point to its nearest existing centroid
b. Choose the next centroid with probability proportional to distance^2
3. Repeat until k centroids are selected
Example:
centroids = kmeans_plus_plus_init(X, k=3)
# centroids are spread out, not clustered together
print(centroids.shape) # (3, n_features)
**Explanation:** Standard K-Means randomly picks initial centroids, which can lead to poor clustering if centroids start close together. K-Means++ spreads them out by preferring points far from existing centroids, leading to faster convergence and better results.
Constraints:
Test Cases
X=random(100,2), k=3shape (3, 2)k=1shape (1, n_features), a single data pointX=[[0,0],[10,10],[20,20]], k=3All 3 points selected (they are maximally spread)