Click4Ai

81.

Medium

Implement the **K-Means Clustering** algorithm from scratch.

Algorithm:

1. Randomly initialize k centroids from the data points

2. **Assign:** Each point to the nearest centroid (using Euclidean distance)

3. **Update:** Recompute each centroid as the mean of its assigned points

4. Repeat steps 2-3 until centroids stop changing or max_iterations reached

Example:

km = KMeans(n_clusters=3)

km.fit(X)

labels = km.predict(X) # Cluster assignments for each point

print(km.centroids) # Final centroid positions

**Explanation:** K-Means partitions n data points into k clusters by minimizing within-cluster sum of squared distances. Each iteration reduces the total distance, guaranteeing convergence (to a local minimum).

Constraints:

  • Initialize centroids by randomly selecting k data points (no duplicates)
  • Use Euclidean distance for assignment
  • Convergence check: stop when centroids don't change between iterations
  • Test Cases

    Test Case 1
    Input: 3 well-separated clusters, k=3
    Expected: 3 distinct labels assigned
    Test Case 2
    Input: centroids shape after fit = (k, n_features)
    Expected: True
    Test Case 3
    Input: predict returns array of length n_samples
    Expected: True
    + 2 hidden test cases