Click4Ai

87.

Hard

Implement the **Mean Shift** clustering algorithm from scratch.

Mean Shift is a non-parametric, density-based clustering algorithm. It works by iteratively shifting each data point toward the **weighted mean** of its neighbors within a given bandwidth, using a Gaussian kernel for weighting.

Algorithm:

1. Initialize each data point as a candidate center.

2. For each candidate, find all points within the bandwidth radius.

3. Compute the Gaussian-weighted mean of those neighbors:

K(x) = exp(-0.5 * (||x|| / bandwidth)^2)

new_center = sum(K(x_i - center) * x_i) / sum(K(x_i - center))

4. Shift the candidate to the new weighted mean.

5. Repeat until convergence (shift distance < 1e-5) or max_iterations reached.

6. Merge candidate centers that are within bandwidth / 2 of each other.

7. Assign each original point to the nearest final cluster center.

Example:

X = [[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]

ms = MeanShift(bandwidth=3.0)

ms.fit(X)

# Points [1,2], [1.5,1.8], [1,0.6] cluster together

# Points [5,8], [8,8], [9,11] cluster together

print(ms.labels_) # e.g., [0, 0, 1, 1, 0, 1]

Constraints:

  • Input X is a 2D numpy array of shape (n_samples, n_features)
  • bandwidth > 0
  • The class must expose `cluster_centers_` (array of final centers) and `labels_` (cluster label per point)
  • Use Euclidean distance
  • Test Cases

    Test Case 1
    Input: Example input 1
    Expected: Expected output 1
    Test Case 2
    Input: Example input 2
    Expected: Expected output 2
    + 1 hidden test cases