Click4Ai

86.

Hard

Implement **DBSCAN** (Density-Based Spatial Clustering of Applications with Noise).

Key Concepts:

  • **Core point:** Has at least `min_samples` points within `eps` radius
  • **Border point:** Within eps of a core point but not itself a core point
  • **Noise point:** Neither core nor border (labeled as -1)
  • Algorithm:

    1. For each unvisited point, find all neighbors within eps distance

    2. If the point has >= min_samples neighbors, it's a core point — start a new cluster

    3. Expand the cluster by recursively adding all density-reachable points

    4. Points not reachable from any core point are labeled as noise (-1)

    Example:

    db = DBSCAN(eps=0.5, min_samples=5)

    db.fit(X)

    print(db.labels_) # -1 for noise, 0,1,2... for clusters

    **Explanation:** Unlike K-Means, DBSCAN doesn't require specifying the number of clusters. It discovers clusters of arbitrary shape and identifies outliers as noise.

    Constraints:

  • Use Euclidean distance
  • Noise points labeled as -1
  • Store labels in self.labels_
  • Test Cases

    Test Case 1
    Input: Two dense clusters with outliers, eps=1.0, min_samples=3
    Expected: Two clusters found, outliers labeled -1
    Test Case 2
    Input: All points within eps of each other
    Expected: Single cluster (label 0 for all)
    Test Case 3
    Input: All points far apart (distance > eps)
    Expected: All labeled -1 (noise)
    + 2 hidden test cases