Click4Ai

91.

Hard

Implement a **simplified t-SNE** (t-distributed Stochastic Neighbor Embedding) for 2D visualization.

Simplified Algorithm:

1. Compute pairwise squared Euclidean distances in high-dimensional space

2. Convert distances to symmetric probabilities using Gaussian kernel:

p_ij = exp(-d_ij^2 / (2 * sigma^2)), then normalize

3. Initialize low-dimensional points randomly

4. For each iteration:

a. Compute low-dimensional pairwise affinities using Student-t kernel:

q_ij = (1 + ||y_i - y_j||^2)^(-1), then normalize

b. Compute gradient: dY_i = 4 * sum_j((p_ij - q_ij) * (y_i - y_j) * (1 + ||y_i-y_j||^2)^(-1))

c. Update: Y = Y - learning_rate * gradient

Example:

tsne = TSNE(n_components=2, perplexity=30, n_iterations=500)

X_2d = tsne.fit_transform(X) # X was (100, 50) → X_2d is (100, 2)

**Explanation:** t-SNE preserves local structure by ensuring similar points in high-D space remain close in low-D space. The Student-t distribution in low-D allows dissimilar points to spread far apart.

Constraints:

  • Output shape: (n_samples, n_components)
  • Use fixed sigma (simplified version -- no perplexity optimization)
  • Use learning rate 100 and basic gradient descent
  • Test Cases

    Test Case 1
    Input: X shape (50,10), n_components=2
    Expected: output shape (50, 2)
    Test Case 2
    Input: fit_transform returns numpy array
    Expected: True
    Test Case 3
    Input: 3 clusters in high-D should separate in 2D
    Expected: Visually separated groups
    + 2 hidden test cases