Click4Ai

74.

Hard

Implement a **Decision Tree Classifier** from scratch using information gain (entropy-based splitting).

Algorithm:

1. Recursively split data by finding the feature and threshold that maximizes **information gain**

2. **Entropy:** H(y) = -sum(p_i * log2(p_i))

3. **Information Gain:** IG = H(parent) - (n_left/n * H(left) + n_right/n * H(right))

4. Stop splitting when: max_depth reached, node is pure, or too few samples

Example:

dt = DecisionTree(max_depth=5)

dt.fit(X_train, y_train)

predictions = dt.predict(X_test)

**Explanation:** At each node, we try all possible feature-threshold combinations and choose the split that reduces entropy the most. Leaf nodes store the majority class.

Constraints:

  • Use entropy (not Gini) for the splitting criterion
  • Support max_depth and min_samples_split stopping criteria
  • Use majority class for leaf node predictions
  • Test Cases

    Test Case 1
    Input: XOR data [[0,0],[0,1],[1,0],[1,1]], y=[0,1,1,0], max_depth=3
    Expected: Model fits and predicts without error
    Test Case 2
    Input: Linearly separable data, predict training set
    Expected: 100% accuracy on training data (with sufficient depth)
    Test Case 3
    Input: All same class [0,0,0,0]
    Expected: Always predicts 0
    + 2 hidden test cases