Implement a **Decision Tree Classifier** from scratch using information gain (entropy-based splitting).
Algorithm:
1. Recursively split data by finding the feature and threshold that maximizes **information gain**
2. **Entropy:** H(y) = -sum(p_i * log2(p_i))
3. **Information Gain:** IG = H(parent) - (n_left/n * H(left) + n_right/n * H(right))
4. Stop splitting when: max_depth reached, node is pure, or too few samples
Example:
dt = DecisionTree(max_depth=5)
dt.fit(X_train, y_train)
predictions = dt.predict(X_test)
**Explanation:** At each node, we try all possible feature-threshold combinations and choose the split that reduces entropy the most. Leaf nodes store the majority class.
Constraints:
Test Cases
Test Case 1
Input:
XOR data [[0,0],[0,1],[1,0],[1,1]], y=[0,1,1,0], max_depth=3Expected:
Model fits and predicts without errorTest Case 2
Input:
Linearly separable data, predict training setExpected:
100% accuracy on training data (with sufficient depth)Test Case 3
Input:
All same class [0,0,0,0]Expected:
Always predicts 0+ 2 hidden test cases