Click4Ai

112.

Medium

Cross-Validation (K-Fold)

Implement k-fold cross-validation to evaluate model performance. K-fold cross-validation is a robust model evaluation technique that splits the dataset into k equally sized folds, then iteratively uses each fold as a validation set while training on the remaining k-1 folds. This provides a more reliable estimate of model performance than a single train-test split.

The k-fold cross-validation algorithm is:

fold_size = len(data) // k

for i in range(k):

val_data = data[i * fold_size : (i+1) * fold_size]

train_data = all data except val_data

model.train(train_data)

scores[i] = model.evaluate(val_data)

final_score = mean(scores)

Your function cross_validation(X, y, k) should split the data into k folds, and for each fold, compute the accuracy of a dummy model (that always predicts 0) on the validation set. Return an array of scores for each fold.

Example:

Input: X = [[1,2],[3,4],[5,6],[7,8]], y = [0, 1, 0, 1], k = 2

Fold 1: val = X[0:2], y_val = [0, 1] -> accuracy of predicting 0 = 0.5

Fold 2: val = X[2:4], y_val = [0, 1] -> accuracy of predicting 0 = 0.5

Output: [0.5, 0.5]

Cross-validation reduces the variance of the performance estimate by averaging over multiple train-test splits. It is especially useful when the dataset is small, as it ensures every data point is used for both training and validation. Common choices for k are 5 and 10.

Constraints:

  • Use NumPy for all array operations
  • Split data into k contiguous, non-overlapping folds
  • Each fold should have `len(X) // k` samples
  • Return an array of k scores (one per fold)
  • Test Cases

    Test Case 1
    Input: [[[1,2],[3,4]],[[5,6],[7,8]]]
    Expected: [0.5,0.5]
    Test Case 2
    Input: [[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
    Expected: [0.5,0.5]
    + 3 hidden test cases