How to generate train and test sets for 5-fold cross validation

Question

How to generate train and test sets for 5-fold cross validation

1 Answer

answered Dec 29, 2020 by pkumar81 (348k points)
selected Jul 16, 2023 by pkumar81

Best answer

You can use sklearn's StratifiedKFold() method to split the data into train and test sets to run 5-fold cross-validation. This method will generate unique test sets for each fold.

Here is an example:

import numpy as np
from sklearn.model_selection import StratifiedKFold

# sample data
X = np.array([[1, 2, 3], [2, 4, 6], [3, 6, 9], [4, 8, 12], [5, 10, 15],
              [6, 12, 18], [7, 14, 21], [8, 16, 24], [9, 18, 27], [10, 20, 30]])
y = np.array([0, 1, 1, 1, 1, 0, 0, 0, 1, 0])

kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=1001)

for k, (train_idx, test_idx) in enumerate(kf.split(X, y)):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    # print('X_train:', X_train)
    # print('y_train:', y_train)
    print('fold: {0}, X_test: \n{1}'.format(k, X_test))
    print('fold: {0}, y_test: {1}'.format(k, y_test))

The above code prints the following output. You can see that each fold has different test set.

fold: 0, X_test:
[[ 5 10 15]
[ 6 12 18]]
fold: 0, y_test: [1 0]
fold: 1, X_test:
[[ 2 4 6]
[10 20 30]]
fold: 1, y_test: [1 0]
fold: 2, X_test:
[[ 3 6 9]
[ 7 14 21]]
fold: 2, y_test: [1 0]
fold: 3, X_test:
[[ 1 2 3]
[ 9 18 27]]
fold: 3, y_test: [0 1]
fold: 4, X_test:
[[ 4 8 12]
[ 8 16 24]]
fold: 4, y_test: [1 0]

How to generate train and test sets for 5-fold cross validation

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories