How to divide a given data into train and test sets

Question

How to divide a given data into train and test sets

1 Answer

answered May 1, 2022 by pythonuser (73.8k points)
selected Sep 18, 2022 by pkumar81

Best answer

If you are familiar with the scikit-learn library, you can use its train_test_split() function to create train and test data for your classification model. You can specify the test_size as the argument of this function.

Here is an example. I have randomly generated data and labels and will apply the function to generate train and test sets.

import numpy as np
from sklearn.model_selection import train_test_split
# generate random data
n_samples = 25
n_features = 4
np.random.seed(1234)
X, y = np.random.random(n_samples*n_features).reshape((n_samples, n_features)), \
[np.random.randint(0, 2) for _ in range(n_samples)]
print("data shape: {0}".format(X.shape))
# split data into train (75%) and test (25%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1234)
print("train set shape: {0}".format(X_train.shape))
print("test set shape: {0}".format(X_test.shape))

The above code prints the following output:

data shape: (25, 4)
train set shape: (18, 4)
test set shape: (7, 4)

How to divide a given data into train and test sets

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories