How to impute missing values in training datasets

Question

How to impute missing values in training datasets

1 Answer

answered Apr 5, 2021 by pkumar81 (348k points)
selected Oct 26, 2023 by pkumar81

Best answer

Imputation of noisy features or missing feature values is a research question. However, there are some existing methods that can be used to impute the missing values. The sklearn library has univariate and multivariate imputation modules.

Here is an example using the univariate feature imputation method. Missing values can be imputed with a provided constant value, or using the statistics (mean, median, or most frequent) of each column in which the missing values are located.

>>> import numpy as np
>>> from sklearn.impute import SimpleImputer
>>> imp = SimpleImputer(missing_values=np.nan, strategy='mean')
>>> X_train = np.array([[4, 2, 3], [6, 1, 1], [7, 6, 5], [4, 9, 10]])
>>> X_train
array([[ 4, 2, 3],
       [ 6, 1, 1],
       [ 7, 6, 5],
       [ 4, 9, 10]])
>>> X_test = np.array([[np.nan, 2, 3], [6, np.nan, 1], [7, 6, 5], [4, 9, np.nan]])
>>> X_test
array([[nan, 2., 3.],
       [ 6., nan, 1.],
       [ 7., 6., 5.],
       [ 4., 9., nan]])
>>> imp.fit(X_train)
SimpleImputer()
>>> imp.transform(X_test)
array([[5.25, 2. , 3. ],
       [6. , 4.5 , 1. ],
       [7. , 6. , 5. ],
       [4. , 9. , 4.75]])

How to impute missing values in training datasets

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories