+4 votes
in Programming Languages by (6.5k points)

I am using Breast Cancer data from R datasets to run the XGBoost classifier. The data frame contains a header, and all values are numeric. I have to convert the data frame to the data matrix to use it for the classifier. I am using the as.matrix() function to convert the data frame, but it makes all values character. The classifier needs numeric data. How can I convert the data frame to a numeric data matrix?

Here is my code:

library(mlbench)

data("BreastCancer")

# generate data

X <- subset(BreastCancer, select = -c(Class,Id)) # remove Id and Class

X <- as.matrix(X)

1 Answer

0 votes
by (35.8k points)

Since your data have a header, the as.matrix() function is converting numerical values to characters.

You can use the sapply() function to make data frame values numeric and then convert the data frame to the data matrix.

Apply the highlighted change to your code, and it should work without any error:

library(mlbench)
data("BreastCancer")
# generate data
X <- subset(BreastCancer, select = -c(Class,Id)) # remove Id and Class
X <- as.matrix(sapply(X, as.numeric)) # keep data as numeric

...