+5 votes
in Programming Languages by (40.5k points)

I am using BreastCancerData from the R dataset to run a machine learning algorithm. However, when I try to convert the label "benign" to 0 and "malignant" to 1, it throws a warning message, and labels are not converted. The warning message is: Warning message: In `[<-.factor`(`*tmp*`, y == "benign", value = 0) :   invalid factor level, NA generated

How can I fix this warning message?

library(xgboost)

library(mlbench)

# Wisconsin Breast Cancer Database

data("BreastCancer")

y <- BreastCancer$Class

y[y=="benign"] = 0

y[y=="malignant"] = 1

1 Answer

+1 vote
by (349k points)
selected by
 
Best answer

As it is clear from the warning message that the class of the variable "y" is "factor" and you are comparing it with "character ". You need to un-factorize a string using the as.character() function, and, then, you can compare it with character.

How to check the class of a variable:

> library(xgboost)
> library(mlbench)
> data("BreastCancer")
> y <- BreastCancer$Class
> class(y)
[1] "factor"
> y <- as.character(BreastCancer$Class)
> class(y)
[1] "character"

So, make the highlighted changes in your code and it should work.

library(xgboost)
library(mlbench)

# Wisconsin Breast Cancer Database
data("BreastCancer")

# generate data labels
y <- as.character(BreastCancer$Class)
y[y=="benign"] = 0
y[y=="malignant"] = 1
y <- as.integer(y)


...