+2 votes
in Machine Learning by (73.8k points)
recategorized by
I want to convert the labels of my data from character to numerical as XGboost does not allow character. Is there a library to convert labels?

E.g. Y=['a','b','c','a','a','c','c','b','b','a'] should be converted to [0, 1, 2, 0, 0, 2, 2, 1, 1, 0]

Since I do not know how many different values are there in Y, I want to use some existing library to convert.

1 Answer

+1 vote
by (349k points)
selected by
 
Best answer

You can use LabelEncoder() module of sklearn for the conversion. Check the following example:

>>> from sklearn.preprocessing import LabelEncoder
>>> Y=['a','b','c','a','a','c','c','b','b','a']
>>> le = LabelEncoder().fit(Y)
>>> encoded_Y = le.transform(Y)
>>> encoded_Y
array([0, 1, 2, 0, 0, 2, 2, 1, 1, 0])
 


...