+1 vote
in Programming Languages by (8.2k points)
I want to convert the labels of my data from character to numerical as XGboost does not allow character. Is there a library to convert labels?

E.g. Y=['a','b','c','a','a','c','c','b','b','a'] should be converted to [0, 1, 2, 0, 0, 2, 2, 1, 1, 0]

Since I do not know how many different values are there in Y, I want to use some existing library to convert.

1 Answer

0 votes
by (16.1k points)

You can use LabelEncoder() module of sklearn for the conversion. Check the following example:

>>> from sklearn.preprocessing import LabelEncoder
>>> Y=['a','b','c','a','a','c','c','b','b','a']
>>> le = LabelEncoder().fit(Y)
>>> encoded_Y = le.transform(Y)
>>> encoded_Y
array([0, 1, 2, 0, 0, 2, 2, 1, 1, 0])
 

...