2

Should I always use label encoding while doing binary classification?

2 Answers2

0

If by label encoding you mean one-hot-encoding, no it's not necessary. In fact it's not a good idea because this would create two target variables instead of one, a setting which corresponds to multi-label classification.

The standard way is to simply represent the label as an integer 0 or 1, for example with LabelEncoder.

Erwan
  • 24,823
  • 3
  • 13
  • 34
  • thanks, but why to use label encoder with target values, will It make any difference? Just will speed up algorightms – Rus Pylypyuk Apr 15 '22 at 10:46
  • it's essentially a technical constraint: most algorithms require an integer as label, they simply wouldn't work with categorical labels as strings for example. – Erwan Apr 15 '22 at 10:57
0

For binary classification, you need your target variable to take only the values 0 and 1 to be able to compute a loss and train your model. If your raw target variable are strings, you can use LabelBinarizer to transform them into 0s and 1s. You should not use OneHotEncoder which is meant to transform features, not target variables, and can perform a number of other things such as handling unknown classes or rare ones which is irrelevant for target transformation.

You could also use LabelEncoder for binary classification as it can handle a variable number of classes but it is more natural to use LabelBinarizer to be explicit about the fact that you have only two classes.

Also note that the output will have different dimensions:

from sklearn.preprocessing import LabelBinarizer, LabelEncoder
lb = LabelBinarizer()
le = LabelEncoder()
y = ["a", "a", "b", "a", "b"]

lb.fit_transform(y).shape, le.fit_transform(y).shape

will return ((5, 1), (5,))

I suggest you read the great documentation about Transforming the prediction target

Just trying
  • 464
  • 2
  • 6