What is difference between one hot encoding and leave one out encoding?

Question

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?

It's not clear (from just your question) what leave-on-out even is. You should edit this to give a pointer and explain briefly your understanding of the two, and why you think they are the same. — Sean Owen, Mar 23 '16 at 13:31
[leave one out, from scikit learn contrib categorical project](https://contrib.scikit-learn.org/categorical-encoding/leaveoneout.html) — mork, Mar 18 '19 at 08:34
OHE and LOO are #2 and #10 in [11 Categorical Encoders and Benchmark](https://www.kaggle.com/code/subinium/11-categorical-encoders-and-benchmark/notebook#10.-Leave-one-out-Encoder-(LOO-or-LOOE)) respectively. — smci, Apr 21 '22 at 07:20

score 20 · Answer 1 · edited Aug 16 '20 at 17:03

20

They are probably using "leave one out encoding" to refer to Owen Zhang's strategy.

From here

The encoded column is not a conventional dummy variable, but instead is the mean response over all rows for this categorical level, excluding the row itself. This gives you the advantage of having a one-column representation of the categorical while avoiding direct response leakage

This picture expresses the idea well.

edited Aug 16 '20 at 17:03

Zephyr

997
4
10
20

answered Mar 23 '16 at 12:06

Dex Groves

386
1
4

1

Your explanation is better than wacax's in the referred link, thank you – Allan Ruin Aug 12 '16 at 15:00
Hi @Dex Groves, so the leave_one_out encoding for the test is always .5? – user7117436 Mar 24 '17 at 20:29
3

Hi! As seen from the picture, this paticular example relates to classification problem. Does anybody have an experience with LOO encoding within regression problem? The main question is how to aggregate the target variable. I am now making experiments and get huge overfitting with mean(y). – Alexey Trofimov Jun 19 '17 at 12:49
1

for a clustering (unsupervised) problem, is possible to use this kind of encoding? – enneppi Sep 13 '18 at 10:26
@AlexeyTrofimov - try an aggregation with a lower variance. I'd start with different binning (like 1K, 2K, 2M, .. for large y int values, or some rounding to a decimal place for y float values) => mean(bin_f(y)) – mork Mar 18 '19 at 08:40
1

@enneppi - the whole idea is to "tie" your categorical feature to the target "y", which you're missing in your unsupervised ML. You could try "tying" your categorical feature into other X features (a kind of feature engineering) – mork Mar 18 '19 at 08:46

What is difference between one hot encoding and leave one out encoding?

1 Answers1

Linked