The meaning of random word dropout in NLP

Question

I have been reading the early paper on pre-training in NLP (https://arxiv.org/abs/1511.01432) and I can't understand what random word dropout means. The authors completely ignore explaining this method as if it was a standard thing. Can someone explain what they really do and what is the purpose of that?

Does this answer your question? [Meaning of dropout](https://datascience.stackexchange.com/questions/37835/meaning-of-dropout) — Tom M., Feb 24 '20 at 22:04
@Tom M. Not exactly. I know what a dropout is, but how do I apply it to words? Do I just randomly remove some of them (do I sample uniformly or based on frquency?) or maybe set them to some special token. If I have a sentence of length 5 words and 150 words then if I just remove 50% of words at random then effect maybe be different in those two cases. In a standard dropout size of the layer is the same for each train example — WoofDoggy, Feb 24 '20 at 22:11

score 3 · Answer 1 · edited Feb 25 '20 at 19:22

3

It is not uncommon that we can make sense of a sentence without reading it completely. Or when you are having a quick look at a document, you tend to oversee some words and still understand the main point. This is the intuition behind the word dropout.

Generally this is done by randomly dropping each word in a sequence following for example a Bernoulli distribution:

$X \leftarrow X \odot \vec{e}, \vec{e} ∼ B(n, p)$

where X is the index of the word token, n is the lenth of the sequence, and $\vec{e}$ is a vector with each word dropout state.

This is usually done after calculating the word embeddings, and the words selected to be left out are normally changed to the <UNK> equivalent embedding.

By doing this, we allow out model to learn more flexible ways of writing/convey meaning.

edited Feb 25 '20 at 19:22

Tom M.

663
3
9

answered Feb 25 '20 at 09:00

TitoOrt

1,832
12
22

all right then, so I choose words at random from sequence and remove them. Yoav Goldberg in his NLP book says: "word dropout may also be beneficial for preventing overfitting and improving robustness by not letting the model rely too much on any single word being present". – WoofDoggy Feb 25 '20 at 11:38
that's the idea – TitoOrt Feb 25 '20 at 12:10

The meaning of random word dropout in NLP

1 Answers1