Questions tagged [masking]

7 questions
2
votes
1 answer

Decoder Transformer feedforward

I have a question about the decoder transformer feed forward during training. Let's pick an example: input data "i love the sun" traduction i want to predict (italian traduction) "io amo il sole". Now i feed the encoder with the input "i love the…
2
votes
1 answer

Why shouldn't we mask [CLS] and [SEP] in preparing inputs for a MLM?

I know that MLM is trained for predicting the index of MASK token in the vocabulary list, and I also know that [CLS] stands for the beginning of the sentence and [SEP] telling the model the end of the sentence or another sentence will come soon, but…
Jie
  • 21
  • 1
1
vote
1 answer

Dealing with high frequency tokens during masked Language modelling?

Suppose I am working with a Masked Language Model to pre-train on a specific dataset. In that dataset, most sequences have a particular token of a high frequency Sample Sequence:- , , , , , ---> here tok4 is very…
neel g
  • 207
  • 4
  • 11
1
vote
1 answer

Anonymize continuous variable for masking purposes

I am about to kick off a large hackathon event. We have a dataset that is comprised of one continuous variable with high precision, and a number of categorical variables qualifying these data 3-levels deep. Data provider wants to 'mask' the data…
HEITZ
  • 911
  • 4
  • 7
0
votes
1 answer

There could be a problem with the linear layer after the attention inside a transformer?

My question regards this image: It seems that after the multi head attention there is a linear layer as they mention also from here: the linearity is given by the weights W^{o}. my quesion is: for the decoder, doesn't this linear layer mess up…
0
votes
0 answers

Keras masking with MultiHeadAttention

I am following keras example to classify time series using transformers. Timeseries classification with a Transformer model The creation of the model is presented in the following code snippet: def transformer_encoder(inputs): # Normalization and…
0
votes
0 answers

Embedding: Can i use it in a time series problem?

I'm trying to do feature extraction in some discretized time series with a variable length, doing that i'm creating an RNN auto encoder. My main problem is to find a way to let the model train with variable length time sequences. I read in TF guides…
Nathaldien
  • 21
  • 3