Keras masking with MultiHeadAttention

Question

I am following keras example to classify time series using transformers. Timeseries classification with a Transformer model The creation of the model is presented in the following code snippet:

def transformer_encoder(inputs):
   # Normalization and Attention
   x = layers.LayerNormalization(epsilon=1e-6)(inputs)
   x = layers.MultiHeadAttention(key_dim=256, num_heads=4, dropout=0.25)(x, x)
   x = layers.Dropout(0.25)(x)
   res = x + inputs

   # Feed Forward Part
   x = layers.LayerNormalization(epsilon=1e-6)(res)
   x = layers.Conv1D(filters=4, kernel_size=1, activation="relu")(x)
   x = layers.Dropout(0.25)(x)
   x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
   return x + res

def build_model(input_shape,num_transformer_block):
   inputs = keras.Input(shape=input_shape)
   x = inputs
   for _ in range(num_transformer_blocks):
       x = transformer_encoder(x)

   x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
   x = layers.Dense(128, activation="relu")(x)
   x = layers.Dropout(0.4)(x)
   outputs = layers.Dense(n_classes, activation="softmax")(x)
   return keras.Model(inputs, outputs)

I am using a different dataset and the shape of my data is different too. Also my data is not in fixed length, so I am trying to add masking to my model to ignore the missing time steps.

As for now I tried few options but none of them worked. I tried to add a Masking layer after the input layer:

def build_model(input_shape,num_transformer_block):
  inputs = keras.Input(shape=input_shape)
  x = Masking(input)
  ...

And I also tried to compute the masking of the data manually and pass it as the attention_mask argument in the MultiHeadAttention

To verify that the masking didn't succeed I replaced all the values in my data set to a constant number (e.g 500) and the model could still classify the data to the correct classes, I think it learned the length of the padding.

The shape of my data is with padding is (15, 7)

How do I apply the padding on this model correctly?

Keras masking with MultiHeadAttention

0 Answers0