Has anyone seen this model's implementation using Keras?
inb4: tensorflow, pytorch
Has anyone seen this model's implementation using Keras?
inb4: tensorflow, pytorch
Update for anyone googling this in 2021: Keras has implemented a MultiHead attention layer. If key, query, and value are the same, this is self-attention.
One example from Kaggle is available.