Questions tagged [encoder]

27 questions
3
votes
2 answers

What is the difference between BERT architecture and vanilla Transformer architecture

I'm doing some research for the summarization task and found out BERT is derived from the Transformer model. In every blog about BERT that I have read, they focus on explaining what is a bidirectional encoder, So, I think this is what made BERT…
2
votes
1 answer

Why transform embedding dimension in sin-cos positional encoding?

Positional encoding using sine-cosine functions is often used in transformer models. Assume that $X \in R^{l\times d}$ is the embedding of an example, where $l$ is the sequence length and $d$ is the embedding size. This positional encoding layer…
kyc12
  • 155
  • 1
  • 5
2
votes
2 answers

Role of decoder in Transformer?

I understand the mechanics of Encoder-Decoder architecture used in the Attention Is All You Need paper. My question is more high level about the role of the decoder. Say we have a sentence translation task: Je suis ètudiant -> I am a student The…
kyc12
  • 155
  • 1
  • 5
2
votes
1 answer

Encoding correlation

I have rather theory-based question as I'm not that experienced in encoders, embeddings etc. Scientifically I'm mostly oriented around novel evolutionary model-based methods. Let's assume we have data set with highly correlated attributes. Usually…
Piotr Rarus
  • 814
  • 4
  • 15
2
votes
1 answer

What to do with Transformer Encoder output?

I'm in the middle of learning about Transformer layers, and I feel like I've got enough of the general idea behind them to be dangerous. I'm designing a neural network and my team would like to include them, but we're unsure how to proceed with the…
Rstan
  • 23
  • 2
2
votes
2 answers

Is it vital to do label encoding with target variable

Should I always use label encoding while doing binary classification?
1
vote
1 answer

Encode time-series of different lengths with keras

I have time-series as my data (one time-series per training example). I would like to encode the data within these series in a fixed-length vector of features using a keras model. The problem is that my different examples' time-series don't have the…
Contestosis
  • 171
  • 1
  • 6
1
vote
1 answer

How to add a Decoder & Attention Layer to Bidirectional Encoder with tensorflow 2.0

I am a beginner in machine learning and I'm trying to create a spelling correction model that spell checks for a small amount of vocab (approximately 1000 phrases). Currently, I am refering to the tensorflow 2.0 tutorials for 1. NMT with Attention,…
Dom
  • 11
  • 2
1
vote
1 answer

sklearn serialize label encoder for multiple categorical columns

I have a model with several categorical features that need to be converted to numeric format. I am using a combination of LabelEncoder and OneHotEncoder to achieve this. Once in production, I need to apply the same encoding to new incoming data…
1
vote
1 answer

How do I implement Dual-encoder model in Pytorch?

I am trying to implement the paper titled Learning Cross-lingual Sentence Representations via a Multi-task Dual-Encoder Model. Here the encoder and decoder share the same weights but I am unable to put it in code. Any links ?
gaurus
  • 341
  • 1
  • 2
  • 5
1
vote
2 answers

What does the output of an encoder in encoder-decoder model represent?

So in most blogs or books touching upon the topic of encoder-decoder architectures the authors usually say that the last hidden state(s) of the encoder is passed as input to the decoder and the encoder output is discarded. They skim over that topic…
Marek M.
  • 63
  • 5
1
vote
1 answer

Encode categorical data for unsupervised learning

What is the best encoder for categorical data in unsupervised learning? I am using unsupervised learning on mixed data (such as K-means). Before running my unsupervised algorithm, I am using dimension reduction of my data using FAMD (PCA for mixed…
1
vote
0 answers

Motivation of LSTM with no Input

I have read this paper where authors use LSTM to learn the attention applied to several sets. They use LSTM without input or output, LSTM just uses the hidden state and evolves it: My question is what is the motivation of using LSTM without any…
Oculu
  • 11
  • 2
1
vote
1 answer

How to Visualize attention weights in a Attention based Encoder-Decoder network in Time series forecasting

Below is one example Attention-based Encoder-decoder network for multivariate time series forecasting task. I want to visualize the attention weights. input_ = Input(shape=(TIME_STEPS,N)) x = attention_block(input_) x = LSTM(512,…
1
vote
1 answer

Get Hidden Layers in PyTorch TransformerEncoder

I am trying to access the hidden layers when using TransformerEncoder and TransformerEncoderLayer. I could not find anything like that in the source code for these classes. I am not using hugging face but I know one can get hidden_states and…
1
2