0

Are there any standard or recommended SOS (start of sequence) and EOS (end of sequence) tokens for seq2seq encoding using RNN/LSTM/Transformers applied to real-valued and/or complex-valued 1D signals (with few samples, i.e. <100 samples per input signal to be encoded) ? The encoding being generated for signal discrimination, in other words we are generating fixed-size 1D signal embeddings for classification by an arbitrary discrimination method afterwards.

It may be that SOS and EOS are not necessary for fixed-sized signal reconstruction by a seq2seq, i.e. EOS is necessary when the seq2seq is applied to translation, where the task is also about learning to predict the translated sentence size / choosing when to stop the sentence, and where being at the end of a sentence can be key to extract tone etc. In contrast, fixed-sized signal reconstruction avoids the need to predict the output sequence size. Assuming the frequency content is constant over the scale of an input sample and constitutes the discriminative information we're interested in, it does not really matter where in the sequence the information is found. I would still gladly receive indications on references developing such intuitions.

This datascience.SE answer does get close to my question without answering it. More interestingly, the unique comment right below it partially asks what I'm writing about here:

Just to follow up, will the encoder EOS token be necessary if the input sequence is padded and has the same length? I have seen examples where input sequence doesn't have the EOS token and it still works.

I have a hard time finding good references and examples for this. I know there are wav2vec (with convolutions) and wav2vec 2.0 (with convolutions and transformer), but I’m not sure such approaches are adapted for short signals never exceeding say a hundred samples. I also had a look at articles, such as “LSTM-based auto-encoder model for ECG arrhythmias classification”, without finding a proper explanation for the choice of SOS and EOS values. Perhaps I missed a typical case (eg. for a real-valued signal in [0,1], SOS could be 2 and EOS -1 ?).

I would also definitely be interested in a more elaborate encoding reference where the encoding network takes into account a sampling frequency diversity in the input signals, in addition to clearly stating which SOS and EOS are used and why.

Disclaimer: this is a slightly modified version of my unanswered question on the PyTorch discuss forum, since I was getting no answer I googled a little more, found seq2seq questions on this SE site and decided to post a more informed version of my question here.

Blupon
  • 101
  • 1

0 Answers0