1

I have read this paper where authors use LSTM to learn the attention applied to several sets. They use LSTM without input or output, LSTM just uses the hidden state and evolves it:

enter image description here

My question is what is the motivation of using LSTM without any input? Doesn't it defies the purpose of LSTM if there are no inputs?

so initially LSTM uses random hidden state, with that hidden state (2) they compute the dot product with the vector/embeddings in the memory, similarity to yield a scalar (3) and use that calculate attention readout with respect to given vector/embedding (5) and concatenate that attention readout with the hidden state (6) and feed it back to LSTM (2).

Is concatenating the attention readout $r_t$ with hidden state $q^*_t$ will eventually make $r_t$ and $q^*_t$ same? or what is the motivation of using LSTM in this way?

Their indirect explanation doesnt makes sense as recurrent step is only based on hidden context which without any concrete input has no logical meaning. They refer to each recurrent step as processing step:

We can see that for each processing step, the attention mechanism has access to the original inputs. It can then refine its choices as to which inputs matter, conditioned on the information from the previous steps.

Oculu
  • 11
  • 2
  • LSTM can be thought of a means to add information and remove information using gates given some input $x$ , but since you only have context-vector $h$ in this case $q^*_t$ I am not sure how add or subtract is orchestrated. Perhaps attention is orchestrating add/subtract however the objective doesnt seem direct! – user0193 Nov 13 '22 at 06:33
  • @user0193 This is why I am concern as why use recurrence! – Oculu Nov 13 '22 at 14:16

0 Answers0