I have read this paper where authors use LSTM to learn the attention applied to several sets. They use LSTM without input or output, LSTM just uses the hidden state and evolves it:
My question is what is the motivation of using LSTM without any input? Doesn't it defies the purpose of LSTM if there are no inputs?
so initially LSTM uses random hidden state, with that hidden state (2) they compute the dot product with the vector/embeddings in the memory, similarity to yield a scalar (3) and use that calculate attention readout with respect to given vector/embedding (5) and concatenate that attention readout with the hidden state (6) and feed it back to LSTM (2).
Is concatenating the attention readout $r_t$ with hidden state $q^*_t$ will eventually make $r_t$ and $q^*_t$ same? or what is the motivation of using LSTM in this way?
Their indirect explanation doesnt makes sense as recurrent step is only based on hidden context which without any concrete input has no logical meaning. They refer to each recurrent step as processing step:
We can see that for each processing step, the attention mechanism has access to the original inputs. It can then refine its choices as to which inputs matter, conditioned on the information from the previous steps.
