Predict a sequence given many sequences

Question

I'm trying to find an algorithm that would fit this use case:

My data: a bunch of fixed-size integer arrays, e.g.

[0,2,3,4,5]
[1,2,3,1,5]
[4,1,2,4,5]
...

Input: an array of integers, and output: prediction for the rest of the array, e.g.

[0,1] -> [2,4,2]
[3] -> [1,3,2,5,5]
[3,2,4,1] -> [4]

This is really too broad a question to answer without some knowledge of the domain (ie how the data was constructed). For example, is there significance to the position in the array - is the probability of 2 following an initial zero related to the possibility of 2 after a zero in second place? — Josh Friedlander, Apr 02 '20 at 21:18
If so, this sounds parallel to next word/ next n-gram models in NLP, except with a vocab of only 10 words. HMMs and LSTMs could work for this. — Josh Friedlander, Apr 02 '20 at 21:20
If so, do you expect the same performance for each test example? — Donald S, Jun 14 '20 at 05:05

score 1 · Answer 1 · answered Jun 19 '20 at 09:32

So, the question is how to model the following problem: input a sub section of a sequence, then output the rest of the sequence. This is a sequence-to-sequence problem.

For this, as Derek O has eluded to, I would suggest a encoder-decoder architecture. In the encoder, you would encode your input sequence (using RNN/LSTM), which will give you a "hidden representation", then pass your hidden representation through a decoder (RNN/LSTM) to then decode the hidden representation into the resining sequence of integers.

Here is an article that provides more detail around encoder-decoder models: https://towardsdatascience.com/understanding-encoder-decoder-sequence-to-sequence-model-679e04af4346

score 0 · Answer 2 · answered Apr 03 '20 at 20:31

0

It depends on what these numbers represent.

If there is sequential / time dependence (i.e. the subsequent outputs depend on previous inputs) I would suggest an LSTM-RNN.

answered Apr 03 '20 at 20:31

Derek O

354
1
10

Predict a sequence given many sequences

2 Answers2