How many LSTM cells should I use?

Question

Are there any rules of thumb (or actual rules) pertaining to the minimum, maximum and "reasonable" amount of LSTM cells I should use? Specifically I am relating to BasicLSTMCell from TensorFlow and num_units property.

Please assume that I have a classification problem defined by:

t - number of time steps
n - length of input vector in each time step
m - length of output vector (number of classes)
i - number of training examples

Is it true, for example, that the number of training examples should be larger than:

4*((n+1)*m + m*m)*c

where c is number of cells? I based this on this: How to calculate the number of parameters of an LSTM network? As I understand, this should give the total number of parameters, which should be less than number of training examples.

I'd check out this paper which nicely addresses the topic of comparing sequential deep-learning models as well as hyperparameter tuning: https://arxiv.org/pdf/1503.04069.pdf In summary they suggest the obvious, that increasing the number of LSTM blocks per hidden layer improves performance but has diminishing returns & increases training time. — CubeBot88, Jun 10 '19 at 12:31

score 4 · Answer 1 · edited Dec 11 '17 at 15:47

The minimum number of training examples is what you have up there:

$$4(nm + n^2)$$

For more information refer to this article: Refer to this link if you needed some visual help: Number of parameters in an LSTM model

The number of units in each layer of the stack can vary. For example in translate.py from Tensorflow it can be configured to 1024, 512 or virtually any number. The best range can be found via cross validation. But I have seen both 1000 and 500 number of units in each layer of the stack. I personally have tested with smaller numbers as well.

How many LSTM cells should I use?

1 Answers1