Multivariate and multi-series LSTM

Question

I am trying to create a pollution prediction LSTM. I've seen an example on the web to cater for a Multivariate LSTM to predict the pollution levels for one city (Beijing), but what about more than one city? I don't really want a separate network for every city, I'd like a single generalised model/network for all x cities. But how do I feed that data into the LSTM?

Say I have the same data for each city, do I...

1) Train on all data for one city, then the next city, and so on until all cities are done.

2) Train data for all cities on date t, then data for all cities on t+1, then t+2 etc.

3) Something completely different.

Any thoughts?

score 2 · Accepted Answer · answered Jun 20 '18 at 08:07

2

I would say that option 1 will not work out too well: in my experience, the model will either only be good for the first or last model you train, depending on how much freedom you give the algorithm to change weights as time goes on (e.g. with the learning rate).

You really need to decide what you are going to be predicting. Is it the pollution level for a single city: Which features do you have for each city?

It could can make sense to train all cities at the same time if the features you have are also general ones that really can explain the target variable. So if you have temperature, humidity, some transport statistics for that city etc. then training everything together could make sense.

I would think about each sample leading to one target pollution level, and if that sample has enough information (based on the features) to distinguish itself from samples of the other cities, the model should pick up on and leverage those subtleties in the data.

answered Jun 20 '18 at 08:07

n1k31t4

14,663
2
28
49

Good point, I can understand why the model would be more biased to more recent data, because the weights are adjusted based on the loss of the data that presented in an epoch, and the epoch would only consist of the data of the current city. So I need a way where 1 epoch contains *all* of the data (all cities for all days) so the loss is calculated more accurately. So I could load all data to memory, and in my batch generator I iterate through each city and create sequences from all data for that one city before progressing to the next. Not mini-batch, but I don't know how else to do it. – BigBadMe Jun 20 '18 at 10:39
Actually, I guess I could make it mini-batch / more stochastic by only creating sequences for some of the data for each city. In the next epoch I'd get different sequences for that city, and so on. That would make it mini-batch and faster to train per epoch. – BigBadMe Jun 20 '18 at 10:46
1

I would create each single batch from a mixture of samples from each city. Regarding your first comment, the model could also be biased to the first data it saw, if you e.g. reduced the learning rate significantly before the model saw the rest of that data. This is because the later epochs wouldn't be able to change the weights much, given a tiny learning rate. – n1k31t4 Jun 20 '18 at 10:52
Great points. Many thanks for your input with this. – BigBadMe Jun 20 '18 at 10:54

score 1 · Answer 2 · answered Jun 20 '18 at 08:17

I can think of two alternatives:

Multiple Inputs / Multiple Outputs model: If the cities are close to each other, the pollution in one of them might affect the pollution in the other one. In this case, it makes sense to check for mutual pollution by having the measurements of each city as a separate input to your LSTM-RNN. You can train your network with these time series and during testing you will insert the pollution of each city at time t and the network can predict the pollution of all cities at t+n (n is the arbitrary horizon; the longer it is, the lower the accuracy of the prediction).
Single Input / Single Output Model: Another way will be to use all the training data of every city to create a single-input network, by assuming that all the data come from the same source (or at least similar sources). This network is trained to output the t+n pollution prediction given the pollution at t for any city. But this implies that your network can generalize well. In order to generalize with your LSTM-RNN, you should consider adding Dropout during training. See this and this.

Thanks for the suggestion. What do you think to my idea in my comment to n1k31t4? — BigBadMe, Jun 20 '18 at 10:49

Multivariate and multi-series LSTM

2 Answers2

Linked