2

I am using Keras for time series forecasting and I am trying to understand the tutorial on the offical site of keras about time series forecasting that you can find here (https://keras.io/examples/timeseries/timeseries_weather_forecasting/).

They use one keras-method called keras.preprocessing.timeseries_dataset_from_array and it has the following parameters (here is a documentation https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/timeseries_dataset_from_array):

    dataset_train = keras.preprocessing.timeseries_dataset_from_array(
    x_train,
    y_train,
    sequence_length=sequence_length,
    sampling_rate=step,
    batch_size=batch_size,
)

So my question is what is the difference between the sequence length and the batch size. I think the sequence is the size of the sliding window (x-features and one target y-value). But what is the batch size? Unfortunately I can't have a look at the output of this method as

   print(dataset_train) or 
   print(dataset_train.head()) 

does not show me the data and I do not know any other function how I could have a look at the output of the method.

Has anyone of you had expercience with this method or generally with sequences and batches? I'd appreciate every comment.

Valentin Calomme
  • 5,396
  • 3
  • 20
  • 49
PeterBe
  • 71
  • 2
  • 10
  • May I ask why you deleted ncasas' answer? I read it and it fully answers your question in more length than most would have cared to provide. – Valentin Calomme Jan 11 '21 at 21:09
  • 1
    Hi Valentin. I personally did not delete ncasas answer or at least I had no intention of doing this. Maybe I pressed the wrong button (without knowing it) or maybe someone else (maybe himself) deleted the answer. – PeterBe Jan 12 '21 at 15:58
  • Oh, it's my mistake. It said it was deleted by the post author. I assumed it was you since this is your question. But I guess it must mean it was him. – Valentin Calomme Jan 12 '21 at 16:26

1 Answers1

2

Let's take a TS data = [ 1, 2, 3, 4, 5, 7, 8, 9, 10 ]
Call the function with these parameters
sequence_length=5, sampling_rate=1, sequence_stride=1, shuffle=False, batch_size=2

shuffle, batch_size has no role in TS data creation. It will come into effect when you iterate on the returned Dataset.

In this case, we will have the following data points,
[ 1, 2, 3, 4, 5 ]
[ 2, 3, 4, 5, 6 ]
[ 3, 4, 5, 6, 7 ]
[ 4, 5, 6, 7, 8 ]
[ 5, 6, 7, 8, 9 ]
[ 6, 7, 8, 9, 10 ]

batch_size
When you will iterate on this dataset, you will receive 2 records in each iteration.
If shuffle=True, records will be shuffled before batching.

for batch in dataset:
  inputs, targets = batch

In the above snippet, inputs will be a batch of records, not just one record. You may have the batch_size=1 if required.

targets

Targets corresponding to timesteps in data. It should have same length as data. targets[i] should be the target corresponding to the window that starts at index i (see example 2 below). Pass None if you don't have target data (in this case the dataset will only yield the input data)

This is a general-purpose function.
It is not deciding the Target on some logic i.e. Autoregressive approach. It expects that the targets will be provided, otherwise, it will just return the Predictors.

10xAI
  • 5,454
  • 2
  • 8
  • 24
  • Thanks 10xAI for your answer (I upvoted it). Altough I understand some parts of it I still have several questions: 1)Why has the batch_size has no role in TS data creation. In the tutorial from keras they choose a number (256). If the batch_size does not have any role in time series, you could just use any random number (which I doubt). 2) What do you mean by saying "When you will iterate on this dataset, you will receive 2 records". Why will just just receice 2 records? In your example we have 6 subsequences hence I should receive 6 records – PeterBe Jan 12 '21 at 16:48
  • 3) What about the labels in your example? How do you have to provide them? The method "dataset_train = keras.preprocessing.timeseries_dataset_from_array" needs these target values. How can I link the targets to certain inputs? 4) In your code snippet you assign one batch to both the inputs and the targets. Does this mean that a batch includes both input data and its labels? I'd appreciate any further comment from you. – PeterBe Jan 12 '21 at 16:55
  • No, I didn't mean that i.e. these are not important, I meant this function will not take care of all these. This is just a Facilitator. You should know what will be the best batch_size, sequence step, all other Hyperparameters. On your second question, yes you will receive 6 records but after 3 Iteration. – 10xAI Jan 12 '21 at 16:58
  • Thanks 10xAI for your comment. What do you mean by 6 records after 3 iteration? What is an iteration is this context? And what about my 3) and 4) question. – PeterBe Jan 12 '21 at 17:27
  • Any further comments on my questions? I'd highly appreciate it – PeterBe Jan 13 '21 at 13:43
  • 4. If you pass the target, then the batch will be a tuple of X, Y. Otherwise just X. `inputs= batch` – 10xAI Jan 13 '21 at 14:15
  • 3. Label - You will have to engineer it. e.g - Let's say the last element is my Target in each sequence, so may split the X into X, Y, etc. inside the loop i.e. `x=x[:-1], y=x[-1]` – 10xAI Jan 13 '21 at 14:18
  • Iteration - It is the passing of one batch of data through a Model during training with Gradient Descent(Or any such approach). – 10xAI Jan 13 '21 at 14:43