0

I have the following code for time series predictions with RNNs and I would like to know whether for the testing I predict one day in advance:

# -*- coding: utf-8 -*-
"""
Time Series Prediction with  RNN

"""
import pandas as pd
import numpy as np
from tensorflow import keras


#%%  Configure parameters

epochs = 5
batch_size = 50

steps_backwards = int(1* 4 * 24)
steps_forward = int(1* 4 * 24)

split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90


#%%  "Reading the data"

dataset = pd.read_csv('C:/User1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

df = dataset
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:2]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)

#%%   Prepare the input data for the RNN

series_reshaped_X =  np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y =  np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])


timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] 
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] 
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards] 


indexWithYLabelsInSeriesReshapedY = 0
lengthOfTheYData = len(data_Y)-steps_backwards -steps_forward
Y = np.empty((lengthOfTheYData, steps_backwards, steps_forward))  
for step_ahead in range(1, steps_forward + 1):     
   Y[..., step_ahead - 1] =   series_reshaped_Y[..., step_ahead:step_ahead + steps_backwards, indexWithYLabelsInSeriesReshapedY]
 
Y_train = Y[:timeslot_x_train_end] 
Y_valid = Y[timeslot_x_train_end:timeslot_x_valid_end] 
Y_test = Y[timeslot_x_valid_end:]


#%%  Build the model and train it

model = keras.models.Sequential([
    keras.layers.SimpleRNN(90, return_sequences=True, input_shape=[None, 2]),
    keras.layers.SimpleRNN(60, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(steps_forward))
    #keras.layers.Dense(steps_forward)
])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size,
                    validation_data=(X_valid, Y_valid))


#%%    #Predict the test data
Y_pred = model.predict(X_test)

prediction_lastValues_list=[]

for i in range (0, len(Y_pred)):
  prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

#%% Create thw dataframe for the whole data

wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,0]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100

So I define eps_forward = int(1* 4 * 24) which is basically one full day (in 15 minutes resolution which makes 1 * 4 *24 = 96 time stamps). I predict the test data by using Y_pred = model.predict(X_test) and I create a list with the predicted values by using for i in range (0, len(Y_pred)): prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

As for me the input and output data of RNNs is quite confusing I am not sure whether for the test dataset I predict one day in advance meaning 96 time steps into the future. Actually what I want is to read historic data and then predict the next 96 time steps based on the historic 96 time steps. Can anyone of you tell me whether I am doing this by using this code or not?

Here I have a link to some test data that I just created randomly. Do not care about the actual values but just on the structure of the prediction: Download Test Data

Reminder: My bountry is expiring soon and I have not received an answer to my basic question so far. I have uploaded a minimal reproducible example and even some test data. So I'd be quite happy if you could answer my basic question on whether I am forecasting 96 steps in advance with the given code. I'll highly appreciate it. If you need some further information, you can tell me.

PeterBe
  • 71
  • 2
  • 10

2 Answers2

1

Usually with NN you would use LSTM layers to deal with time. Time steps can be a little confusing with TF/Keras. However, there is a great tutorial using the Jena data. Maybe this helps: https://blogs.rstudio.com/ai/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networks/

Peter
  • 7,277
  • 5
  • 18
  • 47
  • Thanks for your comment Peter. Part of my question was also about my used code and if I am predicting 96 time steps in advance for every time step. Did you have a look at my code and can you tell me something about my question? – PeterBe Oct 25 '21 at 09:49
  • I think this particular question would be better suited for SO since it is a programming issue. Since I‘m not so much into timeseries with Keras, I can give you no clear answer to this question. – Peter Oct 25 '21 at 16:42
  • Thank you for your comment Peter. I posted this question here because I think it is data science related and quite specific for RNNs which is strongly used in Data Science. – PeterBe Oct 26 '21 at 07:17
0

Your code implements timeseries split from scratch. Implementing from scratch has the potential to introduce subtle bugs. Another option would be to use an established package. Examples include scikit-learn's TimeSeriesSplit and Keras' TimeseriesGenerator.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
  • Thanks Brian for your comment. Actually I tried to use the keras package for timeseries split but I had very big problems undestanding it (see for example this question I asked some time ago https://datascience.stackexchange.com/questions/86857/difference-between-sequence-length-and-batch-size-in-time-series-forecasting). But apart from that I would still like to know if I am forecasting 96 steps in advance for at every time step with the code posted above and if not, how can I modify the code such that it can. Do you have any suggestions on this issue? – PeterBe Oct 26 '21 at 07:15
  • Any comments on my last comment? I'll highly appreciate every further comment from you. – PeterBe Oct 29 '21 at 10:04
  • Any further comments Brian? I'll be quite thankful for further comments from you. – PeterBe Nov 05 '21 at 08:19