I have a binary classification task for time series data. Every 14 rows in my CSV is relevant to one time slot. How should I prepare this data to be used in LSTM? In other word how to feed the model with this data?
-
Can you be more specific about what variables the rows and columns of your CSV represent? In particular, are there only 14 features for each time step (equivalently, is there only one column in your CSV)? – liangjy Mar 29 '17 at 01:11
-
For each time step (every 14 rows in csv) I have 12 features and the task is binary classification.How should I load this data to LSTM?So the number of column is 12 – Kaggle Mar 29 '17 at 08:48
3 Answers
I hope that dataset also consist of meta data, which means you also need to have a one to one mapping of those tuples, eg. dog > good, cat > bad, kittens > bad, puppies > good, etc.
Separate the data into X:training_data, Y:label. Then use a vectorizer and train using X, Y. If you're able to do above steps then use methods like test_train set , cross_folds etc.
Friendly suggestion: Try seq2seq layers before LSTM (they require more resources).
Although I'm not sure about this statement "Every 14 rows in my CSV is relevant to one time slot.", as it's not cleared to me.
But if I go with your comment "How should I load this data to LSTM?So the number of column is 12 ", what I believe that you are asking how to load multiple features(in your case 12) for a time series model.
If my understanding is correct its a problem of type "Multiple Parallel Timeseries". I have created a similar model in Tensorflow and pushed in github. Github Source Code for Multiple Parallel TimeSeries
Note: Here instead of 12 features, I have used 3 features.
- 1,252
- 9
- 17
Here is the pseudo code for this:
Import pandas as pd
Import numpy as np
Data = pd.read_csv(filename)
Lag = 14
#assuming target column is last one
X=[ ]
Y = [ ]
for x in range(lag, len(data)):
X.append(data.iloc[x-lag:x,:])
Y.append(data.iloc[x,-1])
X= np.array(X)
Y = np.aaray(Y)