Val loss initially decreases, then increases

Question

I've created an LSTM model to predict 1 output value from 8 features. My loss constantly decreases and my val loss also decreases from the start, however it begins to increase after so many epochs. Here's a picture of what's going on.

Also here is my code:

file = r'/content/drive/MyDrive/only force/only_force_pt1.csv'

df = pd.read_csv(file)

df.head()

X = df.iloc[:,1:9]
y = df.iloc[:,9]
#X.head()
print(type(X))

WIN_LEN = 5

def window_size(size, inputdata, targetdata):
  X = []
  y = []
  i = 0
  while(i + size) <= len(inputdata) - 1:
    X.append(inputdata[i: i + size])
    y.append(targetdata[i + size])
    i += 1
  assert len(X) == len(y)
  return (X, y)

X_series, y_series = window_size(WIN_LEN, X, y)

data_split = int(len(X_series)*0.8)
X_train, X_test = X_series[:data_split], X_series[data_split:]
y_train, y_test = y_series[:data_split], y_series[data_split:]

n_timesteps, n_features, n_outputs = np.array(X_train).shape[1], np.array(X_train).shape[2], 1

X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

[verbose, epochs, batch_size] = [1, 500, 32]

input_shape = (n_timesteps, n_features)

model = Sequential()
model.add(LSTM(64,input_shape = input_shape,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1))

earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 60, verbose =1, mode = 'auto')

model.summary()

model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.00005), metrics = [tf.keras.metrics.RootMeanSquaredError()])

history = model.fit(X_train, y_train, batch_size = batch_size, epochs=epochs, verbose=verbose, validation_data=(X_test, y_test), callbacks = [earlystopper],shuffle = True)

I do get much better results when I use train_test_split and shuffle my training and testing data, however that leads to major overfitting problems. Also, I'm using time series data, so I don't want to shuffle anyways.

Does anyone have any suggestions?

it's called overfitting, and there are tons of things you can do about it — , Aug 29 '22 at 11:34

buddemat · Answer 1 · 2022-09-02T18:10:06.753

1

The point where the validation loss starts to grow is where the training starts to overfit the model, i.e. it is memorizing your training data and getting worse at generalizing to new data.

There are multiple things that can be done to combat overfitting, among them:

early stopping, i.e. checking when the validation loss starts to increase and stop training / restore the best fitting model. You actually have that in your code, but a patience of 60 seems way to high imho.
adding dropout layers, which will randomly remove certain features by setting them to zero. Again, there is dropout in your code, so you may want to increase the parameter to 0.5 for droupout after the dense layers. Also check this post for a discussion on where to place the dropout layers. You could try removing the one after the LSTM and place one after the second dense layer.
adding regularization, i.e. penalizing large weights in the loss function. Your code also has L2 regularization, so you might try increasing the parameter to 0.01.
reducing the batch size may help, since larger sizes tend to have a negative impact on generalization.

There is more you can play with, like modifying your learning rate, but I suggest you generally familiarize yourself with overfitting in general and for LSTMs in particular.

edited Sep 02 '22 at 18:10

answered Sep 02 '22 at 17:39

buddemat

138
7

I implemented your suggestions and resulted in much better looking loss curves, thank you for the suggestions! Loss and val loss pretty much followed each other until early stopping occurred (after 15 epochs). However, the losses tend to plateau at a certain value, around 0.014 on a scaled from 0-1, and doesn't perform very well for both training and testing. Training and testing values are pretty much neck and neck and both RMSE values are close, so I feel like the over fitting problem has settled down. But, do you have suggestions for getting loss down in general? – ahy Sep 04 '22 at 04:59
Glad to be of help. If you found my answer useful, feel free to accept and/or upvote it. Concerning your follow-up question, it is often the case that the loss will plateau or even rise again at some point. How is your accuracy developing in parallel? In general, it's difficult giving recommendations on what to try without knowing your data and problem better... – buddemat Sep 05 '22 at 21:38
It's a regression model so I'm not measuring accuracy. Here's the link to the data I'm using. You really only need to look at the files that include _test in the names: https://drive.google.com/drive/folders/1HaFk8dURQRUXmZ2J3kXCcujKKcEqh2pJ?usp=sharing. This data is not all continuous though. The data I had collected had correlation issues because I was using a force plate that only collected data at certain points in the whole trial. So, the data I'm using is spliced up at the points where the force plate collected the data. – ahy Sep 05 '22 at 23:32

score 0 · Answer 2 · answered Sep 02 '22 at 21:15

Overfitting in recurrent NN has been an unique issue for a little while now. Aside from what's already been posted, the simplest thing you can try is increase the drop out of the LSTM layer to closer to 0.5. (https://arxiv.org/abs/1512.05287, https://arxiv.org/abs/1409.2329)

It looks like you are working with time series, so you can also try to split the data in a rolling origin fashion. ScikitLearn has an implementation of it here And I think not shuffling is the right idea. Good luck!

Val loss initially decreases, then increases

2 Answers2