Keras Sequential model returns loss 'nan'

Question

I'm implementing a neural network with Keras, but the Sequential model returns nan as loss value. I have sigmoid activation function in the output layer to squeeze output between 0 and 1, but maybe doesn't work properly.

This is the code:

def data_generator(batch_count, training_dataset, training_dataset_labels):
  while True:
    start_range = 0
    for batch in batch_count:
      end_range = (start_range + batch[1])
      batch_dataset = training_dataset[start_range:end_range]
      batch_labels = training_dataset_labels[start_range:end_range]
      start_range = end_range
      yield batch_dataset, batch_dataset

mlp = keras.models.Sequential()

# add input layer
mlp.add(
    keras.layers.Input(
        shape = (training_dataset.shape[1], )
    )
)
# add hidden layer
mlp.add(
    keras.layers.Dense(
        units=training_dataset.shape[1] + 10,
        input_shape = (training_dataset.shape[1] + 10,),
        kernel_initializer='random_uniform',
        bias_initializer='zeros',
        activation='relu')
    )
# add output layer
mlp.add(
    keras.layers.Dense(
        units=1,
        input_shape = (1, ),
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros',
        activation='sigmoid')
    )

print('Compiling model...\n')

mlp.compile(
    optimizer='adam',
    loss=listnet_loss
)

mlp.summary() # print model settings


# Training
with tf.device('/GPU:0'):
  print('Start training')
  #mlp.fit(training_dataset, training_dataset_labels, epochs=50, verbose=2, batch_size=3, workers=10)
  mlp.fit_generator(data_generator(groups_id_count, training_dataset, training_dataset_labels),
                    steps_per_epoch=len(training_dataset), epochs=50, verbose=2, workers=10, use_multiprocessing=True)

How can I do?

@lcrmorin I’m pretty sure that my dataset doesn’t contain nan elements. However, I notice that the loss turn to nan when I changed training method: I was using only fit and the loss wasn’t nan, now I’m using fit_generator and it’s nan. — pairon, Feb 19 '20 at 14:01
@Sharan @Icrmorin, another thing that I notice is that with ```fit_generator()```the training go slower compared with use of ```fit()```. The batch size with ```fit()```was 3. — pairon, Feb 19 '20 at 15:01
@Sharan @Icrmorin Maybe I solved using a generator extending ```Sequence```. However I have small batch, so the ``ETA``` in the ```Keras```progress bar indicates that are necessary 25 minutes to perform one epoch. Is it because of the batch size? — pairon, Feb 19 '20 at 17:08

score 21 · Answer 1 · edited Aug 25 '22 at 04:11

21

To sum up the different solutions from both stackOverflow and github, which would depend of course on your particular situation:

Check validity of inputs (no NaNs or sometimes 0s). i.e df.isnull().any()
- Some float encoders (e.g. StandardScaler) allow use of NaN
Add regularization to add l1 or l2 penalties to the weights. Otherwise, try a smaller l2 reg. i.e l2(0.001), or remove it if already exists.
Try a smaller Dropout rate.
Clip the gradients to prevent their explosion. For instance in Keras you could use clipnorm=1. or clipvalue=1. as parameters for your optimizer.
Replace optimizer with Adam which is easier to handle. Sometimes also replacing sgd with rmsprop would help.
Use RMSProp with heavy regularization to prevent gradient explosion.
Try normalizing your data, or inspect your normalization process for any bad values introduced.
Verify that you are using the right activation function (e.g. using a softmax instead of sigmoid for multiple class classification).
Try to increase the batch size (e.g. 32 to 64 or 128) to increase the stability of your optimization.
Check the size of your last batch which may be different from the batch size.

edited Aug 25 '22 at 04:11

Kermit

519
5
16

answered Apr 18 '20 at 04:25

Othmane

351
1
4

This is what I got for first 3 epoches after I replaced relu with tanh (high loss!): Epoch 1/10 1/1 - 9s - loss: 91189.1953 Epoch 2/10 1/1 - 0s - loss: 91176.1953 Epoch 3/10 1/1 - 0s - loss: 91164.1172 ... When I deleted 0s and 1s from my each row, the results got better loss around 0.9. But deleting those values is not a good idea since those values mean off and on of switches. Any idea about that please? – Avv Jul 09 '21 at 03:32
Thank you very much! It works, but should I add such a regulaizer to every layer given I have LSTM autoencoder structure please? I added it to every layer and loss still around 0.9 for my model. I don't know why is that please? – Avv Jul 09 '21 at 03:37
1

If batch size fixes your problem, you may have a naive normalization function that doesn't account for zero-division if there's 0-variance in a batch. `z = (value - mean) / (std + 1E-7)` or any other small value should actually fix the root cause, whereas changing the batch size just makes it less likely to occur. +1 for this being the most comprehensive answer out of about a dozen of these questions. So many answers amount to "I changed X to y and it worked!", which, 9/10 times aren't addressing the real problem (which could be any of these and more). – Brendano257 Jul 20 '21 at 17:00

score 2 · Answer 2 · answered Mar 13 '20 at 17:15

2

A similar problem was reported here: Loss being outputed as nan in keras RNN. In that case, there were exploding gradients due to incorrect normalisation of values.

answered Mar 13 '20 at 17:15

Vincent Yong

422
3
6

score 0 · Answer 3 · answered Sep 27 '21 at 17:20

If you found this via Google and use keras.preprocessing.sequence.pad_sequences to pad sequences to train RNNs:

Make sure that keras.preprocessing.sequence.pad_sequences() does not have the argument value=None but either value=0.0 or some other number that does not occur in your normal data.

Keras Sequential model returns loss 'nan'

3 Answers3

Linked

Related