14

I'm implementing a neural network with Keras, but the Sequential model returns nan as loss value. I have sigmoid activation function in the output layer to squeeze output between 0 and 1, but maybe doesn't work properly.

This is the code:

def data_generator(batch_count, training_dataset, training_dataset_labels):
  while True:
    start_range = 0
    for batch in batch_count:
      end_range = (start_range + batch[1])
      batch_dataset = training_dataset[start_range:end_range]
      batch_labels = training_dataset_labels[start_range:end_range]
      start_range = end_range
      yield batch_dataset, batch_dataset

mlp = keras.models.Sequential()

# add input layer
mlp.add(
    keras.layers.Input(
        shape = (training_dataset.shape[1], )
    )
)
# add hidden layer
mlp.add(
    keras.layers.Dense(
        units=training_dataset.shape[1] + 10,
        input_shape = (training_dataset.shape[1] + 10,),
        kernel_initializer='random_uniform',
        bias_initializer='zeros',
        activation='relu')
    )
# add output layer
mlp.add(
    keras.layers.Dense(
        units=1,
        input_shape = (1, ),
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros',
        activation='sigmoid')
    )

print('Compiling model...\n')

mlp.compile(
    optimizer='adam',
    loss=listnet_loss
)

mlp.summary() # print model settings


# Training
with tf.device('/GPU:0'):
  print('Start training')
  #mlp.fit(training_dataset, training_dataset_labels, epochs=50, verbose=2, batch_size=3, workers=10)
  mlp.fit_generator(data_generator(groups_id_count, training_dataset, training_dataset_labels),
                    steps_per_epoch=len(training_dataset), epochs=50, verbose=2, workers=10, use_multiprocessing=True)

How can I do?

pairon
  • 395
  • 1
  • 3
  • 15
  • have you checked for nan ion your data set ? – Lucas Morin Feb 19 '20 at 13:24
  • For how many epochs did you train and see? – Sharan Feb 19 '20 at 13:33
  • 1
    @lcrmorin I’m pretty sure that my dataset doesn’t contain nan elements. However, I notice that the loss turn to nan when I changed training method: I was using only fit and the loss wasn’t nan, now I’m using fit_generator and it’s nan. – pairon Feb 19 '20 at 14:01
  • @Sharan for 10 epochs. – pairon Feb 19 '20 at 14:01
  • @Sharan @Icrmorin, another thing that I notice is that with ```fit_generator()```the training go slower compared with use of ```fit()```. The batch size with ```fit()```was 3. – pairon Feb 19 '20 at 15:01
  • @Sharan @Icrmorin Maybe I solved using a generator extending ```Sequence```. However I have small batch, so the ``ETA``` in the ```Keras```progress bar indicates that are necessary 25 minutes to perform one epoch. Is it because of the batch size? – pairon Feb 19 '20 at 17:08

3 Answers3

21

To sum up the different solutions from both stackOverflow and github, which would depend of course on your particular situation:

  • Check validity of inputs (no NaNs or sometimes 0s). i.e df.isnull().any()
    • Some float encoders (e.g. StandardScaler) allow use of NaN
  • Add regularization to add l1 or l2 penalties to the weights. Otherwise, try a smaller l2 reg. i.e l2(0.001), or remove it if already exists.
  • Try a smaller Dropout rate.
  • Clip the gradients to prevent their explosion. For instance in Keras you could use clipnorm=1. or clipvalue=1. as parameters for your optimizer.
  • Replace optimizer with Adam which is easier to handle. Sometimes also replacing sgd with rmsprop would help.
  • Use RMSProp with heavy regularization to prevent gradient explosion.
  • Try normalizing your data, or inspect your normalization process for any bad values introduced.
  • Verify that you are using the right activation function (e.g. using a softmax instead of sigmoid for multiple class classification).
  • Try to increase the batch size (e.g. 32 to 64 or 128) to increase the stability of your optimization.
  • Check the size of your last batch which may be different from the batch size.
Kermit
  • 519
  • 5
  • 16
Othmane
  • 351
  • 1
  • 4
  • This is what I got for first 3 epoches after I replaced relu with tanh (high loss!): Epoch 1/10 1/1 - 9s - loss: 91189.1953 Epoch 2/10 1/1 - 0s - loss: 91176.1953 Epoch 3/10 1/1 - 0s - loss: 91164.1172 ... When I deleted 0s and 1s from my each row, the results got better loss around 0.9. But deleting those values is not a good idea since those values mean off and on of switches. Any idea about that please? – Avv Jul 09 '21 at 03:32
  • Thank you very much! It works, but should I add such a regulaizer to every layer given I have LSTM autoencoder structure please? I added it to every layer and loss still around 0.9 for my model. I don't know why is that please? – Avv Jul 09 '21 at 03:37
  • 1
    If batch size fixes your problem, you may have a naive normalization function that doesn't account for zero-division if there's 0-variance in a batch. `z = (value - mean) / (std + 1E-7)` or any other small value should actually fix the root cause, whereas changing the batch size just makes it less likely to occur. +1 for this being the most comprehensive answer out of about a dozen of these questions. So many answers amount to "I changed X to y and it worked!", which, 9/10 times aren't addressing the real problem (which could be any of these and more). – Brendano257 Jul 20 '21 at 17:00
2

A similar problem was reported here: Loss being outputed as nan in keras RNN. In that case, there were exploding gradients due to incorrect normalisation of values.

Vincent Yong
  • 422
  • 3
  • 6
0

If you found this via Google and use keras.preprocessing.sequence.pad_sequences to pad sequences to train RNNs:

Make sure that keras.preprocessing.sequence.pad_sequences() does not have the argument value=None but either value=0.0 or some other number that does not occur in your normal data.