Why does my training loss oscillate while training the final layer of AlexNet with pre-trained weights?

Question

I am working on texture classification and based on previous works, I am trying to modify the final layer of AlexNET to have 20 classes, and train only that layer for my multi class classification problem. I am using Tensorflow-GPU on an NVIDIA GTX 1080, Python3.6 on Ubuntu 16.04. I am using the Gradient Descent Optimiser and the class Estimator to build this. I am also using two dropout layers for regularization. Therefore, my hyper parameters are the learning rate, batch_size, and weight_decay. I have tried using batch_size of 50,100,200,weight_decays of 0.005 and 0.0005, and learning rates of 1e-3,1e-4,and 1e-5. All the training loss curves for the above values follow similar trends.

My training loss curve does not monotonically decrease and instead seems to oscillate. I have provided a tensorboard visualization for learning rate=1e-5, weight decay=0.0005, and batch_size=200.

Please assist in understanding what went wrong and how I could possibly rectify it. The Tensorboard Visualization for the case I specified

  # Create the Estimator
  classifier = tf.estimator.Estimator(model_fn=cnn_model)
  # Set up logging for predictions
  tensors_to_log = {"probabilities": "softmax_tensor"}
  logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=10)
 # Train the model
  train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": train_data},y=train_labels,batch_size=batch_size,num_epochs=None,shuffle=True)
  classifier.train(input_fn=train_input_fn, steps=200000, hooks=[logging_hook])
  # Evaluate the model and print results
  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
  y=eval_labels,
  num_epochs=1,
  shuffle=False)
  eval_results = classifier.evaluate(input_fn=eval_input_fn)
  print(eval_results)

#Sections of the cnn_model
 #Output Config
 predictions = { "classes": tf.argmax(input=logits, axis=1),# Generate predictions (for PREDICT and EVAL mode)
"probabilities": tf.nn.softmax(logits, name="softmax_tensor")}  # Add `softmax_tensor` to the graph. It is used for PREDICT and by the `logging_hook`.
if mode == tf.estimator.ModeKeys.PREDICT:
  return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

 # Calculate Loss (for both TRAIN and EVAL modes)
  onehot_labels = tf.one_hot(indices=tf.cast(labels,tf.int32),depth=20)
  loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)


#Training Config
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    tf.summary.scalar('training_loss',loss)
    summary_hook = tf.train.SummarySaverHook(save_steps=10,output_dir='outputs',summary_op=tf.summary.merge_all())
    train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op,training_hooks=[summary_hook])



  # Evaluation Metric- Accuracy
  eval_metric_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=predictions["classes"])}
   print(time.time()-t)
   tf.summary.scalar('eval_loss',loss)
   ac=tf.metrics.accuracy(labels=labels,predictions=predictions["classes"])
   tf.summary.scalar('eval_accuracy',ac)
   evaluation_hook= tf.train.SummarySaverHook(save_steps=10,output_dir='outputseval',summary_op=tf.summary.merge_all())
  return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops,evaluation_hooks=[evaluation_hook])

Use Adam optimizer and don't set the decay in your first epochs. Moreover, oscillating is normal for batch approaches. Let me know what happens. — Green Falcon, Feb 15 '18 at 10:45
Hey. Thanks for the advice. Changing pre-processing helped as did switching to Adam. I am still trying to create a more effective grid search for the model's hyper parameters, which I hope will help train the network. — RSSharma, Feb 22 '18 at 06:10
So, did that do the job? oscillating is normal in batch approaches. — Green Falcon, Feb 22 '18 at 14:03
Sorry, it doesn't work still. I am getting the oscillating behavior, and what's more is that the loss increases or stays constant. It just doesn't go down. — RSSharma, Feb 23 '18 at 09:05
I am retraining the final layer of AlexNET for my problem as per a previous work. The difference is in our data-sets(they use microscopic image inputs, and I used scanner image inputs), and dataset size(mine is around 40000 without data-augmentation, theirs is 30,000 without data augmentation, and around 1.5 million with data augmentation), and thus I believe only my hyperparameters would be different,right? — RSSharma, Feb 24 '18 at 08:45
Have you tried to fine tune the learning rate? Moreover, I guess you have to try to use transform learning for the previous layers as well based on your description of your data-set. — Green Falcon, Feb 24 '18 at 19:27
I tried decreasing the learning rate as you suggested. Will try fine-tuning all layers. But I fine-tuned last layer of VGGNet and it worked fine with Keras's pretrained model. — RSSharma, Feb 25 '18 at 14:20
It was just for getting better performance. Did you solve your problem? — Green Falcon, Feb 26 '18 at 08:40
Thanks for the encouragement and advice. Not yet. I am still trying to find the right hyperparameters. Also, for AlexNET, the only preprocessing as per the original paper is mean subtraction and channel reversal. Is there something else that needs to be done? Have I made any pre-processing mistakes? — RSSharma, Feb 26 '18 at 23:03
I recommend you trying new models like `VGG16`. It may help :) — Green Falcon, Feb 27 '18 at 17:56

Why does my training loss oscillate while training the final layer of AlexNet with pre-trained weights?

0 Answers0