Overfitting in CNN

Question

I am training a VGG net on STL-10 dataset

I am getting Top-5 validation accuracy about 98% and Top-1 validation accuracy about 83%

But both the Top-1 and Top-5 Training accuracy is reaching 100%

Does this mean that the network is over-fitting? Or not?

Code::

def conv2d(inp,name,kshape,s):
    with tf.variable_scope(name) as scope:
        kernel = get_weights('weights',shape=kshape)
        conv = tf.nn.conv2d(inp,kernel,[1,s,s,1],'SAME')
        bias = get_bias('biases',shape=kshape[3])
        preact = tf.nn.bias_add(conv,bias)
        convlayer = tf.nn.relu(preact,name=scope.name)
    return convlayer

def maxpool(inp,name,k,s):
    return tf.nn.max_pool(inp,ksize=[1,k,k,1],strides=[1,s,s,1],padding='SAME',name=name)

def loss(logits,labels):
    labels = tf.reshape(tf.cast(labels,tf.int64),[-1])
    #print labels.get_shape().as_list(),logits.get_shape().as_list()
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits,name='cross_entropy_per_example')
    cross_entropy_mean = tf.reduce_mean(cross_entropy,name='cross_entropy')
    total_loss = tf.add(tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)),cross_entropy_mean,name='total_loss')
    return total_loss

def top_1_acc(logits,true_labels):
    pred_labels = tf.argmax(logits,1)
    true_labels = tf.cast(true_labels,tf.int64)
    #print pred_labels.get_shape().as_list(),true_labels
    correct_pred = tf.cast(tf.equal(pred_labels, true_labels), tf.float32)
    accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))
    return accuracy

with tf.device('/gpu:0'):
    conv1 = conv2d(feed_images,'conv1',[3,3,3,64],1)
    conv2 = conv2d(conv1,'conv2',[3,3,64,64],1)
    pool1 = maxpool(conv2,'pool1',2,2)
    #size = [N,48,48,64]
    conv3 = conv2d(pool1,'conv3',[3,3,64,128],1)
    conv4 = conv2d(conv3,'conv4',[3,3,128,128],1)
    pool2 = maxpool(conv4,'pool2',2,2)
    #size = [N,24,24,128]
    conv5 = conv2d(pool2,'conv5',[3,3,128,256],1)
    conv6 = conv2d(conv5,'conv6',[3,3,256,256],1)
    pool3 = maxpool(conv6,'pool3',2,2)
    #size = [N,12,12,256]
    conv7 = conv2d(pool3,'conv7',[3,3,256,512],1)
    conv8 = conv2d(conv7,'conv8',[3,3,512,512],1)
    pool4 = maxpool(conv8,'pool4',2,2)
    #size = [N,6,6,512]
    conv9 = conv2d(pool4,'conv9',[3,3,512,512],1)
    conv10 = conv2d(conv9,'conv10',[3,3,512,512],1)
    pool5 = maxpool(conv10,'pool5',2,2)
    #size = [N,3,3,512]
    flattened_pool5 = tf.contrib.layers.flatten(pool5)
    fc1 = tf.contrib.layers.fully_connected(flattened_pool5,1024,weights_regularizer=tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))
    dropout1 = tf.nn.dropout(fc1,keep_prob)
    fc2 = tf.contrib.layers.fully_connected(dropout1,1024,weights_regularizer=tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))
    dropout2 = tf.nn.dropout(fc2,keep_prob)
    logits = tf.contrib.layers.fully_connected(dropout2,10,activation_fn=None,weights_regularizer=tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))

    cost = loss(logits,feed_labels)

    opt_mom = tf.train.MomentumOptimizer(learning_rate=lr,momentum=0.9)
    opt = opt_mom.minimize(cost)

    acc = top_1_acc(logits,feed_labels)

You have to test it using data that has not been used in your training set. — Green Falcon, Jul 08 '18 at 16:31
Yes I am using separate data for validation and training @Media — Siladittya, Jul 08 '18 at 16:31
83% and 100% is high variance problem. You are overfitting. try to use dropout in your fully connected layers. — Green Falcon, Jul 08 '18 at 16:34
@Media I have used dropout of 50% and also data augmentation — Siladittya, Jul 08 '18 at 16:36
So, essentially, overfiting is measured using the Top-1 accuracy. I have tried all methods except batch-normalization, but this difference in the two accuracy values still remain — Siladittya, Jul 08 '18 at 16:37
Try to decrease the number of parameters by diminishing the number of filters and the number of nodes in your fully connected layer. Batch normalisation does not have too much effect in overfitting. — Green Falcon, Jul 08 '18 at 16:38
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/79897/discussion-between-siladittya-and-media). — Siladittya, Jul 08 '18 at 16:40
@Media as I was using VGG 13 architecture already, I decreased the number of nodes in FC layers from 4096 to 2048 and also decreased the number of filters in the 7th and 8th conv layers from 512 to 256. But I see that the Top-1 Validation Accuracy is not increasing above 72% but Top-1 Training accuracy has already crossed 95% — Siladittya, Jul 09 '18 at 04:18
@Media I tried decreasing the number of filters and nodes, both, but the difference between the Top-1 Training and Validation accuracy does not decrease — Siladittya, Jul 09 '18 at 05:33
Increase the dropout hyperparameter. decrease 1024 to 512. Use Adam optimiser and again tell me what happened. — Green Falcon, Jul 09 '18 at 07:21
@Media Top-1 training Accuracy did not increase above 20% and validation accuracy did not increase above 10%. I used keep_prob 0.7 and also adam optimizer — Siladittya, Jul 09 '18 at 09:09
0.7 is too much. set it to something like 0.55. Moreover, try to change the learning rate. a bit peculiar behaviour. You should train it at least some hours. — Green Falcon, Jul 09 '18 at 10:15
keep_prob -> 0.55, adam optimizer starting learning rate = 0.0001and FC layers 512 nodes, working perfect, Top - 1 training accuracy and validation accuracy 88% and 77% respectively and Top-5 98% both @Media — Siladittya, Jul 09 '18 at 11:05
Sorry, you have used a lot of sentence fragments. I didn't understand what you mean. — Green Falcon, Jul 09 '18 at 11:06
@Media I used keep_prob = 0.55 ; adam optimizer starting learning rate = 0.0001; and number of nodes in the laast two Fully connected layers = 512. After 47 epochs, I obtained Top - 1 training accuracy and validation accuracy 86% and 80% respectively and Top-5 training and val accuracy 100% and 98% repectively — Siladittya, Jul 09 '18 at 11:20

score 1 · Answer 1 · answered Jul 09 '18 at 11:38

1

Based on your accuracies the $12 \%$ difference is introducing high variance problem which means you are overfitting. Due to the fact that the number of parameters is too many for VGG16 and you have a moderate-size dataset which is smaller than ImageNet overfitting is obvious. Try to decrease the number of parameters in the bottlenecks of your model, the connections among fully connected networks and convolutional layers and fully connected layers. Moreover, try to use AdamOptimizer which better. Also try to train for more epochs.

answered Jul 09 '18 at 11:38

Green Falcon

13,868
9
55
98

If I train for more epochs then the difference between the accuracies starts increasing, so I stopped training – Siladittya Jul 09 '18 at 11:42
It depends. By choosing appropriate dropout alpha it may not. – Green Falcon Jul 09 '18 at 11:44
I used 0.55 as you said. I will try with different alpha then. but using 0.55 I obtained 86% and 80% respectively – Siladittya Jul 09 '18 at 11:46
There should be a point where you stop training phase. I meant you should let your model be trained enough. It's customary to use grid search for hyper-parameter tunning. – Green Falcon Jul 09 '18 at 11:48
Okay, I understand. Thank you for you help. I had not been able to implement Adam optimizer before today, always faced problem. – Siladittya Jul 09 '18 at 11:56
@Siladittya `tf.train.AdamOptimizer` – Green Falcon Jul 09 '18 at 12:01
Yeah, I know that, but I wanted to say, I always had problems using Adam Optimizer because I always had problems I mentioned in the comments – Siladittya Jul 10 '18 at 06:08

Overfitting in CNN

1 Answers1