Adding more layers decreases accuracy

Question

I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.

Edit : Activation function : sigmoid. 521 epochs.

Could you provide both training and test accuracy in both cases — seanv507, Oct 29 '18 at 15:37

Green Falcon · Answer 1 · 2018-10-28T09:49:26.137

2

The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.

edited Oct 28 '18 at 09:49

answered Oct 28 '18 at 08:45

Green Falcon

13,868
9
55
98

It's also a very small dataset. – Matthieu Brucher Oct 28 '18 at 12:53
1

Yes! $50$ thousand is very smal for deep-learning purposes. – Green Falcon Oct 28 '18 at 13:00
I added 10 neurons in both hidden layers. Trained for 100 epichs and the accuracy is 21% – Pink Oct 28 '18 at 17:17
Epoch or iteration? – Green Falcon Oct 28 '18 at 17:40
Epochs.......... – Pink Oct 28 '18 at 17:51
Iterations are the same as number of data – Pink Oct 28 '18 at 17:57
I guess you are doing something wrong. It is very memory-consuming to load 50 thousand images simultaneously. Please put your code. I've said that due to stating iterations are equal to epochs. – Green Falcon Oct 28 '18 at 18:07
Making changes to https://github.com/HyTruongSon/Neural-Network-MNIST-CPP I hope this makes it easier to understand – Pink Oct 28 '18 at 18:22
Epochs = iterations here. @Media – Pink Oct 28 '18 at 19:54
There may be something wrong. – Green Falcon Oct 30 '18 at 17:28
@Media I have posted the same question under username nerd giraffe on stackoverflow. Please take a look at it – Pink Oct 31 '18 at 21:52
please add the link. – Green Falcon Nov 05 '18 at 06:31

DuttaA · Answer 2 · 2018-10-28T13:47:27.087

0

The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:

Vanishing Gradient
High learning rate

NOTE: The link provided for 'Vanishing Gradient' explains beautifully why increasing layers make your network more susceptible to saturation of learning.

The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.

Solution:

Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.

edited Oct 28 '18 at 13:47

answered Oct 28 '18 at 13:17

DuttaA

793
6
23

1

Has nothing to do with this, it's the size of the network compared to the amount of data available. – Matthieu Brucher Oct 28 '18 at 13:41
@MatthieuBrucher what exactly do you mean by size? – DuttaA Oct 28 '18 at 13:43
The size of the network (number of layers + number of nodes per layers). – Matthieu Brucher Oct 28 '18 at 13:43
@MatthieuBrucher yes adding a layer makes it more prone to less learning via vanishing gradient, check the link of vanishing gradient...i did not add it my answer because the answer given was great, however I will indicate it in my answer – DuttaA Oct 28 '18 at 13:44
I know what a vanishing gradient is... You don't know what OP uses for the training, and the number of epochs is a proof that you didn't read the question. – Matthieu Brucher Oct 28 '18 at 13:46
@MatthieuBrucher what OP uses for the training? what do u mean by that? also how number of epochs indicate what? I am at a loss here..can you explain your stand more clearly – DuttaA Oct 28 '18 at 13:48
The number of epochs is 521, not 1 million. And you don't know what kind of optimizer is used. If you need to know, ask as a comment first, same for learning rate. – Matthieu Brucher Oct 28 '18 at 13:50
@MatthieuBrucher I am pretty sure the the learning rate is not in the order I provided, second I provided it as a global solution not localised to just the OP's problem...nitpicking to downvote is not something I would not suggest...also the link tells why stacking layers might lead to bad learning, I do not think you checked the link – DuttaA Oct 28 '18 at 13:53

Adding more layers decreases accuracy

2 Answers2