2

I have tried to train a neural network for a simple x^2 function

  1. I developed training data in excel. First column (X) is =RANDBETWEEN(-5,5) i.e random integer between -5 and 5
  2. Second column simply squares first column
  3. And third column, my output 'y' column is 0 or 1. 0 if second column is less than 12.5 else 1

I made 850 training examples and used the first column as 'X' and third column as 'y'

However I am only able to get a training accuracy of 63%!

Where could I have gone wrong? I changed input_layer to 1 and tried hidden units between 5 and 35. Tried regularization lambda 0 to 2 but still only 63% accuracy! Where could I have gone wrong?

My predict function is p = 1 if h2(i)>0.5 else 0.

Any help will be much appreciated! :-)

I also noticed that my neural network's output is 0.3XXX for all training examples...how is this possible??

Vin
  • 89
  • 5
  • What is the architecture of your neural network? How many layers, what type of activation's, number of nodes, etc... – Armen Aghajanyan Mar 14 '16 at 15:03
  • I used input layer of 1 unit, one hidden layer of 15 units (tried up to 25 units) and output layer of 1 unit. For activation I used the sigmoid function. – Vin Mar 14 '16 at 15:43
  • Is the sigmoid activation function applied to the output layer as well? – Armen Aghajanyan Mar 15 '16 at 00:20
  • Yes for the output layer as well – Vin Mar 15 '16 at 00:45
  • Thanks Armen. I will try without the sigmoid today evening. However in my training data set the output y is not linear, I have converted y to 0 or 1 based on whether the linear output is greater than or less than 12.5; don't you think it should work for such a case with sigmoid function? – Vin Mar 15 '16 at 02:10
  • 1
    Can you post the code somewhere? Have you scaled the input data to -1, 1? How did you initialize the weights and what learning rates did you try? Can you plot the learning curves - if they don't decrease, learning rate might be too low, if they jump around a lot, learning rate might be too high. You should definitely use the sigmoid function also on the output, don't remove it. – stmax Mar 15 '16 at 07:59
  • I'm voting to close this question as off-topic because we generally close questions as not useful to future readers if they were ultimately due to a typo or other local error – Sean Owen Mar 18 '16 at 13:20

2 Answers2

3

I re-implemented your set-up in python using keras. I used a hidden layer size of 25, and all my activations were sigmoid's. I got to an accuracy of 99.88%. Try running your algorithm for a greater amount of epochs. Use binary cross entropy as the loss function and try decreasing the learning rate of your gradient descent algorithm. This should help increase your accuracy. My only explanation for the poor performance would be that you are getting stuck at a local minimum, if that is the case different initiations of your weights should fix that problem.

  • Thanks Armen! I implemented my code in octave and i am not aware of keras and epochs, i will try and explore these concepts. – Vin Mar 15 '16 at 08:26
  • @NeilSlater Done – Armen Aghajanyan Mar 15 '16 at 20:30
  • Please find my code here, still not been able to solve in Octave. Main program @ http://pastebin.com/v1LCqYqT Cost function @ http://pastebin.com/TNQJ59dm Predict function @ http://pastebin.com/T51WABvk Sigmoid function @ http://pastebin.com/gvp2SPH9 Please see where I could have gone wrong – Vin Mar 16 '16 at 16:13
1

Problem solved! There was mistake in my cost formula...lambda was not multiplied with both theta components due to a missing bracket! Resolved that and things working fine now. :-)

Vin
  • 89
  • 5