Gradient Descent Python Implementation isnt converging

Question

I'm trying to implement gradient descent in Python and following Andrew Ng course in order to follow the math. However, my implementation isn't working as I expected. It would be great if the community can help me to identify my mistake.

When I increase the range from 3 to higher number, it does not converge, rather thetas move from very positive to very negative and finally get nan because they get extremely small.

Code is given below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
    
X = pd.DataFrame(load_boston().data, columns = load_boston().feature_names)
X['theta0'] = 1
y = load_boston().target
y = pd.DataFrame(y, columns = ['target'])
theta = pd.DataFrame(np.random.randn(X.shape[1]),columns = ['target'], index = X.columns.values)
    
print('theta shape',theta.shape)
print('X shape',X.shape)
print('y shape',y.shape)
print(theta)
    
def predict(X,theta, ycol = 'target'):
    return X.dot(theta)
    
mse_values =[]
alpha = 0.01
for i in range(10000):
   error = predict(X,theta) - y
   theta = theta - ((alpha)* (1/len(X)) * X.T.dot(error))
   mse= np.sum(error**2)/len(X)
   print('mse: ', mse.values)
   mse_values.append(mse)
   print('+'*5)
    
plt.plot(mse_values)
plt.show()

@Aditya thats exactly the problem, I did but couldnt find anything wrong — Shoaibkhanz, Nov 03 '18 at 15:39

score 1 · Answer 1 · edited Sep 14 '20 at 08:19

1

I was doubting my implementation all the way but it was the learning rate. After a lot of experimentation, I found the right one, but I'm very much surprised to see how small the learning rate had to be in order for it to work, i.e alpha = 0.000001

edited Sep 14 '20 at 08:19

Oussama Rizwi

53
4

answered Nov 04 '18 at 17:11

Shoaibkhanz

111
3

It can be really easy to just assume something else is wrong. Sometimes, using batch and normalization helps as well. – Carl Rynegardh Nov 04 '18 at 22:05

score 1 · Answer 2 · answered May 06 '19 at 12:14

1

If you use the backtracking method (details in my answer in this link:

Does gradient descent always converge to an optimum?)

then you can avoid spending time to manually find the "right learning rate" as in your case here.

answered May 06 '19 at 12:14

Tuyen

131
1
4

Gradient Descent Python Implementation isnt converging

2 Answers2