0

I'm trying to implement gradient descent in Python and following Andrew Ng course in order to follow the math. However, my implementation isn't working as I expected. It would be great if the community can help me to identify my mistake.

When I increase the range from 3 to higher number, it does not converge, rather thetas move from very positive to very negative and finally get nan because they get extremely small.

Code is given below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
    
X = pd.DataFrame(load_boston().data, columns = load_boston().feature_names)
X['theta0'] = 1
y = load_boston().target
y = pd.DataFrame(y, columns = ['target'])
theta = pd.DataFrame(np.random.randn(X.shape[1]),columns = ['target'], index = X.columns.values)
    
print('theta shape',theta.shape)
print('X shape',X.shape)
print('y shape',y.shape)
print(theta)
    
def predict(X,theta, ycol = 'target'):
    return X.dot(theta)
    
mse_values =[]
alpha = 0.01
for i in range(10000):
   error = predict(X,theta) - y
   theta = theta - ((alpha)* (1/len(X)) * X.T.dot(error))
   mse= np.sum(error**2)/len(X)
   print('mse: ', mse.values)
   mse_values.append(mse)
   print('+'*5)
    
plt.plot(mse_values)
plt.show()  
Shoaibkhanz
  • 111
  • 3

2 Answers2

1

I was doubting my implementation all the way but it was the learning rate. After a lot of experimentation, I found the right one, but I'm very much surprised to see how small the learning rate had to be in order for it to work, i.e alpha = 0.000001

Shoaibkhanz
  • 111
  • 3
1

If you use the backtracking method (details in my answer in this link:

Does gradient descent always converge to an optimum?)

then you can avoid spending time to manually find the "right learning rate" as in your case here.

Tuyen
  • 131
  • 1
  • 4