12

I have been running few ML models on same set of data for a binary classification problem with class proportion of 33:67.

I had the same algorithms and same set of hyperparamters during yesterday and today's run.

Please note that I also have the parameter random_state in each estimator function as shown below

np.random.seed(42)
svm=SVC()  # i replace the estimator here for diff algos
svm_cv=GridSearchCV(svm,op_param_grid,cv=10,scoring='f1')
svm_cv.fit(X_train_std,y_train)

q1) Why does this change happens even when I have random_state configured?

q2) Is there anything else that I should do to reproduce the same results every time I run?

Please find below the results that are different? Here auc-Y denotes yesterday's run

enter image description here

desertnaut
  • 1,908
  • 2
  • 13
  • 23
The Great
  • 2,525
  • 16
  • 40

1 Answers1

14

Not every seed is the same.

Here is a definitive function that sets ALL of your seeds and you can expect complete reproducibility:

def seed_everything(seed=42):
    """"
    Seed everything.
    """   
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True

You have to import torch, numpy etc.

UPDATE: How to set global randomseed for sklearn models:

Given that sklearn does not have its own global random seed but uses the numpy random seed we can set it globally with the above :

np.random.seed(seed)

Here is a little experiment for scipy library, analogous would be sklearn (generating random numbers-usually weights):

import numpy as np
from scipy.stats import norm
print('Without seed')
print(norm.rvs(100, size = 5))
print(norm.rvs(100, size = 5))

print('With the same seed')
np.random.seed(42) 
print(norm.rvs(100, size = 5))
np.random.seed(42) # reset the random seed back to 42
print(norm.rvs(100, size = 5))

print('Without seed')
np.random.seed(None)
print(norm.rvs(100, size = 5))
print(norm.rvs(100, size = 5))

outputing and confirming

Without seed
[100.27042599 100.9258397  100.20903163  99.88255017  99.29165699]
[100.53127275 100.17750482  98.38604284 100.74109598 101.54287085]
With the same seed
**[101.36242188 101.13410818 102.36307449  99.74043318  98.83044407]**
**[101.36242188 101.13410818 102.36307449  99.74043318  98.83044407]**
Without seed
[101.2933838  100.52176902 101.38602156 100.72865231  99.02271004]
[100.19080241  99.11010957  99.51578106 101.56403284 100.37350788]
desertnaut
  • 1,908
  • 2
  • 13
  • 23
Noah Weber
  • 5,609
  • 1
  • 11
  • 26