10

I am trying to build a model in scikit-learn. I used RandomForestClassifier as my method for classification. In order to improve the score and efficiency of my model, I thought about using GridSearchCV.

Here is the code:

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,roc_auc_score
from sklearn.grid_search import GridSearchCV

..................................... ## code for cleaning data

X_train, X_test, y_train, y_test = train_test_split(train,output, test_size=0.2, random_state =7)


# In[18]:

clf = RandomForestClassifier(n_estimators =100)
param_grid = {'max_depth' : [None, 10,20],
              'max_features' : ['auto',None],
              'n_estimators' :[100,200,300],
              'random_state': 7}
## This line is throwing the error shown below
validator = GridSearchCV(clf, param_grid= param_grid) 
vaildiator.fit(X_train,y_train)

The error being thrown by my code is:

ValueError     Traceback (most recent call 

last)
<ipython-input-22-3711af477b0c> in <module>()
      3          "max_depth" : [5,10,50],
      4          "random_state" : 7}
----> 5 grid = GridSearchCV(clf, param_grid=param, n_jobs=1)
      6 grid.fit(X_train,y_train)

C:\Anaconda3\envs\DeepLearning\lib\site-packages\sklearn\grid_search.py in __init__(self, estimator, param_grid, scoring, fit_params, n_jobs, iid, refit, cv, verbose, pre_dispatch, error_score)
    785             refit, cv, verbose, pre_dispatch, error_score)
    786         self.param_grid = param_grid
--> 787         _check_param_grid(param_grid)
    788 
    789     def fit(self, X, y=None):

C:\Anaconda3\envs\DeepLearning\lib\site-packages\sklearn\grid_search.py in _check_param_grid(param_grid)
    326             check = [isinstance(v, k) for k in (list, tuple, np.ndarray)]
    327             if True not in check:
--> 328                 raise ValueError("Parameter values should be a list.")
    329 
    330             if len(v) == 0:

ValueError: Parameter values should be a list.

Please help me in figuring out the above error and why is this happening?

Shayan Shafiq
  • 1,012
  • 4
  • 11
  • 24
enterML
  • 3,011
  • 9
  • 26
  • 38

2 Answers2

13

Well the error message is quite clear. GridSearchCV accepts only lists. Therefore 'random_state': [7]} will solve the issue.

However when you have only one value with this parameter, it makes more sense put it directly into the classifier as you did with n_estimators.

HonzaB
  • 1,669
  • 1
  • 12
  • 20
6

I would say that you have to remove random_state from the parameter grid. That, or put something like [7, X] which will work but that doesn't make sense I think. If you want to use fixed random_state = 7, you should write it when you instantiate the estimator just as another hyperparameter (next to n_estimators).

I can't test it right now but I'd say that's the problem.

Shayan Shafiq
  • 1,012
  • 4
  • 11
  • 24
hipoglucido
  • 1,160
  • 1
  • 10
  • 17