I am taking a course that introduced me to sklearn.ensemble.RandomForestClassifier. At first it uses n_estimators with the default value of 10 and the resulting accuracy turns out to be around 0.28. If I change n_estimators to 15, the accuracy goes to 0.32
Here's some of the code:
pl = Pipeline([
('union', FeatureUnion(
transformer_list = [
('numeric_features', Pipeline([
('selector', get_numeric_data),
('imputer', Imputer())
])),
('text_features', Pipeline([
('selector', get_text_data),
('vectorizer', CountVectorizer())
]))
]
)),
('clf', RandomForestClassifier())
])
I thought that increasing the number of trees (n_estimators) in the RandomForestClassifier would give a better accuracy, but sometimes if I use a value of 100 I can get between 0.30 and 0.32. Could someone please explain? How do you find which is the smallest value for getting the highest possible accuracy?