Questions tagged [decision-trees]

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

740 questions
110
votes
9 answers

When should I use Gini Impurity as opposed to Information Gain (Entropy)?

Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)? Which metric is better to use in different scenarios while using decision trees?
87
votes
6 answers

strings as features in decision tree/random forest

I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…
40
votes
4 answers

Why do we need XGBoost and Random Forest?

I wasn't clear on couple of concepts: XGBoost converts weak learners to strong learners. What's the advantage of doing this ? Combining many weak learners instead of just using a single tree ? Random Forest uses various sample from tree to create…
37
votes
5 answers

Are decision tree algorithms linear or nonlinear

Recently a friend of mine was asked whether decision tree algorithms are linear or nonlinear algorithms in an interview. I tried to look for answers to this question but couldn't find any satisfactory explanation. Can anyone answer and explain the…
32
votes
3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…
user781486
  • 1,305
  • 2
  • 16
  • 18
30
votes
1 answer

How is a splitting point chosen for continuous variables in decision trees?

I have two questions related to decision trees: If we have a continuous attribute, how do we choose the splitting value? Example: Age=(20,29,50,40....) Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…
WALID BELRHALMIA
  • 411
  • 1
  • 4
  • 5
25
votes
4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…
25
votes
4 answers

How to make a decision tree with both continuous and categorical variables in the dataset?

Let's say I have 3 categorical and 2 continuous attributes in a dataset. How do I build a decision tree using these 5 variables? Edit: For categorical variables, it is easy to say that we will split them just by {yes/no} and calculate the total gini…
Sahil Chaturvedi
  • 425
  • 1
  • 4
  • 7
22
votes
1 answer

XGBRegressor vs. xgboost.train huge speed difference?

If I train my model using the following code: import xgboost as xg params = {'max_depth':3, 'min_child_weight':10, 'learning_rate':0.3, 'subsample':0.5, 'colsample_bytree':0.6, 'obj':'reg:linear', 'n_estimators':1000, 'eta':0.3} features =…
21
votes
1 answer

Decision trees: leaf-wise (best-first) and level-wise tree traverse

Issue 1: I am confused by the description of LightGBM regarding the way the tree is expanded. They state: Most decision tree learning algorithms grow tree by level (depth)-wise, like the following image: Questions 1: Which "most" algorithms…
kkk
  • 433
  • 1
  • 4
  • 12
17
votes
5 answers

Decision tree vs. KNN

In which cases is it better to use a Decision tree and other cases a KNN? Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm) Anyone have some explanations or references…
gchavez1
  • 173
  • 1
  • 1
  • 4
17
votes
5 answers

Should I use a decision tree or logistic regression for classification?

I am working on a classification problem. I have a dataset containing equal numbers of categorical variables and continuous variables. How do I decide which technique to use, between a decision tree and logistic regression? Is it right to assume…
Arun
  • 717
  • 3
  • 10
  • 27
16
votes
2 answers

When to choose linear regression or Decision Tree or Random Forest regression?

I am working on a project and I am having difficulty in deciding which algorithm to choose for regression. I want to know under what conditions should one choose a linear regression or Decision Tree regression or Random Forest regression? Are there…
15
votes
1 answer

Can gradient boosted trees fit any function?

For neural networks we have the universal approximation theorem which states that neural networks can approximate any continuous function on a compact subset of $R^n$. Is there a similar result for gradient boosted trees? It seems reasonable since…
Imran
  • 2,351
  • 11
  • 22
13
votes
3 answers

Unbalanced classes -- How to minimize false negatives?

I have a dataset that has a binary class attribute. There are 623 instances with class +1 (cancer positive) and 101,671 instances with class -1 (cancer negative). I've tried various algorithms (Naive Bayes, Random Forest, AODE, C4.5) and all of them…
1
2 3
49 50