I'm a newbie and I'm writing a decision tree from scratch using entropy and information gain. I understand that entropy is the measure of impurity of a data set and also calculated entropy for categorical features but for continuous data, I do get an intuition that I have to take a range of values for each class label but how do I choose that range? Is my intuition correct? What is the standard way of finding the entropy of a feature with numerical data.
Asked
Active
Viewed 31 times
0
-
so the midpoint values can be used as ranges.Thanks – Siddharth Murari May 15 '20 at 14:00
-
For N data points, there are N-1 possible splits. It is only interesting checking splits between data points of different classes because mathematically, the optimal split will never split subsequent data points of the same class – Valentin Calomme May 15 '20 at 14:03