0

I'm a newbie and I'm writing a decision tree from scratch using entropy and information gain. I understand that entropy is the measure of impurity of a data set and also calculated entropy for categorical features but for continuous data, I do get an intuition that I have to take a range of values for each class label but how do I choose that range? Is my intuition correct? What is the standard way of finding the entropy of a feature with numerical data.

  • so the midpoint values can be used as ranges.Thanks – Siddharth Murari May 15 '20 at 14:00
  • For N data points, there are N-1 possible splits. It is only interesting checking splits between data points of different classes because mathematically, the optimal split will never split subsequent data points of the same class – Valentin Calomme May 15 '20 at 14:03

0 Answers0