2

I have two sample groups of customers, each customer has 100s of features. For a single sample, i would use Decision Trees to find sub-groups that have a high churn rate. Thats easy.

However, my requirement is: between two samples (below), find segment(s) such that in one sample its churn rate is high and in the other, it is low. In other words, find a sub-group which has the highest difference in churn rate.

What is an appropriate algorithm to solve this?

Thanks.

enter image description here

Arslán
  • 131
  • 2
  • I think you can do it using entropy and information gain, do you know how they work? – Francesco Pegoraro Sep 24 '18 at 20:43
  • You could use clustering and find the groups with high and low churn rate, – user2974951 Sep 25 '18 at 06:26
  • I usually use decision tree to find the sub-groups, because i also need to explain those groups. My naive approach was to find all sub-groups in sample 1, and then apply the same decision tree rules to sample 2, and vice-versa, with a goal to maximize the churn rate of corresponding sub-groups. This approach didn't seem efficient to me. – Arslán Sep 25 '18 at 13:41

1 Answers1

-1

You can frame this issue as feature importance. Which features have the greatest influence on the target value of churn rate?

There are many ways to approach feature importance. In decision trees, permutation importance can be used.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102