How is bayesian risk computed to prune decision trees?

Question

I've been trying to follow this paper on Bayesian Risk Pruning. I'm not very familiar with this type of pruning, but I'm wondering a few things:

(1) The paper describes risk-rates to be defined per example. We have $R_k(a_i|x)=\sum\limits_{j=1,j \neq i}^{T_c} L_k(a_i|C_j)p_k(C_j|x)$. $L_k(a_i|C_j)$ is defined to be the loss of an example being predicted in class $C_j$ when the true class is $C_i$. $p_k(C_j|x)$ is the estimated probability of an example belonging to $C_j$. Above is the decision tree that was produced from a C4.5 algorithm. Pruning occurs from left to right, bottom-up. My main question: How are the risk-rates found in the decision tree such as for Node 3?

(2) There's also conflicting statements here:

The first image states that if parent risk-rate exceeds the total risk-rate of the leaves, then the parent is pruned to a leaf. However, the second claims that pruning occurs if the leaf risk-rate exceeds the parent. To confirm, if the risk-rate of the parent is less than the risk-rate of the leaves under the subtree of the parent, then I would set the parent to be a leaf?

(3) From (1), loss would be 0-1 in the binary case. What could be a reasonable loss for multi-class output?

(4) From (1), would estimated probability of $C_j$ be the proportion of $C_j$ in the partitioned output class at a node? For instance, at node 3, we're looking at output = [No].

(5) From (1), would risk-rate be over all training examples?

first: the if condition in the algorithm is probably a typo, it should read `if (Rp < Ri) convert parent node to leaf` instead — Nikos M., Jan 17 '21 at 08:44

nrael · Answer 1 · 2021-01-20T19:02:37.920

I'm going to try to answer my question. To (4), I would say yes. For (5), I believe that for each node, there's a specific partition that falls into the node when following the branches down the decision tree. For instance, for Node 2, there are 3 instances (where a=1) that results in [Yes, No, No] for the partitioned output class at Node 2. For (1), I would calculate the risks over each of these relevant examples and sum them. Then at node 2: $\frac{2}{3}(1)+\frac{1}{3}(1)+\frac{1}{3}(1) = \frac{4}{3}$ is the bayes risk where I use 0-1 loss. Then I would use these estimated risks in the pruning algorithm. Can anyone corroborate?

(2) was answered in the comments above, and I'm assuming for (3) that cross-entropy can be used for multi-class loss.

How is bayesian risk computed to prune decision trees?

1 Answers1