4

I've been trying to follow this paper on Bayesian Risk Pruning. I'm not very familiar with this type of pruning, but I'm wondering a few things:

(1) The paper describes risk-rates to be defined per example. We have $R_k(a_i|x)=\sum\limits_{j=1,j \neq i}^{T_c} L_k(a_i|C_j)p_k(C_j|x)$. $L_k(a_i|C_j)$ is defined to be the loss of an example being predicted in class $C_j$ when the true class is $C_i$. $p_k(C_j|x)$ is the estimated probability of an example belonging to $C_j$. enter image description here enter image description here Above is the decision tree that was produced from a C4.5 algorithm. Pruning occurs from left to right, bottom-up. My main question: How are the risk-rates found in the decision tree such as for Node 3?

(2) There's also conflicting statements here:

enter image description here enter image description here The first image states that if parent risk-rate exceeds the total risk-rate of the leaves, then the parent is pruned to a leaf. However, the second claims that pruning occurs if the leaf risk-rate exceeds the parent. To confirm, if the risk-rate of the parent is less than the risk-rate of the leaves under the subtree of the parent, then I would set the parent to be a leaf?

(3) From (1), loss would be 0-1 in the binary case. What could be a reasonable loss for multi-class output?

(4) From (1), would estimated probability of $C_j$ be the proportion of $C_j$ in the partitioned output class at a node? For instance, at node 3, we're looking at output = [No].

(5) From (1), would risk-rate be over all training examples?

nrael
  • 41
  • 7

1 Answers1

0

I'm going to try to answer my question. To (4), I would say yes. For (5), I believe that for each node, there's a specific partition that falls into the node when following the branches down the decision tree. For instance, for Node 2, there are 3 instances (where a=1) that results in [Yes, No, No] for the partitioned output class at Node 2. For (1), I would calculate the risks over each of these relevant examples and sum them. Then at node 2: $\frac{2}{3}(1)+\frac{1}{3}(1)+\frac{1}{3}(1) = \frac{4}{3}$ is the bayes risk where I use 0-1 loss. Then I would use these estimated risks in the pruning algorithm. Can anyone corroborate?

(2) was answered in the comments above, and I'm assuming for (3) that cross-entropy can be used for multi-class loss.

nrael
  • 41
  • 7