3

I have 130 records in one CSV dataset and I'm using j48 decision tree. I used the whole training set for testing, and the result was 79 correctly classified and 51 incorrectly classified records. Now! I want a result between 95% - 100% and don't have ability to add or remove any record to those 130. But I'm allowed to play a little with my dataset, for example use 10 out of 130 in such a way that all of those 10 must be classified correctly.

  • The Quality of tree is not Important
  • The Number of test samples is not so important (10-15 is good)
  • The Only Important thing is Accuracy between 95-100 %

FIRST, I tried to move 10 correct samples to bottom of dataset and use "split by percentage" (92.30%) but it wasn't useful.

At SECOND, I tried to choose them by "try and error" and chose randomly, and I was just testing the last sample (130th), the I found some correct samples and keep it there, then tried another below the last successful sample and tested the last two lines, the result should have been 50% or 100%, but surprisingly was 0% (Total instants:2 correct:0 Incorrect:2).

Can anyone help me please? I already thank a lot dear readers...

1 Answers1

2

I don't understand the purpose when the quality of the tree is not important but you should be able to make a tree to have 100% accuracy on training data set easily. Just avoid any pruning and let the tree grow as maximum as possible. As far as I remember Weka set a pruning policy for J48 by default, you should disable that. Also check to see that the split nodes with minimum number of instances possible.

Now that I an considering better, there are cases when full accuracy is impossible. Think for example a simple toy data set with 10 instances. Suppose that all instances are the same for input variables, but for target variable 5 are positive and 5 are negative. Either way you take a decision for prediction, the accuracy on training data would be 0.5. In this case you simply do not have enough data to discriminate on.

rapaio
  • 4,633
  • 20
  • 35