1

I am trying to interpret this chart.

I am not sure how to interpret this, because, I think that the fact of the for examples LGBM Validation error, is wide and similar to train boxplot, there arent problem of overfitting, but when I see another type of charts of the execution of LGBM, I can see that really the LGBM is overfitted, so really I don't know how to interpret this of the correct way.

enter image description here

But I don't know how could interpret beyond this:

LightGBM is maybe the best option because it is faster and finally you can get enough accuracy with that, and in comparison with the other two, bagging have less overfit because of the differences between the error is less.

Any idea?

Thanks

Tlaloc-ES
  • 337
  • 1
  • 6
  • 1
    What do you not understand? If you created the chart, you must have had a reason. Can you elaborate on what confuses you? – Valentin Calomme Jun 27 '20 at 17:09
  • I want to know if is possible, analice the overfit and unverfit, with this type of chart – Tlaloc-ES Jun 27 '20 at 17:13
  • Is this homework or something like that? – noe Jun 27 '20 at 17:17
  • Where does this diagram from? And the one from [another of your questions](https://datascience.stackexchange.com/q/76755/14675) that also came with no explanation at all? – noe Jun 27 '20 at 17:18
  • Yes, this is homework, the problem was getting conclusions about the plot, but I think that my own interpretation of the plot is not enough, my interpretation is the following: LightGBM is maybe the best option because is faster and finally you can get enough accuracy with that, and in comparison with the another two, bagging have less overfit because the differences between the error are less. – Tlaloc-ES Jun 27 '20 at 17:27
  • And I don't found literature about this kind of exercise in order to get an idea about how to do the task. – Tlaloc-ES Jun 27 '20 at 17:28
  • For this kind of situation, I would suggest: 1) saying that it is homework right away, 2) Post what you understand from it, and ask for confirmation, refutation, or alternative interpretations. Otherwise, it seems that you come here asking people to invest their time doing your homework for you (unfair), instead of helping you understand stuff (fair). – noe Jun 27 '20 at 17:48
  • Thanks @ncasas for your suggest, I updated the question. – Tlaloc-ES Jun 27 '20 at 17:57
  • First of all, I think you are mixing concepts such as error and score. They are different things, basically, you want to minimize errors and increase scores. In a classification task, a score can be f1 or accuracy, and an error could be a log-loss function. So what is the thing you are calling error? – Victor Oliveira Jun 27 '20 at 19:24

1 Answers1

3

Your chart seems to show that light GBM models are very inconsistent in terms of F1 score. The other two types of model tend to have lower validation accuracy than training accuracy, suggesting overfitting is occurring to some extent (but this is ubiquitous in machine learning so it’s not a deal breaker by any means). The best median validation performance is by RandomForest, however some outliers underperformed the models using bagging. Possibly a good approach would be to have an ensemble of RandomForest models.