0

XGBoost and standard gradient boosting train learners to fit the residuals rather than the observations themselves. I understand that this aspect of the algorithm matches the boosting mechanism which allows it to iteratively fit errors made by previous learners.

  • Which other algorithms or also train single or multiple learners to fit residuals?
  • Does this method only make sense for learners built in a sequence? Or also for any ensemble methods?
  • Is there a deep significance to fitting residuals or is this a mathematical convenience that allows for iterative improvement from learners built in succession?
Bobby
  • 114
  • 6
  • 2
    i think the answer to the last question is that the deep significance is the convenience to minimise error – Nikos M. Jul 27 '20 at 20:09
  • 1
    have you read the original papers introducing those algorithms and the theorem which states that one can make a strong classifier from many weaker classifiers? – Nikos M. Jul 27 '20 at 20:11
  • @NikosM. It would certainly be interesting to read the original papers, but I'm not sure that it would answer my question. A review of many types of ML algorithms may answer the question about whether the models fit the residuals instead of the observations directly. But then agian, this level of detail may not be the focus of a paper that compares many types of algorithms. – Bobby Jul 27 '20 at 21:12

1 Answers1

3

Training of XGBoost is based on a boosting model, which is a general ensemble method creating a strong model from a number of weak models. This process is performed by building a model from the training dataset, then, creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.

All machine learning models based on boosting almost follow the above procedure. For example, AdaBoost is one of the boosting algorithms developed for binary classification. Methods like LightGBM and Catboost use this algorithm.

Unlike the boosting algorithm, in the bagging algorithm, the models are independent and each model is directly fitted to a subset of the original training data.

nimar
  • 718
  • 3
  • 7
  • It makes sense that all algorithms based on boosting fit residuals rather than observations directly. A random forest however fits the observations directly. Are there any models not based on boosting that fit residuals rather than observations directly? – Bobby Jul 27 '20 at 21:10
  • 1
    @Bobby as far as I understand, fitting residuals *is* boosting. – Itamar Mushkin Jul 28 '20 at 06:13
  • As far as I know, just boosting methods are fitted in this way. – nimar Jul 28 '20 at 17:37