5

Is it possible to use XGBoost regressor to do non-linear regressions?

I know of the objectives linear and logistic.
The linear objective works very good with the gblinear booster.
This made me wonder if it is possible to use XGBoost for non-linear regressions like logarithmic or polynomial regression.

a) Is it generally possible to make polynomial regression like in CNN where XGBoost approximates the data by generating n-polynomial function?
b) If a) is generally not possible, would it be possible to declare a curve with its parameters and let XGBoost figure out the values of the parameters? (To give an example) Assume we guess that the curve can be approximated with:

$$ 10^{a\log_{k}({x})-b} $$

XGBoost would have to figure out $a$, $k$, and $b$. $x$ would be a given feature.

Harrys Kavan
  • 151
  • 4
  • 1
    As a heads up, polynomial regression is a type of linear regression. You might want to watch [this](https://youtube.com/watch?v=rVviNyIR-fI) video by MathematicalMonk (Jeff Miller). – Dave Dec 04 '21 at 15:08

1 Answers1

3

Boosting is just a special way to fit some model by trying to successively/repeatedly "explain" the residual. See a minimal example for a linear booster here. So essentially the xgboost model with gblinear will be a "normal" linear model.

From your question I would not expect that a linear booster delivers good results against the backdrop of your problem. I think if you want to use other models than NN, you have several options.

  • Use boosting with "tree based" (gbtree). This will fit a model which is essentially "non-parametric". However, the success of this strategy will depend on the explanatory power of your "x" variables (which you did not mention in the question).
  • Use linear-style models with more general structure (i.e. generalised additiove models, GAM). These model family is extremely well suited to fit highly non-linear functions. Find a minimal example here. There are GAM for Python and R. My minimal example would yield the following result (see figure). The blue line is a "normal" linear model, the black line is a fitted GAM model (red is the ground truth).
  • If you know the parameterization of your model (more or less), you could also define a linear model (with proper parameterization) to solve your model. However, this seems to be a less attractive solution. It can be daunting to find a proper representation for the data.

Introduction to Statistical Learning (ISL) provides a good overview of GAM models if you want to have a further look. There are also Python examples.

enter image description here

Peter
  • 7,277
  • 5
  • 18
  • 47
  • Thank you for your answer Peter. I asked because we got an assignment in University. Everybody got an ML task. I got XGBoost where I used it successfully for classification. I've talked with a college that has LSTM for a Timeseries problem. Then I saw that XGBoost can do regression and gave it a shot with simple functions (`kx+d` style). After some success with those, I tried to get into more complicated ones and hit a wall. So that was my motivation for this question. I'll upload a notebook somewhere to make my question more detailed. Your answer is still highly appreciated. – Harrys Kavan Dec 05 '21 at 09:14