Is there a way to add more importance to points which are more recent when analyzing data with xgboost?
Asked
Active
Viewed 4.4k times
3 Answers
43
Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.
data <- data.frame(feature = rep(5, 5),
year = seq(2011, 2015),
target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01
#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature),
label = data$target,
weight = weightsData)
wacax
- 3,370
- 4
- 22
- 45
-
Thanks for your answer - its really helpful to see a coded example. How does the magnitude of the weighting function coefficients affect the model? I looked through xgboost docs, but I can't find information about the significance of these numerical values. – kilojoules Dec 23 '15 at 19:29
-
didn't know this trick, nice. there's a little tidbit in the xgboost doc under the function `setinfo()`, though its not very descriptive – TBSRounder Dec 24 '15 at 15:39
23
On Python you have a nice scikit-learn wrapper, so you can write just like this:
import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)
More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit
lucidyan
- 331
- 2
- 5
-
-
1that should be `xgb.XGBClassifier()` in the second line of code but stackexchange does not allow edits of less than six characters... – Andre Holzner Jul 18 '17 at 10:05
15
You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.
TBSRounder
- 883
- 6
- 12
-
9The OP can simply give higher sample weights to more recent observations. Most packages allow this, [as does xgboost.](http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit) – Ricardo Magalhães Cruz Aug 11 '17 at 08:55