Differences between class_weight and scale_pos weight in LightGBM

Question

I have a very imbalanced dataset with the ratio of the positive samples to the negative samples being 1:496. The scoring metric is the f1 score and my desired model is LightGBM. I am using the sklearn implementation of LightGBM.

I have read the docs on the class_weight parameter in LightGBM:

class_weight : dict, 'balanced' or None, optional (default=None) Weights associated with classes in the form {class_label: weight}. Use this parameter only for multi-class classification task; for binary classification task you may use is_unbalance or scale_pos_weight parameters. The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). If None, all classes are supposed to have weight one. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

On using the class_weight parameter on my dataset, which is a binary classification problem, I got a much better score (0.7899) than when I used the recommended scale_pos_weight parameter (0.2388). Should I use the class_weight parameter or the scale_pos_weight parameter to balance the classes?

I haven't tried this myself, but the docs say to use 'is_unbalance', which I presume is the "class_weight" equivalent. Note that your probabilities will be haywire after this, and you will want to calibrate if you intend to use the probabilities. — ngopal, Nov 26 '19 at 23:24
Cross-posted at https://stats.stackexchange.com/q/413596/232706 — Ben Reiniger, Feb 24 '20 at 00:55

Mayank Mahawar · Answer 1 · 2021-01-22T22:15:03.947

13

You can achieve the same results by using either class_weight, scale_pos_weight and is_unbalanced for binary classification on unbalanced dataset.

Setting

class_weight = {0: (number of negative samples / number of positive samples), 
                1: (number of positive samples / number of negative samples)}

is the same as setting is_unbalance = True or scale_pos_weight = (no. of negative samples / number of positive samples).

edited Jan 22 '21 at 22:15

answered Feb 24 '20 at 00:06

Mayank Mahawar

131
1
4

The argument is `scale_pos_weight` not `scale_pos_weights`. – igorkf Jan 20 '21 at 23:53

Differences between class_weight and scale_pos weight in LightGBM

1 Answers1