7

As per my understanding through books & Google Search,

GOSS (Gradient-Based One Side Sampling) is a novel sampling method that downsamples the instances on the basis of gradients. As we know instances with small gradients are well trained (small training error) and those with large gradients are undertrained. A naive approach to downsample is to discard instances with small gradients by solely focussing on instances with large gradients but this would alter the data distribution. In a nutshell, GOSS retains instances with large gradients while performing random sampling on instances with small gradients. Source

LightGBM uses a novel technique of Gradient-based One-Side Sampling (GOSS) to filter out the data instances for finding a split value while XGBoost uses pre-sorted algorithm & Histogram-based algorithm for computing the best split.

Can someone please explain the math behind GOSS?

Oxbowerce
  • 7,077
  • 2
  • 8
  • 22
Pluviophile
  • 3,520
  • 11
  • 29
  • 49

2 Answers2

2

Wang et al., (2019) have provided a nice and clear explanation. Please, check out their paper to find the answer you are looking for:

Part II. BAYESIAN OPTIMIZED LIGHTGBM

Section: A. The Principle of the LightGBM

Wang, R., Liu, Y., Ye, X., Tang, Q., Gou, J., Huang, M., & Wen, Y. (2019). Power System Transient Stability Assessment Based on Bayesian Optimized LightGBM. 2019 IEEE 3Rd Conference On Energy Internet And Energy System Integration (EI2). doi: 10.1109/ei247390.2019.9062027

0

GOSS is a technique used in LightGBM to select a subset of the data to use when training a gradient boosting model. It works by first sorting the training data by the gradients of the loss function with respect to the current model, and then selecting a subset of the data based on the magnitude of these gradients.

To understand the math behind GOSS in more detail, it's helpful to first review some of the key concepts behind gradient boosting. In gradient boosting, a model is trained by iteratively adding weak learners (e.g., decision trees) to the model, with each new learner being trained on the residual errors of the previous learners. This process continues until the model reaches a predefined stopping criterion.

At each iteration, GOSS selects a subset of the training data based on the gradients of the loss function with respect to the current model.

In detail, GOSS uses a two-step process to determine the subset of data instances to consider for split points:

  1. For each data instance, the algorithm computes its gradient and adds it to a sorted list. This list is then divided into two parts: the top (i.e. largest) $K$ gradients, and the bottom (i.e. smallest) $N-K$ gradients. The top $K$ gradients are considered large gradients and the bottom $N-K$ gradients are considered small gradients .

  2. For the large gradients , the algorithm includes all of the corresponding data instances in the subset of data instances to consider for split points. For the small gradients, the algorithm randomly samples a fixed number of data instances to include in the subset. The number of data instances to sample is determined by a parameter called bagging fraction.

In summary, the basic idea behind GOSS is to sample instances in a way that takes into account the gradient of the loss function with respect to the predictions made by the model. This is done by retaining a certain percentage of instances with the largest gradients, while randomly sampling the remaining instances. This allows the model to focus on instances that are more important for training, while still maintaining a diverse training set.

The exact details of the GOSS algorithm can be found in the LightGBM paper, which is available here: LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In particular, the GOSS algorithm is described in Section 3.3 of the paper.

Pluviophile
  • 3,520
  • 11
  • 29
  • 49