GOSS is a technique used in LightGBM to select a subset of the data to use when training a gradient boosting model. It works by first sorting the training data by the gradients of the loss function with respect to the current model, and then selecting a subset of the data based on the magnitude of these gradients.
To understand the math behind GOSS in more detail, it's helpful to first review some of the key concepts behind gradient boosting. In gradient boosting, a model is trained by iteratively adding weak learners (e.g., decision trees) to the model, with each new learner being trained on the residual errors of the previous learners. This process continues until the model reaches a predefined stopping criterion.
At each iteration, GOSS selects a subset of the training data based on the gradients of the loss function with respect to the current model.
In detail, GOSS uses a two-step process to determine the subset of data instances to consider for split points:
For each data instance, the algorithm computes its gradient and adds it to a sorted list. This list is then divided into two parts: the top (i.e. largest) $K$ gradients, and the bottom (i.e. smallest) $N-K$ gradients. The top $K$ gradients are considered large gradients and the bottom $N-K$ gradients are considered small gradients .
For the large gradients , the algorithm includes all of the corresponding data instances in the subset of data instances to consider for split points. For the small gradients, the algorithm randomly samples a fixed number of data instances to include in the subset. The number of data instances to sample is determined by a parameter called bagging fraction.
In summary, the basic idea behind GOSS is to sample instances in a way that takes into account the gradient of the loss function with respect to the predictions made by the model. This is done by retaining a certain percentage of instances with the largest gradients, while randomly sampling the remaining instances. This allows the model to focus on instances that are more important for training, while still maintaining a diverse training set.
The exact details of the GOSS algorithm can be found in the LightGBM paper, which is available here: LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In particular, the GOSS algorithm is described in Section 3.3 of the paper.