Questions tagged [probability-calibration]
48 questions
18
votes
4 answers
XGBoost outputs tend towards the extremes
I am currently using XGBoost for risk prediction, it seems to be doing a good job in the binary classification department but the probability outputs are way off, i.e., changing the value of a feature in an observation by a very small amount can…
alwayslearning
- 181
- 4
12
votes
1 answer
Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?
1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model)
2)…
Gale
- 403
- 1
- 4
- 13
7
votes
1 answer
Are calibrated probabilities always more reliable?
EDIT: Based on the answer below, I have updated the question and added more detail.
I have applied Dirichlet calibration to my fast-bert sentiment classification model, and I am struggling to really understand why/ if it is actually more reliable.…
Danyal Andriano
- 131
- 3
6
votes
2 answers
Probability Calibration : role of hidden layer in Neural Network
I try a simple Neural Network (Logistic Regression) to play with Keras.
In input I have 5,000 features (output of a simple tf-idf vectorizer) and in the output layer I just use a random uniform initialization and an $\alpha = 0.0001$ for $L_{2}$…
BimBimBap
- 81
- 1
- 3
5
votes
1 answer
XGBoost: how to adjust the probabilities of a binary classifier to match training data?
Training and testing data have around 1% positives, but the model predicts only around 0.1% as positives.
The model is an xgboost classifier.
I’ve tried calibration but it didn’t improve much. I also don’t want to pick thresholds since the final…
Henrique Nader
- 511
- 2
- 5
- 15
5
votes
2 answers
convert predict_proba results using class_weight in training
As my dataset is unbalanced(class 1: 5%, class 0: 95%) I have used class_weight="balanced" parameter to train a random forest classification model. In this way I penalize the misclassification of a rare positive cases.
rf =…
srl
- 51
- 1
- 2
3
votes
1 answer
why does my calibration curve for platts and isotonic have less points than my uncalibrated model?
i train a model using grid search then i use the best parameters from this to define my chosen model.
model = XGBClassifier()
pipeline = make_pipeline(model)
kfolds = StratifiedKFold(3)
clf = GridSearchCV(pipeline, parameters,…
Maths12
- 496
- 5
- 14
3
votes
1 answer
How to determine the correct target for classification probability when the observed samples are probabilities of each class?
I have data in which each event's outcome can be described by a probability of a categorical occurrence. For example, if all of the possible class outcomes are A, B, C, or D suppose in one event 7/10 people selected category A, 2/10 selected…
user3327134
- 33
- 3
3
votes
1 answer
I have 3 graphs of a binary Logistic Regression that I want to understand better what is happening and learn of a strategy to make the model better
My problem is the following: I have a binary Logistic Regression model with a very imbalanced dataset that outputs the percentage of the prediction. As can be seen in the images, as the threshold is increased there's a certain point it stops…
Gabriel Almeida
- 31
- 2
3
votes
0 answers
How to explain a Calibration Plot for many models?
I have a heavy imbalanced dataset with a classification problem. I try to plot the Calibration Curve from the sklearn.calibration package. In specific, I try the following models:
rft = RandomForestClassifier(n_estimators=1000)
svc = SVC(probability…
Tasos
- 3,860
- 4
- 22
- 54
3
votes
1 answer
which loss function (if any) optimizes the calibration graph
The calibration graph is the predicted versus actual probability(see http://scikit-learn.org/stable/modules/generated/sklearn.calibration.calibration_curve.html). Is it possible to optimize the linearity of that curve in terms of a loss function?…
Hanan Shteingart
- 329
- 1
- 7
2
votes
1 answer
Calibrating probability thresholds for multiclass classification
I have built a network for the classification of three classes. The network consists of a CNN followed by two fully-connected layers. The CNN consists of convolutional layers, followed by batch normalization, a RELU activation, max pooling and drop…
machinery
- 236
- 2
- 9
2
votes
1 answer
How can i tell if my model is overfitting from the distribution of predicted probabilities?
all,
i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from the model (after calibrating using calibrated…
Maths12
- 496
- 5
- 14
2
votes
0 answers
Imbalanced text classification by oversampling: correction of class predicted probability by prior probability
My dataset has 3 class and 900 examples for training. Class distribution is 255, 185, and 460.
I found that if I oversample (random) the training data then I have to correct/calibrate the predicted probability of the test data because after…
user3363813
- 261
- 2
- 6
2
votes
0 answers
Platt Scaling vs Isotonic Regression for reliability curve
I am learning classifier probability calibrations and have calibrated an eleastic net model using both Platt scaling and isotonic regression. As you can see in the attached image Platt scaling (on the bottom) better approximates the diagonal line…
yl637
- 21
- 2