Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur or how likely it is that a proposition is true.
Questions tagged [probability]
298 questions
18
votes
4 answers
XGBoost outputs tend towards the extremes
I am currently using XGBoost for risk prediction, it seems to be doing a good job in the binary classification department but the probability outputs are way off, i.e., changing the value of a feature in an observation by a very small amount can…
alwayslearning
- 181
- 4
12
votes
1 answer
Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?
1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model)
2)…
Gale
- 403
- 1
- 4
- 13
9
votes
4 answers
Loss Function for Probability Regression
I am trying to predict a probability with a neural network, but having trouble figuring out which loss function is best. Cross entropy was my first thought, but other resources always talk about it in the context of a binary classification problem…
ahbutfore
- 191
- 1
- 2
8
votes
2 answers
Confidence intervals for binary classification probabilities
When evaluating a trained binary classification model we often evaluate the misclassification rates, precision-recall, and AUC.
However, one useful feature of classification algorithms are the probability estimates they give, which support the…
berrypy
- 213
- 3
- 7
6
votes
1 answer
Data-generating probability distribution, probability distribution of a dataset, in ML
In Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016 Nov 10.
http://thuvien.thanglong.edu.vn:8081/dspace/bitstream/DHTL_123456789/4227/1/10.4-1.pdf
p. 102 (for example), it is said that with Unsupervised Learning, one usually wants…
SheppLogan
- 322
- 4
- 11
6
votes
3 answers
Why does the naive bayes algorithm make the naive assumption that features are independent to each other?
Naive Bayes is called naive because it makes the naive assumption that features have zero correlation with each other. They are independent of each other. Why does naive Bayes want to make such an assumption?
user781486
- 1,305
- 2
- 16
- 18
6
votes
2 answers
Xgboost predict probabilities
When using the python / sklearn API of xgboost are the probabilities obtained via the predict_proba method "real probabilities" or do I have to use logit:rawand manually calculate the sigmoid function?
I wanted to experiment with different cutoff…
Georg Heiler
- 327
- 2
- 3
- 13
5
votes
2 answers
How useful is Bayesian Inference
Last few months, I had been exposed to Bayesian Inference in ML course
With further investigation, I come to place where there is MCMC technique to simulate the posterior distribution.
It seems interesting. However, I am not sure if it is really…
chris tan
- 53
- 3
5
votes
4 answers
How to get probabilities values with keras?
tensorflow version = '1.12.0'
keras version = '2.1.6-tf'
I'm using keras with tensorflow backend.
I want to get the probabilities values of the prediction.
I want the probabilities to sum up to 1.
I tried using 'softmax' and…
KarmaPl
- 51
- 1
- 3
5
votes
3 answers
How to convert an array of numbers into probability values?
I would like some help with respect to certain numerical computation. I have certain arrays which look like:
Array 1:
[0.81893085, 0.54768653, 0.14973508]
Array 2:
[0.48078357, 0.92219683, 1.02359911]
Each of the three numbers in the array…
Krishna Shiva
- 61
- 1
- 1
- 3
5
votes
1 answer
How does binary cross entropy work?
Let's say I'm trying to classify some data with logistic regression.
Before passing the summed data to the logistic function (normalized in range $[0,1]$), weights must be optimized for desirable outcome. In order to find optimal weights for…
ShellRox
- 389
- 3
- 12
4
votes
1 answer
Using softmax for multilabel classification (as per Facebook paper)
I came across this paper by some Facebook researchers where they found that using a softmax and CE loss function during training led to improved results over sigmoid + BCE. They do this by changing the one-hot label vector such that each '1' is…
Steve Ahlswede
- 181
- 3
4
votes
3 answers
Notation for features (general notation for continuous and discrete random variables)
I'm looking for the right notation for features from different types.
Let us say that my samples as $m$ features that can be modeled with $X_1,...,X_m$. The features Don't share the same distribution (i.e. some categorical, some numerical, etc.).…
Yael M
- 41
- 2
4
votes
3 answers
What is the meaning of likelihood?
I am studying Bayes probability applied to machine learning, and I have encoutered the concept of likelihood, which I don't understand.
I have seen that the Bayes rule is:
$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$
where $P(B|A)P(A)$ is the conditional…
J.D.
- 841
- 4
- 15
- 29
4
votes
1 answer
Relation between an underlying function and the underlying probability distribition function of data
I heard and read a lot of times the following statements and got a lot of confusion over time.
Statement 1: The goal of machine learning is to get a function from
the given data
Statement 2: The goal of machine learning is to find the…
hanugm
- 157
- 1
- 9