Questions tagged [variance]

122 questions
31
votes
5 answers

Why underfitting is called high bias and overfitting is called high variance?

I have been using terms like underfitting/overfitting and bias-variance tradeoff for quite some while in data science discussions and I understand that underfitting is associated with high bias and over fitting is associated with high variance. But…
Vaibhav Thakur
  • 2,333
  • 3
  • 11
  • 9
23
votes
3 answers

What is the meaning of term Variance in Machine Learning Model?

I am familiar with terms high bias and high variance and their effect on the model. Basically your model has high variance when it is too complex and sensitive too even outliers. But recently I was asked the meaning of term Variance in machine…
Sociopath
  • 1,223
  • 2
  • 11
  • 27
9
votes
2 answers

RL Advantage function why A = Q-V instead of A=V-Q?

In RL Course by David Silver - Lecture 7: Policy Gradient Methods, David explains what an Advantage function is, and how it's the difference between Q(s,a) and the V(s) Preliminary, from this post: First recall that a policy $\pi$ is a mapping…
Kari
  • 2,686
  • 1
  • 17
  • 47
9
votes
3 answers

How to estimate the variance of regressors in scikit-learn?

Every classifier in scikit-learn has a method predict_proba(x) that predicts class probabilities for x. How to do the same thing for regressors? The only regressor for which I know how to estimate the variance of the predictions is Gaussian process…
Vladislav Gladkikh
  • 1,086
  • 9
  • 18
8
votes
1 answer

Question on bias-variance tradeoff and means of optimization

So I was wondering how does one, for example, can best optimize the model they are trying to build when confronted with issues presented by high bias or high variance. Now, of course, you can play with the regularization parameter to get to a…
Zer0k
  • 155
  • 5
8
votes
3 answers

Overfitting Naive Bayes

My question is what are potential reasons for Naive Bayes to perform well on a train set but poorly on a test set? I am working with a variation of the 20news dataset. The dataset has documents, which are represented as "bag of words" with no…
6
votes
3 answers

What are bias and variance in machine learning?

I am studying machine learning, and I have encountered the concept of bias and variance. I am a university student and in the slides of my professor, the bias is defined as: $bias = E[error_s(h)]-error_d(h)$ where $h$ is the hypotesis and…
J.D.
  • 841
  • 4
  • 15
  • 29
5
votes
2 answers

Elimination of features based on high covariance without affecting performance?

I ran into a question where the answer ran me into a big doubt. Suppose we have a dataset $A=${$x1,x2,y$} in which $x1$ and $x2$ are our features and $y$ is the label. Also, suppose that the covariance matrix between these three random variables are…
5
votes
2 answers

Bagging vs Boosting, Bias vs Variance, Depth of trees

I understand the main principle of bagging and boosting for classification and regression trees. My doubts are about the optimization of the hyperparameters, especially the depth of the trees First question: why we are supposed to use weak learners…
K.Hua
  • 153
  • 6
5
votes
1 answer

Variance in statistics vs machine learning

In basic statistics, variance is a measure the variability of the data about its mean. In machine learning, variance is a measure of learning the training data too well/capturing the noise in the data/oversensitivity to the small local fluctuations…
MAA
  • 151
  • 1
5
votes
2 answers

Evaluation of regression models with different evaluations (MSE, variance, VAF etc.)

When comparing several regression models in terms of quality, it seems like most have agreed on the MSE. There are also papers comparing "variance" and "variance accounted for (VAF)". However, there seems to be a controversial opinion about the…
5
votes
2 answers

How to decide what threshold to use for removing low-variance features?

How to decide what threshold to use for removing low-variance features? Particularly, I have 100000 features and the variances look like: Could I e.g. take the average and use it to split this to ~half? Or some other method of grouping?
mavavilj
  • 416
  • 1
  • 3
  • 12
4
votes
2 answers

Trade off between Bias and Variance

What are the best ideas or approaches to trade off between bias and variance in Machine Learning models.
deepguy
  • 1,441
  • 7
  • 18
  • 38
4
votes
2 answers

How can I calculate mean and variance incrementally?

Say I have a set S of values, and want to store in a database some summary information about that set, so that later when I acquire a new value v I can make a reasonable estimate of what the summary information would be about the set S ∪ {v} ---…
dubiousjim
  • 181
  • 6
4
votes
2 answers

How do you set sigma for the Gaussian similarity kernel?

Let's say we have $n$ two-dimensional vectors: $$\mathbf{x}_1,\dots,\mathbf{x}_i,\dots,\mathbf{x}_n=(x_{1_1},x_{1_2})^T,\dots,(x_{i_1},x_{i_2})^T,\dots,(x_{n_1},x_{n_2})^T$$ How do you set $\sigma$ for the Gaussian similarity…
Diego Sacconi
  • 45
  • 1
  • 6
1
2 3
8 9