R-square or adjusted R-square for one variable model?

Question

I have model like y=mx. Since the adjusted R2 tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable and I have only one independent variable, do I need to consider my adjusted r-square value? Or, r-square is good for this type of model?

score 1 · Answer 1 · answered Nov 21 '19 at 15:44

1

They're going to be very similar (practically the same), for a model with only one independent variable. So I'd say it doesn't matter without understanding better your purpose in using R2 / Adj R.

answered Nov 21 '19 at 15:44

Chris Umphlett

168
4

score 1 · Answer 2 · answered Nov 21 '19 at 15:47

Your interpretation of R² is not correct.

R2 tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable

R² does not perform any variable selection - it [...]

is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

However, there is often a misconception about R² - It does not tell you if your model is correctly specified (e.g. Homoscedasticity, No Autocorrelation etc..) nor it does tell you if your regressor is significant.

Extreme High R² can also mean a spurious regression (as the model is not correctly specified)

Nonetheless, deciding to use adj R² or R² is somewhat depending on your sample size. If you have enough observations (and you only have a small number of regressors (degree of freedoms)) then adj R² and R² are almost identical. Use it if you have only a few data points to estimate your model.

score 1 · Answer 3 · answered Dec 22 '19 at 16:13

Your question boils down to what the difference between $R^2$ and $\bar{R^2}$ is.

R-squared is given by: $$ R^2=1-(SSR/n)/(SST/n) .$$

The adjusted R-squared is given by: $$ \bar{R^2}=1- [ SSR/(n-k-1)]/[SST/(n-1) ].$$

$SSR$ is the sum of squared residuals $\sum u_i^2$,
$SST$ is the total sum of squares $(y-\bar{y})^2$,
$n$ is the number of observations,
and $k$ is the number of independent variables (the number of $x$ variables).

So essentially, the adjusted R-squared "adjusts" for the degree of freedem in your model. This is done by introducing a "penalty" for adding more independent variables $k$.

It is easy to write this in R:

# Regression using mtcars data
reg = lm(mpg~cyl,data=mtcars)

# Define n, k
n = length(mtcars$mpg)
k = nrow(mtcars)-1-df.residual(reg)

# Calculate SSR, SST
ssr = sum(resid(reg)^2)
sst = sum((mtcars$mpg - mean(mtcars$mpg))^2)

# Calculate r2, r2_bar
r2  = 1-(ssr/n)/(sst/n)
r2_bar = 1-(ssr/(n-k-1))/(sst/(n-1))

# Compare results
r2
summary(reg)$r.squared
r2_bar
summary(reg)$adj.r.squared

Adjustment for the degree of freedom in the model is used because when you add more $x$ variables to your model, the new variables may probably not help to explain $y$ (so no improvement whatsoever in this case). However, after adding more variables to the model, $SSR$ falls, but also the degree of freedom falls.

So $R^2$ can be a little misleading while $\bar{R^2}$ provides - because of adjustment by the degree of freedom - a better guidance when comparing (nested) models with different $k$.

In the little exercise below, I add a "noisy" variable ($x_2$) which does not help much to explain $y$. After adding $x_2$, $R^2$ goes up, while $\bar{R^2}$ goes down. This essentially is what $\bar{R^2}$ is supposed to do: To show if the reduction in the degrees of freedom is worth the improvement from adding a new variable.

# Use simulated data to compare r2, r2_bar
# Set seed for reproducible results
set.seed(81)

# Draw y, x1 from normal distribution
y = rnorm(100, mean = 0, sd = 1)
x1 = rnorm(100, mean = 0, sd = 1)

# Draw from uniform distribution 
# Lot of noise, little explanatory power
x2 = runif(100, min = 0, max = 1)

# Compare r2, r2_bar
summary(lm(y~x1))$r.squared
summary(lm(y~x1))$adj.r.squared
summary(lm(y~x1+x2))$r.squared
summary(lm(y~x1+x2))$adj.r.squared

R-square or adjusted R-square for one variable model?

3 Answers3

Linked