1

Is plotting the log_odds vs independent variables an appropriate way to check the linearity in logistic regression with multiple predictors?

I feel the assumption should be that linearity between dependent and an independent variable exists when other independent variables are kept constant. But since in the data other independent variables are also varying, is such a scatter plot appropriate way of checking linearity?

Dave
  • 3,841
  • 1
  • 8
  • 23
Umang Garg
  • 11
  • 1
  • How would you calculate the log-odds that you want to plot? – Dave Oct 08 '22 at 13:04
  • Hi Dave, I have a time-series data, and we have monthly observations for 10 years, so I think calculating proportions group by each month's observations? i.e. we have default rates on banks loan protfolio, the data is by client 0 for no default, and 1 for default. and we observe these each month, so currently we calculate default rate by sum(default_ind)/count(default _ind) group by month. – Umang Garg Oct 08 '22 at 13:10

1 Answers1

1

The trouble with your plan is that you don’t know the log-odds. You can predict the log-odds through some method like what you suggested in the comments by grouping your time series by month, but then you’re comparing your predictions to other predictions. Even setting aside the influence of multiple features, if your plot shows nonlinearity, it will be a challenge to untangle if that is because of regression misspecification or because of an error from your other method of predicting.

Dave
  • 3,841
  • 1
  • 8
  • 23
  • is that a prediction or actual dataset? How would you suggest to deal with this? Is box tidwell test for each explanatory variable one by one appropriate ? – Umang Garg Oct 08 '22 at 13:20
  • Is what a prediction or an actual dataset? – Dave Oct 08 '22 at 13:23
  • the calculation of sum(default_ind)/count(default_ind) for each month? – Umang Garg Oct 08 '22 at 13:24
  • If you’re content to assess the probability of default just by looking at the number of defaults each month, why are you considering the features? – Dave Oct 08 '22 at 13:26
  • Umm, in general mostly to predict probability of default, logistic regression model is used with macroeconomic variables as predictor variables. Now I wanna check if the linearity assumption is met here so I need to estimate default rates like this to plot the log_odds. – Umang Garg Oct 08 '22 at 13:27
  • Then those two approaches use different approaches to predicting the probability of default. Why should they coincide? – Dave Oct 08 '22 at 13:31
  • That's actually one of the ways I read about to check linearity of logit, that is to group your dataset (but they said to choose 3-5 groups) and calculate log_odds like this and plot and check if they nearly fall on a line. But I think It should be appropriate only for the univariate case. – Umang Garg Oct 08 '22 at 13:33
  • 1
    Now that you’re seeing cracks in this approach, you might consider posting a more general question either here or on the [statistics Stack](https://stats.stackexchange.com) about assessing the linearity assumption in a logistic regression. – Dave Oct 08 '22 at 13:44
  • Sure. Doing the same – Umang Garg Oct 08 '22 at 13:59