1

so I'am doing a logistic regression with statsmodels and sklearn. My result confuses me a bit. I used a feature selection algorithm in my previous step, which tells me to only use feature1 for my regression.

The results are the following:

enter image description here

So the model predicts everything with a 1 and my P-value is < 0.05 which means its a pretty good indicator to me. But the accuracy score is < 0.6 what means it doesn't say anything basically.

Can you give me a hint how to interpret this? It's my first data science project with difficult data.

My code:

X = df_n_4["feat1"]
y = df_n_4['Survival']

# use train/test split with different random_state values
# we can change the random_state values that changes the accuracy scores
# the scores change a lot, this is why testing scores is a high-variance estimate
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
print(len(y_train)," Testdaten")

# check classification scores of logistic regression
logit_model = sm.Logit(y_train, X_train).fit()
y_pred = logit_model.predict(X_test)
print('Train/Test split results:')
plt.title('Accuracy Score:{}, Variablen: feat1'.format(round((accuracy_score(y_test, y_pred.round())),3)))
cf_matrix = confusion_matrix(y_test, y_pred.round())
sns.heatmap(cf_matrix, annot=True)
plt.ylabel('Actual Szenario');
plt.xlabel('Predicted Szenario');
plt.show()
print(logit_model.summary2())
grumpyp
  • 157
  • 5
  • 1
    Does this answer your question? [How to interpret my logistic regression result?](https://datascience.stackexchange.com/questions/88069/how-to-interpret-my-logistic-regression-result) – Oxbowerce Jan 17 '21 at 10:11
  • 1
    No unfortunatly not @Oxbowerce – grumpyp Jan 17 '21 at 10:26
  • Make sure you add an intercept to the model (Not added automatically in statsmodels). If this does not help, use „shrinkage“ (e.g. from sklearn) or switch to another method than Logit. https://stats.stackexchange.com/questions/440242/statsmodels-logistic-regression-adding-intercept – Peter Jan 17 '21 at 18:24
  • https://datascience.stackexchange.com/a/74445/71442 – Peter Jan 17 '21 at 18:25
  • Read here, why a constant in linear models is usually needed: https://datascience.stackexchange.com/questions/80812/removing-constant-from-the-regression-model/80822?noredirect=1#comment92035_80822 – Peter Jan 17 '21 at 18:27
  • @Peter I use a `logistic` model, is that the same than what you mean by `linear`? – grumpyp Jan 18 '21 at 09:53
  • Logit is not exactly the same as a pure linear (ols) model, but both would usually need an intercept term. One reason for this is that the models are linear in parameters like y=a+bx+u – Peter Jan 18 '21 at 20:21
  • @grumpyp your model is only predicting the class 1 which should tell you that something is wrong with the way you trained your model. It is not predicting class 2. What kind of feature selection have you done? If you could add that part of code as well! – spectre Oct 24 '21 at 11:26
  • Check the probability outputs of your model, not just the classes. Remember that a logistic regression does not explicitly perform classification; logistic regression gives you probability values that you can compare to a threshold (often $0.5$ is the software default) to get a category, though this may not be what you want to do [(1)](https://www.fharrell.com/post/classification/) [(2)](https://www.fharrell.com/post/class-damage/). – Dave Oct 24 '21 at 12:56

1 Answers1

0

Something's wrong with your feature selection tool: p-value is NaN, confidence interval includes $0$. Confusion matrix shows that all observations are predicted as Class 1. How many explanatory variables do you have? Try using all of them instead of just one. Are you sure

logit_model = sm.Logit(y_train, X_train).fit()

is correct? Shouldn't it be the other way around, logit_model = sm.Logit(X_train, y_train).fit()?

Alex
  • 757
  • 6
  • 17
  • I think it's correctly like `logit_model = sm.Logit(y_train, X_train).fit()`. What do you mean with your `confidence interval`? In my model where I use all `features` it works better. But if I use `sklearn` and one `feature` it works as well. It's all so confusing! – grumpyp Jan 17 '21 at 13:02
  • obviously from what you wrote your model with a single feature doesn't work at all – Alex Jan 17 '21 at 13:02
  • Can you tell me why? @Alex – grumpyp Jan 18 '21 at 09:43
  • I don't know but the confusion matrix shows it – Alex Jan 18 '21 at 18:45