2

I build an SVM classifier but get an inverse ROC curve. The AUC is only 0.08. I've used the same datasets to build a Logistic Regression classifier and a Decision Tree classifier, and the ROC curves for them look good.

Here are my codes for SVM:

from sklearn.svm import SVC
svm = SVC(max_iter = 12, probability = True)
svm.fit(train_x_sm, train_y_sm)
svm_test_y = svm.predict(X = test_x)
svm_roc = plot_roc_curve(svm, test_x, test_y)
plt.show()

Could anyone tell me what is wrong in my codes?

MMMMMay
  • 31
  • 3

2 Answers2

2

For any classification problem if AUC<0.5, you are not performing better than random(0.5).

Reason could be:

  • Your classifier is over-fitted on the training set and performs very poorly on the test set.
  • Your test sample might be very small.
  • Your classifier is giving you the probability that the class is -1. Thus, you get a prediction (close to) 0 for a class 1, and 1 for a class -1 prediction. If your ROC method expects positive (+1) predictions to be higher than negative (-1) ones, you get a reversed curve.

A valid strategy is to simply invert the predictions as:

invert_prob=1-prob 

Reference: ROC

prashant0598
  • 1,471
  • 1
  • 11
  • 21
2

One potential fix is to remove max_iter = 12 (which would set it to the scikit learn default of max_iter=-1). Using such a low value can lead to bad scores as you can see from the following example:

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import plot_roc_curve
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

model = SVC(max_iter=12, probability = True)
model.fit(X_train, y_train)

plot_roc_curve(model, X_test, y_test)

results in

ROC with max_iter=12

However, executing exactly the same code (max_iter=12 still) again gives a totally different result:

ROC max_iter=12

After removing max_iter=12 the code consistently produces higher AUCs around $0.95$ to $0.99$.

Jonathan
  • 5,310
  • 1
  • 7
  • 21
  • I set a small value for max_iter because when max_iter = -1, the program will take a really long time and I don't know if it will stop. After I published this question, I tried to change the kernel of SVM to sigmoid instead of the default rbf, and this time I got a good ROC with the AUC equals to 0.94. So maybe the kernel is the issue? – MMMMMay Jul 28 '20 at 18:27
  • @MMMMMay Have a look at my example above: with `max_iter=12` your results can fluctuate a lot. What happens if you use rbf as kernel and fit the model 10 times? Do you always get a low AUC? – Jonathan Jul 28 '20 at 18:40
  • Yes, the AUC is always low. – MMMMMay Jul 28 '20 at 19:06