0

I have an imbalanced data problem (prop. rate: 0.8571429 0.1428571) and for this reason, our sensitivity and PPV rates are very low. What do you recommend to fix this problem in R or in general? See the ML codes below,

new_data<-data # after the feature selection
index<- sample(n_rows, n_rows * 0.7)
trainset <- new_data[index,]
testset <- new_data[-index,]  
# imbalance data problem
train_sample <- ovun.sample(primary2 ~., data = trainset, method = "over", N = x5*2)$data
dim(train_sample) 
# ML model

mod_svm <- svm(primary2 ~ ., 
               data = train_sample, 
               type = 'C-classification',
               cost=15,kernel="radial") 

#model summary
summary(mod_svm)
svm_pre <- predict(mod_svm,testset)

#confusion matrix
xx<-table(testset$primary2,svm_pre)

TN<-xx[1]
FN<-xx[2]
FP<-xx[3]
TP<-xx[4]


CV.SVM.Sensitivity=TP / (TP+FN)
CV.SVM.Specifity= TN/(TN+FP)
CV.SVM.PPV= TP/(TP+FP)
CV.SVM.NPV= TN/(TN+FN)
CV.SVM.Accuracy = (TP+ TN)/(TP+TN+FP+FN)
CV.SVM.Sensitivity

 ```
Melih Aras
  • 101
  • 1
  • 1
    For a short overview of different types of approaches to this problem (not specific to SVMs or R) you might find the second part of this answer helpful: https://datascience.stackexchange.com/a/90564/84891 – Jonathan Jun 30 '21 at 08:47
  • Probably not the answer you're looking for, but the best way to improve performance in general is to improve the features. The main reason why imbalance causes many FN errors is because the model doesn't have sufficiently strong indications in the features to take the risk to predict the minority class. – Erwan Jun 30 '21 at 15:12

0 Answers0