Questions tagged [binary-classification]

126 questions
5
votes
1 answer

How to choose the right threshold for binary classification?

I am currently working on the titanic dataset from Kaggle. The data set is imbalanced with almost 61.5 % negative and 38.5 positive class. I divided my training dataset into 85% train and 15% validation set. I chose a support vector classifier as…
4
votes
3 answers

Timing of applying random oversampling on the dataset

I tried to learn classification using machine learning algorithms. I went through Breast Cancer - EDA, Balancing and ML the notebook. In this notebook Random Oversampling had been implemented. However, when the person did the oversampling he did it…
4
votes
2 answers

Meaningfully compare target vs observed TPR & FPR

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$…
3
votes
1 answer

What does precision-recall curve and ROC curve tell us abouth threshold invariance

Consider a binary classification problem. Intuitively, a value for the area under the curve (for both curves) very close to 1, shows that the curve is almost L-shaped. Thus, this means that the value on y axis stays rather consistent despite changes…
3
votes
1 answer

How to combine binary classification with patient stratification?

I am working on a binary classification model (healthy/diseased) based on gene expression data of different patients. As a second task, I would like to stratify these patients and find subgroups. I expect that the summary pattern of different genes…
vhio
  • 31
  • 2
3
votes
3 answers

How are scores calculated for each class of binary classification

The formula for Precision is TP / TP + FP, but how to apply it individually for each class of a binary classification problem, For example here the precision, recall and f1 scores are calculated for class 0 and class 1 individually, I am not able…
3
votes
1 answer

How do you add negative class sample for binary classification?

How do you prepare the negative dataset for binary classification? Let us say that I am building a classifier that has to classify whether the input image is of a car or not. I already have a dataset that consists of thousands of cars. But what…
2
votes
1 answer

Changing model architecture doesn't impact results

I am currently learning binary classification. The problem is classifying positive and negative movie reviews. The dataset is 25,000 reviews with each review represented by 10,000 of the most used words. each review is transformed into multi-hot…
2
votes
1 answer

Finding research papers for a dataset

I found a breast cancer dataset on Kaggle. Here is the link - https://www.kaggle.com/datasets/reihanenamdari/breast-cancer I would like to how could I find out which research papers use this dataset for binary classification. So far I got only one…
Encipher
  • 359
  • 1
  • 9
2
votes
2 answers

Binary Classification with Very Small Dataset (<40 samples)

I'm trying to perform binary classification on a very small dataset, consisting of 3 negative samples and 36 positive samples. I've been testing different models from scikit-learn (logistic regression, random forest, svc, mlp). Depending on…
2
votes
0 answers

Obtaining threshold based rules for classification problem

Suppose there are X1...Xn numerical variables predicting a target variable Y (0 or 1) Objective: to obtain the best possible thresholds and combinations of X1...Xn that can predict Y Example: (X1>60 and X3<20) predicts Y=1 with 90%…
2
votes
2 answers

Is it vital to do label encoding with target variable

Should I always use label encoding while doing binary classification?
2
votes
2 answers

Which machine learning algorithms are more suitable for binary classification?

We know that there are many different types of classification algorithms. But among the different categories of classification algorithms, which algorithms are suitable for binary classification and which are suitable for more classes, and why?
AMZ
  • 143
  • 3
2
votes
3 answers

What could go wrong if I sample before classification?

I have a million entries in a table that I can use to train a binary classifier. Only 30 thousand of them are positive. Is there anything fundamentally wrong with selecting around 30 thousand negative cases uniformly and then training a binary…
Bruce
  • 186
  • 1
  • 8
2
votes
1 answer

Top 2% of scores of a binary classifier are 100% class 1

I have a binary classification model (Xgboost) that is supposed to be predicting whether a customer will be purchasing a service. Overall the metrics are satisfactory ~.67 AUC, ~30% precision and ~40% recall @ max F1, performance holds well out of…
Mouad_S
  • 121
  • 4
1
2 3
8 9