Questions tagged [smote]

Synthetic Minority Oversampling Technique (SMOTE) is an approach used for dealing with imbalanced datasets before running them through machine learning models.

Synthetic Minority Oversampling Technique (SMOTE) is an approach used for dealing with imbalanced datasets before running them through machine learning models. Common techniques for dealing with imbalanced datasets include over or under sampling either the minority or majority class. In this case, as the name suggests, SMOTE is a technique used to oversample the minority class. SMOTE can thereby create more balanced datasets that are less influenced by the majority class.

99 questions

votes

3 answers

How do you apply SMOTE on text classification?

Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique used in an imbalanced dataset problem. So far I have an idea how to apply it on generic, structured data. But is it possible to apply it on text classification problem?…

class-imbalance text smote

asked Feb 10 '18 at 11:18

catris25

votes

4 answers

Train/Test Split after performing SMOTE

I am dealing with a highly unbalanced dataset so I used SMOTE to resample it. After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it. However, I am…

machine-learning model-evaluations class-imbalance smote

asked Dec 09 '16 at 00:19

Edamame

2,705
5
23
32

votes

2 answers

Oversampling/Undersampling only train set only or both train and validation set

I am working on a dataset with class imbalance problem. Now, I know one needs to oversample or undersample only the train set and not the test set. But my issue is: whether to oversample the train set and then split it to train and validate set or…

data training smote

asked Oct 17 '19 at 08:21

yamini goel

votes

4 answers

Why SMOTE is not used in prize-winning Kaggle solutions?

Synthetic Minority Over-sampling Technique SMOTE, is a well known method to tackle imbalanced datasets. There are many papers with a lot of citations out-there claiming that it is used to boost accuracy in unbalanced data scenarios. But then, when I…

machine-learning class-imbalance kaggle smote

asked Dec 27 '21 at 16:50

Carlos Mougan

6,011
2
15
45

votes

1 answer

Why you shouldn't upsample before cross validation

I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE technique. I have created a model using AdaBoost…

python scikit-learn cross-validation class-imbalance smote

asked Sep 22 '20 at 11:40

sums22

votes

1 answer

What is the best performance metric used in balancing dataset using SMOTE technique

I used smote technique to oversample my dataset and now I have a balanced dataset. The problem I faced is that the performance metrics; precision, recall, f1 measure, accuracy in the imbalanced dataset are better performed than with balanced…

python class-imbalance performance smote

asked Jul 31 '18 at 23:23

Rawia Sammout

votes

2 answers

Why class weight is outperforming oversampling?

I am applying both class_weight and oversampling (SMOTE) techniques on a multiclass classification problem and getting better results when using the class_weight technique. Could someone please explain what could be the cause of this difference?

multiclass-classification class-imbalance smote

asked May 26 '19 at 01:09

Sarah

votes

1 answer

SMOTE vs SMOTE-NC for binary classifier with categorical and numeric data

I am using Xgboost for classification. My y is 0 or 1 (true or false). I have categorical and numeric features, so theoretically, I need to use SMOTE-NC instead of SMOTE. However, I get better results with SMOTE. Could anyone explain why this is…

machine-learning classification class-imbalance smote smotenc

asked Sep 24 '19 at 09:46

RafalQA

votes

1 answer

How to avoid resampling part of pipeline on test data (imblearn package, SMOTE)

I am using the imblearn package to resample some data before applying other transformation/prediction techniques. Specfically, I am using SMOTE in a slightly unconventional way, as a data augmentation technique to upsample all classes rather than…

python scikit-learn class-imbalance smote

asked May 18 '18 at 21:37

asher1213

votes

3 answers

SMOTE and multi class oversampling

I have read that the SMOTE package is implemented for binary classification. In the case of n classes, it creates additional examples for the smallest class. Can I balance all the classes by running the algorithm n-1 times?

python class-imbalance sampling smote

asked Nov 11 '17 at 23:20

atos

votes

1 answer

How does SMOTE work for dataset with only categorical variables?

I have a small dataset of 977 rows with a class proportion of 77:23. For the sake of metrics improvement, I have kept my minority class ('default') as class 1 (and 'not default' as class 0). My input variables are categorical in nature. So, the…

machine-learning deep-learning neural-network classification smote

asked Feb 20 '22 at 10:32

The Great

2,525
16
40

votes

2 answers

SMOTE for multilabel classification

I have a dataset with 77 different labels. Each sample has one or more of these labels. I did some data analysis and found out that the dataset is highly imbalanced - there are a large number of examples that have a particular label, whereas the…

classification multilabel-classification smote

asked Feb 06 '20 at 01:04

Aishwarya A R

votes

1 answer

Why removing rows with NA values from the majority class improves model performance

I have an imbalanced dataset like so: df['y'].value_counts(normalize=True) * 100 No 92.769441 Yes 7.230559 Name: y, dtype: float64 The dataset consists of 13194 rows and 37 features. I have tried numerous attempts to improve the…

python pandas class-imbalance missing-data smote

asked Jan 22 '21 at 16:08

sums22

votes

1 answer

SMOTE for regression

I am looking into upsampling an imbalanced dataset for a regression problem (Numerical target variables) in python. I attached paper and R package that implement SMOTE for regression, can anyone recommend a similar package in Python? Otherwise, what…

r sampling smote imbalanced-learn smotenc

asked Mar 03 '20 at 17:51

thereandhere1

votes

1 answer

Combining 'class_weight' with SMOTE

This might sound a weird question, but I could not find enough details in sklearn documentation about 'class_weight'. Can we first oversample the dataset using SMOTE and then call the classifier with the 'class_weight' option? As my testing set is…

scikit-learn multiclass-classification class-imbalance smote imbalanced-learn

asked Aug 30 '19 at 17:55

Sarah

2 3 4 5 6 7 Next