Highest Voted 'oversampling' Questions - Data Science Stack Exchange

4

votes

3 answers

Timing of applying random oversampling on the dataset

I tried to learn classification using machine learning algorithms. I went through Breast Cancer - EDA, Balancing and ML the notebook. In this notebook Random Oversampling had been implemented. However, when the person did the oversampling he did it…

asked Sep 05 '22 at 04:19

Encipher

359
1
9

2

votes

3 answers

unbalanced data on train set and test set

I already have 2 datasets. One to use for training and one for testing. Both datasets are unbalanced (with similar percentages), with around 90% of label 1 . Will it be useful to balance the data if the test set is very unbalanced anyway? Instances…

machine-learning training sentiment-analysis oversampling

asked Mar 08 '23 at 01:53

mikeman

21
1

1

vote

2 answers

How to properly use oversampling without inflating results?

I am using with a tiny private dataset (over 192 samples) with 4 classes. A preprocessing step is trivial in order to do any classification. Among feature selection and extraction techniques, i decided to apply oversampling (SMOTE). Here is what i…

python classification preprocessing smote oversampling

asked Apr 07 '21 at 19:24

heresthebuzz

395
3
9

1

vote

1 answer

Does synthetic data be over sampled as well?

I'm building a binary text classifier, the ratio between the positives and negatives is 1:100 (100 / 10000). By using back translation as an augmentation, I was able to get 400 more positives. Then I decided to do up sampling to balance the data. Do…

classification class-imbalance data-augmentation oversampling

asked Mar 30 '22 at 15:09

guestmember123456790

11
1

1

vote

0 answers

oversampling multivariate time series data

For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands) I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap…

python classification time-series multiclass-classification oversampling

asked Oct 08 '21 at 22:02

ala

11
2

0

votes

2 answers

How to increase the Accuracy after Oversampling?

The Accuracy before ovesampling : On Training : 98,54% On Testing : 98,21% The Accuracy after ovesampling : On Training : 77,92% On Testing : 90,44% What does mean this and how to increase the accuracy ? Edit: Classes before…

accuracy oversampling

asked Jun 20 '21 at 16:31

Mimi

45
7

0

votes

0 answers

How to use SMOTE to rebalance multiclass dataset when the target is one hot encoded with pd.get_dummies?

I'm using a multiclass dataset (cic-ids-2017), which is very imbalanced. I have already encoded the categorical feature (which is the target) using OneHotEncoder. I tried to use SMOTE oversampling method to balance the data with pipeline: X =…

class-imbalance smote one-hot-encoding oversampling

asked Jun 02 '21 at 17:17

Mimi

45
7

0

votes

2 answers

Is it good practice to use SMOTE when you have a data set that has imbalanced classes when using BERT model for text classification?

I had a question related to SMOTE. If you have a data set that is imbalanced, is it correct to use SMOTE when you are using BERT? I believe I read somewhere that you do not need to do this since BERT take this into account, but I'm unable to find…

bert smote oversampling

asked Mar 25 '21 at 16:32

QMan5

133
5

0

votes

0 answers

LSTM, seq to classification, why training on balanced data set yields such a good result?

I am using LSTM to classify the origin of people's names. The input data is not balanced over target classes, so I used oversampling to balance it. Now, I defined a simple LSTM model as follows: LSTM_with_Embedding( (embedding): Embedding(32, 10,…

neural-network lstm overfitting oversampling

asked Jun 02 '23 at 22:27

user2856069

1

0

votes

1 answer

SMOTENC oversampling without one-hot encoding

I'm using SMOTENC to oversample an imbalanced-dataset. I thought the point of SMOTENC was to give the option to oversample categorical features without one-hot encoding them. The reason I don't want to one-hot encode is to avoid Curse of…

one-hot-encoding imbalanced-data catboost oversampling smotenc

asked Feb 17 '23 at 17:48

Boots

1

0

votes

1 answer

A question about overfitting and SMOTE

So I understand that overfitting is when you have for example a good accuracy for the training dataset and bad one for the testing dataset, but why would I even check the accuracy for the training dataset? If I have a good accuracy on the testing…

machine-learning overfitting oversampling

asked Dec 30 '22 at 13:06

FjkgB

89
7

0

votes

0 answers

Prior probability shift vs oversampling/undersampling imbalanced datasets

I'm trying to understand what prior probability shift (label drift) in data means. If I understand it correctly then it means that distribution of labels in training dataset differs compared to distribution of labels in production environment. This…

class-imbalance imbalanced-data oversampling

asked Nov 27 '22 at 19:53

user60175

113
3

0

votes

0 answers

Oversampling SMOTE sampling strategy ratio

I have 36168 data with imbalanced target. 88,3% is 0 (31970 data) and 11,7% is 1 (4198 data). I want to apply oversampling using SMOTE. Is it ideal to make it the same amount of data so the 0 & 1 target contains 31970 data? Because i think in the…

machine-learning python class-imbalance smote oversampling

asked Oct 12 '22 at 14:27

Jovian Aditya

138
6

0

votes

1 answer

Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets

Reading the following article: https://kiwidamien.github.io/how-to-do-cross-validation-when-upsampling-data.html There is an explanation of how to use from imblearn.pipeline import make_pipeline in order to perform a cross-validation on an…

class-imbalance pipelines imbalanced-learn methodology oversampling

asked Jan 01 '22 at 19:14

PwNzDust

149
3

-1

votes

1 answer

Is my model classification overfitting?

Is this possible to be just a bad draw on the 20% or is it overfitting? I'd appreciate some tips on what's going on.

oversampling

asked May 15 '22 at 17:28

user135670

Questions tagged [oversampling]