Questions tagged [imbalanced-learn]

Imbalanced Learn is a python package used specifically for dealing with imbalanced data in machine learning contexts.

Imbalanced Learn is a python package used specifically for dealing with imbalanced data in machine learning contexts. It contains various techniques for implementing both over-sampling and under-sampling methods on data sets. One popular method, included, is to use SMOTE for over-sampling. This package is fully compatible with scikit-learn.

50 questions

votes

3 answers

For imbalanced classification, should the validation dataset be balanced?

I am building a binary classification model for imbalanced data (e.g., 90% Pos class vs 10% Neg Class). I already balanced my training dataset to reflect a a 50/50 class split, while my holdout (training dataset) was kept similar to the original…

classification class-imbalance imbalanced-learn

asked Jun 15 '20 at 18:39

thereandhere1

votes

1 answer

Difference between sklearn make_pipeline and imblearn make_pipeline

Can anybody please explain the difference between sklearn.pipeline.make_pipline and imblearn.pipeline.make_pipline.

predictive-modeling pandas class-imbalance pipelines imbalanced-learn

asked Aug 21 '19 at 06:45

boredaf

votes

2 answers

Can we specify the number of data generated(minority class) using SMOTE?

I am trying to improve classification of imbalanced dataset creditcard fraud using SMOTE imbalanced_learn. But, in this it generates the data to 50%, can we give a specific number for the data to be generated? I want to track the classifier…

machine-learning python class-imbalance imbalanced-learn

asked Aug 20 '19 at 06:44

somorjit leichombam

votes

1 answer

Why is oversampling outperforming class weight?

I have a dataset that is highly imbalanced. One class has 412 (class 0) samples while the other has 67215 (class 1) samples. For its classification, I am using MLP. When I use class weight of 165 for class 0 and 1 for class 1 (in keras), I am…

classification predictive-modeling preprocessing class-imbalance imbalanced-learn

asked Apr 19 '20 at 04:05

girl101

1,161
2
11
25

votes

1 answer

SMOTE for regression

I am looking into upsampling an imbalanced dataset for a regression problem (Numerical target variables) in python. I attached paper and R package that implement SMOTE for regression, can anyone recommend a similar package in Python? Otherwise, what…

r sampling smote imbalanced-learn smotenc

asked Mar 03 '20 at 17:51

thereandhere1

votes

3 answers

How to Split And Resample Imbalanced Dataset Into Train, Validation and Test

I want to understand how to split the imbalanced data set with a binary target variable where 87% of the samples are negative and 13% of the samples are positive. Now, I know that you should always split the data into train and test set before doing…

python classification scikit-learn class-imbalance imbalanced-learn

asked Oct 10 '19 at 10:21

Krishnang K Dalal

votes

1 answer

Combining 'class_weight' with SMOTE

This might sound a weird question, but I could not find enough details in sklearn documentation about 'class_weight'. Can we first oversample the dataset using SMOTE and then call the classifier with the 'class_weight' option? As my testing set is…

scikit-learn multiclass-classification class-imbalance smote imbalanced-learn

asked Aug 30 '19 at 17:55

Sarah

votes

3 answers

Reproducible examples where balancing the training data demonstrably improves accuracy

I asked this question on the Statistics SE, but there were no answers, even when a modest bonus was available, so I am asking here to see if any examples can be given. I have been looking into the imbalanced learning problem, where a classifier is…

class-imbalance smote imbalanced-learn

asked Apr 18 '23 at 11:51

Dikran Marsupial

votes

1 answer

What does IBA mean in imblearn classification report?

imblearn is a python library for handling imbalanced data. A code for generating classification report is given below. import numpy as np from imblearn.metrics import classification_report_imbalanced y_true = [0, 1, 2, 2, 2] y_pred = [0, 0, 2, 2, 1]…

python classification class-imbalance imbalanced-learn

asked Jan 21 '21 at 17:49

codeczar

votes

1 answer

The most informative curve for imbalance datasets

For the imbalanced datasets: Can we say the Precision-Recall curve is more informative, thus accurate, than ROC curve? Can we rely on F1-score to evaluate the skillfulness of the resulted model in this case?

machine-learning classification class-imbalance imbalanced-learn

asked May 07 '20 at 17:53

Dave

votes

0 answers

Balancing the dataset using imblearn undersampling, oversampling and combine?

I have the imbalanced dataset: data['Class'].value_counts() Out[22]: 0 137757 1 4905 Name: Class, dtype: int64 X_train, X_valid, y_train, y_valid = train_test_split(input_x, input_y, test_size=0.20,…

python class-imbalance smote imbalanced-learn smotenc

asked Mar 05 '20 at 00:11

hanzgs

votes

3 answers

imbalanced dataset in text classififaction

I have a data set collected from Facebook consists of 10 class, each class have 2500 posts, but when count number of unique words in each class, they has different count as shown in the figure Is this an imbalanced problem due to word count , or…

python nlp class-imbalance imbalanced-learn

asked Feb 06 '19 at 12:31

mtesta010

votes

1 answer

How to use SMOTENC inside the Pipeline?

I would greatly appreciate if you could let me know how to use SMOTENC. I wrote: num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 =…

python scikit-learn imbalanced-learn smotenc

asked Jan 18 '19 at 20:08

ebrahimi

1,277
7
20
39

votes

1 answer

Using SMOTENC in a pipeline

I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm: Given that the N-Nearest Neighbors algorithm and Euclidian distance are used, should the data by normalized (Scale input…

class-imbalance smote imbalanced-learn smotenc

asked Jun 22 '20 at 15:34

thereandhere1

votes

1 answer

Positively skewed target label in regression

I have a dataset where the target label is positively skewed and produces a long tail, and currently I have a high residual on these values when experimenting with some linear, tree-based and neural-network regression models. I see the same problem…

machine-learning regression preprocessing imbalanced-learn

asked Jul 05 '19 at 15:53

Ellio

2 3 4 Next