Questions tagged [imbalance]

9 questions
8
votes
1 answer

Categorization of approaches to deal with imbalanced classes

What is the best way to categorize the approaches which have been developed to deal with imbalance class problem? This article categorizes them into: Preprocessing: includes oversampling, undersampling and hybrid methods, Cost-sensitive learning:…
3
votes
1 answer

Clustering with imbalanced data and groups

I have a problem that is about identifying clusters of highly correlated items. I initially focused on building a model and features that put similar data items close to each other. The main challenge is that I have a case of imbalanced data, as…
DED
  • 345
  • 1
  • 3
  • 7
3
votes
1 answer

unbalanced data classification

I used XGBoost to predict company's bankruptcy, which is an extremely unbalanced dataset. Although I tried weighting method as well as parameter tuning, the best result which I could obtain is as follows: Best Parameters: {'clf__gamma': 0.1,…
ebrahimi
  • 1,277
  • 7
  • 20
  • 39
2
votes
3 answers

Imbalanced Dataset (Transformers): How to Decide on Class Weights?

I'm using SimpleTranformers to train and evaluate a model. Since the dataset I am using is severely imbalanced, it is recommended that I assign weights to each label. An example of assigning weights for SimpleTranformers is given here. My question,…
Aventinus
  • 203
  • 1
  • 3
  • 7
1
vote
0 answers

Stratified K Fold Cross Validation in Orange: python script

I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc. As my problem is unbalanced (10% churn - 90% not churn), I want to oversample. However, when using orange, this is not possible to do the…
1
vote
1 answer

How and where to set weights in case of imbalanced cost sensitive learning in machine learning?

I confront with a binary classification machine learning task which is both slightly imbalanced and cost sensitive. I wonder what (and where in the modeling pipeline, say, in sklearn) is the best way to take all these considerations into…
0
votes
2 answers

GridSearch on imbalanced datasets

Im trying to use gridsearch to find the best parameter for my model. Knowing that I have to implement nearmiss undersampling method while doing cross validation, should I fit my gridsearch on my undersampled dataset (no matter which under sampling…
0
votes
0 answers

How to balance an imbalanced dataset

I'm using the UNSW-NB15 dataset, which is really imbalanced, to train a Multi-class classificationMLP. I encode the categorical features of the dataset which leads to a 2+ million x 205 columns dataframe. After creating a sequential model with…
0
votes
1 answer

How to deal with imbalanced categorical variables in regression tasks?

I want to predict real estate prices using several Machine Learning algorithms. My dataset contains numerical and categorical predictors. I already eliminated the outliers of numerical variables. Now I'm wondering on how to deal with "outliers"…