Questions tagged [smotenc]

8 questions
6
votes
1 answer

SMOTE vs SMOTE-NC for binary classifier with categorical and numeric data

I am using Xgboost for classification. My y is 0 or 1 (true or false). I have categorical and numeric features, so theoretically, I need to use SMOTE-NC instead of SMOTE. However, I get better results with SMOTE. Could anyone explain why this is…
4
votes
1 answer

SMOTE for regression

I am looking into upsampling an imbalanced dataset for a regression problem (Numerical target variables) in python. I attached paper and R package that implement SMOTE for regression, can anyone recommend a similar package in Python? Otherwise, what…
thereandhere1
  • 715
  • 1
  • 7
  • 22
3
votes
0 answers

Balancing the dataset using imblearn undersampling, oversampling and combine?

I have the imbalanced dataset: data['Class'].value_counts() Out[22]: 0 137757 1 4905 Name: Class, dtype: int64 X_train, X_valid, y_train, y_valid = train_test_split(input_x, input_y, test_size=0.20,…
hanzgs
  • 163
  • 1
  • 1
  • 5
3
votes
1 answer

How to use SMOTENC inside the Pipeline?

I would greatly appreciate if you could let me know how to use SMOTENC. I wrote: num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 =…
ebrahimi
  • 1,277
  • 7
  • 20
  • 39
2
votes
1 answer

Using SMOTENC in a pipeline

I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm: Given that the N-Nearest Neighbors algorithm and Euclidian distance are used, should the data by normalized (Scale input…
thereandhere1
  • 715
  • 1
  • 7
  • 22
2
votes
1 answer

SMOTE and oversampling with constraints

I'm trying to apply SMOTE to a dataset that has time-constraints. I have information about users visiting a website. For some features, there are time constraints, e.g having the first visit and the last visit at the website, the first visit…
Titus Pullo
  • 161
  • 1
  • 4
0
votes
1 answer

SMOTE-NC does not help to oversample my mixed continuous/categorical dataset

When I use SMOTE-NC to oversample three classes of a 4-class classification problem, the Prec, Recall, and F1 metrics for minority classes are still VERY low (~3%). I have 32 categorical and 30 continuous variables in my dataset. All the categorical…
Sarah
  • 601
  • 2
  • 5
  • 17
0
votes
1 answer

SMOTENC oversampling without one-hot encoding

I'm using SMOTENC to oversample an imbalanced-dataset. I thought the point of SMOTENC was to give the option to oversample categorical features without one-hot encoding them. The reason I don't want to one-hot encode is to avoid Curse of…