Questions tagged [smotenc]
8 questions
6
votes
1 answer
SMOTE vs SMOTE-NC for binary classifier with categorical and numeric data
I am using Xgboost for classification. My y is 0 or 1 (true or false). I have categorical and numeric features, so theoretically, I need to use SMOTE-NC instead of SMOTE. However, I get better results with SMOTE.
Could anyone explain why this is…
RafalQA
- 63
- 1
- 3
4
votes
1 answer
SMOTE for regression
I am looking into upsampling an imbalanced dataset for a regression problem (Numerical target variables) in python.
I attached paper and R package that implement SMOTE for regression, can anyone recommend a similar package in Python? Otherwise, what…
thereandhere1
- 715
- 1
- 7
- 22
3
votes
0 answers
Balancing the dataset using imblearn undersampling, oversampling and combine?
I have the imbalanced dataset:
data['Class'].value_counts()
Out[22]:
0 137757
1 4905
Name: Class, dtype: int64
X_train, X_valid, y_train, y_valid = train_test_split(input_x, input_y, test_size=0.20,…
hanzgs
- 163
- 1
- 1
- 5
3
votes
1 answer
How to use SMOTENC inside the Pipeline?
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 =…
ebrahimi
- 1,277
- 7
- 20
- 39
2
votes
1 answer
Using SMOTENC in a pipeline
I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm:
Given that the N-Nearest Neighbors algorithm and Euclidian distance are used, should the data by normalized (Scale input…
thereandhere1
- 715
- 1
- 7
- 22
2
votes
1 answer
SMOTE and oversampling with constraints
I'm trying to apply SMOTE to a dataset that has time-constraints. I have information about users visiting a website. For some features, there are time constraints, e.g having the first visit and the last visit at the website, the first visit…
Titus Pullo
- 161
- 1
- 4
0
votes
1 answer
SMOTE-NC does not help to oversample my mixed continuous/categorical dataset
When I use SMOTE-NC to oversample three classes of a 4-class classification problem, the Prec, Recall, and F1 metrics for minority classes are still VERY low (~3%). I have 32 categorical and 30 continuous variables in my dataset. All the categorical…
Sarah
- 601
- 2
- 5
- 17
0
votes
1 answer
SMOTENC oversampling without one-hot encoding
I'm using SMOTENC to oversample an imbalanced-dataset.
I thought the point of SMOTENC was to give the option to oversample categorical features without one-hot encoding them. The reason I don't want to one-hot encode is to avoid Curse of…
Boots
- 1