0

I'm using a multiclass dataset (cic-ids-2017), which is very imbalanced. I have already encoded the categorical feature (which is the target) using OneHotEncoder. I tried to use SMOTE oversampling method to balance the data with pipeline:

X = df.drop(['Label'],1)
y = df.Label

steps = [('onehot', OneHotEncoder()), ('smt', SMOTE())]
pipeline = Pipeline(steps=steps)

X, y = pipeline.fit_resample(X, y)

When I used pd.get_dummies instead of OneHotEncoder, in this case I could not use the pipeline (because of get_dummies).

How can I balance the dataset using SMOTE ? (and use get_dummies for One Hot Encoding)

Mimi
  • 45
  • 7
  • There is a prerequisite i.e. _Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods_. So, It will not work. Why you want to do that. – 10xAI Jun 04 '21 at 07:17
  • I used get_dummies beacause, OneHotEncoder() of Scikit generate me NAN. I tried to use SMOTE without Pipeline, but does not work. for that I tried also the Pipeline – Mimi Jun 04 '21 at 09:14

0 Answers0