Can anybody please explain the difference between sklearn.pipeline.make_pipline and imblearn.pipeline.make_pipline.
Asked
Active
Viewed 4,339 times
6
boredaf
- 161
- 1
- 8
1 Answers
1
The imblearn package contains a lot of different samplers for easy over- or under-sampling of data.
These samplers can not be placed in a standard sklearn pipeline.
To allow for using a pipeline with these samplers, the imblearn package also implements an extended pipeline. This pipeline is very similar to the sklearn one with the addition of allowing samplers.
If you want to include samplers in the pipeline, use the imblearn pipeline. Otherwise, use the sklearn one.
The code for the imblearn pipeline can be seen here and the sklearn pipeline code here. Note that make_pipeline is just a convenient method to create a new pipeline and the differnece here is actually with the pipelines themselves.
Shaido
- 652
- 6
- 13
-
1but why is a sampler not implemented using the fit/transform paradigm of sklearn? what's different about this? – Shihab Shahriar Khan Dec 01 '19 at 12:56
-
@ShihabShahriarKhan: Without any deeper investigation, it could be due to the different treatment of training and testing data (see e.g. https://github.com/EpistasisLab/tpot/issues/547#issuecomment-451115193) or due to the issue that data is added or removed in the sampler which the standard `sklearn` API maybe doesn't support (since there is no need to). – Shaido Dec 01 '19 at 13:45
-
In case someone wants to understand more about Samplers, I found this article useful: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/ – loadbox Sep 30 '20 at 13:51