2

I want to apply pd.cut as a transformer in a pipeline, like this:

numerical_preprocessing = Pipeline([
            ('cut_into_bins', FunctionTransformer(pd.cut, kw_args={'bins': [10, 100, 1000]})
             )]

However, I get an error: ValueError("Input array must be 1 dimensional") I could just write the same function over and over again with each column separately, but that looks like a terrible coding practice. Any thoughs on this?

JohnnyQ
  • 201
  • 2
  • 5

1 Answers1

1

I solved the problem by creating a wrapper around pd.cut, which then applies pd.cut using the apply method of DataFrame:

if isinstance(x, pd.Series):
    return pd.cut(x, bins_final, labels=labels, **kwargs)
elif isinstance(x, pd.DataFrame):
    return x.apply(pd.cut, args=(bins_final,), axis=0, labels=labels, **kwargs)
JohnnyQ
  • 201
  • 2
  • 5