0

I have a DNN in Keras, which includes a custom metric function and which I want to pipeline with some SKlearn preprocessing. I further want to persist the model using MLFlow for easy deployment. The requirement to pipeline with sklearn means that I can't use the mlflow.keras versions of .log_model() and .load_model(), and have to instead use the mlflow.pyfunc versions, which is fine.

Saving the model seems to work fine, but when I try to use mlflow.pyfunc.load_model() to reimport the saved model I get this error message (full stack trace at link):

ValueError: Unknown metric function:custom_mse

To try and make sure that the custom function makes its way through to MLFlow I'm persisting it in a helper_functions.py file and passing that file to the code_path parameter of .log_model(), and then attempting to import that function in .load_context() before using keras.models.load_model() to reimport the saved keras model.

helper_functions.py:

import keras.backend as K

def custom_mse(y_true, y_pred):
    return K.mean((y_pred - y_true) ** 2)

The PythonModel I'm trying to persist is this:

class ProductRecommender(PythonModel):

    def __init__(self, pipeline):
        self.pipeline = pipeline

    def load_context(self, context):

        from helper_functions import custom_mse

        self.keras_model = keras.models.load_model(context.artifacts["keras_model"], custom_objects={'custom_mse', custom_mse})
        self.sklearn_preprocessor = joblib.load(context.artifacts["sklearn_preprocessor"])

        self.sklearn_model = KerasModelRegressor(self.keras_model, epochs=5, validation_split=0.2)

        self.pipeline = Pipeline(steps=[
            ('preprocessor', self.sklearn_preprocessor),
            ('estimator', self.sklearn_model)
        ])

    def fit(self, X, y):
        self.pipeline.fit(X, y)
        self.pipeline.named_steps.estimator.model.save('artifacts/keras_model.h5')
        joblib.dump(pr.pipeline.named_steps.preprocessor, 'artifacts/sklearn_preprocessor.joblib')

    def predict(self, context, X):
        return self.pipeline.predict(X)

Note I import custom_mse function from the helper_functions module and pass it as part of custom_objects to keras.models.load_model()

Here's the mlflow.pyfunc.log_model() call:

with mlflow.start_run() as run:

    run_id = run.info.run_id

    conda_env = {
        'name': 'mlflow-env',
        'channels': [
            'defaults',
            'anaconda',
            'conda-forge'
        ],
        'dependencies': [
            'python=3.7.0',
            'cloudpickle',
            'keras==2.2.5',
            'joblib==0.13.2',
            'scikit-learn==0.20.3'
        ]
    }

    artifacts = {
        'keras_model':'artifacts/keras_model.h5',
        'sklearn_preprocessor':'artifacts/sklearn_preprocessor.joblib'
    }

    mlflow.pyfunc.log_model(
        artifact_path='Model',
        code_path=['artifacts/example_sklearn_wrapper.py', 'artifacts/helper_functions.py'],
        python_model=pr,
        conda_env=conda_env,
        artifacts=artifacts
    )

What's happening here? Why isn't keras seeing my custom_mse function?

Dan Scally
  • 1,724
  • 6
  • 23

1 Answers1

0

Long story short is to use cloudpickle instead of joblib or pickle to dump thing to disk, and this all works much more cleanly.

Dan Scally
  • 1,724
  • 6
  • 23