0

The following is a simplified code snipet that is relevant to storing keras LSTM models in MLFlow.

with mlflow.start_run() as run:
    mlflow.keras.log_model(model,"lstm")
    mlflow.log_params(model_parameters)
    mlflow.log_metrics(model_metrics)

However, suppose that for each model there is a corresponding data preprocessing function that need be applied to new data before prediction.

processed_data = custom_processing_function(new_data)
predictions = model.predict(processed_data)

Because each model may have a different preprocessing function, I want to keep track of each pair of the form (preprocessing function, model). Ideally, I am looking for something like that:

with mlflow.start_run() as run:
    mlflow.keras.log_model(model,"lstm")
    mlflow.log_params(model_parameters)
    mlflow.log_metrics(model_metrics)
    mlflow.log_function(custom_preprocessing) #<---------
  1. Is it possible to store preprocessing function in mlflow, and what is a standard or appropriate way to do it?

  2. During the prediction step, how can I "call" the stored preprocessed function on new data?

Enk9456
  • 73
  • 1
  • 11

1 Answers1

1

One possible approach is to use mlflow's pyFunc Python model, and store your preprocessing as part of the model's predict call.

E.g.

import mlflow.pyfunc


class ServingModel(mlflow.pyfunc.PythonModel):
    def __init__(self, model , preprocessing_transform):
        self._model = model
        self._scaler_transform = scaler_transform

    def predict(self, context, model_input):
        """
        Perform a transformation and predict on input of (batch, sequence, features)
        """
        for i in range(model_input.shape[0]):
            model_input[i, :, :] = self._preprocessing_transform(model_input[i, :, :])

        return self._model.predict(model_input)

Saving the model

 mlflow.pyfunc.log_model("model", python_model=AutoEncoderServingModel(model, preprocessing_transform))

Loading the model

loaded_model = mlflow.pyfunc.load_model(model_path)
results = loaded_model.predict(feature_input)
Cole Murray
  • 126
  • 1