3

I'm using AWS Sage Maker to build my model. I want to store the model in S3 for later use. How do you save your model in S3 with Amazon Sage Maker? I know this seems trivial but I didn't understand the sources/documentation I've read.

Pluviophile
  • 3,520
  • 11
  • 29
  • 49
Laurent
  • 53
  • 1
  • 4

2 Answers2

2

You can use pickle (or any other format to serialize your model) and boto3 library to save your model to s3.

To save your model as a pickle file you can use:

import pickle
import numpy as np

from sklearn.linear_model import LinearRegression

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

model = LinearRegression().fit(X, y)

# save the model to disk
pkl_filename = 'pickle_model.pkl'
with open(pkl_filename, 'wb') as file:
    pickle.dump(model, file)

and to save your model as a pickle file to s3, rather than the sagemaker's local:

# to save the model to s3
import boto3

# For aws credentials, if ~/.aws/credentials is missing
# access_key_id =  '...'
# secret_access_key = '...'

# session = boto3.Session(
#     aws_access_key_id=access_key_id ,
#     aws_secret_access_key=secret_access_key,)

# s3_resource = session.resource('s3')

s3_resource = boto3.resource('s3')

bucket='your_bucket'
key= 'pickle_model.pkl'

pickle_byte_obj = pickle.dumps(model)

s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)
0

To expand on the other answer: this is a problem that I've run into several times myself, and so I've built an open source modelstore library that automates this step - as well as doing other things like versioning the model, and storing it in s3 with structured paths.

The code to use it looks like this (there is a full example here):

from modelstore import ModelStore

# Train your model, as usual
model = LinearRegression()
model.fit(X, y)

# Create a model store that points to your s3 bucket
bucket_name = "your-bucket-name"
modelstore = ModelStore.from_aws_s3(bucket_name)

# Upload your model
model_domain = "your-model-domain"
modelstore.sklearn.upload(model_domain, model=model)

This will dump your model to a file, create a tar archive from it, and then upload that to s3 for you. The function returns some meta-data as a dictionary; this includes the version ID for your model.

neal
  • 211
  • 1
  • 3