sklearn and pandas in AWS Lambda

Question

I made a front end where I would like to make REST calls to an AWS Lambda interfaced with AWS API Gateway.

I dumped my model as a pickle file (and so my encoders) which I initially trained locally. I then stored these files in a S3 bucket.

The problem is that I cannot import libraries such as pandas and sklearn to make model predictions because the lambda console is unable to find them.

Does anyone have any suggestions to help solve this issue?

score 3 · Answer 1 · answered Apr 09 '19 at 14:44

Using layers will allow the dependency to be more reusable and potentially easier to maintain and deploy.

The short version is:

create a requirements.txt from pip freeze or similar that looks like this:

pandas==0.23.4
pytz==2018.7

create get_layer_packages.sh bash script to be run by docker which looks like this:

#!/bin/bash

export PKG_DIR="python"

rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR}

docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \
    pip install -r requirements.txt --no-deps -t ${PKG_DIR}

Run the stuff from above like this in the terminal:

chmod +x get_layer_packages.sh
./get_layer_packages.sh
zip -r my-Python36-Pandas23.zip .

I'm not so experienced with python and spent a decent chunk of time messing around with zipping pandas and virtual envs up, and have never really used docker for anything IRL, but this process is actually far more accessible (and better documented) than the venv > zip > upload process I was using before.

score 1 · Accepted Answer · answered Mar 25 '19 at 16:43

1

You need to create a deployment package which includes the packages you want to use in Lambda (sklearn and pandas).

You can then either upload that deployment package to S3 and import it in the Lambda function, or upload it within the Lambda function itself.

The Lambda function code will have to be written outside of AWS Lambda and be included in the deployment package. Here's a guide on how to do it.

answered Mar 25 '19 at 16:43

Dan Carter

1,732
1
11
26

i tried this solution, however when I reference the zip on s3 it shows a dimension exceeded. Didi you face the same issue or am I doing anything wrong? – 3nomis Mar 26 '19 at 08:53
How big is your deployment package uncompressed? – Dan Carter Mar 26 '19 at 09:44
About 45MB. I am using Windows though. Could it be that numpy and sklearn download some files windows compatible which are not of AWS Linux compatible? – 3nomis Mar 29 '19 at 13:39
1

Yes, you need to create the package in a Linux environment. – Dan Carter Mar 29 '19 at 14:07
1

Do you recommend using a Docker Amazon Linux container? Or a normal Linux OS is ok? – 3nomis Mar 29 '19 at 15:56
1

I'd guess any Linux OS would be okay, but not sure on that one. I ssh'd into an Amazon EC2 instance and created the package there... – Dan Carter Mar 29 '19 at 16:14
Have you ever tried making packages with sklearn numpy and so on? – 3nomis Mar 29 '19 at 16:18
1

Yes, numpy, pandas, sklearn. It will work :) Just make sure you only add the packages you need, obviously with lambda you pay for memory used per second, package size may affect that. – Dan Carter Mar 29 '19 at 16:23
How did you manage to create a small enough zip file for both pandas and sklearn that would satisfy the less than 50 mb limit? – justanewb Jun 07 '21 at 23:00

sklearn and pandas in AWS Lambda

2 Answers2