Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Nicola Pietroluongo, AWS senior solution architect
and one of my responsibilities to help customers innovate and unlock
new possibilities with machine learning. Today I'm going to talk
about how to deploy your machine learning model as a serverless API.
Let's start to say that only a small fraction of real
world machine learning systems is composed of the machine learning code.
There is a vast and sometimes complex infrastructure
surrounding a machine learning workload with its own challenges.
What if we can minimize or remove some concerns? And I'm
talking about issues around model deployment and serving the part
when you have done all your work of creating a machine learning model and
you are going to face questions as where to host your model to
serve predictions at scale cost effectively, and how to do
that in the easiest way. The answer to where to host is AWS
Lambda, a serverless compute service that lets you run code without
provisioning or managing servers. AWS Lambda has many benefits
as subsequent automatic scaling, high availability,
and a bay as you go model. But more importantly for this session,
it supports function deployed as container images.
In other words, you can bundle a machine learning model and
its code in a container image and deploy as a serverless API.
Now that we know the where, let's see how to deploy a container
on lambda. The answer to how is to use the AWS serverless
application model or AWS SAM. It is
an open source framework that you can use to build serverless application
on AWS. It provides a single deployment configuration,
built in, best practices, local debugging,
testing, and more. AWS SAM allows you
to define your serverless application using a template.
Sum template are an extension of AWS cloud formation templates.
AWS Cloudformation is a service that allow you to provision
infrastructure as a code. Here you can see a sample template composed
of three blocks. The first block instructs cloudformation to
perform serverless transformation. The central block creates a
lambda function connected with Amazon API gateway with code
and all necessary permission. API Gateway is a fully managed service
that handles all the tasks involved in accepting and processing
API calls. Last block helps AWS SAM
to manage the container images. To summarize,
with few lines of code, this template creates all we need to
run a serverless machine learning container. It exposes a public endpoint
we can call using a post request to run inferences. We don't
need to worry about all the configurations, role and permissions needed
by those resources to run. SAM will help us on that.
This is effectively what we're using to create in this talk,
but without writing one line of code. Because AWS
SAM provides comma line tool the SAM CLI, they make
it easy to create, manage and test serverless applications,
and it's available for Linux, Windows and macOS.
You can use the CLI to build, validate and test serverless
application and integrate with added resources as well as
database. Let's see what we can run with the SAM CLI.
We can use SAM init to generate pre configured template SAM
package to create a deployment package build and
deploy as you can imagine to build and deploy an application.
Finally, some local to test the application locally.
Now it's time to see this tool in action and deploy a machine learning model
as serverless API I'm going to show how to use the
SAm CLI to create a machine learning container running on AWS
lambda exposed via Amazon API gateway. In this way,
a client application can make requests to API gateway which will invoke
the lambda function and return the inference output.
We are going to generate the template with some init,
build the solution with some build, test locally with
some local create a container registry. This is optional if you
already have a container registry. Deploy with some,
deploy and finally test the deployment. The first step
is to run some init to initialize the project.
First we need to choose between an AWS quick start template or a custom
one, and I'm going to choose quick start template.
The next question is about the package we would like to use.
The choice is between a zip file or a container image.
The choice will be an image. AWS provides already base
images for common tasks, but you have the possibility to
create your own. For this demo, I'm going to use the Python three
eight base image. Now it's time to select a
name for the project. I'm going to accept the default sum
app. At this point, sum is fetching the required file
to create the app. The final step is to select which type
of application we would like to run. As you can see, there are some
pre configured scenarios to get started quickly. Hello word Pythorch
scikit tensorflow xgboost I'm going to choose Pythorch
machine learning inference API. At this stage, Sam generates
a directory with all the required file to run the application.
Let's move inside the application directory. I'm going to use the three
command to show the directory structure. Here you
can see all the file generated by Sam. The quick start I've chose
contains a sample model to identify handwritten digits.
There is some template yaml to generate the infrastructure we
saw before. An example of it, a sample training file to train the
model, an event file to test the API,
and more. Two files that are particularly important
are the docker file to build the container,
the app file which contains the code. Let's inspect the
docker file. You can see here the base image previously
selected in the from statement and other statements to bundle
and run the application. At the very bottom you can see the docker CMD
command that specifies what needs to be executed when
docker container starts. In this case, it will execute
the function called lambda handler inside the app file.
So let's have a look at the app file. As you can see is a
python file with all required statements to run inference,
preprocessing steps, model load and here
you can see the lambda handler function which runs the inference and
returns a JSON output with a prediction back
to the main folder. Now it's time to
build the application with sum build. In short,
this operation build and tag a docker container locally
before we saw a file called event JSON which
can be used to test the application. Let's zoom
out a bit. The file contains a JSON request with
a payload body which is the base 64 representation
of an image. I can decode the body and show
you the image with this statement. As you can see it's the representation
of a ramp tree. Let's try to test the application locally
with some invoke using the event JSON file which
contains the representation of the number three as we saw before.
As you can see, the response is exactly what we expected predicted.
Label the number three to recap we
used the quick start to generate the code and related assets.
We builtin the docker image and test the container locally.
Now we are entering in the deployment phase to
deploy the solution. We need to make our local container
available for cloud resources. One way is to create an Amazon
elastic container registry or ECR and push
the docker image there. ECR is a fully managed
service to store, manage and deploy container images.
We need to authenticate the request before creating
the registry in ECR and this statement allows
us to retrieve temporary credential. This series of commands are
run with the AWS CLI, not the AWS sum.
You might notice that we need to substitute region and account id
in this statement. I've reducted some parts,
but this is how the request might look like. The authentication
is successful. Now we can create a repository called ML
demo with this command. AWS ECR
create repository this is the output and
we need to copy the repository URI which will be used during
the deployment phase. The final step is to run some
deploy guided and follow the instructions.
The first step is to bow to give a name to the stack.
The second is about choosing the region in which the
stack will be deployed. In the next step, we need to
specify the docker image and we're going to use the uri we
saw before a confirmation step to
apply the changes. Some will need permissions
to set up the resources. This step tells
us that our API doesn't have any authorization method.
This is okay for this demo, but it's good practice to secure the
access. All those choices are going
to be saved in a configuration file which will make the next deployment faster.
Let's accept the default name for the config file and
keep a default environment for this configuration.
At this stage, SAM is pushing the docker image we built
locally into ECR repository.
Finally, everything is ready to be deployed.
As you can see, the stack has been created successfully. That's wonderful.
Job done. Sam created all resources for us,
the API gateway and the lambda function. We saw earlier
that the API gateway is publicly exposed and can
be used by an application to run inferences. So if we want
to test our cloud stack, we need to grab the API gateway
endpoint which is actually part of this output. So scrolling
up a bit we can see the API endpoint.
Let's clean up a bit terminal and as
a very final step we can test our serverless machine learning model,
making a request against the API endpoint. You can see we
can use curl to create a post request and send the base 64
representation of the number three. Let's send a request
and celebrate. The output of the inference is what we expected
to recap. We use the sum CLI to generate and deploy a serverless
machine learning model. The model and the application code has
been bundled in a container and deployed in a lambda function
which resides in a private network, while an API
gateway has been deployed to handle public API requests.
As a final conclusion, before getting into serverless machine learning
solutions, you need to carefully validate your use case and define clear
KPIs. With the serverless approach, you run code with zero
administration and with the pay as you go model, you don't have to
pay for unused server time. Moreover, you benefit from continuous
scaling. To date, serverless machine learning solutions
are more suited when performances are not a big concern and
when you work with batch processing since everything runs
independently in parallel. So it's important to be aware
of the service quotas to see if they affect your use case.
For instance, AWS Lambda supports container images
of up to ten gigabyte in size. And this leads
us to the final conclusion, which is to continuously test and validate
your assumptions and AWS. Sam surely gives
an advantage in terms of fast prototyping and experimentation.
That's great if you want to innovate faster. Thank you.