From idea to production: Automate your machine learning workflows with pipelines

Video size:

Abstract

Developing high-quality machine learning models involve many steps. We typically start with exploring and preparing our data. We experiment with different algorithms and parameters. We spend time training and tuning our model until the model meets our quality metrics, and is ready to be deployed into production. Orchestrating and automating workflows across each step of this model development process can take months of coding.

In this session, I show you how to create, automate, and manage machine learning workflows using Amazon SageMaker Pipelines. We will create a reusable NLP model training pipeline to prepare data, store the features in a feature store, fine-tune a BERT model, and deploy the model into production if it passes our defined quality metrics.

Summary

In this session I will discuss how you can take your data science and machine learning projects from idea to production by automating your machine learning workflows with pipelines. While most of the time we tend to focus on the technology, people and process are equally, if not more important.
Today we often still manually build and manage individual models. We also execute each step in the model development workflow individually. A limited cross team collaboration could lead to limited visibility and transparency. We can also look at automating tasks in each step.
Amazon Sagemaker is a fully managed service that helps you build, train, tune, and deploy your machine learning models. This demo shows how you can leverage Amazon Sagemaker pipelines to automate the individual steps of building a machine learning model.
All right, and with that I can define the official step as part of my pipeline. Now let's move on to the second steps, which is fine tuning the model with the help of a sagemaker training job. If you're interested in seeing how to do this, in particular, how to use this pretrained hugging phase model and then just fine tune it to the data.
I'm using a built in Tensorflow estimator with Sagemaker, which has optimizations to run TensorFlow on AWS. One more step actually, that I'm coding to do is activating step caching with steps caching. This will help you accelerate and run the different executions more efficient.
A sagemaker pipeline can be used to train and validate models. It can also prepare the model for deployment later. One nice thing is that you can set a model approval status. This demo shows how to do this.
In the second phase, we could automatically run pipelines and include automated quality gates. Only models that fall into acceptable performance metrics get registered in the model registry and approved for deployment. Another trigger to rerun a model training and deployment pipeline could be code changes.
Pipelines runs Sagemaker projects helps you to set up all of the automation needed in the AWS account to build a continuous integration, continuous deployed workflow. Let me walk you through the steps here.
All right, so let me show you how this automation works. Projects integrate with the developer tools and has this automation built in. This template is also set up to have a two phase deployed. Phase one is deploying in a staging environment, and second stage is deploying into a production environment.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Welcome. In this session I will discuss as how you can take your data science and machine learning projects from idea to production by automating your machine learning workflows with pipelines. Before I start, I want to point out two great learning resources to follow up on this topic after today's session. Besides working as a developer advocate, I'm also an O'Reilly author and coursera instructor. The O'Reilly book data Science on AWS, which I coauthored, discusses in over 500 pages and hundreds of code samples how to implement endtoend, continuous AI and machine learning pipelines. Another great resource is the newly launched practical data science specialization. In partnership with deep Learning AI and Casera, this three core specialization teaches you practical skills in how to take your data science and ML projects from idea to production using purpose build tools in the AWS cloud, and it also includes on demand, hands on labs for you to practice. So we're talking about automating machine learning. Hmm. I have an idea. Alexa deploy my model. Which multi armed bandit strategy would you like to use? Thompson sampling Epsilon Greedy or online cover? Well, I'm pretty sure someone already thought about developing this Alexa skill, but unfortunately, getting your machine learning projects ready for production is not just about technology. A term you will likely hear in this context of getting your ML applications ready for production is mlops. MLOPs builds on DevOps practices that encompasses people, process and technology. However, MLOPs also includes considerations and practices that are really unique to machine learning workflows. So while most of the time we tend to focus on the technology, people and process are equally, if not more important. Let's take a look at a few key considerations in ensuring your models and machine learning workloads have a path to production. First of all, the machine learning development lifecycle is very different from a software development lifecycle. For example, model development includes longer experimentation cycles compared to what you would typically see in an agile software development process. You need to consider choosing the right data meets, perform data transformations, and feature engineering. So besides the actual model training code, you also need to develop the data processing code. Next. The model is typically only a small part of an overall machine learning solution, and there are often more components that need to be built or integrated. For example, maybe the model needs to be integrated into an existing application to trigger further process developing on production results. This leads to the next consideration. There are typically multiple personas involved in the machine learning development lifecycle, often with competing needs and priorities. A data scientist might feel comfortable in building a models that meets the expected model performance metrics, but might not know how to host that model in a way that it can be consumed by another system or application. This part might require a DevOps engineer or the infrastructure team. You also need to integrate the projects with existing it systems and practices, such as change management, for example. This could mean that as part of the pipelines, you automatically open a change ticket anytime a new model is ready to get deployed into production. Or you might want to add a manual approval steps in your pipeline before deploying any model into your production. If we look at the goal of mlops, you want to move away from manually building models, which is often still the status quo. In the first phase, you can accelerate the path to production by instead of building and managing individual models, start building and managing pipelines. You also want to improve the quality of deployed models. To do this, you need to be able to detect model decay, maybe due to a drift in the statistical changes in data distributions. You should also monitor the models for any drifts in bias or explainability from a set baseline. This can be accomplished in a second phase, and ultimately this should lead into building AI ML solutions that are resilient, secure, performant, operationally efficient, and cost optimized. Let's have a close look at each phase. Today we often still manually build and manage individual models. We also execute each step in the model development workflow individually. Here is an example workflow where a data engineer may create a raw data set and manually send it to a data scientist. Then the data scientist iteratively performs data preparation and features. Engineering performs multiple experiments until a training model is actually performing well according to the objective metrics. Then the data scientist may hand it off to a deployment team or an ML engineer who is then responsible for deploying the model. If there has been limited communication between teams, this part could result in a lot of delays because the model is essentially intransparent to the deployment engineer or the DevOps team, meaning there is limited visibility into how the model is built or how you consume that model. Then a software engineer potentially needs to make changes to the application, which consumes that model for prediction. And finally, someone ultimately needs to operate the model in production, which includes making sure the right level of monitoring is set up. We can see the challenges in this setup. The workflows includes multiple handoffs between teams and personas who might not all be familiar with machine learning workloads. A limited cross team collaboration could lead to limited visibility and transparency using increased code rework, and ultimately slows down the ability to get the model to production quickly. So what can we do in a first phase, we should improve the situation by orchestrating the individual steps. As a pipeline, we can also look at automating tasks in each step. For example, we can build a model training pipeline that orchestrating the data preparation, model training, and model evaluation steps. We could also build a deployed pipeline which grabs a model from a model registry and deploys it into a staging environment. The software engineers could then use this model to run unit or integration tests before approving the model for production deployment. Let's see a demo of a model training pipeline. All right, here I am in my AWS demo account, and I want to show you how you can leverage Amazon Sagemaker pipelines to automate the individual steps of building a machine learning model. Amazon Sagemaker is a fully managed service that helps you build, train, tune, and deploy your machine learning models. All right, so the use case we're going to build is I want to train a natural language process model to classify product reviews. So I'm going to pass in raw text product review text. For example, I really enjoyed reading this book, and my NLP model should classify this into a star rating. So in this case, hopefully a star rating of a five the best, or a star rating of 4321, with one being the worst. And the way we're doing this, I'm going to use a pretrained bird model. Bird is a very popular model architecture in the NLP states, and what you can leverage is actually pre trained models that have been trained on millions of documents already, for example the Wikipedia. And then you can fine tune it to your specific data set, which I will do in the training step in this pipelines, fine tuning it to my specific product reviews text. So the DAC we're going to build here is first process the raw text data to generate the embeddings that the bird model expects as inputs. Then in the training step, I'm going to fine tune the model to my data set, and I'm also evaluating the model performance. So in this case, I'm coding the validation accuracy of my model and I'm defining a threshold, a condition. And if my model performs above this threshold, then I'm going to register it to a model registry. Think of it as a catalog of your models to compare the different versions. And I'm also preparing for deployment by creating a model object here in Sagemaker. All right, so let's see how we can build this. First of all, here I'm importing a couple of sdks and libraries. One of them is the Sagemaker Python SDK and a couple of additional libraries used here in the AWS environment. All right, first of all, I'm going to import or set the location of the raw data set, which is my reviews data here hosted in public s three bucket. And what I'm going to do here is I'm pulling a subset of the data just for demo purposes here into my own AWS account. So I'm setting a path here to my own bucket, and I'm going to pull in just a subset here so the model training doesn't run for too long, and I'm just pulling in three different categories of the product reviews data. All right, let's start building the actual pipeline. So first of all, I'm creating a name for the pipeline. Let's call this my Bird pipeline, and a timestamp. And then one nice thing about pipelines is that you can define parameters to parameterize individual executions. And now let's start with building the first step, which is the feature engineering. Here's a little bit of explanation what we're going to do. So my raw data set here on the left has star ratings and the review meets. So for example, this is a great item, or I love this book, and the corresponding label, which is the star rating. And what I'm going to do in this first features engineering step is I'm using a sagemaker processing job, which helps me to execute code on data, so it's specifically suited if you want to run feature engineering. And I'm converting this raw input data into embeddings that the bird model expects as inputs. All right, so what I'm going to do here again, I'm making sure I have access to my input data, which is now in my own bucket. And I start by preparing a couple of those parameters which I want to be able to parameterize. So one is definitely where to find the input data in case the location changes, or I want to use a different data set. I'm also specifying the processing job instance count. So what I could do, depending on how much data I need to process, I can run this distributed, so I can run it across one AWS cloud instance. I could run it across two instances, five instances, et cetera. And it's as easy as just setting the value to one or five or ten. And the processing job will make sure to distribute the data across the instances and work in parallel. In this case, I have a small subset of the data, so I'll stick to one instance. I can also specify here the instance type. So this is the AWS easy two instance type managed by Sagemaker to run this processing job, and I'm just specifying one particular instance type here. Then I'm also defining a couple of parameters, which my feature engineering script requires. Then I could set parameters such as do I want to balance the data set before and percentages how to split the data into a training set, validation and test data set. So in this case, I'm taking 90% for my training data, and I keep a split of 5% for validation and another 5% for test. All right, then the step actually needs to perform the feature engineering. So what I've done in preparation is I wrote a Python script which performs the actual transformations, and this is here in this Python file. So I'm not going to go into all of the glory details. If you're curious how to do this, have a look at the GitHub repo. All right, then I can start creating this processing job, and for that I'll define a processor. Here I'm using a prebuilt process based on scikit learn. I'm defining the framework version passing in my IAM role the instance type and the instance count, and also the region I'm operating in. And I need one more thing, because before I wrap it into the official workflow and pipelines step, I need to define the inputs for the job and the outputs. The inputs here is the raw input data, and the outputs are s three folders where to store the generated features. And I'm also going to split this again by training, validation, and test data meets. So I'm putting here the three locations where the data will be written to, and those internal container paths will get mapped to an s relocation later. All right, and with that I can define the official step as part of my pipeline. So here you can see I'm defining a processing step. I'll give it a name. This is what you saw in the DAC. I point to the actual Python code to execute my feature engineering, and then I'm passing in the scikitlearn processor, which I defined the inputs and the outputs. And here I'm passing the specific job arguments that my script requires. So for example, the training, validation test, split percentages, et cetera. All right, this defines my processing step. Now let's move on to the second steps, which is fine tuning the model with the help of a sagemaker training job. And this is pretty similar. So here you can see again I'm defining parameters, for example, the training instance type and count again. And then I'm setting up parameters, the hyperparameters will depend on the models you're using the use case. So in my case, some general parameters, number of epics runs throughout the whole data set. And I'm just going to keep this here to one. For this demo purpose, I'm setting a learning rate and then additional values, for example, the Epsilon value train, batch sizes, validation batch sizes, et cetera. Again, this will highly depend on the type of model and use case you are training. All right, next, what I'm going to do is I'm also going to capture the performance of my model during training. So I can specify here regex expressions that the model training code will output in the logs. So for example, my script that I use will put validation loss and validation accuracy as an output. And I'm using here those regex expressions to capture them from the locks and then also are available in the UI and for me to check later on in the evaluation step. All right, and I've talked about the training script. So again, I've prepared a Python file which contains the code to train my model. This is here in the Tfbirdreviews py file. And again, I'm not going into details. If you're interested in seeing how to do this, in particular, how to use this pretrained hugging phase model and then just fine tune it to the data, please check out the code in the GitHub repo. All right, so we can now prepare the training estimator and then build the model training step. So first of all, I actually need to define the estimator which performs the training. And I'm using a built in Tensorflow estimator with Sagemaker, which has optimizations to run Tensorflow on AWS. And as you can see here, I'm defining this Tensorflow estimator, pointing to the training script, which I just highlighted, and also passing in the additional parameters, the role, instance count and type, python version, tensorflow framework version I want to use. And again, my hyperparameters, which I defined earlier. One more step actually, that I'm coding to do is activating step caching with steps caching. What you can do is make sure if you're rerunning the pipeline and individual steps might have not have changed to reuse the previous results. So Sagemaker will apply this caching to help you accelerate and run the different executions more efficient. All right? And with that, I can define the training step. So here you can see I define the official training step. Give it a train. Again, this name, the train will appear in the DAG as you could see before then I'm passing in this estimator that has all of the tensorflow and training script configuration, and I'm also training the inputs. And here you can see I'm referring to the previous process step output and using it as an input for the training. So those are the features that I generated for bird training, for bird validation, and for bird test. All right, after the training, there comes the model evaluation. So let's see how I can do this. The model evaluation I can also execute as a processing job again. So I'm going to use this scikitlearn processor again specifying the framework version instance types and counts. The difference here is that I do have another script to execute. So instead of the feature engineering, I've now written a script that evaluates the model performance. And again, here is a link to the python script. Basically what I'm going to do is I'm using the pretrained and fine tuned model from the previous step, and I'm running some test predictions and see how the performance is. So in this case, I'm specifically looking for the validation performance of my model. All right, and the results get written into JSON file, which I call the evaluation JSON, which is the official evaluation report from the step. And here's the official definition. So this is actually implemented as another processing step. In this case I call it evaluate the model. I'm pointing to this new script which runs the validation. And again I'm pointing it here to inputs, which in this case is the fine tuned model, the model artifact from the training step, and also again, input data which I could use to run validation. And I'm also pointing out an output location to store the evaluation results. All right, the official model metrics are defined in an object, which I do here, which contain the evaluation JSON file. All right, and the last part now is to define the condition step. So, checking whether my model meets the expected highquality gate. And if yes, then register the model to the model registry and also prepare for deployment. And what I'm going to do here is first create those steps afterwards, and then I can reference them in the actual condition step. So let's see this. And one nice thing that you can do with sagemaker pipelines is to actually set a model approval status. So when you get a model, you evaluate it. You can specify whether this has a pending, for example, manual approval. So somebody has to look at the metrics and then approve it for deployment. You can set it to always approve, but in this case, I want to show you I keep it to manual approval only. All right, then I'm going to define again the instance types and counts, where to later deploy my model and host it for live predictions. And I'm also specifying the model package group, which is registered in the model registry. And I'm defining an image that is used to later deploy the endpoint and then run the model and the inference code. So in this case it's going to be a tensorflow based docker image again. All right, so here is the step that registers the model. It's taking the estimator object and the information about the inference image to use. It actually points to the s three location of the fine tuned model, and it also defines specific input format types. For example, you know that your model expects JSON lines as input, and also response is going to be in JSON lines. Then this might vary, of course, depending on your model. And you also set here the model package group, where to register it with, and the approval status. This one will be pending manual approval. All right, then what I'm also going to do here is create a step to prepare the model for deployment later. So I'm preparing a model object in sagemaker, again, pass in the inference image and also the artifact. All right, and then I define here the official create model steps for the pipeline and pass in the model which I just created. So now we can start in creating this condition check that comes before. So what I'm going to do here is I'm importing the conditions and corresponding functions that are available. For example, a condition greater than or equal to, and I'm defining a minimum accuracy value I want to check against. As this is a demo and I'm just training on a little bit of data, I'll keep this low so all the model training runs will actually pass. But obviously in other use cases, you definitely want to bump up the accuracy threshold. In this case, I'm using 20% as my minimum accuracy to check against. Then here is the definition. So it's going to execute this check condition greater or equal to. And here is my evaluation step. And I'm also pointing to this report file to generate and set to this value I created. And the additional official step is then tuning this condition. And if I meet the condition. So if I'm passing my quality threshold, I'm going to register the model, and I'm also going to create the model in preparation for later deployment. In the out step you can say, what else? If I fail the test, right, send a message to the data scientist or whatever you want to do. In my case, I'm just keeping it empty so it will fail and end the pipeline. So what we've done now is defining each and every step from the preprocessing, to the training, to the condition and preparation for deployment, if I pass my quality threshold. So now I can wrap this in the end to end pipelines definition. First of all, again, I'm importing some of the functions here needed and objects from the SDK. And as you can see here, I'm now creating the official pipeline object passing in the name that we created, and all of the parameters which I specified in the above code. And then the steps here will actually line up the individual steps in this stack. So let's start with the processing, then move to the training step, do the evaluation step, and then you have this condition step to evaluate the models, which will in itself trigger the two different path depending on if I pass the quality check. I'm also adding this to an official experiment tracking, so I can keep track of my pipeline runs, and that's the definition of my pipelines. And then I can submit the pipeline for execution. So I'm calling the pipeline create passing a role that has the permissions to execute everything. And then what I can do is I can call the pipelines start, which will start an individual execution run of this pipeline. And you can see here also you can pass in now your parameter values, and that will kick off the pipelines in the background. But this will run now on AWS with the sagemaker processing job, the training job, and also run the model evaluation. And what I also want to show you is that Sagemaker keeps track of the individual artifacts that are generated in each step. So what I'm doing here is I'm listing all artifacts generated by this pipelines. And this is super helpful if you want to keep track of the individual steps. And for example, what were the inputs in this case for the processing job, I do have the raw data. I do have the docker image that's used to process the data. And the output is the generated features, and you can see here contributed to or production. Then in the training step, you have the generated features, training features as the input and the image to execute the training job. And the output here would be the model artifact. So really nice to keep track of those artifacts for each step. And with that, we can come back to actually checking on our pipeline. So I'm going to go back here in the pipelines. You can check on the graph, and this is the overall graph that we defined so it matches our steps. You can check the parameters. Again, you set the settings, you can click into the individual graph, and the color coding here shows green means it completed successfully. You can click into each step and again see the parameters here that were used to run and execute the step. All right, what I also want to show you is the model registry. So let me go here in the navigation also to the model registry. And we do see here our model group, the bird reviews and here is my model version. And again, I've set this to manual approval. So this one here will still need my approval to be deployed into production. I can update the status and set it here to approved and say this is good for deployment into staging update. And with that the model is now approved for deployments and this completes the first demo. In the second phase, we could automatically run pipelines and include automated quality gates. So here the model training pipeline could automatically evaluate the model in terms of model performance or bias metrics and thresholds. Only models that fall into acceptable performance metrics get registered in the model registry and approved for deployment. The deployment pipeline could leverage deployment strategies such as a b testing or bandits to evaluate the model in comparison to existing models. The software engineer can automate the integration tests with pass fail highquality gates, and only models that passes get deployed into production. And finally, the operation team sets up model monitoring and analyzes for any data drift or models drift. If the drift is violating defined threshold values, this could actually trigger a model retraining pipeline. Again, another trigger to rerun a model training and deployment pipeline could be code changes as part of a continuous integrations and continuous delivery. Short CI CD automation let's see another demo of a code change triggered pipelines run all right, I'm back in my AWS demo account. Now I want to show you how you can leverage Sagemaker projects to automate workflows. Pipelines runs Sagemaker projects helps you to set up all of the automation needed in the AWS account to build a continuous integration, continuous deployed workflow. The easiest way to get started is if you navigate here in the menu to projects and then create project. You can see that it already comes with a couple of prebuilt templates which you can use. So one template will set up all of the automation for a model building and training pipeline and automation. The other one is for model deployment. And what I've pre provisioned here is a CI CD environment for model building, training and the actual model deployment. I've already pre built everything, so let me walk you through the steps here. Programmatically, Sagemaker project is based on an AWS service catalog product. So the first step is to reusable sagemaker projects here in the studio environment. Then I'm again importing all of the needed sdks and libraries and set the clients, and I'm coding it here programmatically. But again, if you're coding the template through the UI, this is done for you. I've provisioned the service catalog item and the template here programmatically through my notebook. Once the project is created, you can see here on the left that I do have this entry. Now for projects, I can select it. And what will happen here in the first step is that sagemaker projects will create two code commit repos for you in this AWS account you're using. So here you can see I do have two repos, one for the model build and training pipelines, and one for the model deployment pipeline. And all of the automation is set up. So whenever I commit code, push code to those repos, that will actually trigger a pipeline execution. And I'll show you how this looks like. So, back here in my main notebook, the first thing I'm also doing in this notebook here is to clone those code commit repos locally into my sagemaker studio environment. So this is what I'm running here for both code repos. I'm also removing one of the sample repos encode that are in here. And I'm triggering the first execution of the pipeline by actually copying over my sample pipeline that I've built before in the demo. So if I go to the file browser here on the left, you can see I've cloned down those model, build and model, deploy code, commit repos, and I can click into those. And I do have all of the needed code already set up all of the triggers in the AWS account. So what I can do here is in pipelines, I just need to add my own pipeline execution code that I want to run on the code change. So if I go in here, you will see I'm using the exact same Python scripts for pre processing, for model training, and for entrance that I've showed you before when I was manually building this pipelines. So especially the pipeline py file. If I open this one, this should look pretty familiar to you. Hopefully this contains exactly the code that I've been showing you before when I was building the pipeline in the notebook. The only difference here is that I'm now tuning this in the Python file. Programmatically, when I trigger the pipeline, run back to my notebook here. So what I've done is I've cloned the repos here, and I'm coding over my sample code into those repos, and then I'm committing them into the code commit. And you can see here it detected that I have removed some sample code. And I've also added my own pipeline code. So that should be enough changes hopefully, to the repo. I'm making sure I keep track of all the variables I'm using here. And again, here's the pipeline py file, which I just showed you, which contains the pipeline code to run. And this is exactly what I have set up before. And when I was doing the code commit and the code push to the repo, this set up all of the CI CD automation that the template set up for you in the AWS account and started the pipeline run. So if I go down here into my projects again and click on pipelines, I can actually see that here is a pipeline that got started and I can select it and it has a succeeded execution run already. So I've started this sometime before the session. All right, so let me show you how this automation works. So what happens is that projects integrate with the developer tools, for example with code pipelines and has this automation built in. So the first step, as you can see here, is creating the source code needed, and then it's going to build and start the pipelines execution run. And this is done with code build. So if I now jump into the build projects, you can see here that our build projects already in place. And the first one is the model training pipelines, which just succeeded. You can also see that the model deploy is currently in a failed state, and this is because it doesn't have an approved model yet to actually deploy. This template is also set up to have a two phase deployed. Phase one is deploying in a staging environment, which for example a data scientist could approve after evaluating the model. And then also it comes with a second stage to deploy into a production environment, which would most likely be another team to approve. For example, the integrations team, the DevOps team, the infrastructure teams. So I can click here into the code pipeline and I can see that my latest run here succeeded. The first one is the one I stopped and deployed from the sample code. So let's go back here to my environment. So what I need to do once the pipeline has executed, I can list the steps again here. I can see everything looks good. I can also list the artifacts again. And this looks familiar to the one I showed before. It's the exact same pipeline, all the artifacts that contributed to each steps. What I do have here now is a last step. That is an approval that is needed to actually deploy this model into the staging environment. And if you can remember in the previous demo, I showed you how to approve the model here through the studio UI. What I can do now here as well is to approve it through the API programmatically. And this is what I'm going to do here. So, in here, I'm looking for the executions and I'm grabbing the model package arn where we registered the model. And then I'm going to update the model package here and approve it for deployment into this first stage, which is the staging environment. So here I'm going to update the model package. I'm setting the status to approved, and then I can check here for the model name, and we'll see that the model starts to get deployed into the staging environment. Let's see the deployed pipeline. So what happens here is once I've approved the model for staging, it actually started the second pipelines here, which is the model deploy. And you can see here it started building the source, and it's currently in progress deploying the model into the defined staging environment. I can also have a look here. So I'm looking at the endpoint that this pipelines will set up. So if I click here, I will see now that here is an endpoint being created on Sagemaker for the staging environment. So this will take a few minutes for the endpoint to be ready. All right, the endpoint is now in service. And if I click in here, I can see the rest API I could call to get predictions from my model. Now, let's check this. In the notebook here, you can see the endpoint is in service. And what I do here is I pull in a Tensorflow predictor object, which I can create, and then I'm going to pass in some sample reviews. Let's say this is great, and I can run this, pass it to the predictor, and you can see I get a prediction result back from my model deployed in the staging environment. Predicting this is a five star rating. Let's have a look at the code pipeline that we executed. So you can see here that the staging succeeded. But there is one more approval needed here for the actual deployment into a production environment. So this could really be something that another team handles. So if I'm the DevOps engineer, the integration engineer, I could make sure I'm running all of the tests that I need with this model. Now that is hosted in the staging environment. And if I agree that it's good to be deployed into production, I could either use here the code pipelines to approve the model and deploy to production, or I can also obviously do this programmatically, which is what I'm doing here now in the notebook. So again, I review the pipelines and what I'm doing here, exact same thing is that I'm programmatically approving this for deployment in production. You can see this succeeded. Let's actually check our pipelines and there we go. You can see it took the approval and is currently now working on deploying this model into the production environment. As pointed out earlier, the ultimate goal is to build AI ML solutions that are secure, resilient, cost optimized, performant and operationally efficient. So in addition to the operational excellence which we discussed in the context of mlops, you also need to incorporate standard practices in each of these areas. Here are a few links to get you started. First, a link to the data science on AWS resources and the GitHub repo which contains all of the code samples I've showed. Also here are links to the Amazon Sagemaker pipelines and the great blog post. Again, if you are looking for more comprehensive learning resources, check out the O'Reilly book data Science on AWS, which covers how to implement endtoend, continuous AI and machine learning pipelines in over twelve chapters, 500 pages, and hundreds of additional code samples. Another great training resource is our newly launched practical data science specialization in partnership with deep learning AI and Coursera. This three course specialization teaches you practicals skills in how to take your data science and machine learning projects from idea to production using purpose built tools in the AWS cloud. This also includes on demand, hands on labs for you to practice. This concludes the session. Thanks for watching.

Slides

Download slides (PDF)

See all 23 talks at this event!

Conf42 Machine Learning 2021 - Online

July 29 2021

From idea to production: Automate your machine learning workflows with pipelines

Video size:

Abstract

Summary

Transcript

Slides

Antje Barth

Senior Developer Advocate - AI & ML @ Amazon Web Services

Join the community!

Featured event

2025

2024

Info

Conf42 Machine Learning 2021 - Online

July 29 2021

From idea to production: Automate your machine learning workflows with pipelines

Video size:

Abstract

Summary

Transcript

Slides

Antje Barth

Senior Developer Advocate - AI & ML @ Amazon Web Services

Join the community!