Transcript
This transcript was autogenerated. To make changes, submit a PR.
Jamaica make up real
time feedback into the behavior of your distributed systems and
observing changes exceptions.
Errors in real time allows you to not only experiment with
confidence, but respond instantly to get things working
again.
Close hello
friends. Thank you everybody for taking the
time to join our session today with you.
It's me, Soumen Chatterjee, partner Solution architect
AWS and my colleague Natalie, senior applied
scientist AWS taking you through the cows
engineering at at at. At. At. At. At the
age of AI and know that AI is omnipresent in our everyday
life. Embedded AI is embraced in
almost all types of business fabric. AI is at
the heart of everything we do. Chaos engineering,
on the other hand, has become a no brainer testing approach.
Adversarial ML techniques and attack spectrums
raises the unique question, if ML model testing
still benefit from chaos engineering approach, how do we
adopt the Cows engineering strategies within
ML lifecycle? This session will guide you through building
an approach of chaos engineering at at the age of
AI. And let's quickly
take you through the key topics or our agenda
today in the session. Everything fails all the time.
Introducing machine learning and its complexity model
performs and predicts may not fail adversarial
introducing a significant business impact AWS Sagemaker
model debug and monitoring chaos engineering as a continuum
product thinking versus every time everywhere.
Can you guess what will happen here? There is a man
walking, then decide to go over the fence.
Let's see what happens.
Still time to guess.
See,
yes.
So, so that is very,
very familiar, right? Isn't it? So, in our every
day life, when we do our programs, we write
programs, we write our model. It follows very similar
situation. Sometimes it is very unpredictable.
One simple things can shut down
your whole system, or it
can shut down your program. It can do a completely incorrect
prediction. So everything fails all the time.
That's really a great quote from our
CTO of Amazon.com.
What about machine learning? Can it fail too?
Let's have a quick review of what ML
does. So what is machine learning? So we
all know what machine learning is in
a simple way. So using technology
to discover trends and patterns, use compute,
mathematical, complex computation mathematically
to predict models based on factual past data.
So past data, statistics and probability theory
are the key tools used to build machine learning models and make
predictions. So where traditional business analytics aims
at answering questions about past events, machine learning
aims at answering questions about the possibilities
or probabilities of the future events. So when
to use machine learning? There are four key categories,
I would say, where we tend to use
machine learning more than anything else.
So category one is use ML when you can't code
it. So complex tasks where deterministic solutions don't suffice,
like recognizing speech or images. Category two,
use ML when you can't scale it. So replace repetitive tasks
needing humanlike expertise.
Example would be recommendations,
spam, fraud detection, machine translation.
Category three is where you need to adapt or personalize
like recommendation engine or personalization engine.
And the fourth category where you can't track it like
automated driving. And imagine all this category. As you
know that all this category is very much dependent on the data,
your model will be as good as your data.
So in category one, for example, you are in a
manufacturing unit or a car manufacturing company.
Or it could be a production lab for
any fast moving goods or any food supply
company. And they are heavily
dependent on machine learning, especially computer
evasions and images for the production quality
detections or finding any faults
in the system. And guess what?
If you get a program which
manipulates your model through a different
set of data input and then that will change the
prediction or the behavior of your model
and that will be fatal. Sometimes that could be fatal,
right? If it is a manufacturing company and you are not able to detect
your production issues through
the images, that could be a fatal
things. So these kind of things
we call the data drift. Like if we manage to drift
your data to a different segment of
the data and that can change the
behavior of your model. Similarly, in the category
two where we can bypass a lot of content
instead of spam, it can look
like a legitimate good content and that can go to your normal
folders like an email spam, right? And EU and we
all are facing that every day. You will see that there are
lot of models like a lot of content which
are bypassing and coming to our main folder which
are good candidates for spam. Now imagine
the fourth category where you are building your automated driving.
A lot of things, I would say, if not everything are
AI AML dependent, right? If your model is
not able to detect the right level of speed from
the speed signs outside or
stop signs, or not able to detect the objects
on the way. So that could even be another fatal
example. It could be life threatening for the
users of the particular automated car.
So these are the scenario where machine learnings are
greatly, I think, great examples and had been
adopted significantly across the industry,
but at the same time adversarial or adversarial input,
I would say, right,
that trade vector introducing a lot
of different kind of challenges for
your industry and how that impacts your
business. And that's one
of the key intention to introduce today.
Little later, I think Natalie will take you through some of the great examples
of adversarial input of the data and how that
impacts your machine learning, its prediction quality
and how that disturbs the model prediction.
So AI and AML systems are actually very
complex. It's not because
it's a complex algorithm, it is because it's iterative
and it has got many steps, especially the stage
one itself, like preparation or prepared stage, right, where you
collect and prepare training data. And this data,
one of the key criteria for this data.
For a very successful or accurate, or I would say high
performing models, you need to have representative
data or samples you need to collect.
And that's not an easy stage, actually. So the
building is relatively easy in terms
of the model, which you call the algorithm,
right? Or that's the heart of your
code, but that's not heart of the whole thing.
So model or algorithm is just a one part of
the successful ML ecosystem. So data,
how you train it or how you tune it
and how you manage your evasions, how you train and
debug are the key part for any
successful model deploy or delivery, and then
how you manage and scale and monitor
its predictions and accuracy quality. So that is
another complex aspect of a successful
full model.
So traditional software and program testing.
So traditional programs are deterministic, as you all
know. Right? So it's based on a fixed set of heuristic
rules. Generally, testing,
traditional software includes unit testing,
regression and integration testing. But in
our ML system, in the world of ML,
ML systems are not heuristics, they are stochastics
and probabilistic. What it means that starting from
left to right, all the stages,
like pretrain, post train,
integrate every stage, it goes through
a flow of data and comes
as an output or model. And every stage,
every different data set or data flow, changes these
and refine this model as a final outcome
of the model. So, model learns from the data provided
and used for the training all the time.
And now, coming to the, in the context
of chaos engineering, right. In the chaos engineering, we follow
all the usual steps, right? But I think there are two key
stage, I would say are worth noting, which are different,
quite different compared to traditional software
and machine learning. So there is state
called steady state, right? But in the ML system, there is no
as such steady state, because models are
not steady. Like, if you think that your model is
just static model and it
is now functioning, maybe 98% accuracy and everybody
is happy, it may not function
continuously like that. Similarly,
the verify stage, we inspect metrics
and plots summarizing model performance, not like
verifying certain test that it passed or failed.
So it is quite different, especially these two stage
compared to any other software and models
moving to the next one. So model evaluation and
model test. So ML systems, unlike traditional models,
does not produce a report of specific
behaviors and metrics, such as how many tests passed
or failed or it has got 100% code coverage,
so on and so forth. Instead of that,
what do we do here? We perform the model evaluation towards
performance analysis. Whereas model
test is an approach towards error analysis,
developing a model test for ML systems can offer a
systematic approach towards error analysis.
For machine learning model, we inspect metrics and plot
summarizing models performance over evaluation data
sets.
So now we are entering into
adversarial machine learning stage and
it can get more complicated.
Machine learning obviously we are talking here when
adverse serial input gets introduced. So ML models
are vulnerable to such inputs. So adversarial
machine learning is a machine learning method that crafts
input to trick machine learning models to strategies alter
the model output and there are three different kind of attacks
are predominantly observed,
evasion, poison and model extraction.
So types of detection method in individual input samples
and distribution ships.
So chaos in practice,
I think watch this for another few
seconds and you will really enjoy it.
So chaos engineering is the discipline of experimenting
on a distributed system in order to build confidence in
the system's capability to withstand turbulent
conditions in production.
Break your system on purpose, find out their weakness
and fix them before they break when least
expected. So I'm going to hand over to
my colleague Natalie now. She will take you through how
to detect adversarial samples. She has prepared a few
fantastic examples from our lab. She will also take you through
our AWS edge maker tools
for model monitoring, debugging and how
to handle adversarial input in
our model management and model lifecycle.
Over to you, Natalie. Thank you. Let's start with
an example first.
Quite often, adversal samples are very sophisticated and
difficult to distinguish from normal samples. On this
slide here, I'm showing you an image that I have been taken from
the Calltech 101 test data set.
We can see that the model correctly predicts the image
class starfish. Next, I use the same
image and I apply an attack on this image.
I'm using the attack technique PGD, which stands for
projected gradient descent. This technique
uses an epsilon parameter to define the amount of noise that
is added to the input. The higher the amount of noise,
the more likely the attack is going to be successful.
What we see now is that we can barely distinguish the original test
image on the left from the adversarial sample on the right.
But the model does no longer predict the correct image class.
Next, I increase the epsilon parameter.
Again, the model does not predict the correct image class, and we
cannot see much of a difference between the images.
When I increase the epsilon parameter even further to
zero five, we finally see some artifacts
that have been introduced into the input image.
Adversarial samples can have devastating business impact.
Imagine, for instance, you have an autonomous driving application,
and as part of this application you have a traffic sign classification
models. The image that I'm showing here is taken
from the german traffic sign data set. We see on the left side the original
input image that shows a speed limit
sign of 80 km/hour again,
in the image in the middle, we can see barely a
difference, but the model can no longer correctly
predict it. It predicts now the traffic sign,
AWS stop sign. And when
we compare the difference between the both, which is indicated here on the
right, we see some difference in the inputs.
So now how can we detect these adversarial samples?
Let's take a look at the model input distributions. To do that,
we use TSNE. TSNE stands for dedistributed
stochastic neighbor embedding. It's a technique that allows
you to take highly dimensional data and map it into a two or three dimensional
space. The image that I'm showing here is
I have taken a set of test images that
are indicated as orange data points. And then I applied the
PGD attack on it. These samples are presented
as adversal samples, indicated as blue data points.
Then, for each of these images, I compute the TSNE embedding
and visualize it. After all, in this two dimensional
space. So each data point that you see here in the image presents
the embedding. For an input image. What we see is that there is
no difference between the orange and the blue data points.
That means if we would use now a
technique to distinguish adversarial and normal inputs
in the input space, like the images,
you would not be able to distinguish them because the distributions
look very similar when
you look at deep neural networks, then these are models that consist
of multiple layers. Each layer learns different kind of features
of the inputs, and they create different representations.
So, the same analysis that I have shown you on the previous slide,
I'm repeating now this analysis for different representations produced
by different layers. In the model, the layer zero
corresponds to our input layer. So that is basically
the input images and again, we don't see much
of a difference between adversarial and normal samples.
Next, I take the activation outputs of layer four and
I repeat this analysis. And again there
is not much of a difference. In layer eight
we see now that normal and adversarial samples cluster
slightly differently, the same we observe in layer twelve.
When we go to layer 14 we see an even larger
difference, and in layer 15 we see now
a clear distinction between the adversarial and the normal samples.
I have done this analysis on a REsnet 18 model. So the
layer 15 was the pen ultimate layer. In my model,
the penultimate layer is the layer before the classification
is done. And when we create adversely
samples, then the goal is to change
the model prediction. So that means that before
the outputs go into the classification layer, they have to create different
representations to lead to a different classification.
What we also observe is that in initial layers we
cannot well distinguish between adversarial and normal samples because the initial
layers learn mainly basic features of your inputs,
while the deeper layers of your model learn more complex patterns
of the input data. So this analysis shows us
if you want to detect adversely samples,
then we need to use the representations that are produced by the
deeper layers of a deep neural network.
What we can do is we can now apply statistical test to
distinguish between these distributions. We can use
a two sample test using MMT, which stands for maximum mean
discrepancy. MMD is a kernel based metric
that allows you to measure the similarity between two distributions.
The distributions we are going to compare are these layer
representations captured from the intermediate layers of your deep neural
network, and we capture them during the validation phase in
the training because these presentations
present the normal samples and
then during inference we capture the same layer representations
and then try to see if the inference data matches the
data during training.
Now I would like to show you how you can detect adversarial inputs
using Amazon Sage mega model monitor and debugger. So basically,
the analysis that I have shown on the previous slide we
are now going to deploy on Amazon Sagemaker and run it in production.
First, let me give just a brief overview of what Amazon
Sagemaker is. Sagemaker is our fully managed
machine learning service that you can use to more easily build, train and
deploy machine learning models. When you think about machine learning,
then it's not just about creating and training a model, but there are many
different steps involved. For instance, you need to
create training data set. You need to build and train models.
You need to perform hyperparameter tuning.
Then you maybe want to compile the model for faster inference
and then you need to deploy the model to the cloud or to the edge.
Amazon Sagemaker provides features for each different step
in this machine learning lifecycle. And as
part of the workflow that I'm going to show in a few slides,
the main features I'm going to use is Sagemaker debugger and Sagemaker
model monitor let's take a brief look at what
Sagemaker model monitor is.
Model monitor is a feature of Amazon Sagemaker that allows you to
detect data drift. So once you have trained an Amazon
a model on Amazon Sagemaker, you can now deploy it
as an endpoint.
Now when users are interacting with your endpoint,
model Monitor will now automatically capture requests and predictions
and upload them to the Amazon S three bucket.
Model monitor will also create a baseline processing job.
It basically takes the training data and computes some
statistics on it. So for instance, assume you have a tabular data set.
Now this baseline processing job will check
the different columns in your tabular data set, such as
the min and max value average, and then
in the deployment phase, you can now specify a
scheduled monitoring job that will run once an hour or once a
day. You can specify the monitoring interval
and this job will basically take the requests and
predictions and compare them against the baseline. As an
example, let's assume in your tabular data set column one was always between
a value of zero and ten during training, and now during
inference, it has a value of -100 to plus
100. Model monitor would automatically detect
this problem and record these violations
and statistics in an output file that is uploaded to Amazon
S three, and it will also publish some metrics to Amazon Cloudwatch.
Let's take a brief look on what Sagemaker
debugger is. Sage Maker Debugger is
a feature that provides you utilities
to record and load tensors from your model training.
It's typically used for training, but you can also use it for inference.
It comes with an API that is called SM debug.
It's open sourced, it's a framework agnostic and concise
API to record and load tensors. It supports the
major machine learning frameworks and it also provides the concept
of rules to automatically detect issues, which is
very useful as part of the training. You can also customize
and extend sagemaker debugger.
If you use debugger as part of Amazon Sagemaker, you can use built in
rules as well as offload the rule analysis to separate instances and
specify rule actions and notifications as
part of the workflow to detect adversarial inputs I mainly
use the SM debug API to capture the layer representations.
Let's take a brief look at the SM debug API. So with just
a few lines of code, you can enable debugger to capture
certain layers from your model. So as we have
shown on the previous slides, I was analyzing the
TSNE embeddings and that
I computed on the different representations produced by the deep
neural network. I use the activation outputs.
So what I do, I specify now a debugger hook configuration
where I can just specify a regular expression of all
the tensors that I want to have collected, and I specify the
output path where this data should be uploaded to.
So with just a few lines of code, I can now enable debugger and
capture this data. And then once you have captured the data,
you can easily access the data using the same API.
You specify where the data has been recorded and with
that object that is created the trial object, you can now start iterating
over all the different inference requests that have been recorded,
access the tensor, and then do a computation on that.
So now I would like to show you the system design for detecting adversal
inputs using Sagemaker model monitor and debugger.
So first I train the model on Amazon Sagemaker
and enable debugger to capture the layer
representations. These tensors
are then uploaded to Amazon s three. Once the model
has been trained, I will now deploy it as an endpoint on Amazon sagemaker.
And in the endpoint I also have debugger enabled to
capture the layer representations during inference.
Now my users may interact with the model and send some inference
requests, and my model performs predictions.
The layer representations as well as the model inputs are recorded in
Amazon history. Now I use a custom
model monitor to basically run this two sample
test using MMD. I run this every hour.
So every hour it will capture the layer representations that were
recorded during the training and compare that with the layer representations
recorded during inference.
And it then runs this two sample test and if an
issue has found it will record some metrics
as well as write an output file to Amazon S three with the
recorded violations. And basically this custom model
monitor will now output the result of which kind of
images have been the most likely adversarial. So as
a user, you can then download this file from Amazon S three and perform
some further investigations.
You can also use Amazon Sagemaker Studio to get some further insights.
Amazon Sagemaker Studio is a machine learning IDE,
and you can check, for instance, the execution
of each of these model monitoring jobs. What we see here
is that in the first hour the model monitoring job did not find
any issue, and in the subsequent hour I send adversarial
samples against the endpoint. So an issue was detected by model monitor.
The custom model monitoring container also outputs some metrics
to Amazon Cloudwatch metrics, such AWS, the inference request
process, and a detection rate that is indicated AWS orange
line shear that indicates how many of these inference requests were
detected as adversarial. And this is computed
for every hour. So you can use that to determine if there was an attacker
active at a specific time frame. What we see here is that the
detection rate was roughly about 100% at
05:00 a.m. To 06:00 a.m. And then it started dropping.
So around 07:00 a.m. The attacker was no longer active.
Sagemaker Debugger stores the tensors in Amazon s
three so as a further analysis, you can now, with just
a few lines of code, use the SM debug API to do some further analysis.
You can now just create this trial object to access the
tensors that have been recorded during inference,
iterate over each inference request, and visualize,
for instance, the TSNE embeddings for each of these
tensors to see. How does the distribution of
the representations during inference compare with
the ones that have been recorded during training?
With that, I would like to conclude my session.
Thank you, Natalie, for taking us through those
great examples of adversarial input and how
to deal with that. We needed to build systems
that embrace failures as a national occurrence.
Another fantastic example, and quote, I always
refer from our Amazon CTO.
So chaos engineering as a continuum. Let's build confident ML
systems that withstand turbulent conditions and
adversarial inputs every time it runs, not just in production or
any particular moment of time.
And I would like to thank you all for
your time today, your interest in our session,
and spending time with us today.
So thank you for that. And this is not
the end. If you really wanted to know more or wanted
to be in touch, please feel free to reach
us, or you can connect us through LinkedIn. Once again,
thanks everybody.