Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, I'm Karl Weinmeister with Google's developer app SC team.
Today we're going to talk about ten things that can go wrong with ML projects.
Let's get started. So machine learning practitioners are
solving important problems every day, running into a
set of unique challenges. Today we're going to talk about the best practices
and tools that can help address them. The issues we're
going to discuss fall into four categories, building a model,
model accuracy, transparency and fairness,
and mlops. So let's start with our first problem.
It's all about these business problem you're trying to solve.
So many organizations are transforming and
really changing how they do things with machine learning, but we see
other companies that are really struggling to get value out of those machine learning
projects. So it's key that your machine learning model is
aligned with your goals. It's also important that
when you're figuring out if you're doing well, you have a baseline
where you can evaluate how your model is doing. You need to know
how existing approaches are working, whether they're manual,
whether they're implemented by traditional software development
systems, or in a previous form of machine learning.
You need to know what your starting point is, to know how much you're improving
based on that. So it's key to know what that baseline is and the goals.
So let's talk a little more about that. So I recommend watching
this video coming from DeepMind, Google's research
organization, where there's a great video on product management
for AI and the speaker talks about a lot of topics.
It's about a 30 minutes video. Some of these key takeaways that I took from
that were staying focused on your tools
as a project, but being flexible on the tactics.
It's inevitable that you're going to run into things that don't
work along the way. So you just need to keep adapting but not losing sight
of what your ML solution is aiming to achieve
for your users. Secondly, from a scheduling
perspective, it can be hard with machine learning because it's
often a large research and discovery. Part of the mission
is where you start things, where you're not sure exactly how things
are going to work. You might not have an answer.
You're learning a lot as you go, and that's hard to plan around. So setting
milestones that you can adapt and change
as you go is fine, but it's important to plan things
out with milestones as you go to sort of have some structure
on your project. Finally, having users involved
at each stage of your project is important. So you
may still be doing a lot of research work, but having
some early insights and lessons to share with your users to
ensure you're on track is often very valuable. So some
of these insights will help ensure that you're getting value out of your AI
projects. The next thing I want to focus on here is
that your problem needs to be a good fit for machine learning.
This isn't an exhaustive list, but here are a few things that
you might want to keep in mind when you're wondering, is this an
area where machine learning can help? The first group is
predictive analytics, and these are problems. Theyre you have historical
data and you want to look at trends that are going to happen in the
future. Whether it's looking at past,
say, transactions and trying to figure out if a new one
is fraud, looking for equipment,
if it's going to fail over time by looking at information about
heat or vibration, et cetera, being able to extract those
patterns for when the systems are going to fail, being able to fix them beforehand.
There's all kinds of situations like this. We're using historical data to predict
some new future. There's another class of problems around unstructured data.
This is where you have data that's not in the tabular
format that fits into, say, databases. This is where you
have images, text, et cetera. There's a
variety of different use cases here, but just a few examples
of things like maybe triaging emails if
there's a large load for the customer service organization, being able to
cluster and move items to the right
place. Another area could be automation, where you have
a manual process and you're trying to automatically fulfill
some step of that process, and you see a few examples of that.
Finally, personalization, where you want to understand how your
users tick and you want to be able to provide
them useful information. Next steps that help them
in your application to achieve what they're trying to do faster,
easier, et cetera. All right, so let's move on to the next
part of building a model. A huge problem can be jumping straight into
model development with these prototype. A machine learning project
is an iterative process, so you start with something simple and
you refine it as you go, and many times something
simple if it achieves your goals, is fine, but it's
always good to start small and expand from there. A quick prototype
can tell you a lot about challenges. Those could be access
to data as you start to build out that initial model. Maybe there are teams
you need to work with to request data or integration points
that you weren't aware of. Maybe you really struggle with getting
a decent model accuracy. There's all kinds
of other questions you can find out and that will help with scoping out
the length of the project and be able to understand what you're
getting into. Starting with that prototype, there are
a couple of tools that can help. So let's first start with Bigquery ML.
So Bigquery ML allows you to create machine
learning models directly from the data warehouse in Bigquery.
So you'll see a little bit in this animation where you can
write some SQL statements for deep neural network
models, logistic regression, all these different
types of models, even time series forecasting. It allows you
to get started quickly. By the way, this is a full fledged production
system as well. If you want to run your models out
of bigquery completely, a great option for that,
but allows you to move quickly, try some things out, see what your
baseline accuracy is. Similarly,
AutomL is another great option. If you want to write
a custom model, theyre you take your own data. Use AutoML to handle
the training, deployment and serving steps, and then generating a rest API.
It can wrap all that into one user
interface or SDK to enable you to do all these steps quickly
so that can serve as performance baseline.
It also can help with explainability where you can get feature importances
so you can know maybe where AutoML is focusing in theyre
there's signal. Maybe that's an area where with your data engineering, you can dig
in a little bit to extract some more features from it.
Problem number three, model training taking a long time.
Not sure if you've ever run into this. If you're working with a larger model
where it can take days or even weeks in some
cases to run, and that just really mlops down your team where
you might reach a point where you just have to wait until tomorrow
to find out what you can do next. So that slows down that process of
innovation and can really harm
the success of your project. Serverless training with Vertex
AI can be very helpful for this because it allows you to submit
training jobs across a distributed infrastructure using
gpus and other custom
chips that you'd like to use, and building models
in all different kind of frameworks. Using a container image as your
base allows you to do even things like hyperparameter tuning
where you can create multiple models and find out
the one that's working best. And this really allows you
to speed up that training time. And as
well as you see here in these screenshot, things like getting access
to logs, downloading your model
storing it in these cloud and managing on the cloud it will take care of
for you as well. Another option as far as
training quickly is cloud tpus or
tensor processing units. And those are custom
chips that are built for machine learning workloads.
They allow you to create models
very quickly at high scale, and can speed up that training
even more if that's something you want to use. All right, let's move on to
the next group of issues around model accuracy. So,
first type of issue could happen when you have an imbalanced data set.
So many machine learning tasks have many
more examples that fall into one category
than the other, right? So let's take fraud detection as an example
where fortunately, say, most of the data is benign,
theyre they are not fraudulent transaction, and you have a few that
are. It's like finding a needle in the haystack. So we
could have a trivial model that simply predicts that everything is benign
and I would have a good accuracy, but that's not going to add
any value, right? So what you
want to do is apply some techniques to make sure
that your accuracy is good across both of the
classes, even if there's not a lot of data for them. So let's
look at this next resource here,
which is a tutorial for dealing with imbalanced data.
So this is on the Tensorflow website, and it provides some of these
different techniques. So things like weighting different classes differently,
basically applying a greater penalty for
mistakes on, say, the class with
fewer examples. Other things
you could do are oversampling and undersampling, where you take
with the existing data you have, you basically can duplicate some
of those records so that there's a more balanced
number between the classes. Or conversely, under sampling,
where you remove examples from the class with more.
Finally, you could consider generating synthetic data. So there
are packages in Python, like one called smote, for example,
that can look at the distributions of your data and generate
data that's similar to what's in your training
sets. All things to try. Personally, I've had
the best success with waiting classes
to help with that issue.
And automl has some ability to help with this
as well, without even changing some of the class
weighting that we discussed previously. So when you're training
a model, there's something called an optimization objective, and this is
what you're optimizing for. And so
you can see here there are several different options.
And if you switch to something called the
area under the curve for precision recall, that is
generally better for helping with that
class with fewer examples in it, but you see there's
a spectrum of possibilities depending on if you're trying to maximize
accuracy for all of the data
or create a more balanced result,
et cetera. You can just customize using this.
It's under advanced options for model evaluation.
There is something you can do too, is where you create a model,
and then you can review the accuracy at different thresholds.
You can then review the confusion matrix.
And if you see that below, this is an example of flight delays.
And again, very good to see that most of
the time, flights are on time, although it might not always feel that way.
We can see here that there's definitely a difference in the
accuracy with flights that are delayed,
a little bit harder to pick those out with this particular model.
So you could go through this process,
look at the results for each of the different classes,
and then perhaps come back and adjust that optimization objective
if you'd like. So, model accuracy, this is a huge
one. And this really can be where projects might
not even succeed. They get completely stuck, right? So you are
creating a model and you just get to a point where you can't improve the
model accuracy anymore, and it's not good enough. It's not going to really add any
value for the business. And sometimes
it just is what it is, that it's a hard problem to solve,
and there's not that signal available in the data,
but often, with some creative thinking,
you can move past those obstacles. So let's go back to this example
of flight delays. So, on the left, I have a
research paper that talks about historically, what are
the reasons for flight delays. So I was actually looking at modeling
this problem, and I started with data around
the start and end times of the different flights,
the carrier, et cetera. And that gave
me some information. But what I did was I augmented
my data with weather information. So I took a bigquery
public data set of weather.
I had Latlong coordinates, I joined
that against the airport. So I know, okay, well, the arriving flight
is going to have hail in it, things like that. And that definitely
improved my model. No, it actually wasn't a huge
increase. And this data kind of shows that
6% of the root
cause of flight delays and cancellations due to extreme cancellations,
but the biggest one is aircraft arriving late. And this would be an
example of if you have better data,
you might be able to work on some data engineering to
look at the whole flight graph and
what flight is coming into the flight that you're trying to predict,
even multiple flights back, using that as information. So the
point here is really understanding these problem is
key. It's not just about the algorithm and the math
that you can often make a much bigger difference by
understanding these domain that you're in. So that's my number one
tip is look at improving domain expertise for the
problem you're trying to solve. Really dig in, ensure you have the right experts
on the team and as a data scientist learn
the domain the best you can and you'll probably think
of some things that are going to help you. Secondly, and sort of
related to things, including more data, that always
helps improve your accuracy, of course, and varied training data of
different types for different tables that are going
to add some diversity there for your model.
Feature engineering of course is always useful to
unlock information from some features. Maybe I'm
making this up. You have a date field and you want to extract
whether it's a weekend or a weekday. There might be
different patterns for that, all kinds of different things
that you can do with your data to ensure that
your model can take advantage of it.
So looking at some of the other things, consider removing some of the features that
are causing overfitting where you're sort of locking into
noise. I might suggest with what we talked
about earlier, start with a smaller model and then incrementally
add features back. Also try different model architectures.
This is traditionally what we do in data science. Try a
different model architecture, different number of layers,
hyperparameter tuning, ensembling your model after
you've done some of these more fundamental things. Finally,
just for a gut check, always doesn't hurt to try automl
to see what kind of performance is possible
with the input data that you have. Let's move on.
Transparency and fairness, our 6th issue,
your model doesn't serve all of your users well.
And so this issue
is a very important and complex
issue that when you look at how you're dealing with AI
development, you want to look at a responsible AI framework to help with
this. So here are a few questions to ask as you're building your
model to help. So the
first is around some of the business questions we asked before.
What is the problem you're trying to solve? Who's your user?
Moving on. Things like the risks, success factors. Now we're
starting to move on to data. How was the training data
collected, sampled, labeled. There's all kinds
of issues that can pop up into this phase of the project.
It's key to ensure that you're collecting data
and those sampling and labeling processes ensure you
have a representative sample. I'm not going to
go through all the points here, but it's worth, as you go through the training
problems, the evaluation process,
you are considering all of these important questions when
the model might have some limitations
that you want to document and you want to document how
you collected your data from end to end, so that you
can continue to assess where you're
at from a responsibility perspective and keep improving on
it. There are a couple of tools that I'd like to mention here
that can help. The what if tool allows you to
slice your data by various factors
to see why predictions happen the way that
they did. This can give you a much more
detailed understanding of your model accuracy versus a simple statistic
like it was 98% precision
or something like that. And you can also things is why
it's called the what if tool. Actually change some values and
see what happens. If I change a value slightly,
does that change my prediction? So it's almost like a debugging
tool for your model. Tensorflow model analysis is
also helpful. So what this can help do is it
can produce estimates of
performance by slice of the data. So let's take a look at
what that really means. Here is an example of Chicago
taxi trip data, where we're estimating what
the tip is going to be for a taxi trip. And you
can see some statistics here. The bar graph shows
you the number of samples at different hours of the day.
So we're seeing that 05:00 in the morning,
06:00 in the morning, much lower number of trips.
Well, that might impact the accuracy. It's an example of not
having a balanced data set, so it's actually going to slice per hour
and give you statistics on that accuracy.
So this will allow you to see by different
dimensions of your data how balanced the errors
are. So it can be a very useful tool. So we talked
about assessing the accuracy and equitability
of the model. Now let's look at how to document
it. So ML models often get distributed
as is. Here's these model, it just works. Okay?
And think about it. With software development, they always
have tutorials,
documentation explaining each
part of the user interface, and glossary, et cetera.
Let's think about these same concept applied to ML models.
Right? So first is being able to explain
what's happening under the hood. So for a variety of different
data types, it's important to look at why
these predictions happened the way they did. And explainable
AI can tell you which features are
driving the decisions. So you see some examples
here, maybe in an image. What is it about the image that was critical to
making that determination, or in, say, in tabular data.
We see that distance was the most important factor in our
model's predictions. Today on Google Cloud, we support explainability
across multiple layers of the
platform, from automl to prediction to using
our sdks to perform explainability.
From your notebook and model cards allow you
to document your model. You can specify information
such as how you collected the data. You can
put graphs around your performance curves,
the model architecture. This is what we've done for the
object detection API.
And you can see there's a model card toolkit that allows you to generate
these model cards based on information or model, or even
attach it to your Tensorflow extended pipeline to generate one of
these automatically. All right, so our final class of problems,
ML Ops, are machine learning operations.
So what if you built a model that just
was a bad model? It was built on some training data
that had just a bunch of
data quality issues, and it somehow got into production. All kinds
of users were impacted by that. Definitely not something
that you want to have happen. Vertex pipelines can help with that by
allowing you to codify the set of steps to build a machine
learning model and providing guardrails
around deployment so you can implement steps,
or rather processes like continuous integration,
continuous deployment, and include tests along
the way for each of these different steps.
Pipelines allow you to build a custom machine learning pipeline.
Next issue, your model accuracy is drifting downward. So most
software projects you can have or should have unit
best on them, where you can evaluate whether your code is working or
not, and the unit test will tell you if it's working or not.
It's binary, it works, or it's broken. It's a little
more subtle with machine learning, where things might drift
slightly, the data distributions,
for whatever reasons, for whatever you're modeling over time,
may change due to outside conditions changing.
So how do you detect and manage that?
There are a couple of processes to consider. One is continuous evaluation,
where you're regularly sampling your model's predictions, comparing those
to ground truth, and these assessing
the accuracy. Another thought is continuous
training, where you deploy an
NML pipeline that extracts the data, trains a new model,
tests the model, of course, and then deploys it to production.
And each of these different steps can
work together to look for issues
when you've drifted away from a certain threshold and
then preventing the issue by training models
on a recurring basis or when a
certain amount of data changes. So you want to find that right sweet spot,
not create too many models with every small change that's happening,
but not let your models get too stale where their
performance starts getting impacted. Vertex model monitoring
can help around detecting these issues as
far as drift or training serving SKU.
So maybe you're starting to see where your users
are making a lot of prediction with data that's much
different from what you originally trained on. That might be a warning
signal that your model isn't quite as applicable
as it could be. So this gives you an additional layer of confidence
in your model reliability. Now,
our final issue is around model inference.
So this is when you've built a model, it's deployed and
people are using it, or they're making inferences or predictions on your
model. So it's a success. You solved the problem at
the accuracy level you were looking for. It's integrated
into widely used application. Now, how do you handle the
spiky workloads that may result? And how do you avoid
over provisioning infrastructure while preventing errors
or high latency? If you don't have enough infrastructure
set up, vertex prediction can help with that, because that can allow you
to set up an online endpoint where you can
serve your model, and it will scale automatically based on
your traffic. So you can set up, say, a minimum number of nodes,
a maximum number of nodes. It will scale those up and down based on various
utilization thresholds. It will help you with logging,
it'll provide you the option of using
some powerful GPU chips,
and so you'll be able to ensure that you're serving the
right amount of requests with vertex
prediction at an optimized cost. So that
wraps up the ten different issues. I hope that was helpful. Let's look
at a few resources. Vertex AI is
the AI platform that we discussed today that can help with several of
these problems. Codelabs are a way to dive in and
use notebooks and building
models and basically get some training. They're free resources
at Codelabs, at developers google.com, and if
you're into learning via video, like this one,
AI Adventures is one of our video series that has a lot
of different resources around using Google Cloud for
AI. And that concludes our
presentation today. So I thank you for watching.
I hope you have a great day.