Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
My name is Shraddha.
In this talk, my co speaker and I will be sharing some practical tips
for breaking into the AI ML space.
This session is perfect for those who have, who are new to the field
or have recently graduated, are industry professions who are, who
are planning to transition into a new role, or anyone interested in
learning more about the AI ML space.
I hope you find, it valuable and learn something new.
If at any point in the talk, you have any questions, feel free to reach
out to us on our LinkedIn channels.
You can find the IDs tagged in the slide.
So let's start with high level objectives for this talk.
We will start by talking about why debugging is an invaluable skill.
You will clearly stand out if you can identify and solve Complex
issues that others cannot.
Then we will discuss the gap that exists between academic ML and practical ML.
This is particularly interesting because the new graduates are often
surprised when they discover that machine learning is more than building models.
The scale of model serving and delivery is not something fully
covered in graduate schools.
Lastly, as promised, we will share some practical tips and
tricks to break into the space.
Debugging as a scale is valuable for any engineers in any space.
specifically in AI.
It is definitely an invaluable skill in today's automation world.
And in this slide, I'm going to talk about why.
So, first of all, AI is everywhere.
From social media algorithms to recommendation systems
and even self driving cars.
AI is all around us.
As companies automate more tasks using ML, the demand for people who can
troubleshoot these systems will go up.
If an AI makes a mistake or isn't working as expected, it will impact millions and
billions of users all around the world.
So having the ability to spot and fix the issues is actually
very crucial and valuable.
Secondly, models aren't perfect.
ML models are built on data, and data can be messy, incomplete, and biased.
Debugging helps identify when a model isn't learning what
it should, and when it's not.
Making incorrect predictions due to underlying data issues.
For example, think about a recommendation system on a streaming
platform like YouTube or Netflix.
If it starts recommending shows and videos that users don't like because
of a bug in the model, the platform could lose engagement and customers.
So that directly impacts the revenue that the platform generates.
And ultimately revenue is everything.
Thank you.
Right, debugging the system keeps it running smoothly and make
sure that we have the engagement that we need from the user.
thirdly, keep the systems running efficiently.
So as automation grows, performance matters.
There is a direct resource and monetary cost involved with
serving these heavy models.
A well debugged system runs faster, uses less resources, and scales better.
So if you can debug and optimize a model, you can save company a
lot of money and time by improving performance and reducing the downtime.
So continuing from the last slide, my fourth point is it's
all about continuous improvement.
ML models evolve over the time.
They need to be retrained with new data.
And as, as that happens, new bugs and issues can emerge.
Think of it like tuning a car engine.
Even after the initial build, you, you need to keep making adjustments
to make sure it runs smoothly, especially as conditions changed.
Like, and the conditions do keep changing all the time.
Next point is we want to prevent big mistakes from happening.
So automation means that systems are often making decisions
without human intervention.
If an AI system makes a mistake, then the consequences can be large scale.
For example, think of AI used in finance.
if it incorrectly starts flagging transactions as fraudulent, this can
cause chaos to all the customers.
So debugging ensures that such critical systems make accurate
decisions by minimizing the error.
And last point is, it's a rare and a high demand skill.
So debugging ML and AI models is a highly specialized skill.
As automation continues to grow, companies will need experts who can
dive into these complex models and data pipelines to find and fix these issues.
Overall, in an age where automation is taking over, having the ability
to debug these complex models will become like a superpower.
It's a skill that's high in demand already and can open a lot of, can open up a lot
of career opportunities in the tech world.
Let's move on to the next topic.
We will talk about the gaps that exist between academic ML and practical ML.
Let's take a pause and look at these numbers.
These numbers are mind blowing.
Netflix, over 277 million daily active users worldwide.
YouTube, 500 million daily active users.
Instagram, 2 billion monthly active users with 500 million daily users
engaging with the app on a daily basis.
Think about the scale.
Each request Mind you, each request here is actually a user
asking for a customized feed, a personalized set of recommendations
made specifically for them.
But this customization isn't magic.
It happens by ranking millions and billions of options available
in the dataset for a given user.
Think about the number of combinations out of a million options.
How do you pick and choose the top 10 most suitable for a given user?
How do you pick such that user actually ends up making Taking an action that
will that we want them to take now Compare these numbers With data sets
that, that are used in graduate schools.
top of my mind, the most popular data sets that I can remember,
at least the ones that I used during my time was, Iris Dataset.
Iris, had about one 50 samples.
it consisted of Iris flowers from three different pieces.
and second one was, . It contained, grayscale images
of handwritten digits, from s.
It had about 60, 000 training samples and each image was a 28 by 28 pixel in size.
so we are talking about 60, 70, 000 samples versus millions of daily
active users and billions of requests.
on hourly basis.
So the scale is mind blowing.
So next we'll talk about the heavy duty models that work behind the scenes.
so ranking is done using complex models, and it's, they have a very
complicated architecture and they are trained on a recurring basis.
So the recurring schedule can be hourly, can be daily, or can be custom, like It
can be trained on every six hours, so it depends on the type of model, the,
the, the availability of data and how our entire data pipelines are set up.
So it depends on all these parameters.
so every time a model trains, it doesn't start training from scratch.
Instead they begin, begin, training from the last snapshots and
learn new weights based on the latest trends and data patterns.
so, BERT, as you can see on your screen, BERT, GPT 3, T5, these are
all tech summarization, chatbot, recommendation systems, and, These
models are used for all these use cases, statbots, recommendation
systems, and sentiment analysis.
And each of them have about billions of parameters.
And ResNet and YOLO, they are used for image classification and object
detection in computer vision tasks.
YOLO is, actually used for real time object detection in video streams.
all of them have, very specific use cases and, they have complex
model architectures and have billions and millions of parameters.
What these parameters mean, each parameter is actually a weight that needs to be
updated every time you retrain a model.
So after every time new data is comes into the, new data comes into the pipeline.
You start the training.
It triggers a new training run.
These model updates all, all these billions and millions of parameters.
and these process, this process involves calculating and optimizing weights.
and then, all these parameters are learned during the practice.
back propagation process of the training.
Compare this with, with your, with your graduate school models.
The parameters are not, in, in billions.
That's what I'm trying to say.
It's the scale that, that, that matters here.
So next I want to highlight the data pipeline, privacy and sensitivity
filtering that takes place, in the training, data pipelines.
so unlike in university, where, where you have a data set and you directly
train your model, you perform all kinds of data processing on it and
augmentation, data augmentation, data processing, pre processing on it.
But it's not like you, you don't filter out the data based on
privacy and sensitive information.
Right.
But you already have a data set available and you directly use it for your use case.
But unlike, unlike that, unlike in school, Production actually has a lot
of rules which changes from country to country and region to region.
We have privacy and sensitivity, sensitive data filtering, phase where
you have to, you have to, filter out data based on the, specific rules
in that region or in that country.
and these laws, are, developed by individual countries or, or unions.
So for example, GDPR is one of the, is considered one of the strictest and the
most comprehensive privacy laws globally.
It was devised by the European union and, is, It protects the European residents,
European residents, and because it mandates user consent for data processing.
It gives users the right to data access and deletions and imposes
heavy fines for noncompliance.
in fact, other, other countries and regions like the ones that are listed in
the table here, CCPA Bill C 27 by Canada, LGBT, by Brazil, Australia's Privacy
Act reform, all of them have, in some sense, been, inspired by the GDPR law,
GDPR took the lead, and, GDPR is like the first of its kind, that had come into
the picture, and then other, California, Canada, Brazil, Australia, they did, and
there are many other countries who have followed the, the, the And they have
come up with their own, laws to protect their residents, so that they can, the
residents have more control over what data is being collected about them.
They can delete their data and they can opt out of it.
So next, so far we have talked about, What, what, what model is, what, what kind
of parameters the models are trained on in the industry, what kind of data filtering
happens and why debugging it is complex.
so now let's talk about how, what happens once the data is filters, data is ready,
the, and the, and you have, You have fresh models, like you have, you have
trained your models on the clean data.
Now what happens?
So I think in the graduate school, that's when your task ends, right?
Like you have your model, which is trained, which is giving out prediction.
You look at the confusion matrix where you look at the true positive, false negative.
And everything else, and you look at the recall, you look at the
precision metrics, and you decide if the model is good enough.
But, this is not where, this is, this is just a start of, the delivery process in
the, after the model training finishes, that's the start of the delivery process.
so model will train a prediction value for each item.
For example, a video or post a product, it'll predict what kind
of engagement can this, video get if it is shown to a given user.
but what happens after that?
So there are other business rules that apply after the
prediction value, is put out.
So for example, on YouTube, you cannot have more than two videos From the
same channel in a user's timeline.
So don't quote me on the two videos part, but I'm sure there is, there is
definitely some sort of restriction on how many videos from us from the same
channel can be shown on user's timeline.
next, you have ads, which are shown alongside the organic YouTube content.
So ads is, or like a whole different space where advertisers are, putting
money so that these, these ads, these videos, these advertisements
can be shown alongside the organic content, coming out of other channels.
So, how do you, how do you create a timeline where advertiser
can exhaust their, advertiser budget is also exhausted.
Advertiser gets, gets, what they want.
Like they get the engagement on their website.
Whatever.
Whatever they are investing in, we give them enough return on investment.
And at the same time, we want the users to also engage organically
on the platform and also like the content that is created by other
creators and not just advertisements.
So users are usually, no one likes advertisements, but we want to
make sure that advertisements can get enough engagement as well as.
the organic videos can also keep the users engaged.
So how do you create a timeline?
How do you diversify the timeline such that, not the
user does not feel overwhelmed.
and at the same time you want to diversify, right?
Like not all videos should be on the same topic.
You want to diversify.
You want to exhaust advertisers budget.
You want to diversify.
Don't want to show more than certain number of videos from the same
channel and use this timeline.
And there are so many other rules that business rules that come
into picture when it comes to ranking and showing the timeline.
Next, I want to talk about continuous monitoring and real world validation.
So unlike in schools where you measure accuracy using simple test
data sets, real time performance validation is, not possible.
So you, even if your model predicts, Some prediction value you will you can't
say with confidence if it is true or not Let's say it it predicts with high
confidence that this user if he sees this video He will click on it and he
will watch this video fully even if your model is saying it You can't fully know
if it is true, if it is accurate, unless, unless you, you, in, in an ideal world,
you, you'd be able to see it, but unless you'll have to, you can only confirm
it after weeks or months of monitoring performance and revenue metrics.
And you can't, even if you can, you can have engagement metrics
at individual user level.
No one, no one in the industry has the time to go through every
user because we are talking about millions of users, right?
So we only look at the aggregated metrics.
so creating those aggregated metrics, understanding how the engagement
is, is either improving, is becoming neutral to understand, to create
those metrics, to, to monitor it on daily basis is really important.
and then there also exists a bias within the models.
for example, the early versions of tragedy was biased towards
political, political ideologies.
and that did not come up until, until the model was actually
put out in, up for users to use.
And that's when, the model owners started getting feedback from the users and the
media that this is what is happening.
And because of that, they had to reiterate on like iterate on the model and create
a new version, which was, Which was not biased, so that's, that brings me to
the next point, which is A B testing.
A B testing is the process of running, running a small test, of
your change, on a small subset of users before landing the change to the
production or to the final production.
and even after you run an A B test, it's not like you can just, put like, it's
not like you can just deploy your model.
It's not like that.
You need to prove value to your stakeholders.
Stakeholders here will mean product managers, engineering managers, directors.
You have to make sure that you are, your model is actually
doing what you expect it to do.
For example, if you.
stated that your model will improve engagement metrics, then
you need to show that it actually improved engagement metrics.
if you said, it will improve prediction performance.
and you also have to prove that it does not cause any other
disruption in, in, in the system.
and all the top line and, prediction metrics, look normal.
And when you're running, A B tests, you can have all sorts of issues.
You can, you know, your, your A B test can suffer with dilution, because
of a parallel learning experiments and your model, You have trained a
certain model, which is supposed to be picked in your, your test version,
but it doesn't get big because, there is a lot, there's a heavy fallback.
Like it, it's, it's falling back to the production model.
So, your, your test version is actually not set up correctly and, it's actually
not doing what you expect it to do.
So these are like the big issues that.
That you run into when you're running the A B test and when you're trying to, you
know, build a new model, build a new model architecture and put it in production.
yeah, so, lastly, I want to say that in theory, the goal is to
maximize accuracy and minimize loss.
But in production, you need to consider system efficiencies, privacy, real
world constraints, and business metrics.
this is why real world ML is so important.
It's all about making trade offs and ensuring that the model delivers value.
It's not just about high accuracy.
with that, I will pass on the mic to my co speaker, Sunandan, who
will share some practical tips and tricks to break into the ML space.
Thank you, Shraddha, for the wonderful introduction about the vast
difference between the scale of ML in graduate school versus in production.
Hi, I'm Sunandan, and I'm going to talk about what are the practical
debugging skills you that upcoming talent can use in production
environment to debug AI or ML systems.
We need both short term and long term strategies to avoid the
situation of on call fatigue when dealing with these systems.
Before we do that, let's see what a typical ML system
looks like from a high level.
Any typical machine learning system in a production environment has several
sub components which work together to give us the desired predictions.
These steps such as model training, snapshot creation, snapshot
validation, inference systems, etc.
are not simple components, but a whole large distributed system with its own
components and levels of complexity.
For instance, we don't use raw data to train a model directly.
Instead, we first clean, filter, apply privacy rules, and make transformations
to prepare the data for model training.
If there is failure in the data system, we can end up with corrupted features.
This can lead to low quality snapshots being created during model training.
If these snapshots are deployed to more production, they will produce poor
predictions, which in turn can affect the behavior of software products.
This can result in a different set of events and outcomes, which can
generate more bad quality data.
In other words, data corruption can have a ripple effect, impacting not
only the model's performance, but also the overall behavior of the
software products that rely on it.
This impact is similar for any other components in this ecosystem.
Visually, these failures, represented by fire icons, cascade to different
parts of the system pretty quickly.
For example, issues in training can lead to bad predictions, which can
corrupt the latest snapshot deployed.
Since the outcome of ML models is also used in future ML predictions, issues in
training data today, if not mitigated on time, will lead to corrupted snapshots
and predictions for features too.
These interdependencies make backtracing of issues in ML
models really hard to debug.
As Shraddha explained in a session, due to scale of the ML systems and consequently
the adverse impact a bad ML system can have, It is imperative that the on
calls respond to issues quickly and with proven strategies so that these issues
are mitigated as quickly as possible.
So what should the junior engineers do to make sure that they don't get overwhelmed?
There are six invaluable debugging techniques which, in the short
term and the long term, will not only help the junior engineers.
tackle their on call issues faster, but also make sure that they're
progressing in their career too.
In the next few slides, let's see what these techniques help us achieve.
The most critical issues that the engineers need to deal with called
selves or side events also need the highest level of attention.
There's a critical incidence where a system is malfunctioning, causing
service disruption or degraded performance, and the severity
depends on how bad the impact is.
Training focus of, for engineers in this area should be to recognize the high
priority issues that impact fairness.
This is done typically by creating clear guidelines for deciding which
incident qualifies for a CEP and what should be the severity of the CEP.
The junior engineer should also feel empowered to file follow up tasks
to prevent future CEPs, even if the changes are difficult to do immediately.
Why it's important?
Because knowing when to escalate issues is crucial in industries
like finance or healthcare.
where biases can have serious implications.
A pro tip I can provide is, we should encourage the junior engineers
to to document the impact and the debugging steps, which they are
doing, which will help the senior engineers respond faster when they
join the incident impact debugging.
This also creates a virtuous cycle, as these steps can be then integrated
into the runbook for future reference.
Finding incident reports on time and effectively is
just one part of the puzzle.
Engineers should also take time to understand the underlying
model and serving architecture.
It is not necessary that the engineers need to understand each
and every part of the system.
However, it is crucial that they understand what are the common
failure points which will help in narrowing down the issues faster.
For example, Amazon's AI hiring tool favored male candidates
due to a bias introduced in the training data pipeline, which was
identified, which was not identified early in the architecture review.
Some walkthrough exercises or demo runs to show how these issues can
be diagnosed can be useful here.
For example, a fire drill can be conducted.
Diagnosing issues like data latency between feature stores and serving layers
could be an example of a walkthrough.
Once the engineer has a good grasp of the architecture of the model, next
step for them is to learn the end to end training and deployment workflows.
Concepts like data pre processing, model training, evaluation, deployment, etc.
are something that they will be facing when they're on call and, or when
they're called upon to help mitigate production issues in their models.
Understanding the end to end flow helps in quicker identification of
root cause, faster in real life issues.
To do this, having the engineers sharpen their skills on a simpler data set will
teach them how to manage the deployments and the skills will be invaluable in
showing them the ropes of the system.
An example of how understanding the end to end workflow can be useful is
when Google Photos started labeling black individuals as gorillas and
after some investigation, it was found that it was due to some training data
and model evaluation stages problems.
Another example could be if a junior engineer is retaining a chatbot model,
they should understand where the new data is coming from, how the model is updated,
and what downstream services are impacted.
Running effective queries is another skill set the junior engineer should
continue to work on iteratively as they work on their projects.
Knowing what to query is a human skill, and it takes lots of practice
and trial and error to find the skill.
Example of queries that is often debugged for production is figuring
out why a recommendation system is returning irrelevant results.
This could be due to missing user preferences, bad training data,
mad model architecture, or some issue in the inference system,
or could be something else.
Junior engineers can tag along with senior engineers debugging the issue
in shadow and learn from the experts.
What are the smoking and science to look for?
Building the intuition for figuring out the failures is a skill set which will
help the engineers in the long run.
Mentors play a crucial role in helping individuals develop their
skills and knowledge in any field.
In the context of debugging complex systems, mentors can provide valuable
guidance and support to help individuals overcome challenges and improve
their problem solving abilities.
Finding a mentor can be done through various challenges such as within
the organization or by connecting with professionals on LinkedIn
or other engineering communities.
It is important to find a mentor who has experience and expertise in the specific
area of interest and who can provide constructive feedback and support.
One example of how mentors can help is by encouraging Shadowing sessions or
encouraging attendance in the debugging forums where the engineers can learn from
real world incidents and gain practical experiences in debugging complex systems.
In addition to finding a mentor, it is also important to join relevant
communities and forums where the individuals can connect with peers
and learn from their experiences.
Communities like KD Nuggets and AI Breakfast Club, they offer
valuable resources and networking opportunities for engineers
interested in AI and machine learning.
To become a successful AI ML engineer, it is important that they
also develop the key skills that are necessary for success in this field.
Skills like debugging expertise, understanding system design, and
strong querying and querying skills are just one part of the skill set.
They should also look to specialize in a specific area.
Now that area could be MLOps, data engineering, or model interpretability.
Also it's very important that the AI ML engineer stays ahead of the
curve by engaging in continuous learning as new tools and frameworks
are being developed all the time.
Finally, it is also very important to understand the real
world impact of the systems.
Models like the compass system have had significant real world consequences
and fixing issues with these models can have a major impact on the
company's reputation and bottom line.
And by understanding the potential impact of their work, the AI and ML
engineers can ensure that they're making a positive contribution.
I want to mention that none of this is possible without the combination of great
culture, great team, and great tool.
A great culture serves as the foundation for open communication
and values diverse perspectives.
Junior engineers should feel comfortable knowing that the company culture
respects the diversity of background and technical skills and values the mantra
that there are no stupid questions.
Complementing this is a great team where individuals leverage the
unique strengths and collaborate effectively towards common goals.
This can only be achieved by leading through example.
Managers and tech lead should ensure that the team is taking care of each
other during the on call situation.
and making sure on calls are getting help they need to progress
through the issue mitigation.
Lastly, great tooling enables the team to avoid repetitive tasks and enables team
members to focus on high impact work.
Automation, whenever possible, to reduce the possibility of human error
should be a prime focus of team roadmap to build a great debugging culture
for the product in the long run.
And that concludes my talk.
I hope the listeners gained some valuable insights and understanding
of debugging techniques required to succeed in ML production systems.
Please don't hesitate to reach out to us on LinkedIn for any questions.
Thank you.