Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everyone, thank you so much for joining me. My name is Noa Goldman,
and this is the hitchhiker's guide to the ML of experience.
This talk is all about getting into the mind of an ML engineer,
trying to understand their challenges and
trying to think of approaches that
will help them in their day to day jobs. And when
I say approaches, I don't mean specific tools
or technical integrations. I mean in
terms of Ux and in terms of what their experience should be,
so that they will be able to have better and easier workflow.
So let's step into the mind of an ML engineer and
see what are their challenges. So data quality and
availability is these one challenge. They often have to have the
data with the highest quality to create better models, and these need it
to be available for them to use at each time.
They always have to select the most fitted model for them,
the one with the best result, and they constantly have to fine tuning
them. So it's not about developing one model and that's it.
They have to constantly improve it. They have to explain their experiments
and those models. They have to explain the results to people that
are not ML engineers and to other stakeholders.
They need to work with the team. So me
personally, I'm coming from the world of software development, and it's
relatively easy to work in a team. There are a lot of standards and
best practices to use, but I feel like that from ML project,
these standards is not yet to be set. So it's kind of hard,
and it's a challenge to work in a team of ML engineers.
And I think the main challenge is just keeping everything
up to date. This world is constantly evolving,
especially in the past few years, and it's really hard to keep
track of everything and understand what are the new tools
and new technologies. So those are some of the challenges
that ML engineers are experiencing in their day to day
life. And just to give you
a glimpse of how it looks like to try to find
solutions for those challenges. So this is a very
high level, not that technical flow of what
components a model is built out of. So you have the data,
you have experiments, and you have models,
which is the end result almost. There are a
lot of other things on the way, such as deployments and the
code, but I'm trying to keep it really high level. And for each component,
you have a lot of different steps that you have to do
in order to reach your goals. And for each step,
currently, you have tons of tools, you have tons of options, tons of solutions.
Some of them are open sources, some of them are paid products,
and it's really hard to choose and create
the best fitted solution for ML engineers
and their operations. It's almost impossible. You have
to integrate everything. You have to choose the best tools for this specific step.
It's something that takes a lot of time and a lot of effort.
And I think that thinking about
ML engineers problems and challenges will
help you understand which tools you need to use or
which approach you need to take. And I'm here to offer some
approaches to how this experience of
developing a model should look and feels for
ML engineers daily work, that it will be simpler.
So this is what we're here to do before
we'll deep dive into everything a little bit about who I am.
So who am I? I'm Noah Goldman.
As I said, I'm the lead product manager at Dagshub. At Dagshub, we're a
platform that managing and hosting experiments,
code, data annotations, and models all in one place.
We're in mlops tools. I'm an
ex software developer, so I'm coming from the world of software
engineering. I was a developer for around eight years.
I was doing both front end and back end, and then I transitioned
lead lead product manager. Favorite thing in the world would
be to take complex technologies and create simple,
easy to use tools for tech savvy people and
personal think about me. So we will be able to better connect.
Love CrossFit and I recently adopted my new
dog, my new puppy. His name is Hippo. He's really friendly
and really energetic. So this is about me.
And let's deep dive into those approaches that can help ML
engineers improve their day to day lives. So before we do
that, let's talk about what is mlops.
So Mlops is just a set of practices and technologies that
supposed to help ML engineer, develop, deploy,
and manage their models within their lifecycle.
And it's here to ensure that those models are reliable.
So those models supposed to serve users and
they have to be trustworthy eventually that they are scalable,
that we will be able to work at scale. So it can be models
that use other models or large data sets, and they supposed
to help update and maintain all
the lifecycle of developing a model over time. So mlops
is here to help ML engineers focus on what is really important,
which is developing their models.
And it's important for those reasons.
It helps productivity. It's supposed to help ML
engineers focus again on improving their models, and not
on the DevOps or the operations part.
It's supposed to help them collaboration better and be more productive.
It's supposed to help with scale, with reliability,
and to be future proof. So with
this constantly evolving world, mlops approach
will help you to constantly adapt and be
able to be up to date with the new solutions that are out there.
So as I said before,
those are the main components that mlops should cover.
So I specifically these data experiments and models.
Again, this is a very high level flow,
and we'll focus on that. So this is zooming
in for a second, and let's zoom in on data. So what
are the challenges when it's time to data for ML engineers?
So the main challenge is to keep track and manage
and organize all this constantly collected data from various sources.
So I believe, and this is going to be a buzzword here.
So I warned you that to improve your model, you have to be
data centric. So you have to constantly train your models
on new use cases and use as many as you can,
as many use cases as you can, as many data points as you can to
improve your model. So it's less about the code, it's also about the code,
but it's less about the code and more about the data that you use to
train and the variety of use
cases that you use. And when you have tons of data and you collect it
all the time, it's really hard to keep track of it and manage
it and keep it organized, especially when you do it manually. And another
issue with data, it's just, again, to be data centric,
you have to be able to quickly double down on those relevant use cases
on the data point that matters. And you have
to provide an easy approach, an easy solution in
terms of experience for your ML engineers
to do so. When it comes
to the first issue, I think that the
best approach, or at least the approach that worked for me coming from software
development, is data versioning approach. It can help
ML engineers with their daily challenges when it comes to data,
because if they will be able to use the data
like they treat their code, they will be
able to be a lot more productive. They will be able to reproduce specific data
sets that they see that are relevant. So if, for example,
they see their teammate working on a specific data set and coming to a
different result, and they want to test that they can use this specific data set
with the versioning, and they will just supposed to
be able to have just a clear display of the changes that made
over this data set and maybe better make sense of the
progress that was made during evolving a model.
And another thing that versioning is supposed to help, again, with productivity
is teamwork. They will
be able to collaborate faster if we will
use that approach of software development. So being
able to create pull requests or comments or issues and just have
a really organized way to work together within a
team and have it all organized and
managed in one place, and not just manually, it's supposed to
help ML engineers and ML engineers
teams work a lot more productive and better, and just to create
a cleaner, simpler environment. And it
will help them focus on developing better models and not
on these operations around it, and not just manually
changing names and changing data points.
So data versioning is the approach we choose. For example,
showing your data like you would show different in your
code within a software management project. This isn't one way to do this,
and this is one approach, or again, just using comments or
issues with your data set, that's supposed to
help keep things really organized.
Another thing that's supposed to help when it
comes to picking and choosing specific data points and relevant use
cases is visualization, which is one
of my favorite tools to use. So data visualization
and creating a very clear display
of your data, supposed to help data
scientists and ML engineers just focus on these right data points and the
relevant use cases. And this data should not just show the data
itself, but you should also show what matters, which is,
I like to call it enrichment. So for example, it can be metadata,
annotations, predictions, everything that, according to which an
ML engineer can pick and choose what is most relevant for them
to improve the model.
When thinking of how to solve the problem of
quickly picking and choosing and creating training ready
data sets, you have to provide a way, and I'm not
saying which tools exactly, but you have to provide a way to really
fast filtering and sorting the
data sets so they will be able to create the best sub
data sets possible and the most relevant one. And after you
help them create those sub data sets
and these relevant use cases, you need to
create a way for them to use those data sets. So it's not just
about filtering and seeing visually the specific
use case of a data set, it's also what to
do about it. So, for example, a thing that will be really
helpful is to create a quick way to send those data sets
to annotations or to be able to download
a snippet of this data set to use and retrain your model.
So constantly think about, first of all,
how to help ML engineers focus and zoom in
on the most relevant use cases when it comes to huge data sets,
but also help them take action easily.
And go to the next step of the flow to again help them generate
experiments a lot faster.
So for example, show the data side by side to
the metadata that relevant to it, show metadata,
the annotations and on it the predictions and have
a clear way for them to use it to filter and
clear way. For example, some data
scientists like to use Python client, which is cool. They like to syntax
and do all the commands there and also, but just a very clear intuitive way
to just filter and sort things and to be able to focus
on the relevant things and see it visually. So these is
one way that can help data scientists create
experiments a lot faster by creating relevant data sets
or sub data sets a lot faster with those abilities and just help
them go to the next step a lot faster
by helping these send it to annotation in a very clear
and easy way or having those behind the scene integration
relabeling tools that will help them avoid the process of the
operation or just help them download these data set.
Yeah, so we spoke about data, the challenges the data has
and what approaches these should
be taken care when it comes to helping
ML engineers using their time
wisely and avoid those challenges when it
comes to these data. And let's deep dive into
experiments. So the challenge just with
experiments, again, there are a lot of challenges there.
The main one is that ML engineers have to experiment
fast and they have to keep track of those changes.
So it's a lot similar to the challenges for
data components. They just need to constantly experiment
and they need to make sure they are keeping track of those changes, because eventually
you want to create the best model for you. And to do
that, you need to understand what got you there,
like what got to the point where those are these results of
the model. And another main challenge is that
they have to communicate the result for
non ML engineers or for just their stakeholders.
So if they got to a specific result
from a specific model and they believe that this is the right way to go,
this is the best option. It's not enough.
They have to tell their bosses or their managers or
their managers, like, this is the right thing to do, and communicating and
explaining and convincing others that this is the best model
possible. It's not that easy.
So we need to think of the approach that will make it easy for them.
So just a sec.
Experiment tracking, obviously this is the way to go. And this is these approach
and experiment tracking. Again, much like data,
it's all about display and comparison and just
visual output. So experiment displays is not just about the
end result, it's about showing the hyperparameters, these metric, these result,
everything that the data science care about that
these data scientist cares about when it comes to how it got to those results.
So all the collaboration, everything needs to be really well displayed
and you have to have the ability to have comparison of
these results, obviously, and just the visual output.
For non ML stakeholders, I feel like experiment tracking,
this is the most standard way
to go. Like this is the component that has the most
standard approaches these days. So I'm not going
to talk about it tools long, but I really think that the most important
part these is just the visual output for non ML
stakeholders. You have to find a way to help ML engineers
convince these bosses that the job that they were doing up until
now paid it off and to show them the results
clearly. So obviously, for experiment tracking,
you just want to show these experiments, you want to show these hyperparameters, the metrics
and all that. This is very basic, but you also want to make sure
that you have a way to explain to stakeholders what
happened, how different features affected those models.
And you have to think of a way
that an ML engineer won't have to go through tools much
in order to explain those results. So perhaps find
a good way to expert images that explain what
happened in experiments. And always think about non ML engineers
because data scientists pretty much will understand.
But what about these managers and these managers, how you convince
them? Think about a good way to convince others that
this is the best model fitted for you.
Okay, zooming out again. Final thing is models and the approach
spoiler alert is pretty similar to data and experiments.
What are the challenges? With model? Pretty similar. You have
to keep track of your models, you have to
be able to access your model really easily, and you have to be able to
compare the different versions, especially at scale.
When we are developing models, it's not about just one model that
we want to develop. It's often a pipeline, it's often tons
of different versions of a model. So we want to be able to do that
at scale. And again, much like data and much like experiments,
we want to collaborate, we want to have an easy way to reproduce a model
that we want to use. But it's not just about reproducing in
terms of collaboration, there's also reproducing in terms
of production. So this is an
approach that I've taken from software development, which is
okay, we deployed a model to production. That's cool, but it's
in production now, meaning that if something happens, we need to
have a really fast way to go back
and get the model that will fit the solution
and have no problems, especially when it comes to production.
So ML engineers have to have a very clear way to do that
and not to have to go through DevOps. So the mlops
purpose here is really important. It's supposed to help ML engineers
do all that, compare, have access,
collaboration on models at scale, but also to reproduce to production
easily and without having to involve anyone else
in this. Obviously the solution is
model registry, but it's not just about at least of your model.
When you build a model registry, or when you use a specific tool that implements
model registry, you want to help ML engineers collaborate effectively.
So have an easy access to those models. You want
them to have one location where all those models will be managed
at, all the relevant details supposed to be there.
You don't want them to have to look for something. Think of filtering
ways, think about all the relevant data that's supposed to be in this registry.
And I think the most important thing when you're
deploying your models to production and you want to do this in scale and you
want to do this fast, is when you integrate deployment tools
or you think about deployment solution, think about what will
be the easy and most intuitive ways to deploy your
models to productions from the approach of
ML engineer. So I know that usually this process
involves a DevOps or a software
engineer, but we do want to pass from
that, and we do eventually want to have ML engineers be
able to do this on their own, or at least do most of it on
their own, or at least understand what's happening there. So we need to think of
approach that will be really easy and intuitive for ML
engineers to easily deploy their models.
And if not do the full process, at least do most of it and understand
it. This is at least my take on this from software development.
So there are a lot of tools that do this, obviously, but you have
to think of a way that shows the most relevant details,
the status of this model, where it is,
who challenges last, and just an easy way to deploy
it, easy and intuitive way to deploy it. So this is what I
would think about if I think to make ML engineers
lives a lot easier. And the final thing,
this is not part of the components. This is just me as a
product manager for devtools. Bring it with me.
I think a lot of the tools that are available today are
missing simplification, which is they
have tons of functionalities, tools of abilities. They claim
that they do a lot and they forget to keep
things simple. As an ML engineer, you're not
always aware of all those abilities or you not always want to use those
abilities or you don't want the complexity. You just want to keep simple.
So I would say when you build an internal tools for
your ML team, or when you choose to integrate specific tool to
your workflow, make sure you keep things clean. Make sure you don't
add extra abilities and features
just because you think someone thinks it's cool.
Make sure you think about the process and add only the things
that needs to be there. Another thing that
when you integrate tools, make sure your ML engineers love them. Don't just
force them on using them and make them feel like home.
If they like use pandas, integrate things that
are similar in the behavior. Just give them the look
and feel of their home and have them
love it and have it easy to use.
This is more of a product manager's approach which is just
think about the flows, don't think about the features,
don't think about the tools, think about the flow, think about an ML
engineer. What do they
trying to do when they wake up in the morning and they have a task?
What is the step by step flow that they
supposed to reach? And think about that when it comes to creating
your ML models, sorry, when it comes to creating
your mlops workflow.
So those are some examples. Make them look and feel at home.
That's it. I am Noah Goldman. Here is my
email. Feel free to tell me about your mlops experience and
how you overcome those challenges and what
are the challenges of your ML engineers on their daily job.
Thank you so much.