A Hitchhiker's Guide to the MLOps Experience

Video size:

Abstract

The ML engineering landscape is data-driven and constantly evolving, presenting challenges such as scaling model deployment, team collaboration, and managing large data sources. This talk explores how to create effective and user-friendly MLops tools to overcome these challenges.

Summary

Noa Goldman: This talk is all about getting into the mind of an ML engineer. He tries to understand their challenges and think of approaches that will help them in their day to day jobs. Goldman: The main challenge is just keeping everything up to date.
Noah Goldman is the lead product manager at Dagshub, a platform that manages and hosting experiments, code, data annotations, and models all in one place. Favorite thing in the world would be to take complex technologies and create simple, easy to use tools for tech savvy people.
Mlops is just a set of practices and technologies that supposed to help ML engineer, develop, deploy, and manage their models. And it's here to ensure that those models are reliable. With this constantly evolving world, mlops approach will help you to constantly adapt.
The main challenge is to keep track and manage and organize all this constantly collected data from various sources. To improve your model, you have to be data centric. Data versioning can help ML engineers with their daily challenges when it comes to data. Another thing that's supposed to help is visualization.
The challenge just with experiments, again, there are a lot of challenges there. ML engineers have to experiment fast and they have to keep track of those changes. They have to communicate the result for non ML engineers or for just their stakeholders. We need to think of the approach that will make it easy for them.
Experiment tracking is the most standard way to go for non ML stakeholders. Like data, it's all about display and comparison and just visual output. Find a way to help ML engineers convince bosses that the job that they were doing up until now paid it off.
Final thing is models and the approach spoiler alert is pretty similar to data and experiments. You have to keep track of your models, you have to be able to access your model really easily. Also to reproduce to production easily and without having to involve anyone else in this.
That's it. Feel free to tell me about your mlops experience and how you overcome those challenges. What are the challenges of your ML engineers on their daily job. Thank you so much.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Everyone, thank you so much for joining me. My name is Noa Goldman, and this is the hitchhiker's guide to the ML of experience. This talk is all about getting into the mind of an ML engineer, trying to understand their challenges and trying to think of approaches that will help them in their day to day jobs. And when I say approaches, I don't mean specific tools or technical integrations. I mean in terms of Ux and in terms of what their experience should be, so that they will be able to have better and easier workflow. So let's step into the mind of an ML engineer and see what are their challenges. So data quality and availability is these one challenge. They often have to have the data with the highest quality to create better models, and these need it to be available for them to use at each time. They always have to select the most fitted model for them, the one with the best result, and they constantly have to fine tuning them. So it's not about developing one model and that's it. They have to constantly improve it. They have to explain their experiments and those models. They have to explain the results to people that are not ML engineers and to other stakeholders. They need to work with the team. So me personally, I'm coming from the world of software development, and it's relatively easy to work in a team. There are a lot of standards and best practices to use, but I feel like that from ML project, these standards is not yet to be set. So it's kind of hard, and it's a challenge to work in a team of ML engineers. And I think the main challenge is just keeping everything up to date. This world is constantly evolving, especially in the past few years, and it's really hard to keep track of everything and understand what are the new tools and new technologies. So those are some of the challenges that ML engineers are experiencing in their day to day life. And just to give you a glimpse of how it looks like to try to find solutions for those challenges. So this is a very high level, not that technical flow of what components a model is built out of. So you have the data, you have experiments, and you have models, which is the end result almost. There are a lot of other things on the way, such as deployments and the code, but I'm trying to keep it really high level. And for each component, you have a lot of different steps that you have to do in order to reach your goals. And for each step, currently, you have tons of tools, you have tons of options, tons of solutions. Some of them are open sources, some of them are paid products, and it's really hard to choose and create the best fitted solution for ML engineers and their operations. It's almost impossible. You have to integrate everything. You have to choose the best tools for this specific step. It's something that takes a lot of time and a lot of effort. And I think that thinking about ML engineers problems and challenges will help you understand which tools you need to use or which approach you need to take. And I'm here to offer some approaches to how this experience of developing a model should look and feels for ML engineers daily work, that it will be simpler. So this is what we're here to do before we'll deep dive into everything a little bit about who I am. So who am I? I'm Noah Goldman. As I said, I'm the lead product manager at Dagshub. At Dagshub, we're a platform that managing and hosting experiments, code, data annotations, and models all in one place. We're in mlops tools. I'm an ex software developer, so I'm coming from the world of software engineering. I was a developer for around eight years. I was doing both front end and back end, and then I transitioned lead lead product manager. Favorite thing in the world would be to take complex technologies and create simple, easy to use tools for tech savvy people and personal think about me. So we will be able to better connect. Love CrossFit and I recently adopted my new dog, my new puppy. His name is Hippo. He's really friendly and really energetic. So this is about me. And let's deep dive into those approaches that can help ML engineers improve their day to day lives. So before we do that, let's talk about what is mlops. So Mlops is just a set of practices and technologies that supposed to help ML engineer, develop, deploy, and manage their models within their lifecycle. And it's here to ensure that those models are reliable. So those models supposed to serve users and they have to be trustworthy eventually that they are scalable, that we will be able to work at scale. So it can be models that use other models or large data sets, and they supposed to help update and maintain all the lifecycle of developing a model over time. So mlops is here to help ML engineers focus on what is really important, which is developing their models. And it's important for those reasons. It helps productivity. It's supposed to help ML engineers focus again on improving their models, and not on the DevOps or the operations part. It's supposed to help them collaboration better and be more productive. It's supposed to help with scale, with reliability, and to be future proof. So with this constantly evolving world, mlops approach will help you to constantly adapt and be able to be up to date with the new solutions that are out there. So as I said before, those are the main components that mlops should cover. So I specifically these data experiments and models. Again, this is a very high level flow, and we'll focus on that. So this is zooming in for a second, and let's zoom in on data. So what are the challenges when it's time to data for ML engineers? So the main challenge is to keep track and manage and organize all this constantly collected data from various sources. So I believe, and this is going to be a buzzword here. So I warned you that to improve your model, you have to be data centric. So you have to constantly train your models on new use cases and use as many as you can, as many use cases as you can, as many data points as you can to improve your model. So it's less about the code, it's also about the code, but it's less about the code and more about the data that you use to train and the variety of use cases that you use. And when you have tons of data and you collect it all the time, it's really hard to keep track of it and manage it and keep it organized, especially when you do it manually. And another issue with data, it's just, again, to be data centric, you have to be able to quickly double down on those relevant use cases on the data point that matters. And you have to provide an easy approach, an easy solution in terms of experience for your ML engineers to do so. When it comes to the first issue, I think that the best approach, or at least the approach that worked for me coming from software development, is data versioning approach. It can help ML engineers with their daily challenges when it comes to data, because if they will be able to use the data like they treat their code, they will be able to be a lot more productive. They will be able to reproduce specific data sets that they see that are relevant. So if, for example, they see their teammate working on a specific data set and coming to a different result, and they want to test that they can use this specific data set with the versioning, and they will just supposed to be able to have just a clear display of the changes that made over this data set and maybe better make sense of the progress that was made during evolving a model. And another thing that versioning is supposed to help, again, with productivity is teamwork. They will be able to collaborate faster if we will use that approach of software development. So being able to create pull requests or comments or issues and just have a really organized way to work together within a team and have it all organized and managed in one place, and not just manually, it's supposed to help ML engineers and ML engineers teams work a lot more productive and better, and just to create a cleaner, simpler environment. And it will help them focus on developing better models and not on these operations around it, and not just manually changing names and changing data points. So data versioning is the approach we choose. For example, showing your data like you would show different in your code within a software management project. This isn't one way to do this, and this is one approach, or again, just using comments or issues with your data set, that's supposed to help keep things really organized. Another thing that's supposed to help when it comes to picking and choosing specific data points and relevant use cases is visualization, which is one of my favorite tools to use. So data visualization and creating a very clear display of your data, supposed to help data scientists and ML engineers just focus on these right data points and the relevant use cases. And this data should not just show the data itself, but you should also show what matters, which is, I like to call it enrichment. So for example, it can be metadata, annotations, predictions, everything that, according to which an ML engineer can pick and choose what is most relevant for them to improve the model. When thinking of how to solve the problem of quickly picking and choosing and creating training ready data sets, you have to provide a way, and I'm not saying which tools exactly, but you have to provide a way to really fast filtering and sorting the data sets so they will be able to create the best sub data sets possible and the most relevant one. And after you help them create those sub data sets and these relevant use cases, you need to create a way for them to use those data sets. So it's not just about filtering and seeing visually the specific use case of a data set, it's also what to do about it. So, for example, a thing that will be really helpful is to create a quick way to send those data sets to annotations or to be able to download a snippet of this data set to use and retrain your model. So constantly think about, first of all, how to help ML engineers focus and zoom in on the most relevant use cases when it comes to huge data sets, but also help them take action easily. And go to the next step of the flow to again help them generate experiments a lot faster. So for example, show the data side by side to the metadata that relevant to it, show metadata, the annotations and on it the predictions and have a clear way for them to use it to filter and clear way. For example, some data scientists like to use Python client, which is cool. They like to syntax and do all the commands there and also, but just a very clear intuitive way to just filter and sort things and to be able to focus on the relevant things and see it visually. So these is one way that can help data scientists create experiments a lot faster by creating relevant data sets or sub data sets a lot faster with those abilities and just help them go to the next step a lot faster by helping these send it to annotation in a very clear and easy way or having those behind the scene integration relabeling tools that will help them avoid the process of the operation or just help them download these data set. Yeah, so we spoke about data, the challenges the data has and what approaches these should be taken care when it comes to helping ML engineers using their time wisely and avoid those challenges when it comes to these data. And let's deep dive into experiments. So the challenge just with experiments, again, there are a lot of challenges there. The main one is that ML engineers have to experiment fast and they have to keep track of those changes. So it's a lot similar to the challenges for data components. They just need to constantly experiment and they need to make sure they are keeping track of those changes, because eventually you want to create the best model for you. And to do that, you need to understand what got you there, like what got to the point where those are these results of the model. And another main challenge is that they have to communicate the result for non ML engineers or for just their stakeholders. So if they got to a specific result from a specific model and they believe that this is the right way to go, this is the best option. It's not enough. They have to tell their bosses or their managers or their managers, like, this is the right thing to do, and communicating and explaining and convincing others that this is the best model possible. It's not that easy. So we need to think of the approach that will make it easy for them. So just a sec. Experiment tracking, obviously this is the way to go. And this is these approach and experiment tracking. Again, much like data, it's all about display and comparison and just visual output. So experiment displays is not just about the end result, it's about showing the hyperparameters, these metric, these result, everything that the data science care about that these data scientist cares about when it comes to how it got to those results. So all the collaboration, everything needs to be really well displayed and you have to have the ability to have comparison of these results, obviously, and just the visual output. For non ML stakeholders, I feel like experiment tracking, this is the most standard way to go. Like this is the component that has the most standard approaches these days. So I'm not going to talk about it tools long, but I really think that the most important part these is just the visual output for non ML stakeholders. You have to find a way to help ML engineers convince these bosses that the job that they were doing up until now paid it off and to show them the results clearly. So obviously, for experiment tracking, you just want to show these experiments, you want to show these hyperparameters, the metrics and all that. This is very basic, but you also want to make sure that you have a way to explain to stakeholders what happened, how different features affected those models. And you have to think of a way that an ML engineer won't have to go through tools much in order to explain those results. So perhaps find a good way to expert images that explain what happened in experiments. And always think about non ML engineers because data scientists pretty much will understand. But what about these managers and these managers, how you convince them? Think about a good way to convince others that this is the best model fitted for you. Okay, zooming out again. Final thing is models and the approach spoiler alert is pretty similar to data and experiments. What are the challenges? With model? Pretty similar. You have to keep track of your models, you have to be able to access your model really easily, and you have to be able to compare the different versions, especially at scale. When we are developing models, it's not about just one model that we want to develop. It's often a pipeline, it's often tons of different versions of a model. So we want to be able to do that at scale. And again, much like data and much like experiments, we want to collaborate, we want to have an easy way to reproduce a model that we want to use. But it's not just about reproducing in terms of collaboration, there's also reproducing in terms of production. So this is an approach that I've taken from software development, which is okay, we deployed a model to production. That's cool, but it's in production now, meaning that if something happens, we need to have a really fast way to go back and get the model that will fit the solution and have no problems, especially when it comes to production. So ML engineers have to have a very clear way to do that and not to have to go through DevOps. So the mlops purpose here is really important. It's supposed to help ML engineers do all that, compare, have access, collaboration on models at scale, but also to reproduce to production easily and without having to involve anyone else in this. Obviously the solution is model registry, but it's not just about at least of your model. When you build a model registry, or when you use a specific tool that implements model registry, you want to help ML engineers collaborate effectively. So have an easy access to those models. You want them to have one location where all those models will be managed at, all the relevant details supposed to be there. You don't want them to have to look for something. Think of filtering ways, think about all the relevant data that's supposed to be in this registry. And I think the most important thing when you're deploying your models to production and you want to do this in scale and you want to do this fast, is when you integrate deployment tools or you think about deployment solution, think about what will be the easy and most intuitive ways to deploy your models to productions from the approach of ML engineer. So I know that usually this process involves a DevOps or a software engineer, but we do want to pass from that, and we do eventually want to have ML engineers be able to do this on their own, or at least do most of it on their own, or at least understand what's happening there. So we need to think of approach that will be really easy and intuitive for ML engineers to easily deploy their models. And if not do the full process, at least do most of it and understand it. This is at least my take on this from software development. So there are a lot of tools that do this, obviously, but you have to think of a way that shows the most relevant details, the status of this model, where it is, who challenges last, and just an easy way to deploy it, easy and intuitive way to deploy it. So this is what I would think about if I think to make ML engineers lives a lot easier. And the final thing, this is not part of the components. This is just me as a product manager for devtools. Bring it with me. I think a lot of the tools that are available today are missing simplification, which is they have tons of functionalities, tools of abilities. They claim that they do a lot and they forget to keep things simple. As an ML engineer, you're not always aware of all those abilities or you not always want to use those abilities or you don't want the complexity. You just want to keep simple. So I would say when you build an internal tools for your ML team, or when you choose to integrate specific tool to your workflow, make sure you keep things clean. Make sure you don't add extra abilities and features just because you think someone thinks it's cool. Make sure you think about the process and add only the things that needs to be there. Another thing that when you integrate tools, make sure your ML engineers love them. Don't just force them on using them and make them feel like home. If they like use pandas, integrate things that are similar in the behavior. Just give them the look and feel of their home and have them love it and have it easy to use. This is more of a product manager's approach which is just think about the flows, don't think about the features, don't think about the tools, think about the flow, think about an ML engineer. What do they trying to do when they wake up in the morning and they have a task? What is the step by step flow that they supposed to reach? And think about that when it comes to creating your ML models, sorry, when it comes to creating your mlops workflow. So those are some examples. Make them look and feel at home. That's it. I am Noah Goldman. Here is my email. Feel free to tell me about your mlops experience and how you overcome those challenges and what are the challenges of your ML engineers on their daily job. Thank you so much.

Slides

Download slides (PDF)

See all 13 talks at this event!

Conf42 Machine Learning 2023 - Online

May 18 2023

A Hitchhiker's Guide to the MLOps Experience

Video size:

Abstract

Summary

Transcript

Slides

Noa Goldman

Lead Product Manager @ Dagshub

Join the community!

Featured event

2025

2024

Info

Conf42 Machine Learning 2023 - Online

May 18 2023

A Hitchhiker's Guide to the MLOps Experience

Video size:

Abstract

Summary

Transcript

Slides

Noa Goldman

Lead Product Manager @ Dagshub

Join the community!