Accelerating AI Lifecycle: Meta's Fast ML Prototyping Insights

Video size:

Abstract

How you wondered how tech giants are able to deliver the AI models fast? Do you want to know the ideas behind an accelerated prototyping experience? Let’s take a inspiration from these efficient tooling improvements, gain a futuristic perspective and discuss them to grow together.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Welcome to my session on accelerating AI lifecycle. Today, I will talk about how we at Meta work towards making AI lifecycle and essentially the research to prod cycle faster for scientists and engineers to iterate and deliver their work. But first, who am I? I'm Pari Gupta, and I'm a Senior Production Engineer at Meta. I work on Python Language Foundation team, which manages the entire Python ecosystem within the company. For the longest time, I worked in the interactive computing team focused on Bento, Meta's internal Jupyter notebook distribution. Outside of Meta, I like to paint and dance. Before diving into all the exciting and creative improvements that we've made so far, Let's go back to basics and start with AI lifecycle. AI is an iterative process. When I first started out, I thought it was linear. You collect the data, you train a fancy AI model, and do the testing. Then you deploy that model in production, and then you're magically done to reap the fruits of the AI system. Easy and dreamy, right? What a noob I was. It was very soon that I realized that AR lifecycle is more of a cycle with a lot of back and forth than a linear graph. Here's an oversimplified version of what an AR lifecycle looks like. It contains different stages such as product scoping, data engineering, actual model development, deployment of that developed model, and so on. monitoring of how it, on how it is performing. And finally, business analysis and insights. You'll notice a lot of back and forth arrows between different stages. These indicate that lifecycle is not a steady loop, but involves a lot of revisions and iterations for better results. Out of all of these stages, the data engineering and model development components is called ML prototyping. Thank you This phase is very core to the cycle and its effectiveness can really speed up the overall delivery of AI. The most commonly used ML prototyping tools are Jupyter Notebooks and Kana environments. Let's dive into them. Jupyter Notebooks is an open source, web based, interactive computing platform. They allow you to create and share documents that contain live code, equations, visualizations, And narrative text and images. It's widely used in academia and industry alike. One of its main benefits can be seen in data science and engineering. Notebooks can help you massage the data into the format that's expected by your model, develop data features, do scientific computations, and even visualize your data to derive insights from it. Another is ML modeling and testing. To prove a hypothesis, a scientist would develop an experiment, code a hacky prototype, and run it on their notebook iteratively to train and test their model. Jupyter notebooks have several other benefits. They're language agnostics, which means that you can easily not just run Python, but any other language such as R, Hack, JavaScript, etc. Lastly, it is back end agnostic. This means that you can run your code locally or on a remote server or a GPU or a CPU cluster fairly easily. Conda. Conda is a popular package manager used for managing software environments and dependencies in data science and scientific computing world. It was developed as part of Anaconda distribution and includes Python and other programming languages for major scientific computing tooling. Conva allows its users to create isolated environments where different versions of packages can be installed and used without interfering with one another or with the base system environment. This makes it very easy to manage dependencies and avoid conflicts between different packages and versions. Now I want to share some of the ideas on how we at Meta have explored and executed to make ML prototyping faster within the company so that our researchers can deploy their models. and make them work for the fast growing world of artificial intelligence. Bento. Bento is the internal distribution of Jupyter notebooks. It is particularly optimized for Meta's needs to make it compatible with the internal codebase. It allows users to execute their code on different servers within Meta. Now, let's see how we've improved this interactive platform. Have you ever exe Have you ever wanted to rerun and visualize data on a regular basis? Maybe make reports or see how the model is performing to make informed decisions? Inspired by Pet Paper Mill's approach, Pento introduced powerful tooling to schedule notebooks to run code at any cadence on any server in a privacy aware manner. This helps us ensure that all the permissions are intact for user's ease. Next is persistent sessions. Ever come across a situation where you wanted to run code remotely and lost internet or wifi got disconnected or something as simple as a laptop went to sleep, you lose all the long running executions and you have to start all over again. It's sad, right? We implemented persistent sessions and sophisticated notebook recovery mechanisms where users can access their data computations on, any Jupyter session across different servers. You can simply execute a long running in your notebook and come back to Jupyter session at a later point in time, essentially resuming the session with all the data computations already available. Another idea that made a difference in developer productivity and improved ML prototyping was cross platform debugging between We are scored at Meta and Bento here, one click on any Python file would push you into a debugging session or a note or a notebook to iterate on your code with all the dependencies that you may need. Not only this, we also introduced notebooks within VS code itself. which makes it very easy to import and export in and out of IDE at any time. There are several other innovations that we've made so far and influenced the ML development and really made some quantifiable speed ups. Integrations to enable data privacy and security reviews, support for serverless servers, enhanced SQL cell support, ability to use multi language kernels, auto suggestions using internally developed Gen AI agents, and so much more. Kotlin is relatively a newer introduction to our developer tooling. We want to bring most of the open source experience internally in a controlled manner. This enables faster onboarding of researchers to the ML development cycle, which will Lowering the learning curve and hence making the product delivery faster. Researchers can play with their GitHub projects more freely while they are experimenting with their ML models. One thing to note here is that Meta is a large monorepo. This means that there is one version across all the projects that exist within Meta. This introduces a lot of difficulties when it comes to quicker experimentation and version switching. Conda, on the other end of the spectrum, provides full version control of dependencies that a user may want to experiment with before marring to one specific version. Conda meta is packaged into a portable virtual self contained environment. which can be accessed from anywhere the user desires. And finally, as we plug Conda environments for specific MLE use cases, we also enable better telemetry, accountability, and tracking in terms of errors, package usages, and vulnerability checks to ensure that researchers can work without worrying about any breaches. The last interesting idea that I want to share today is the power of combining these two tools. Basically, I mean that Runconda as a Jupyter kernel in the back end of a notebook. This in particular has accelerated the development of AI models and experimentation bringing research and production development much closer to one another. It has actually reduced the cost. Thank you. Here's the link to my LinkedIn profile. I'll make the slides public in case anyone wants to go through these resources. Thank you.

Slides

Download slides (PDF)

See all 53 talks at this event!

Conf42 Python 2025 - Online

February 06 2025 - premiere 5PM GMT

Accelerating AI Lifecycle: Meta's Fast ML Prototyping Insights

Video size:

Abstract

Summary

Transcript

Slides

Parul Gupta

Engineer @ Meta

Join the community!

Featured event

2025

2024

Info

Conf42 Python 2025 - Online

February 06 2025 - premiere 5PM GMT

Accelerating AI Lifecycle: Meta's Fast ML Prototyping Insights

Video size:

Abstract

Summary

Transcript

Slides

Parul Gupta

Engineer @ Meta

Join the community!