Machine Learning 2.0 with Hugging Face

Video size:

Abstract

As amazing as state-of-the-art machine learning models are, training, optimizing, and deploying them remains a challenging endeavor that requires a significant amount of time, resources, and skills, all the more when different languages are involved. Unfortunately, this complexity prevents most organizations from using these models effectively, if at all. Instead, wouldn’t it be great if we could just start from pre-trained versions and put them to work immediately?

This is the exact challenge that Hugging Face is tackling. Founded in 2016, this startup based in New York and Paris makes it easy to add state-of-the-art Transformer models to your applications. Thanks to popular open-source libraries (transformers, tokenizers, and datasets libraries, developers can easily work with over 2,900 datasets and over 29,000 pre-trained models in 160+ languages. In fact, with close to 60,000 stars on GitHub and 1 million downloads per month, the transformers library has become the de-facto place for developers and data scientists to find state-of-the-art models for natural language processing, computer vision, and audio.

In this session, we’ll introduce you to Transformer models and what business problems you can solve with them. Then, we’ll show you how you can simplify and accelerate your machine learning projects end-to-end: experimenting, training, optimizing, and deploying. Along the way, we’ll run some demos to keep things concrete and exciting!

Summary

A few years ago, deep learning exploded onto the stage. You need a lot of data to train deep learning models. Over 80% of data science projects don't make it into production. Only a fraction of companies today say they get business value and adoption on deep learning.
neural networks are being replaced by a new type of neural architecture called transformers. Instead of building data sets, practitioners now rely more and more on technical transfer learning. Now tools have become friendlier. You don't need to be an expert to get good results.
Transformers is both a new model architecture, but it's also an open source libraries. They started for NLP, but now they're expanding into computer vision, speech and audio and reinforcement learning. Transfer learning is much, much faster than training from scratch.
The app uses off the shelf models, no training involved. It does voice queries on financial documents. Process the speech and find sentences based on the text. Imagine what you would have to do to build everything yourself.
The next planet that's aligning is machine learning hardware. We see companies like Habana, graphcore, intel, Qualcomm, AWS, and a few more building specialized chips for training or inference. And these last planet is basically putting everything together with developer tools.
Find a pretrained model that fits the task and the business problem we're trying to solve. Identify a business KPI that will show success. For many workloads, you need to pay attention to prediction latency. If you're new to transformers, join our community at Huggingface Co.
Okay, hope this was useful. Hope you has a good time too. And I hope to see you maybe on the road at some point. All right, have a great day.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everybody, my name is Julian, I'm chief evangelist for hugging face. In this presentation, I would like to introduce you to building natural language processing applications with transformers. A few years ago, deep learning exploded onto the stage. And this was based by can alignment of planets, so to speak. So the first planet was the resurrection of neural networks pretty could technology, but brought back and applied to computer vision, natural language processing and generally working with unstructured data. And that proved to be very efficient. What made it possible also for companies to use deep learning was these availability of a few open data sets. As we know, deep learning is very data hungry. You need a lot of data to train deep learning models. And obviously having those freely available data sets like imagenet for computer vision was a big boost. Computer is critical when working with deep learning. And gpus became available and applicable for other things that 3d games, and they became available on the cloud as well. So all of a sudden it was easier to grab the compute power that you needed and apply it to deep learning problems and putting everything together. A collection of tools, open resources tools mostly also become available. Libraries like Theano Torch, later Tensorflow became available for experts, mostly because if you remember these days, it was not so easy to build your models and train them, et cetera. So you still needed to know quite a bit about machine learning. And developers and generally people without a machine learning background found it difficult to actually train models. But still, this was all a very nice step forward. And so a few years later, this is what a typical machine learning and deep learning project looks like. And although we like to pretend it's very agile and it looks like a flywheel, again, let's be honest, it's really a waterfall project where a lot of time is spent preparing data, cleaning data, and that's going to be at least 50%, maybe up to 80% of the time you spend on your project, and then you move on to training and evaluating results, training again and again, managing infrastructure in the process, and then eventually deploying your model in production, which is typically the hardest part, because now you have to live with that model in production, monitor it, et cetera, et cetera, scale it. Not an easy thing. So quite a few hurdles to clear. And unfortunately, the inevitable happened, which is that a lot of companies, a lot of organizations found it really difficult to actually deliver deep learning projects in production. And you can see those numbers. Over 80% of data science projects actually don't make it into production, which is really a shame. A POC is nice, but business value means you have to deploy in prod, and very few companies manage to do that. Again, only a fraction of companies today actually say that they get business value and adoption on deep learning. So that's a shame because it's really cool technology, but it's still very challenging to work with, so something a little different is needed. And this is what I call deep training 20. Hopefully it's not deep learning 1.1, but I guess we'll find out. And so we see similar planets aligning, except the technologies have evolved. So neural networks and neural network architectures like cnns and stms are actually being replaced by a new type of neural architecture called transformers. And I'm sure you've heard about Bert released by Google a few years ago, 2017 to be precise. Well, this is pretty much the birth of transformers, and now transformers are evolving and we'll see some examples. Instead of building data sets, practitioners now rely more and more on these technical transfer learning, which we'll discuss in a little more detail. In a nutshell, transfer learning means starting from pretrained models and applying these knowledge, so to speak, that those models have learned to your own business problem, potentially training a little bit in the process. But that's a much simpler thing than just building a huge data set from scratch. Gpus, of course, are still around, but now we see some companies building machine learning hardware. So chips for prediction and training that have been built for machine learning from the ground up. And as you can guess, these deliver quite a few benefits. And now tools have become friendlier. You don't need to be an expert to get good results. If you're a developer, an application developer, back end developer, you can train and deploy these models in a much easier way than before without having to know all the nitty gritty details that used to come with deep training. So definitely the training curve is much flatter now. So let's look at all those four new planets. So Transformers is both a new model architecture, as I mentioned, but it's also an open source libraries that hugging face, my company is the steward for, with the help of the community, obviously. And in fact, it's one of these fastest growing open source project in history. You can see the GitHub stars on this slide. Hugging face transformers are actually the yellow line, and you can see it has the steepest slope, which we're really proud of. And it's pretty funny to see that we're actually growing faster than very popular projects like Kubernetes or Node or Pytorch. So we're really grateful. We see a ton of adoption from the community. And it's not just the community. We also see analysts and generally the IT community acknowledging that transformers are become a thing. So transformers are not just for NLP. They started for NLP, but now they're expanding into computer vision, speech and audio and reinforcement learning and all kinds of areas. And the Kaggle report shows, as mentioned, that traditional deep learning architectures, if there is such a thing like RNNs and CNNs, are actually less and less popular, while transformers are more and more popular. So all these point at the fact that transformers are really rising and are become the next standard way for a lot of machine training problems. And just to give you some numbers, on our website called the hugging face. Hugging face Co. We see about 1 million model downloads every day. That's a good number and it's rising. So transformers, the next big thing. We think transfer learning. It's the second planet. So transfer learning again means instead of training from scratch on a huge data set that was very painful to build and clean, you start from a pretrained model that matches the business problem you're trying to solve. And you can see on this slide the list of task types that are available today on the hugging face hub. So as mentioned, lots of NLP, but also computer vision, audio and some newer task types. So you find something here that matches your business problem, you go and select a few models for this. They've been pretrained on a very, very large data set, I think Wikipedia or even bigger, billions and billions of words, millions and millions of images, and you can test it in a few seconds. I'll show you how on the next slide. So you can very quickly run some tests and figure out does this model work for me out of the box? And a lot of times it will. So for example, if you need to do, let's say organizations or you need to do sentiment analysis, most of the time it's going to work out of the box and it's going to be just fine, right? So that's it. You're done. You can take the model and move on to deploying it. So that was fast. Now sometimes you will need to fine tune these model. So you will need to specialize the model on your data. And that's the transfer learning part. Okay, you're going to say, well, now I'm training again, right? So how is that simpler? Well, it is simpler because a, you need just a little bit of data, right? It's one or two orders of magnitude less data than training from scratch. And so that's going to be faster, to build faster, to train, less expensive. And you need just a few lines of code, thanks to the transformers library. We'll see an example in a minute. Okay. Transfer learning is much, much faster than training from scratch because you don't have to build that huge data set, basically. So here's an example of working with the hugging face library, Transformers library, using the high level object called pipeline. And you can see in one line of code, I can build a model for translation. And it's a multi language model in this case. So you can see the first token is actually the name of the target language from starting from English. So here I'm translating from English to Hungarian. And all it takes is that one line of code here and I can see the result. And then I can build a second pipeline to classify the tokens in my translation. Again, I'm using an off the shelf model. So this one is built for token classifications in Hungarian. Okay. I did not train anything here. So that shows you the depth of models that we have on the hugging face hub. And again, I can predict and you can see the results. Right. So dates, persons, ordinals, and GPE means geopolitical entity. So it's a country name in this case. So five lines of code. And I'm doing entity extraction with translation in English, from English to Hungarian. Right. So that's pretty cool. That's not going to take a lot of time to try and not a lot of time to deploy either. Okay, so pretty nice. Let me show you a more complex demo. So here I'm skills using off the shelf models, no training involved. And I'm going to do voice queries on financial documents. Okay, so the two models I'm using is first a speech to text model with built in translation. This is a very cool model from Facebook. So I'm going to record a sentence in French. It's going to be translated to English, and that text query is going to be used by the second model to run semantic search. Right. Trying to match the close sentences in that document corpus again, which is built from SEC filings, annual reports from large american companies. Okay. Okay, so here's my app. Let's give it a try. Okay, so I'm going to record something here in French and we're going to run the query and then I'll show you the code real quick. Okay, so let's try this keylo CFO the gap. Okay, so I have my clip now key yellow CF for the gap. Okay. And now if I click on submit here again, this speech is going to be turned into text and translated. And we're using to run the query. All right, so we can see the clock ticking. These should take a few seconds. And if I scroll down, I can see. So I can see what I actually said, which is, who's the CFO at gap? And I can see the top matching documents here, which obviously are the annual report for gap. Right? And we see the top matching sentences in decreasing order. Okay. And that ran for just a few seconds. Right. So this is actually public. You can try it for yourself and have fun with it. Let me show you what it entails. So, a space. It's a git repo. These I store code, and that code is automatically run into a docker container. So if we look at the app here, we can see it's about 100 lines of code, right? And half of that is really for the user interface. So what I'm doing here is I'm loading my models. I'm loading my document corpus, which I processed for semantic search using that sentence transformers model. And then basically, I just grab the wave speech and do speech to text and translation on it. And then I run my semantic search on things. Right? And that's all there is to it, as you can see. Process the speech and find sentences based on the text. Nothing hidden and no training whatsoever. So that's a pretty cool app. Imagine what you would have to do to build everything yourself. It would definitely take a little more than 100 lines of code. Okay. All right, let's keep exploring transformers. So, the next planet that's aligning is machine learning hardware. So, so far, we've mostly relied on gpus for training, and they're still very nice. But I guess it's good to have more options. And we see companies like Habana, graphcore, intel, Qualcomm, AWS, and a few more building specialized chips for training or inference. And in fact, accelerating both makes sense, because if you accelerate training, you can obviously iterate quicker, right? During the same day, you can run your series of training jobs. Instead of having to wait for 12 hours or 24 hours, you can make decisions quicker and converge quicker to a great model that creates business value. Accelerating inference, obviously, is critical for low latency applications like conversational apps or search. But generally, everybody wants to go fast. And of course, if you can predict faster, you increase throughput, you decrease latency, and you can just predict more with the same amount of infrastructure. So your cost performance ratio will be quite nicer as well. So we at hugging face are partnering with those companies, and we actually have a dedicated libraries which you can find on GitHub called Optimum, which makes it really easy to work with those chips. You can start from your vanilla hugging face code. Generally it's going to use the trainer API, which is the high level API to fine tune models in. Again, very little code and you can just replace a few objects with the hardware specific projects and accelerate training or accelerate inference. So that's pretty cool because no one wants to rewrite everything. Go and take a look at the optimum repo. You'll find some code samples and we also have getting started post for all those chips on our blog. Okay. And these last planet is basically putting everything together with developer tools, right? Don't get me wrong, we still need experts and for these really hard problems, but for a lot of problems for a lot of projects, we think developers can build it all by themselves, right? So we're trying to come up with tools and solutions that are developer friendly and don't require a lot of machine learning expertise, if any. So again, as mentioned, startup from hugging face hub, hugging Face Co. You can go and look for data sets if you need to start from scratch because you don't have data for your problem, or maybe you want to augment the data that you have with third party data. So we have over 4000 data sets out there. So slightly you'll find something that you can use and then has mentioned before. You can go and look for the models that make sense for your task type and your business problems. We have over 40,000. The number changes every day. By the time you're watching this, it's going to be more than 40,000. And from then on you can obviously test these models as is, fine tune the models either on a hugging face data set, on your own data, maybe both. And you can do this in a number of ways. Of course you can run this on your own servers in your Jupyter notebooks. If you have on prem infrastructure you can run it in auto train, which is our ML service that lets you very easily train on tabular data and NLp data. And this is totally no code, right? You can just click in the UI or use these simple Cli zero line of code needed. And as mentioned before, you can use transformers, the trainer API, you can use optimum to accelerate training, et cetera. Once you have a model that you like, you can, as mentioned, very easily showcases in spaces, you just saw an example of that. And these you can deploy it again, you can deploying it anywhere you like on your infrastructure. You can deploy it on the inference API, which is our very own managed API with hardware acceleration. And you can still use optimum if you'd like to optimize for your own underlying platform. Okay, and you have a model in prod. The last thing I want to mention is we have a deep engineering partnership with AWS. We collaborate at the product level, at the engineering level on Amazon Sagemaker, which is, if you're not familiar with it, it's these machine learning service, these managed machine learning service at AWS, and we makes it pretty easy to run to train. And deploying your hugging face code on sage makes using managed infrastructure. Okay? So either way, whether you want to go on prem or on EC two, or on other virtual machine services, or on sage makes, or, we think we have a solution and we think we can help you fly through that development cycle much faster than before. Okay, so let me quickly show you how to do this on Sagemaker. In the interest of time, I won't go through all the details, I'll just show you the highlights, but you can find the URL to this repo in my slides and replay everything. Okay, so what I'm doing here is I'm fine using a distillbert model. Distillbert is a condensed, smaller version of Bert. I'm fine tuning this model on a product review data set that I found on the hub. And you can see the URL to this data set here. Okay, so installing some dependencies, downloading these data set, and in fact, this data set has english reviews and a thai language translation with a flag saying is the thai translation correct? So I'll just ignore the thai part, I'll just keep the english part and the stars rating. Okay, so here I'm just simplifying the problem by mapping sentiment to positive or negative. So anything that's four and five stars is a positive review. Anything lower than four is a negative review. So just challenging the label here and using some of the APIs in the data sets library to get this done really quickly, right, so you can see after a few steps, this is what my data set looks like. The text and a label that says zero, one, and text and labels are exactly the feature makes that D Silbert expects, which is why I renamed them. Then I'm tokenizing that text, turning words into integer tokens, and finally uploading the training set and the validation set to s three. Okay, so by now I've got a data set ready to go in s three, and I can actually run my hugging face code. Okay, so I've got a training script. You can see it here. It's vanilla transformers code. I could actually run this script on my local machine, passing the appropriate hyperparameters or commandlet arguments, et cetera. This is a sagemaker feature called script mode, which is really handy because you can write the code locally on your machine, test it, and then you can move it as is to sage maker. Okay? So if you're not familiar with this, just look it up. Script mode in sage makes, okay, and then I'm loading the data sets inside the script from the training and validation locations in s three. And then using the trainer API, I'm setting up the training arguments. So where's the data, how many epochs to train for, where to log learning rate, et cetera. And then the training object is where I put everything together, the model, the arguments, and the location of the data sets. And then I call train to fine tune the model. I call evaluate to compute the validation metrics, and then I save the model and I'm done. Okay, so that code runs inside a hugging face container on sagemaker manage infrastructure. Okay, so I just set those hyperparameters one epoch, batch size, name of the model, and then I use this really central object in the sagemaker SDK, which is called the estimator. And here I'm using obviously the hugging face estimator, passing my script, passing versions of transformers and Pytorch and the infrastructure that I want here. So I'm running on a p, these two XL instance, which is a single GPU instance, and that's all I have to do, right? Then I called fit on this estimator, processing the location of the training and validation set. The training starts automatically, the instance starts, code is downloaded, data is downloads, and then it trains, okay? And after a little while, training is complete. And then in one line of code I can just deploy my model. And here I'm deploying on an m five excel, so cpu instance, okay, so after a few minutes, the endpoint is up, I can test, it has, you can see here, and when I'm done, I can delete, right? And then it's gone. And if I want to redeploy the model, assuming that I pushed the model to the hugging face hub, I can do this very easily, right? So I can just refer to the model on the hub, create this hugging face model object with the sagemaker SDK, and call deploy again, right? And then my endpoint is up again and I can predict again, right? So you could even deploy straight from the hub any of the models that are there, right. For the supported task types, that works. So that's a super simple way to deploy models on AWS. If you don't want to manage any infrastructure, and if you want to fine tune the model, then you can run an example like this one. Just fine tune, deploy, predict, take the endpoint down, redeployed, et cetera, et cetera. Super simple. Okay, again, I went a little faster, but go and check out the repo, run the example. It's very straightforward. All right, well, I think it's time to wrap up. So the key takeaways here are that ML tends to be complicated because we love to make it complicated. Right. We love to build complex solutions when they're not really needed, and we're all guilty of this, myself included. So let's focus on the right things and keep machine learning simple. So first, find a pretrained model that fits the task and the business problem we're trying to solve. Identify a business KPI that will show success. Machine learning KPIs are nice. You need them, but metrics will only go so far. You need to have some kind of business KPI that tells you yes things. Predictive application works, and it's actually performing better than whatever we has before. That's really important, and your business stakeholders will want to see that anyway. You can measure the model on real life data, so go and grab whatever data you have. It shouldn't be too clean, it shouldn't be too neat. Sandbox data test sets. They always look nice, they always perform in a pleasant way, but that's not what you're going to get in real life. So run your real life data on the model, see what happens there. If accuracy or whatever metric you're interested in is good enough, then fine, you're done. Move on to deployment, and that's it. These end of the project if you need to fine tune, because maybe you have an NLP application and you have very domain specific vocabulary that doesn't work nice enough in the pretrained model, then go and fine tune. You've seen how to do it. It's not complicated. And once you have the accuracy that you like, then you can deploy the model. And for many workloads, you need to pay attention to prediction latency. So make sure you have some form of hardware acceleration. Either you use the inference API or you use ML hardware, or you have your own solution with the optimum, maybe, but you probably cannot ignore that optimization task. And once you have the latency that you're good with, these, you're done and you can move on to the next project. Tools, libraries, machine learning platforms and infrastructure, I think they're all there, right? So I don't think it's needed that you go and reinvent that stuff and spend months, sometimes more, rebuilding stuff that's just readily available. And again, we love to build stuff. We love to say that, oh, it's different here. And no, we can't use off the shelf stuff. But seriously, that usually doesn't hold. So focus on the business problem. Focus on creating value for customers and users and just go straight to the result, which is, hey, I'm going to use whatever's available now. I'm going to find models, fine tune them and deploy. And if you do that, you can be in production in a matter of, I'm not going to say days. That would be boasting, even though I know some folks who do that. But in a matter of weeks, you have a production ready solution out there, right? And it won't take again months or years to solve that problem, which is great. So if you want to get started, if you're completely new to transformers, I recommend you join our community at Huggingface Co. You can sign up in minutes. It's totally free. All you need is a username and can email. So super simple. If you want to learn, I recommend following the hugging face course, which again is completely free. You don't need to be a machine learning expert at all. It's really targeted at developers. You can ask questions in the forums. The team will be happy to help. And for companies out there who have strong business use cases and ongoing projects and need help with transformers, they should take a look at what we call the expert acceleration program, which basically is advanced consulting that we provide end to end on your projects. From modeling all the way to production concerns. And for companies who have very strong privacy security concerns, who cannot run on the public cloud or on multitenant platforms, we can also do private deployments. So we can deploy the hugging face hub with models and data sets and the tools that you've heard about today on your own infra. Okay? So talk to us and we can see how to do that. Nice and easy, right? Thank you very much. In every language out there. Now we know how to do translation. If you have questions, if I can help you with projects, if you need anything from hugging face, you can contact me this email address and you'll find more content on Twitter, medium, YouTube, et cetera. Okay, hope this was useful. Hope you has a good time too. And thanks again for listening to me today. And I hope to see you maybe on the road at some point. All right, have a great day.

Slides

Download slides (PDF)

See all 13 talks at this event!

Conf42 Machine Learning 2022 - Online

May 19 2022

Machine Learning 2.0 with Hugging Face

Video size:

Abstract

Summary

Transcript

Slides

Julien Simon

Chief Evangelist @ Hugging Face

Join the community!

Featured event

2025

2024

Info

Conf42 Machine Learning 2022 - Online

May 19 2022

Machine Learning 2.0 with Hugging Face

Video size:

Abstract

Summary

Transcript

Slides

Julien Simon

Chief Evangelist @ Hugging Face

Join the community!