Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everybody, my name is Julian, I'm chief evangelist
for hugging face. In this presentation, I would like to
introduce you to building natural language processing applications
with transformers. A few years ago,
deep learning exploded onto the stage.
And this was based by can alignment of planets,
so to speak. So the first planet was the
resurrection of neural networks pretty could technology,
but brought back and applied
to computer vision, natural language processing
and generally working with unstructured data.
And that proved to be very efficient.
What made it possible also for companies to use
deep learning was these availability of a few open
data sets. As we know, deep learning is very data hungry. You need a
lot of data to train deep learning models. And obviously having
those freely available data sets like imagenet
for computer vision was a big boost.
Computer is critical when working with deep learning. And gpus
became available and applicable for
other things that 3d games, and they became available
on the cloud as well. So all of a sudden it was easier
to grab the compute power that you needed and apply
it to deep learning problems and putting
everything together. A collection of tools,
open resources tools mostly also become available.
Libraries like Theano Torch,
later Tensorflow became available for
experts, mostly because if you remember these
days, it was not so easy to build your
models and train them, et cetera. So you still
needed to know quite a bit about machine learning.
And developers and generally people
without a machine learning background found it difficult to
actually train models. But still, this was all a very nice step
forward. And so a few years later,
this is what a typical machine learning and deep
learning project looks like. And although we like to
pretend it's very agile and it looks like a flywheel,
again, let's be honest, it's really a waterfall project where
a lot of time is spent preparing data, cleaning data,
and that's going to be at least 50%, maybe up to 80%
of the time you spend on your project, and then
you move on to training and evaluating results,
training again and again, managing infrastructure in the
process, and then eventually deploying your model
in production, which is typically the hardest part, because now
you have to live with that model in production, monitor it, et cetera, et cetera,
scale it. Not an easy thing.
So quite a few hurdles to clear. And unfortunately,
the inevitable happened, which is that a lot
of companies, a lot of organizations found it really
difficult to actually deliver deep learning projects in production.
And you can see those numbers. Over 80% of data
science projects actually don't make it into production,
which is really a shame. A POC is nice,
but business value means you have to deploy
in prod, and very few companies manage to
do that. Again, only a fraction
of companies today actually say that they get business
value and adoption on deep learning.
So that's a shame because it's really cool technology, but it's
still very challenging to work with, so something a
little different is needed. And this is what I call deep training 20.
Hopefully it's not deep learning 1.1, but I guess we'll find out.
And so we see similar planets
aligning, except the technologies have evolved.
So neural networks and neural network
architectures like cnns and stms are
actually being replaced by a new type of neural architecture
called transformers. And I'm sure you've heard about Bert
released by Google a few years ago, 2017 to be precise.
Well, this is pretty much the birth of transformers,
and now transformers are evolving and we'll
see some examples. Instead of building data sets,
practitioners now rely more and more on these technical
transfer learning, which we'll discuss in a little more detail.
In a nutshell, transfer learning means starting from pretrained models
and applying these knowledge, so to speak, that those models
have learned to your own business problem, potentially training
a little bit in the process. But that's a much simpler thing than
just building a huge data set from scratch.
Gpus, of course, are still around, but now
we see some companies building machine learning hardware.
So chips for prediction and training that
have been built for machine learning from the ground up.
And as you can guess, these deliver quite a few benefits.
And now tools have become friendlier. You don't
need to be an expert to get good results.
If you're a developer, an application developer, back end developer,
you can train and deploy these models
in a much easier way than before
without having to know all the nitty gritty details
that used to come with deep training. So definitely the training
curve is much flatter now.
So let's look at all those four new planets. So Transformers
is both a new model architecture, as I mentioned, but it's
also an open source libraries that hugging
face, my company is the steward for, with the help of
the community, obviously. And in fact,
it's one of these fastest growing open source project
in history. You can see the GitHub stars on this slide.
Hugging face transformers are actually the yellow line, and you can see it
has the steepest slope,
which we're really proud of. And it's pretty funny to
see that we're actually growing faster than very
popular projects like Kubernetes or Node
or Pytorch. So we're really grateful. We see
a ton of adoption from the community. And it's not
just the community. We also see analysts
and generally the IT community acknowledging
that transformers are become a thing.
So transformers are not just for NLP. They started for
NLP, but now they're expanding into computer vision,
speech and audio and reinforcement learning and all kinds
of areas. And the Kaggle
report shows, as mentioned, that traditional deep learning
architectures, if there is such a thing like RNNs
and CNNs, are actually less and less popular, while transformers
are more and more popular. So all these point at the fact that transformers
are really rising and are become the
next standard way for a lot of machine training problems.
And just to give you some numbers, on our website
called the hugging face. Hugging face Co.
We see about 1 million model downloads
every day. That's a good number and it's rising. So transformers,
the next big thing. We think transfer
learning. It's the second planet. So transfer learning again means instead
of training from scratch on a huge data set
that was very painful to build and clean,
you start from a pretrained model that
matches the business problem you're trying to solve. And you can see
on this slide the list of task types that are available today
on the hugging face hub. So as mentioned, lots of NLP,
but also computer vision, audio and some newer task
types. So you find something here that
matches your business problem, you go and select
a few models for this. They've been pretrained on
a very, very large data set, I think Wikipedia
or even bigger, billions and billions of words,
millions and millions of images, and you can test it in
a few seconds. I'll show you how on the next slide. So you can very
quickly run some tests and figure out does this model work
for me out of the box? And a lot of times
it will. So for example, if you need to do, let's say organizations
or you need to do sentiment analysis,
most of the time it's going to work out of the box and it's going
to be just fine, right? So that's it.
You're done. You can take the model and move on
to deploying it. So that was fast.
Now sometimes you will need to fine tune these model.
So you will need to specialize the model on your data.
And that's the transfer learning part.
Okay, you're going to say, well, now I'm training
again, right? So how is that simpler? Well, it is simpler because
a, you need just a little bit of data,
right? It's one or two orders of magnitude
less data than training from scratch. And so
that's going to be faster, to build faster, to train,
less expensive. And you
need just a few lines of code, thanks to
the transformers library. We'll see an example in a minute.
Okay. Transfer learning is much, much faster than training from
scratch because you don't have to build that huge data set,
basically. So here's an example of
working with the hugging face library,
Transformers library, using the high level object
called pipeline. And you can see in one line of code,
I can build a model for
translation. And it's a multi language model
in this case. So you can see the first token is actually
the name of the target language from starting from English.
So here I'm translating from English to Hungarian.
And all it takes is that one line of code here and I can
see the result. And then I can build a second pipeline to classify
the tokens in my translation. Again, I'm using
an off the shelf model. So this one is built for token
classifications in Hungarian. Okay. I did not train
anything here. So that shows you the depth of models that
we have on the hugging face hub. And again, I can
predict and you can see the results. Right. So dates,
persons, ordinals, and GPE means geopolitical
entity. So it's a country name in this case.
So five lines of code. And I'm doing entity extraction
with translation in English, from English to
Hungarian. Right. So that's pretty cool. That's not going
to take a lot of time to try and not a lot of time to
deploy either. Okay, so pretty nice.
Let me show you a more complex demo. So here
I'm skills using off the shelf models, no training involved.
And I'm going to do voice queries on financial documents.
Okay, so the two models I'm using is first
a speech to text model with built in translation.
This is a very cool model from Facebook. So I'm
going to record a sentence in French.
It's going to be translated to English, and that
text query is going to be used by the second model
to run semantic search.
Right. Trying to match the close sentences
in that document corpus again, which is built from
SEC filings, annual reports from large american
companies. Okay. Okay, so here's my app.
Let's give it a try. Okay, so I'm going to record something here in French
and we're going to run the query and then I'll show you
the code real quick. Okay, so let's try this
keylo CFO the gap.
Okay, so I have my clip now key yellow CF
for the gap. Okay. And now if I click on submit
here again, this speech is going to be turned into text
and translated. And we're using to run the query. All right,
so we can see the clock ticking. These should
take a few seconds. And if I scroll down,
I can see. So I can see what I actually
said, which is, who's the CFO at gap? And I can see the top
matching documents here, which obviously are the
annual report for gap. Right? And we see the top matching
sentences in decreasing order. Okay.
And that ran for just a few seconds. Right. So this is actually public.
You can try it for yourself and have fun with it.
Let me show you what it entails. So,
a space. It's a git repo. These I
store code, and that code is automatically run
into a docker container. So if we look at the app here,
we can see it's about 100 lines of code,
right? And half of that is really for the user
interface. So what I'm doing here is I'm
loading my models. I'm loading my document
corpus, which I processed
for semantic search using that sentence transformers
model. And then basically, I just grab the wave
speech and do speech to text and translation
on it. And then I run my semantic search
on things. Right? And that's all there is to it, as you can see.
Process the speech and find sentences based on
the text. Nothing hidden and no
training whatsoever. So that's
a pretty cool app. Imagine what you would have to do to build
everything yourself. It would definitely take a little more than
100 lines of code. Okay. All right, let's keep
exploring transformers. So, the next planet that's
aligning is machine learning hardware.
So, so far, we've mostly relied on gpus for
training, and they're still very nice. But I
guess it's good to have more options. And we see companies like
Habana, graphcore, intel,
Qualcomm, AWS, and a few more building
specialized chips for training or
inference. And in fact, accelerating both makes
sense, because if you accelerate training, you can obviously
iterate quicker, right? During the same day, you can
run your series of training jobs. Instead of
having to wait for 12 hours or 24 hours, you can
make decisions quicker and converge
quicker to a great model that creates
business value. Accelerating inference, obviously,
is critical for low latency applications like conversational apps
or search. But generally, everybody wants to go
fast. And of course, if you can predict faster,
you increase throughput, you decrease latency, and you
can just predict more with the same amount of infrastructure.
So your cost performance ratio will be
quite nicer as well. So we
at hugging face are partnering with those companies,
and we actually have a dedicated libraries which you
can find on GitHub called Optimum, which makes it really easy to
work with those chips. You can start
from your vanilla hugging face code.
Generally it's going to use the trainer API, which is the high level
API to fine tune models in.
Again, very little code and you can just
replace a few objects with the hardware specific
projects and accelerate training or
accelerate inference. So that's pretty cool because no one wants to rewrite
everything. Go and take a look at the optimum repo. You'll find
some code samples and we also have getting
started post for all those chips on our blog.
Okay.
And these last planet is basically
putting everything together with developer tools, right?
Don't get me wrong, we still need experts and
for these really hard problems, but for a lot of
problems for a lot of projects, we think developers
can build it all by themselves, right? So we're trying to come up
with tools and solutions that are developer friendly and don't
require a lot of machine learning expertise,
if any. So again, as mentioned, startup from
hugging face hub, hugging Face Co.
You can go and look for data sets if you
need to start from scratch because you don't
have data for your problem, or maybe you want to augment the
data that you have with third party data. So we
have over 4000 data sets out there. So slightly
you'll find something that you can use and
then has mentioned before. You can go and look for
the models that make sense for your task
type and your business problems. We have over 40,000. The number changes
every day. By the time you're watching this, it's going to be more than
40,000. And from then
on you can obviously test these models as
is, fine tune the models either on a
hugging face data set, on your own data, maybe both.
And you can do this in a number of ways. Of course you can run
this on your own servers in your Jupyter notebooks.
If you have on prem infrastructure you can
run it in auto train, which is our ML
service that lets you very easily train on
tabular data and NLp data.
And this is totally no code, right? You can just click in the UI or
use these simple Cli zero line of
code needed. And as mentioned before,
you can use transformers, the trainer API,
you can use optimum to accelerate training, et cetera.
Once you have a model that you like, you can, as mentioned,
very easily showcases in spaces, you just
saw an example of that. And these you can deploy it again,
you can deploying it anywhere you like on your infrastructure.
You can deploy it on the inference API, which is our very own managed
API with hardware acceleration.
And you can still use optimum if you'd like to optimize
for your own underlying platform. Okay,
and you have a model in prod. The last thing I want
to mention is we have a deep engineering partnership with
AWS. We collaborate at the product
level, at the engineering level on Amazon Sagemaker,
which is, if you're not familiar with it, it's these
machine learning service, these managed machine learning service at
AWS, and we makes it pretty
easy to run to train. And deploying your
hugging face code on sage makes using managed infrastructure.
Okay? So either way, whether you want to go on prem
or on EC two, or on
other virtual machine services, or on sage makes, or,
we think we have a solution and we think we can help you fly
through that development cycle much faster than before.
Okay, so let me quickly show you how to do this on
Sagemaker. In the interest of time, I won't go through all the details,
I'll just show you the highlights, but you can find the URL
to this repo in my slides and replay everything.
Okay, so what I'm doing here is I'm
fine using a distillbert model.
Distillbert is a condensed, smaller version of
Bert. I'm fine tuning this model on
a product review data set that I found on the hub. And you can see
the URL to this data set here.
Okay, so installing some dependencies,
downloading these data set, and in fact, this data set has
english reviews and a thai language translation
with a flag saying is the thai translation correct?
So I'll just ignore the thai part, I'll just keep the english
part and the stars rating. Okay,
so here I'm just simplifying the problem by
mapping sentiment to positive or
negative. So anything that's four and five stars is a positive
review. Anything lower than four is a negative review.
So just challenging the label here
and using some of the APIs in the data sets library
to get this done really quickly, right, so you
can see after a few steps,
this is what my data set looks like. The text and
a label that says zero, one,
and text and labels are exactly the feature makes that
D Silbert expects, which is why I renamed them.
Then I'm tokenizing that text,
turning words into integer tokens,
and finally uploading the training set and the validation
set to s three. Okay, so by now
I've got a data set ready to go in s three,
and I can actually run my hugging face code. Okay,
so I've got a training script. You can see it here.
It's vanilla transformers code.
I could actually run this script on my local machine,
passing the appropriate hyperparameters or commandlet
arguments, et cetera. This is a sagemaker feature called script mode, which is
really handy because you can write the code locally on your machine,
test it, and then you can move it as is to sage
maker. Okay? So if you're not familiar with this, just look it
up. Script mode in sage makes, okay,
and then I'm loading the data sets inside
the script from the training and validation locations in
s three. And then
using the trainer API, I'm setting up the training arguments.
So where's the data, how many epochs to train for,
where to log learning rate, et cetera.
And then the training object is where I put everything together,
the model, the arguments, and the location of the
data sets. And then I call train to
fine tune the model. I call evaluate to compute
the validation metrics, and then I save the
model and I'm done. Okay, so that code runs inside a hugging
face container on sagemaker manage infrastructure.
Okay, so I just set those
hyperparameters one epoch, batch size, name of the model,
and then I use this really central object in the sagemaker
SDK, which is called the estimator. And here I'm using obviously the hugging
face estimator, passing my script, passing versions
of transformers and Pytorch and
the infrastructure that I want here. So I'm running on a p, these two XL
instance, which is a single GPU instance,
and that's all I have to do, right? Then I called fit
on this estimator, processing the location of the training and validation
set. The training starts automatically, the instance starts,
code is downloaded, data is downloads, and then it trains,
okay? And after a little while, training is complete.
And then in one line of code I can just deploy my model.
And here I'm deploying on an m five excel, so cpu instance,
okay, so after a few minutes, the endpoint is up, I can test,
it has, you can see here, and when I'm done, I can
delete, right? And then it's gone.
And if I want to redeploy the model, assuming that I pushed
the model to the hugging face hub, I can do
this very easily, right? So I can just refer
to the model on the hub,
create this hugging face model object with the sagemaker SDK,
and call deploy again, right? And then my endpoint
is up again and I can predict again, right?
So you could even deploy straight from
the hub any of the models that are there, right. For the
supported task types, that works. So that's a
super simple way to deploy models on AWS. If you don't want to
manage any infrastructure, and if you want to fine tune the model,
then you can run an example like this one. Just fine tune,
deploy, predict, take the endpoint down, redeployed, et cetera,
et cetera. Super simple. Okay, again, I went a little
faster, but go and check out the repo, run the example.
It's very straightforward. All right, well, I think it's
time to wrap up. So the key takeaways here are
that ML tends to be complicated because
we love to make it complicated. Right. We love to build complex
solutions when they're not really needed, and we're
all guilty of this, myself included. So let's
focus on the right things and keep machine learning simple. So first, find a
pretrained model that fits the task and the business problem we're
trying to solve. Identify a business KPI
that will show success. Machine learning KPIs are nice.
You need them, but metrics will only go so far.
You need to have some kind of business KPI that tells
you yes things.
Predictive application works, and it's actually performing better than
whatever we has before. That's really important, and your business stakeholders
will want to see that anyway. You can measure the model on
real life data, so go and grab whatever data you have. It shouldn't be
too clean, it shouldn't be too neat. Sandbox data
test sets. They always look nice, they always perform
in a pleasant way, but that's not what you're going to get in real life.
So run your real life data on the model, see what happens there.
If accuracy or whatever metric you're interested in is good enough,
then fine, you're done. Move on to deployment,
and that's it. These end of the project if
you need to fine tune, because maybe you have an NLP
application and you have very domain specific vocabulary
that doesn't work nice enough in the pretrained
model, then go and fine tune. You've seen how to do
it. It's not complicated.
And once you have the accuracy that you like, then you can deploy
the model. And for many workloads, you need to pay attention
to prediction latency. So make sure you have some form
of hardware acceleration. Either you use the inference API or
you use ML hardware, or you have your
own solution with the optimum, maybe, but you
probably cannot ignore that optimization task.
And once you have the latency that you're good with,
these, you're done and you can move on to the next
project. Tools,
libraries, machine learning platforms and infrastructure,
I think they're all there, right? So I don't think it's needed
that you go and reinvent that stuff and spend months,
sometimes more, rebuilding stuff that's just readily
available. And again, we love to build stuff. We love
to say that, oh, it's different here. And no, we can't use off the
shelf stuff. But seriously, that usually doesn't hold.
So focus on the business problem. Focus on creating
value for customers and
users and just go straight to
the result, which is, hey, I'm going to use whatever's available now. I'm going to
find models, fine tune them and deploy. And if you do that, you can be
in production in a matter of, I'm not going to say days.
That would be boasting, even though I know some folks who do that.
But in a matter of weeks, you have a production ready solution out there,
right? And it won't take again months or years to solve
that problem, which is great.
So if you want to get started, if you're completely new to transformers,
I recommend you join our community at Huggingface Co. You can sign
up in minutes. It's totally free. All you need is a username and can
email. So super simple. If you want to learn,
I recommend following the hugging face
course, which again is completely free. You don't need to be
a machine learning expert at all. It's really targeted at developers.
You can ask questions in the forums. The team will be happy
to help. And for companies out there who have
strong business use cases and ongoing projects
and need help with transformers, they should
take a look at what we call the expert acceleration program, which basically
is advanced consulting that we provide end to end on
your projects. From modeling all the way to production concerns.
And for companies who have very strong privacy security
concerns, who cannot run on the public cloud or
on multitenant platforms, we can also do
private deployments. So we can deploy the hugging face hub
with models and data sets and the tools that you've heard about today on your
own infra. Okay? So talk to us and we can
see how to do that. Nice and easy, right?
Thank you very much. In every language out
there. Now we know how to do translation. If you
have questions, if I can help you with projects,
if you need anything from hugging face, you can contact me
this email address and you'll find more content
on Twitter, medium, YouTube, et cetera. Okay,
hope this was useful. Hope you has a good time too. And thanks
again for listening to me today. And I hope to see you maybe on
the road at some point. All right, have a great day.