Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello all. Thank you for joining for the presentation.
So myself, Deepak. I'm working as an associate director for
data science and machine learning here at Novartis. I'm also responsible
for generative AI product deliverables
and building novel machine learning algorithms and deploying
that into production. Today I'm
going to talk about development of
an machine learning model and productionizing the ML algorithm in cloud
environment. So in the recent era, the most of the
time the data scientists spend time in creating or developing
machine learning model, it could be starting from linear regression,
logistic regression, or nave bayesian or
dradum forest decision trees. Any algorithms let them
take. Finally, the ERa has been moved from traditional
or classical machine learning algorithms to large
language models. So now deploying the large language models
or productionizing the large language models in cloud is the biggest challenge we have.
Not only productionizing and how we scale with high
inference is another important aspect to consider.
All right, let's move to the next slide before
getting into the development of algorithms. First, let's understand
what is the development lifecycle of any machine learning algorithm.
Now, to choose any problem
statement, we have to clearly define what is a problem statement we
are trying to solve. It could be an image classification,
text classification or language translation
question and answering or predicting the next sentence or document summarization,
text summarization or there are many tasks involved
in the problem statement. So before starting any
machine learning algorithm development, we define the problem
statement. Once we define the problem statement,
then we have to start with the collectioning of data that is
called a data collection or data acquisition and gathering.
So typically in a domain based when
we are working in an organization which is specific to
healthcare or infrastructure financial investment
banking, there are many scenarios where we cloud
get some real time data for model training.
Excuse me, but in the case of when you are trying to do some research,
then we go for open source data set. But ideally we have to
get the data set or we have to collect the data set and understand
the data. As part of understanding the data, we may
have to do some amount of data preprocessing techniques which we'll
see later. After that we have to
see what is the performance metrics you are going to define
to achieve the objective which we have defined.
Let's take a text classification problem here.
The classification result can be measured in the performance
metrics of recall, precision and
accuracy. So these are all comes under confusion matrix.
So we have to define what are all the performance metrics we do before
performing any model training activity. Now further,
we have to evaluate the procedures. When we
take a model training process, we divide the data
set into training, testing and validation.
So we have a split of around 80 or 680
2020 or 7015 in
this kind. When I say the numbers are in percentage,
right, we have to split the data in train test and validation,
and we have to see whether the model can perform well with
the validation and test data. Again, the metrics we use
based on confusion metrics.
Next, coming to data preprocessing and cleaning
when it comes to data preprocessing, okay, we got the data.
In case of text, what are all the process we follow?
In case of natural language processing, we remove ASCII
and special characters and we do stop words removal followed
by stemming and lamatization. If required, we may go for parts
of speech tagging or some kind of an embeddings, which may be required
before building a model. Now,
once we come to the construction of a baseline model, so when we are using
models like random forest or bayesian network or
generative adverseial networks,
Ada boosting, exg boosting,
when you're using models like this, we may have to think about
the, we don't need to think about the baseline model because they are already a
base models where we use a data set to further
train and we do the prediction or classification with that
algorithm. But the approach which we are going to talk about,
baseline model is bit different. What I'm saying now,
once I walk you through on the slides, you will understand what
I mean by the typical base models. Then we have
to fine tune a model. So the fine tuning a model comes
up with hyperparameter tuning and that techniques also.
I'll talk later here, but ideally we
have to get a pretrained model and fine tune a model to perform a specific
task. Then how do we deploy and operationalize
or industrialize the model in cloud environment?
All right, so we have discussed
about the lifecycle of ML algorithm. Let's move to the next slide.
Before getting into a model training activity,
I clearly wanted to define the problem. What we are
going to solve here is natural language processing tasks. As I
said, natural language processing could be language
translation or entity recognition, or it could be a spam
detection, or we do some amount of part of speech tagging,
text generation or document summarization.
Question answering there are many natural language processing tasks involved,
but ideally for this use case or for this demo, I'm going
to walk you through or take you through on text classification.
Right, let's move on.
Now let's come to the natural language processing as a concept
human understand English as a language or any other
language which he has been known to from the birth.
But when it comes to natural language processing,
machine has to understand the language, right?
So when I say machine, so whatever how humans interpret the
language under response. Similar way we are building a machine learning
or AI platform or AI machine to perform
a specific job. That is what the natural language understanding
has been given to the machine. So how human have an
understanding by reading the text. Similarly by
having or building a model, but to perform like a human
to have a natural language understanding based on that,
it determines the answer. Machine understand. Okay,
this is the natural language understanding I got. Now what is the response I
have to make which would be in human readable format? Again, we can make
an output as a natural language generation, which could be text abstraction,
text summarization, or we can do any natural language classification job.
This is what I'm going to walk you through. So right now I put a
Bert model here. If you could see in the center of the picture. But yes,
I'll elaborately talk once I walk through the next couple of slides.
Ideally we pass an input and we ask the model to
classify. Then here it could be a spam
or ham. So based on that it performs it.
Let's move on to the next slide. Okay, hugging face
now you would have heard this is getting very
popular. Now. Hugging face is a framework or library to solve
most of the NLP problems. They have built
40,000 models around. They have built by now as of today
which are having all as a pretrained model and some amount of instac
based model or fine tuned model also available.
Now we are going to use an agingface platform to perform our model
training. Okay, now as I said, agingfare is
the most popular framework which has been used by right
now, sorry. It has around 4000 models or which
can be deployed in cloud which is based on Pytorch or Tensorflow.
Even Keras library are supported in hugging phase.
Ideally we use a transformer based architecture models to
develop our models. Now when you are
talking about hugging phase, as I
said, there are 4000 pretrained model and for each task they
have a separate model. Let's say when we want to perform a text classification they
have Bert, Robota, distal Bert XLM,
Robota. Similarly for language translation they have Marian,
Mt. Bard and T, five. For V and chat bots
they have GPT, GPT-2 and now we
would have got GPT-3 and four as well.
When it comes to named entity recognition. Again, we can use the
Bert model. Ideally, I'm going to talk more about the Bert model.
The reason why I've kept Bert here is Bert is nothing but a
bi directional encoder representation for transformer.
It is based on the transform architecture or all the attention
is you need based on that they have a transform architecture.
In that way, Bert has been built once
it came in 2018 or 19. Then it shook the
industry to think about the whole machine learning development has been
taken into a next space or next level.
Okay, now, so we are going to use BERT model and we
are going to fine tune the BerT model. Bert model has
comes up with its own strength like it is based
on masked language modeling and next sentence prediction. So if you want to
know more about the Bert model, I have a separate video. Please go
and have a look into that. Now. When coming to
the Bert now, Bert can perform multiple tasks,
but as a general model, you can do a downstream
job to make it specific to a domain or specific to
a task which has to be performed. So yes, Bird can perform
text classification or text generation or next
sentence productionize question and answering.
Similar like a chatbot, it can also perform. But how do we fine
tune the model? Right, we have a prechained model, then we fine
tune a model based on the data set. Then we deploy the model.
We deploy the fine tuned model in production in a cloud environment.
All right, let me walk you through the as I said,
we are going to take the text classification example. Our objective is
to understand the sentiment. Let's say this is an amazing
model. Then we are going to say whether it is positive or negative.
That's what the classification job does. Now, I'm taking a binary classification
here. Going to call that as a positive or negative here.
So this is an example I'm going to take. Now I'm going to walk you
through on how we can perform model chaining.
But before getting into model chaining, I want to tell you
about ML Flow. What is ML Flow?
ML Flow is a platform or is
an API library which can be injected into your model
development process to perform all the model tracking and
model experiments. What I mean by that,
we can build many models and many iteration
of models reasoning. We have to fine tune the model. We have
to change the parameters of the model. Once we keep changing
the parameters of the model every time, model will have a different
outputs. What could be an output here? It could be an precision
recall accuracy, f one score. F two score. There are
many elements we consider as part of model development activity.
So there could be a scenario if in
case of text classification, we have seen positive or negative, how much
I'm more inclined to positive. If I always wants a positive,
I should not miss any false negative means. If algorithm
says it is a
negative, but actually it is a
positive, I should not miss these kind of scenarios,
right? So if I should not
miss any false negative, then I'll be focusing more on
recall. Similarly, when the algorithm
says okay, it is a
positive and algorithm says it is a negative,
then again it comes under false positive, right? So it misses the crucial
element. Right. Now, to handle these
kind of scenarios, we need a tracking platform which is called ML flow,
which is used to record and track all the experiments
along with the results. But I can also show
you a quick sample on how the code looks like by having an
ML flow and without an ML flow before
that, this is how the model experiment looks like.
When I talk about model experiments,
let's say I'm going to train an algorithm and I may
train n number of times. So I wanted to know based
on which seed and which parameter my model really performed.
Well, considering the scenario, I'll take all
the historical experiments which I performed that
would be tracked in ML flow, which you can see each,
along with the timestamp, you can see the model which I ran, and if
you go deep and along with that features, what kind of features I configured,
then I get an accuracy, precision, recall value, whatever. I have that in
a metrics for confusion matrix. This really helps
in performing in multiple iteration of experiments and to get the tracking
of the models. All right, now coming
to the code. So typically what we do to train and model,
we load an input data and we extract some of the features.
And I'm using an Ingrams to extract the features. Then I'm going to
train and model, and I'm going to compute the accuracy. Now,
what version of my code was this result from? No idea.
To perform this, we need an ML flow tracking,
which is ideally used to track all the experiment
results. Now let's see how the code looks like with
ML flow. So with pythonic way, by having a
packages import having ML flow and ML Tensorflow,
then we say ML flow start run as a run.
Then we start to log the metrics. Then we keep training our
model along with fine tuning the parameters. In this
way, everything get stored in a
database, ideally in cloud environment. We configure with an
S three bucket of AWS service, Amazon Web service.
Then once the setup is done, then we
add an implementation accordingly to make an ML flow start,
and then iterate the model multiple times
and keep having experiment results get stored.
ML flow comes with a default UI where all the model
experiment can be visualized, which I've shown you in the earlier
slide. Now coming to the model training.
So now the reason why I kept explaining about the experiment and
tracking and all. When you start the model training framework,
you should have all the experiments needs to be tracked somehow.
Right now I've taken a small example code of
how do we perform the model training activity.
So ideally we are going to use a BERT model which you can see somewhere,
which I am using a pre trained BERT model. Then I'm using
auto model for sequence classification reasoning. I can
put this instead of BeRT model. I can try with robota
XLM, robota distal, Bird, Biobird,
GPT-2 or GPT-3 any of the pretrained models
I can put here. So when I build a framework I
have to call the transformer library. Then I use auto tokenizer
and auto model for sequence classification and load the pretrained
model and in the next line I'll show the code further.
But before that we are using a GPU machine to run all this model
training activity because this is a large language model.
Then again when we are doing a fine tuning, it recurs a GPU machine
with CUDA library to perform this activity,
the model two device and the torch device,
by using a CUDA library specifies
the GPU would be let's say I'm using an eight GPU, the processor
would get split into multiple GPUs and it starts to
perform the model training activity. The reason why we're using
auto model so this is a framework which we can build and
by passing in the command line prompt the model framework.
Based on that we can further train the model.
Now coming to the model train
which you can see is an abstract class to
perform the model training which has been given by transformer
architecture. Then I can start training
model will get trained. Then I can keep changing my training.
Hyperparameters it's based on learning rate, number of
epochs and lock size.
And there are additional parameters which we can use which
mainly we use learning rate and number of epochs which
is used for fine tuning the model parameters.
Right? Now, once we train the model,
then we use a data loader, then we use a model fit to
train the model and along with that hyperparameters tuning,
the fine tuning job is nothing but take a pretrained model
and fine tune the model to a specific task. In Bert we are
going to perform a text classification for that
give an input data set and keep training the model
until you get the accuracy or recall
and precision to a certain benchmark. 85% or 95%.
How much would you require based on the problem definition or problem statement
which you defined? Now we
have done with the model training, then we have to evaluate the model.
As I said at the beginning of the conversation,
when we get the model we have to do the model evaluation metrics,
then split the data into train test and validation.
Then once the training has been performed with the
train data, we can use evaluation or validation
data set to evaluate the model. Then further we can
use a prediction logic which could be based on logics
or softmax classifier or neural network in the behind. I don't
want to go deeper in that. Our idea is to define a framework,
do a model training and productionize the model, or deploy the model
in cloud. That's where our focus is. If anything,
please feel free to reach out to me after the presentation or after
this live event. Then we can discuss or take the conversation
further. Now talked about model training and model evaluation
and we can use model accuracy, sorry, model prediction
and based on the model prediction we can compute the confusion
matrix score that can be used for
taking the model to production. Now how do
we take the model to production? That is another interesting problem which
can be solved by using Pytorch serving,
right? So now
when we say PyTorch serving of ML models,
Pytorch serving is nothing. But we have built a
model in Pytorch and how do we serve the model in production to
achieve the low latency and scalability problem,
right? So as you could see, once you train
the model and you evaluate the model, and if you feel the model
is good enough to take to higher environments,
then you have to convert the model into Mar file.
So which is nothing but a torch serve which we've been
using. And we have a model store where we have to convert the model
into Mar file and the model has to be deployed
into a pytot serving inference place.
So there is a logic which we have to follow. We have to build the
docker image. Once we have a model, then we have to deploy
that into the ECs that we can take in the next slide.
But overall torch serving will help us to
deploy the Mar files inside a model store. Then it will have
an inference API so that via invoking an API we can
call the model prediction results. So internally this has
an architecture where you can have multiple model can be served
in a single python serving instance.
Right. Now, ideally this can be used for an API invocation
to call all the models which has been deployed inside the Python serving model
again. For more questions please feel free to reach out to me after the
event. Now once we have the
Pytotch serving, this is the most interesting piece. So we
have trained and deploy or sorry productionize the model
in cloud right now whenever
you talk about the model training activity. Once the Mar file
is generated, we have to push inside the S three bucket because all
the model can be stored in s three bucket because that
is a huge file and is a blob storage or it's a file storage.
S three bucket from Amazon can be used to store the models.
Then we can use an ECR elastic container
registry to push the model. Or we
have to push the image into an ECS or EC two instance.
There we could see we build every model as a docker image.
Then once we have a docker image, then we have an EBS elastic
load balancing server or EBS storage is used in
the backend to connect to the ECS and the model can
be stored over there in the model store. Then from there via inference
API we can pick the model by writing a python function or Python
code that will be deployed as a docker image or docker container.
Then it can pull the model and it can perform the inference logic or
it can do the prediction. So now you can see how the old model
is getting developed from the time we
start the model development to productionize the algorithms.
Now let's talk more about the AWS cloud environment.
So already we have a sage maker, but let's not use a sage maker
or sage maker endpoint. But ideally we are saving a cost literally by
having ECs, ECR and S three bucket and perform
a model training in a GPU machine. And once the model has been trained,
push the file with and we can write
some scripts to push the trained model into S
three bucket. Then once that is done we can use Jenkins
Ci CD pipeline to
push the docker image into an ECS container which underneath it
uses Fargate or EC two. In my case I refer to EC two,
right? As I said, once a model has been built, all the models
will get stored under model store. Once the models are
stored under model store, there is a management API, an inference API
which by using an pytotch serving command where we
can provide inference API for the applications
to consume. I think this is a holistic step involved in
creating the model or developing the machine learning algorithm
or deploying the model in cloud environment
right. Any further questions? As I said, you can always reach out to me after
the event. All right. Thank you for
watching my video. Have they had.