Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, myself Deepak. I am working as an associate director for
data science and machine learning projects at Novartis.
I'm also responsible for generative AI project initiatives and
deliverables. Today I'm going to talk about the
evolution of natural language processing by leveraging generative
AI and transformer architecture in healthcare.
I'm going to take a use case and I'll walk you through on the
problems we had in using traditional machine learning
model or the transformer architecture model, and by leveraging generative
AI, how efficiently we can solve the business use
case. All right, without further ado,
let me take you to the next slide.
Okay, the problem statement we have here is anomaly
detection. See, the various domains are various kind
of anomalies can be identified in finance,
in healthcare domains. But when it comes to healthcare,
identifying the anomaly can be any instrumental failure
or any medical device got failured that has to be reported.
That is an anomaly. But there are scenarios like patient
monitoring system where any kind of a potential safety
risk has to be reported to the FDA,
food, drug and administrative so
what is an patient safety risk? Here takes a medicine
and any untoward medical experiences cost
the patients. That is an anomaly or adverse effects
that has to be reported to the FDA. See, anomaly detection
plays a significant role in patient health care reasoning.
Without identifying the anomaly, the patient
has to go through a severe problem on consuming
the medicine for a longer period of time.
Right. So now we are trying to uncover the hidden
patterns and trends by using transformers
architecture. Now see,
let me take a scenario without the evolution of
social media or before the evolution of social
media. If there isn't any anomaly that
would be reported to the physician via email correspondences
or by making a telephone call to the physician
and the anomaly or adversarial event would be
reported to the physicians.
Then they take note of the adversarial
effects or, sorry, adversarial events and that would be
reported to FDA. That is a traditional process
which has been followed. But the recent
evolution of social media causes the
huge amount of data can be posted in social media can
be considered as a platform to detect an anomaly.
Right? Now, let's think about
a social media scenario where we get humongous
amount of data that has to be reviewed
manually. If there isn't any adversary events or anomaly has
been present. Right now
you can think of any pharmaceutical company take a social media platform
for digitizing or making digital advertisement on
any new product launch or any new medicine has been launched in
the market. Now, by consuming that product,
many of the patients may report there could
be an adverse real effects after taking the medicine.
Ideally that can be reported in social media channels
like Twitter, Facebook, Instagram,
LinkedIn or many platforms can be used to report it.
Now, reviewing all the platform and
identifying is there any unusual behavior would be a humongous task
for a human to perform.
That is a practical challenge we have in the healthcare industry.
That is how building a technology by having
artificial intelligence solution in place helps us
to reduce the manual efforts and report the adverseial
effects immediately to the FDA. Now we
follow a process. Let me tell you what is a process.
See now anomaly detection at a glance, right. As I was
saying, the data can be pulled from various
channels or various platform. It could be an email communication or
it can be from any among the Twitter post or,
sorry, Facebook post or Twitter tweets.
All this data has to be pulled
via data connector and it has to be pre processed
to understand if any of the reported text as an
anomaly. Right? Again, as I said,
if we have a manual reviewer who has to go
through humongous data, it is an impossible task
for a human being to perform. But ideally by
leveraging machine learning platform, the efforts can be significantly
reduced. Let me talk on the machine
learning platform or ML algorithm which we develop to identify
an anomaly. Like any ML algorithm,
there is a development process involved. Many organizations
have taken an agile approach to do a machine learning algorithm
development and industrialization of the algorithm development
introduction. So we defined a problem here.
It's more like a text classification where we have to understand
the text as in any anomaly, and that has to
be reported to FDA.
Now let's discuss about the machine learning development
or machine learning algorithm development process.
So for any machine learning algorithm has to be developed,
we need a training data set to train or
fine tune the large language model where we are considering
transformer based architectures model for training
or fine tuning the model and deploying
the model in production. So once we train the model,
we have to evaluate the model with validation and test
data set. Right now, as part
of fine tuning the models, there could be changes in the hyperparameter
tuning which helps us to improve
the model precision, recall and accuracy.
See, the terms in data science refers to how
model is efficaciously identifying
the anomaly. Right now,
this is a process involved as part of algorithm development.
Now let's talk about the data annotation process.
Right now. Why I'm saying data annotation process.
So to train the algorithm, we need a huge amount
of data set to train or fine tune the model,
large language model to perform a specific task.
Now the data annotation is nothing but a labeling
job which would be performed by the annotators.
So the process involved collecting
the training data set. Typically for fine tuning
the large language models, we would require around 20,000
to 30,000 data sets, right? Once we
procure the data set, then we have to manually label
whether that falls under an anomaly or not an anomaly.
For doing this process we need a set of annotators
and ideally they should be a skilled annotators.
Then they would be performing or labeling this data set.
Right. Now let's little bit understand on the data
annotation process. So data annotators would
be following a certain guidelines to understand the sample
data and they have an interagreement on the
training phase to understand how they wanted to label that
as an anomaly. Now there would be a
pick of good annotators. Once we identify them,
we onboard them the annotators and we share the
data set for annotating. In the case of
annotation here, let's assume that we
procured the data set of around 15,000.
So 20,000 to 30,000 data set. Then we
have to label manually whether that is an anomaly or not.
To perform that activity, the annotator should have
a good amount of knowledge in identifying an anomaly.
Once as part of doing the interview process, we identify
the good annotators and we deploy them to do the manual
job to manually label the data set.
Now we have a process involved like annotator
one and annotator two where they would be manually
labeling the data. And finally it goes for another reviewer
who would annotate or validate the
manually labeled data. If there isn't any
contradict on the two annotators which are
performing, then it goes to a fourth annotator to do a final
annotation based on the major oating we do a final labels.
This is the old data annotation process involved in training or
fine tuning the machine learning algorithm.
So now we have talked about the data annotation process.
Then there is a comprehensive AI ML journey,
right? So first we have to develop a balanced
or well balanced data set which would be given for model
fine tuning. Then we have to segregate the
data into training, test and validation for model
training and evaluating the model. See, the model should
have an understanding like human understand
the problem. So that is how the data would
be curated and would be given for the model training.
See, as part of the AI implementation model journey,
we use large language models which would be based on transformer architecture.
It can be a Bert or excel, robota or GPT
models would be used. So as part of the
whole process, we ensure repeatability and auditability
is captured in our system design reasoning.
Once we say as an anomaly, this large language models,
again, if you do an inference or prediction, it should say as an
anomaly, there should not be any deviation in
the prediction or in the output. See, the whole thing is
an complete risk management framework to identify
the anomaly. Right?
Now, let's take a real example.
Once we identify the large language
model, we perform fine tuning on the models. Then we
evaluate the model whether it is performing according to
a benchmark, which is called recall,
precision and accuracy. These are all called performance metrics,
right? So ideally we should not have
any false negative as part of the model development.
So when there is an anomaly, we should not miss that anomaly that isn't
false negative right now.
At the same time, we are skewed towards false positive. But that
is okay reasoning. We should not miss any anomaly
in our real data. Introduction so this is a holistic
process in artificial intelligence implementation
journey. Right? Now let's talk
about the framework which we build, which is called
anomaly identification framework. So we use aging
phase library aging phase framework to fine tune
the models, right? So we used model like
Bert and bio bert for fine tuning the
model. Ideally, we give an input
text and we see whether the Bert can predict
the anomaly correctly. If it predicts, that is good.
If not, we have another rule engine or the safeguard,
or we call it as an guardrails, which has
a complete list of rules which will be, it's nothing but
a heuristic rules which will be invoked to detect. Is there any further
anomaly in the text. This is a typical process
we follow, right? But as part of the
whole machine, leveraging algorithm development,
including a single large language model, may not
be sufficient to identify the anomaly. That is
why we use ensemble of models. It could be an
Bert Biobert, an excel number or multiple models
have been used to identify the anomaly. That is a
bit novel technique here. By using multiple models,
even if any of the model identify that as an anomaly based
on aggregate oating, we report that as an
now, so as I said, we have an ML algorithm.
On top of that, we have a rule system which even the algorithm
misses. We have a rule system to identify the
anomaly. Also, there is a
process involved. Let's assume that the
social media records would be humongous. The data which
will be flowed in like it could be in millions,
in a month or in a week. The amount of data
record for the manual job would be drastically reduced.
All the junk data or which is not an anomaly would be identified
and not be available for the human trivia.
That is a whole anomaly identification framework we have built.
Now, let's talk more about the machine learning model framework
here. So we have built a framework
where multiple pre trained models can be
used for fine tuning the model. So today
Bert, tomorrow it could be XLM, robota and GPT, one or two
or three, right, or llama two from meta.
So these models can be plugged in as part of pretrained model.
And we provide the training data set, which we have around 20,000
to 30,000 records which has been manually
labeled by the annotators that would be given for the model training.
As I said, once the model is trained, we have a pool
of models called ensemble of models to identify the anomaly.
Right? Further, we use a process called hyperparameter
optimization, where learning rate,
epoch and multiple learning parameter can
be modified to perform much better precision,
recall and accuracy in our whole algorithm development.
So as part of the whole framework, we give multiple
machine learning models and we see,
okay, we give multiple machine learning large language models into the
framework. And finally, we have a process to identify which
model identifies the anomaly correctly.
So we pick the top three or four models which can identify the anomaly
correctly right now. Okay, this whole thing has
been developed with hugging phase framework. We use transformer based pretrained
model like BERT, Excel, Robota,
and for different languages other than English, we may use Rena
Roberta for chinese language.
Right now, we also have an
ML flow for tracking all the experiments
to see which model is performing well, and we pick that model and deploy
into the production once we move on.
Again, in addition to the large language model,
we are also having a guardrails, which is nothing but a heuristic business
rules which will be used on top of it. Right now we have
the large language models. We say, okay, this model is performing good.
Then we use some amount of operational quality and production quality
test data to evaluate the model which
has been identified as part of the whole model selection
process. So once the model crosses
a specific benchmark, around 98 or 99% of
recall in identifying the anomaly, that would
be moved to higher environments like production environment.
Right? All right, now let's
talk about the challenges in BeRT model. So,
BeRt is an bi directional encoder representation for transformer
architecture. See, this is the model which has been released
by Google in 2017, set the benchmark of understanding
the natural language processing, right?
So this pretrained model is
really capable to understand the text
by having an bi directional flow
towards understanding the text means let's say we
have a text, it understands the word context and
it tries to the next word, which is called
mask language modeling and next sentence prediction. These are all the
techniques used in developing the BeRt, but we have fine
tuned the model to perform a task of classification,
in our case to identify the anomaly.
Now, as we know, the BeRt has first set the
industry benchmark standard for using the transformer architecture in
bi directional way to understand the word context in
the sentence. Then that can really help in
predicting or classifying the anomaly based on the text.
We give. Right now to use this BerT model,
which I have told we need a huge amount of training data set
to train the model. Then there is a process involved in
deploying the model. So then we have to use pytot serving
for serve the machine learning models in production.
So that is how the training time is huge in
the bird model. Also, there could be possibility
of overfitting due to its complexity and capacity,
because BerT can overfit on
smaller data set. That's why I was stressing on the data annotation
process and the data set. A huge amount of data set
required for model training or fine tuning.
Now, okay, we have built
the bert or fine tuned the model, and we have to deploy the model in
production. For identifying the anomaly right
now comes in generative
AI. When I talk about generative AI,
generative AI is primarily used for content generation on question and
answering or building chatbots. But this can be used
for a discriminative task like classification or
identifying the anomaly in the text. Now, how this generative
AI can be utilized for the classification task
is via prompt engineering techniques. So we no
need to fine tune the model or because this generative AI
or GPT four models have been hugely trained
on the public data set, I believe even GPT-3
as around 175,000,000,000 parameters or document has
been trained. Now, the GPT four has been
further went extensively,
extensively to have more training has been given, and that
model is GPT four is available. That with
the prompt engineering technique, without fine tuning, by performing a
prompt engineering technique, by providing in context learning
to the GPT four, we can perform the classification
job in identifying the anomaly.
Right. That is the huge benefit
of having generative AI and prompt engineering techniques
right now, what is that I'm saying about there
is a small solution architectures or the picture
of what I mean by generative AI. By having the
prompt engineering approach, we already have the foundational models
like GPT four or Claude or llama two
or even you can bring your old models, right? These models necessarily
doesn't require any amount of fine tuning which is not advised as
per the pyramid. Right? So once you have the foundational model,
you can pass all the data to the foundational model, or you
can use as an inference point to classify
whether there is an anomaly or not. If not by giving some
amount of prompt engineering technique like we have to follow
the process, like anatomy of prompt or instruction,
tuning, classification or chain of thought
can be used in the prompt engineering techniques on top of the foundational model.
Once we have the prompt engineering technique is built
effectively, then we can classify the anomaly.
The fine tuning and the train is not much well suggested
by the industry to on foundational model unless and
until the prompt engineering couldn't achieve the job.
But most of the cases prompt engineering technique will help you to solve
your problem in identifying the
anomaly. Right, so that's
all I have for now. I talked about the
use case, the problem statement of identifying the anomaly,
and how traditional machine learning models or the transformer based architectures
has been used to identify the anomaly and what are all
the challenges. We set out as part of the whole model development
process, which includes training, validation and
testing and deploying the model into production,
which would require a huge amount of data to train for models
like Bert. But by having generative AI in place,
you are completely moving out from the whole model training phase and
you have to give a simple prompt, not a simple, it's effective prompt
engineering techniques to get the most out of the foundational
models of GPT four. Right.
With that we can solve the
motive behind the business objective or business problem on classify
unidentifying the anomaly. Right. With that,
I end my session. Thank you for listening. Wish you all
success in your career and your path ahead. Thank you so much.