Conf42 Python 2024 - Online

The Evolution of Natural Language Processing: Leveraging Generative AI and Transformers in Healthcare

Video size:

Abstract

Dive into the future of AI with “Neural Networks Unleashed: Python’s Role.” Join me in exploring Python’s pivotal role in shaping advanced neural architectures. From cutting-edge frameworks to real-world applications, discover how Python is driving the evolution of artificial intelligence.

Summary

  • Deepak: I am working as an associate director for data science and machine learning projects at Novartis. Today I'm going to talk about the evolution of natural language processing by leveraging generative AI and transformer architecture in healthcare.
  • Anomaly detection plays a significant role in patient health care reasoning. Huge amount of data can be posted in social media can be considered as a platform to detect an anomaly. Building a technology by having artificial intelligence solution in place helps us to reduce the manual efforts and report the adverse effects immediately to the FDA.
  • To train the algorithm, we need a huge amount of data set to train or fine tune the model. Data annotation is nothing but a labeling job which would be performed by the annotators. This is a holistic process in artificial intelligence implementation journey.
  • BeRt is an bi directional encoder representation for transformer architecture. Can be used for a discriminative task like classification or identifying the anomaly in the text. Generative AI can be utilized for the classification task via prompt engineering techniques.
  • With generative AI in place, you are completely moving out from the whole model training phase. With that we can solve the motive behind the business objective or business problem on classify unidentifying the anomaly. Thank you for listening.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, myself Deepak. I am working as an associate director for data science and machine learning projects at Novartis. I'm also responsible for generative AI project initiatives and deliverables. Today I'm going to talk about the evolution of natural language processing by leveraging generative AI and transformer architecture in healthcare. I'm going to take a use case and I'll walk you through on the problems we had in using traditional machine learning model or the transformer architecture model, and by leveraging generative AI, how efficiently we can solve the business use case. All right, without further ado, let me take you to the next slide. Okay, the problem statement we have here is anomaly detection. See, the various domains are various kind of anomalies can be identified in finance, in healthcare domains. But when it comes to healthcare, identifying the anomaly can be any instrumental failure or any medical device got failured that has to be reported. That is an anomaly. But there are scenarios like patient monitoring system where any kind of a potential safety risk has to be reported to the FDA, food, drug and administrative so what is an patient safety risk? Here takes a medicine and any untoward medical experiences cost the patients. That is an anomaly or adverse effects that has to be reported to the FDA. See, anomaly detection plays a significant role in patient health care reasoning. Without identifying the anomaly, the patient has to go through a severe problem on consuming the medicine for a longer period of time. Right. So now we are trying to uncover the hidden patterns and trends by using transformers architecture. Now see, let me take a scenario without the evolution of social media or before the evolution of social media. If there isn't any anomaly that would be reported to the physician via email correspondences or by making a telephone call to the physician and the anomaly or adversarial event would be reported to the physicians. Then they take note of the adversarial effects or, sorry, adversarial events and that would be reported to FDA. That is a traditional process which has been followed. But the recent evolution of social media causes the huge amount of data can be posted in social media can be considered as a platform to detect an anomaly. Right? Now, let's think about a social media scenario where we get humongous amount of data that has to be reviewed manually. If there isn't any adversary events or anomaly has been present. Right now you can think of any pharmaceutical company take a social media platform for digitizing or making digital advertisement on any new product launch or any new medicine has been launched in the market. Now, by consuming that product, many of the patients may report there could be an adverse real effects after taking the medicine. Ideally that can be reported in social media channels like Twitter, Facebook, Instagram, LinkedIn or many platforms can be used to report it. Now, reviewing all the platform and identifying is there any unusual behavior would be a humongous task for a human to perform. That is a practical challenge we have in the healthcare industry. That is how building a technology by having artificial intelligence solution in place helps us to reduce the manual efforts and report the adverseial effects immediately to the FDA. Now we follow a process. Let me tell you what is a process. See now anomaly detection at a glance, right. As I was saying, the data can be pulled from various channels or various platform. It could be an email communication or it can be from any among the Twitter post or, sorry, Facebook post or Twitter tweets. All this data has to be pulled via data connector and it has to be pre processed to understand if any of the reported text as an anomaly. Right? Again, as I said, if we have a manual reviewer who has to go through humongous data, it is an impossible task for a human being to perform. But ideally by leveraging machine learning platform, the efforts can be significantly reduced. Let me talk on the machine learning platform or ML algorithm which we develop to identify an anomaly. Like any ML algorithm, there is a development process involved. Many organizations have taken an agile approach to do a machine learning algorithm development and industrialization of the algorithm development introduction. So we defined a problem here. It's more like a text classification where we have to understand the text as in any anomaly, and that has to be reported to FDA. Now let's discuss about the machine learning development or machine learning algorithm development process. So for any machine learning algorithm has to be developed, we need a training data set to train or fine tune the large language model where we are considering transformer based architectures model for training or fine tuning the model and deploying the model in production. So once we train the model, we have to evaluate the model with validation and test data set. Right now, as part of fine tuning the models, there could be changes in the hyperparameter tuning which helps us to improve the model precision, recall and accuracy. See, the terms in data science refers to how model is efficaciously identifying the anomaly. Right now, this is a process involved as part of algorithm development. Now let's talk about the data annotation process. Right now. Why I'm saying data annotation process. So to train the algorithm, we need a huge amount of data set to train or fine tune the model, large language model to perform a specific task. Now the data annotation is nothing but a labeling job which would be performed by the annotators. So the process involved collecting the training data set. Typically for fine tuning the large language models, we would require around 20,000 to 30,000 data sets, right? Once we procure the data set, then we have to manually label whether that falls under an anomaly or not an anomaly. For doing this process we need a set of annotators and ideally they should be a skilled annotators. Then they would be performing or labeling this data set. Right. Now let's little bit understand on the data annotation process. So data annotators would be following a certain guidelines to understand the sample data and they have an interagreement on the training phase to understand how they wanted to label that as an anomaly. Now there would be a pick of good annotators. Once we identify them, we onboard them the annotators and we share the data set for annotating. In the case of annotation here, let's assume that we procured the data set of around 15,000. So 20,000 to 30,000 data set. Then we have to label manually whether that is an anomaly or not. To perform that activity, the annotator should have a good amount of knowledge in identifying an anomaly. Once as part of doing the interview process, we identify the good annotators and we deploy them to do the manual job to manually label the data set. Now we have a process involved like annotator one and annotator two where they would be manually labeling the data. And finally it goes for another reviewer who would annotate or validate the manually labeled data. If there isn't any contradict on the two annotators which are performing, then it goes to a fourth annotator to do a final annotation based on the major oating we do a final labels. This is the old data annotation process involved in training or fine tuning the machine learning algorithm. So now we have talked about the data annotation process. Then there is a comprehensive AI ML journey, right? So first we have to develop a balanced or well balanced data set which would be given for model fine tuning. Then we have to segregate the data into training, test and validation for model training and evaluating the model. See, the model should have an understanding like human understand the problem. So that is how the data would be curated and would be given for the model training. See, as part of the AI implementation model journey, we use large language models which would be based on transformer architecture. It can be a Bert or excel, robota or GPT models would be used. So as part of the whole process, we ensure repeatability and auditability is captured in our system design reasoning. Once we say as an anomaly, this large language models, again, if you do an inference or prediction, it should say as an anomaly, there should not be any deviation in the prediction or in the output. See, the whole thing is an complete risk management framework to identify the anomaly. Right? Now, let's take a real example. Once we identify the large language model, we perform fine tuning on the models. Then we evaluate the model whether it is performing according to a benchmark, which is called recall, precision and accuracy. These are all called performance metrics, right? So ideally we should not have any false negative as part of the model development. So when there is an anomaly, we should not miss that anomaly that isn't false negative right now. At the same time, we are skewed towards false positive. But that is okay reasoning. We should not miss any anomaly in our real data. Introduction so this is a holistic process in artificial intelligence implementation journey. Right? Now let's talk about the framework which we build, which is called anomaly identification framework. So we use aging phase library aging phase framework to fine tune the models, right? So we used model like Bert and bio bert for fine tuning the model. Ideally, we give an input text and we see whether the Bert can predict the anomaly correctly. If it predicts, that is good. If not, we have another rule engine or the safeguard, or we call it as an guardrails, which has a complete list of rules which will be, it's nothing but a heuristic rules which will be invoked to detect. Is there any further anomaly in the text. This is a typical process we follow, right? But as part of the whole machine, leveraging algorithm development, including a single large language model, may not be sufficient to identify the anomaly. That is why we use ensemble of models. It could be an Bert Biobert, an excel number or multiple models have been used to identify the anomaly. That is a bit novel technique here. By using multiple models, even if any of the model identify that as an anomaly based on aggregate oating, we report that as an now, so as I said, we have an ML algorithm. On top of that, we have a rule system which even the algorithm misses. We have a rule system to identify the anomaly. Also, there is a process involved. Let's assume that the social media records would be humongous. The data which will be flowed in like it could be in millions, in a month or in a week. The amount of data record for the manual job would be drastically reduced. All the junk data or which is not an anomaly would be identified and not be available for the human trivia. That is a whole anomaly identification framework we have built. Now, let's talk more about the machine learning model framework here. So we have built a framework where multiple pre trained models can be used for fine tuning the model. So today Bert, tomorrow it could be XLM, robota and GPT, one or two or three, right, or llama two from meta. So these models can be plugged in as part of pretrained model. And we provide the training data set, which we have around 20,000 to 30,000 records which has been manually labeled by the annotators that would be given for the model training. As I said, once the model is trained, we have a pool of models called ensemble of models to identify the anomaly. Right? Further, we use a process called hyperparameter optimization, where learning rate, epoch and multiple learning parameter can be modified to perform much better precision, recall and accuracy in our whole algorithm development. So as part of the whole framework, we give multiple machine learning models and we see, okay, we give multiple machine learning large language models into the framework. And finally, we have a process to identify which model identifies the anomaly correctly. So we pick the top three or four models which can identify the anomaly correctly right now. Okay, this whole thing has been developed with hugging phase framework. We use transformer based pretrained model like BERT, Excel, Robota, and for different languages other than English, we may use Rena Roberta for chinese language. Right now, we also have an ML flow for tracking all the experiments to see which model is performing well, and we pick that model and deploy into the production once we move on. Again, in addition to the large language model, we are also having a guardrails, which is nothing but a heuristic business rules which will be used on top of it. Right now we have the large language models. We say, okay, this model is performing good. Then we use some amount of operational quality and production quality test data to evaluate the model which has been identified as part of the whole model selection process. So once the model crosses a specific benchmark, around 98 or 99% of recall in identifying the anomaly, that would be moved to higher environments like production environment. Right? All right, now let's talk about the challenges in BeRT model. So, BeRt is an bi directional encoder representation for transformer architecture. See, this is the model which has been released by Google in 2017, set the benchmark of understanding the natural language processing, right? So this pretrained model is really capable to understand the text by having an bi directional flow towards understanding the text means let's say we have a text, it understands the word context and it tries to the next word, which is called mask language modeling and next sentence prediction. These are all the techniques used in developing the BeRt, but we have fine tuned the model to perform a task of classification, in our case to identify the anomaly. Now, as we know, the BeRt has first set the industry benchmark standard for using the transformer architecture in bi directional way to understand the word context in the sentence. Then that can really help in predicting or classifying the anomaly based on the text. We give. Right now to use this BerT model, which I have told we need a huge amount of training data set to train the model. Then there is a process involved in deploying the model. So then we have to use pytot serving for serve the machine learning models in production. So that is how the training time is huge in the bird model. Also, there could be possibility of overfitting due to its complexity and capacity, because BerT can overfit on smaller data set. That's why I was stressing on the data annotation process and the data set. A huge amount of data set required for model training or fine tuning. Now, okay, we have built the bert or fine tuned the model, and we have to deploy the model in production. For identifying the anomaly right now comes in generative AI. When I talk about generative AI, generative AI is primarily used for content generation on question and answering or building chatbots. But this can be used for a discriminative task like classification or identifying the anomaly in the text. Now, how this generative AI can be utilized for the classification task is via prompt engineering techniques. So we no need to fine tune the model or because this generative AI or GPT four models have been hugely trained on the public data set, I believe even GPT-3 as around 175,000,000,000 parameters or document has been trained. Now, the GPT four has been further went extensively, extensively to have more training has been given, and that model is GPT four is available. That with the prompt engineering technique, without fine tuning, by performing a prompt engineering technique, by providing in context learning to the GPT four, we can perform the classification job in identifying the anomaly. Right. That is the huge benefit of having generative AI and prompt engineering techniques right now, what is that I'm saying about there is a small solution architectures or the picture of what I mean by generative AI. By having the prompt engineering approach, we already have the foundational models like GPT four or Claude or llama two or even you can bring your old models, right? These models necessarily doesn't require any amount of fine tuning which is not advised as per the pyramid. Right? So once you have the foundational model, you can pass all the data to the foundational model, or you can use as an inference point to classify whether there is an anomaly or not. If not by giving some amount of prompt engineering technique like we have to follow the process, like anatomy of prompt or instruction, tuning, classification or chain of thought can be used in the prompt engineering techniques on top of the foundational model. Once we have the prompt engineering technique is built effectively, then we can classify the anomaly. The fine tuning and the train is not much well suggested by the industry to on foundational model unless and until the prompt engineering couldn't achieve the job. But most of the cases prompt engineering technique will help you to solve your problem in identifying the anomaly. Right, so that's all I have for now. I talked about the use case, the problem statement of identifying the anomaly, and how traditional machine learning models or the transformer based architectures has been used to identify the anomaly and what are all the challenges. We set out as part of the whole model development process, which includes training, validation and testing and deploying the model into production, which would require a huge amount of data to train for models like Bert. But by having generative AI in place, you are completely moving out from the whole model training phase and you have to give a simple prompt, not a simple, it's effective prompt engineering techniques to get the most out of the foundational models of GPT four. Right. With that we can solve the motive behind the business objective or business problem on classify unidentifying the anomaly. Right. With that, I end my session. Thank you for listening. Wish you all success in your career and your path ahead. Thank you so much.
...

Deepak Karunanidhi

Associate Director - Data Science & Machine Learning @ Novartis

Deepak Karunanidhi's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)