Unleashing the Power of Retrieval Augmented Generation to enhance AI-powered Applications

Video size:

Abstract

More companies globally have started to utilize Retrieval Augmented Generation (RAG) to significantly enhance their AI-powered applications. In this session, we will take a closer look at how RAG works, and we will demonstrate how to implement a RAG-powered chatbot using Python.

Summary

For today we'll be discussing how to unleash the power of retrieval augmented generation to enhance aipowered applications. There are a lot of different possible applications of generative AI. You'll be surprised with how the recent innovations and findings have helped this field progress further.
We can categorize and group AI into artificial narrow intelligence, artificial general intelligence, and artificial superintelligence. From deep learning comes generative AI, where models can generate new text or data. There are numerous limitations under llMs, but I'll share just a couple of them.
foundation models have a capability to generate output encompassing multitude of applications and cases. These models are trained using unstructured data in an unsupervised manner. You could build on top of foundation models too. For certain scenarios, you could provide a sentence and ask a question.
RAG is about retrieving relevant context from external knowledge bases and then augmenting it with your original query. There are three different types of rag. Advanced rag was created to solve some of the shortcomings of naive rag. Moderal rag provides greater versatility and flexibility.
The Sagemaker Python SDK can be used to deploy a self hosted large language model in its own inference endpoint. Can we build a generative aipowered application in 24 hours? Definitely yes. But once we have to implement a rag powered generative AI application, it may take a bit more time.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. For today we'll be discussing how to unleash the power of retrieval augmented generation to enhance aipowered applications. We'll quickly introduce ourselves. I am Sophie Sullivan, and I am the director of operations for Edamama. I have over nine years of experience in e commerce, fintech, and retail in the Philippines. Over the years I have built my expertise on process management, AI, and everything else in between. And I am Joshua Arvin Lat, the chief technology officer of Nuworks Interactive Labs. I am also an AWS machine Learning hero, and I am the author of the following books. Here in the screen we can see three books I've written so far this past three years. The first one is machine learning with Amazon Sagemaker cookbook. The second one is machine learning engineering on AWS, and finally building and automating penetration testing labs. So right now we are probably wondering what this talk is all about. And it's about generative AI. And of course we'll dive deeper on retrieval augmented generation later. But of course, no AI talk is complete without having a few examples on how AI really works. So here in our chat playground, we basically ask the generative AI service, what is the meaning of life? So here our generative AI service simply answers the meaning of life is a self filled question that has been debated throughout time. Different people have different beliefs and perspectives on this matter. As you can see, the generative AI service aipowered our question, and of course the answer is basically its own interpretation on what the meaning of life is. Now, let's try a different example. This time we have a text input, and the generative AI tool gives us an image response instead. So here we input the prompt. Here the generative AI service simply returns how it interprets our prompt in the form of an image. Now let's try a similar example, but this time let's input the prompt cat flying with wings. So even if it's not really possible at this point in time for a cat to have wings, the generative AI solution is still able to provide us an image. And it generated this image where this image has a cat, and of course this cat has wings, and it basically lets us know that this cat can probably fly because it has wings. Then finally, let's replace the word cat with dog. And here, surprisingly, we have an image of a dog with wings, and it's also flying. There are a lot of different possible applications of generative AI, and you'll be surprised with how the recent innovations and findings have helped this field progress further this last couple of months. So before we start I want to pose a question to everyone. Do you think you can build a generative aipowered application in just 24 hours? If you were to deploy your own self hosted LLM, then yes, it's possible. However, setting up a rag powered generative AI system may take longer. We'll see this in action later in our presentation. We can categorize and group AI into artificial narrow intelligence, artificial general intelligence, and artificial superintelligence. Currently, what we have right now is ANI. We're still in the infancy of AI, wherein it hasn't advanced yet to the point where usage is widespread. Definitely a stage where there are already some limited practical applications of it, but there's still room to further improve in terms of integrating AI in a broader way. In AGI, this is what we are hoping to achieve, where AI is used in a broad and wide range of domains. Lastly, ASI, or artificial superintelligence. This means that AI has surpassed even human intelligence, to a point that technology can even solve all the world's problems. Again, we're still in AnI then, after knowing the different types of AI, we also need to know how to get there. Machine learning is a subset of AI, where the focus is on building systems that can learn from data. ML involves deciphering patterns and trends in order to make the predictions or decisions based on their learning. Under ML, there are three main types of ML supervised learning, unsupervised learning, and reinforcement learning. Supervised unleashing involves labeling data or providing the data the machine can learn from. One of the most common examples is identifying if an email is a spam mail. As for unsupervised learning, the machine has given the data, but we don't inform the machine what it is. The machine will have to figure out itself and make sense of the data that it was given. An example would be cohorting customers based on purchase behavior without being told how these groups should be categorized. Lastly, for reinforcement learning, just like Pavlov's theory, the machine is either given penalties or rewards in order to make the best decision based on this system. Then we have deep learning. Deep unleashing is a subset of machine learning. It involves neural networks with many layers, hence the term deep. These deep neural networks are designed to mimic the way human brains operate, to recognize patterns and make decisions based on data. Deep learning especially excels at processing large amounts of complex, high dimensional data such as images, sound, and text. From deep learning comes generative AI, where models can generate new text or data. This can include images, audio, videos, and other forms of media or content for generating text based on vast amounts of corpus. This is called large language model or LLM. It is a type of artificial system designed to understand, generate and interact with human language at a large scale. These models can grasp the nuances, context, and complexity of human knowledge or language. There are numerous limitations under llMs, but I'll share just a couple of them, so I have five here. The first one is fairness and bias. LLMs can amplify biases present in their training data. Since these models learn from vast amounts of corpus, which may contain bias or discriminatory viewpoints, the models can produce outputs that reflect these biases. This issue raises concerns about fairness and the potential perpetuation of stereotypes. Second is hallucination or the lack of true understanding. These models can provide outputs that are plausible sounding but factually incorrect or nonsensical. Third is training LLMS requires substantial computational resources which make them very, very expensive to run. Processing a single page of text requires computations across billions of parameters, which can result in high response times, especially for longer input documents. Fourth is security and misuse. The advanced capabilities of LLMs can be misused for malicious purposes, such as generating deceptive content such as deep fakes, fake news, automating spam or phishing attacks, and creating propaganda. The potential for misuse raises ethical and security concerns that need to be addressed to ensure the responsible development and deployment of these technologies. Lastly, interpretability and explainability. It is often difficult to understand or explain why an LLM produces a specific output. The complexity and opacity of these models make it challenging to trace the decision making process, which is a significant issue in applications where transparency and accountability are crucial, such as in healthcare, finance and legal applications. I'll now discuss what foundation models are and how integral this is in the realm of AI. Previously, AI was used and created to solve specific tasks. For example, an AI application before would be trained using a specific library CTO perform a specific action. But now we have foundation models that have a capability to generate output encompassing multitude of applications and cases. These are trained with a wider range of data, billions and trillions of data points in order to provide the best outcome, and with this we are able to use the model to any number and a variety of tasks. This isn't also limited to just text, but also encompasses other media like audio, video and images, unlike llms, where it's focused mainly on large or large language understanding and generation. An example of a foundation model is OpenAI style e, which generates images from textual descriptions. What makes foundation models incredibly powerful is these models are trained using unstructured data in an unsupervised manner. You could build on top of foundation models too. You could introduce new data to the model to tune them to do specific tasks, or nlps, or natural language processing like sentiment analysis and classification. This action is called fine tuning. We also have Rag, a retrieval augmented generation, where you can augment knowledge without changing pretrained model weights. Usually this external knowledge source pertains to data related to internal company knowledge bases. So again, we're not changing anything in the foundation model itself, but we're simply retrieving the data from a different source in order to obtain the necessary context and generate the proper response. You don't need to fine tune all the time to get the output you require. For certain scenarios, you could provide a sentence and ask a question to existing models. This is called prompting or prompt engineering. On the right side of the screen you can see I've also illustrated the different methods based on the difficulty level of the implementation. The easiest to do is prompt engineering, and the hardest would be, of course, if you built your own foundation model. But why even bother customizing your own foundation models? It's precisely the fact that you can adapt to domain specific language. So for example, in ecommerce, you would need the model to understand all of the products you want to sell on the site. You might also want these models to perform better at really unique tasks specific for your company. Another would be if you want to improve context and awareness of these models of your external company data. So, for example, you might want to train your customer service team based on the specific policies and rules that you have in the company. Let's now focus on retrieval augmented generation so what is rag? From the name itself, it's about retrieving relevant context from external knowledge bases and then augmenting it with your original query, passing that CTO the foundation model to generate an accurate response. There are a number of use cases for RAG, and one of which is being able to improve content quality to reduce hallucinations with internal sources that are up to date. Another would be to be able to create contextbased chat bot for enterprise related questions. So instead of sifting through hundreds of company documents or faqs, it will now be easier for employees to look up the relevant information based on their prompt. Lastly, you could integrate this with online retail by implementing personalized search. Since the system should know customer purchase behaviors, it could more accurately provide personalized recommendations to increase relevance and conversion. There are three different types of rag. The first one is naive rag is the easiest to implement and the most straightforward. There are three steps involved for this type indexing, retrieval, and generation. In indexing, this is where chunking happens, where data is transformed into vector representations through an embedding model. There are a couple of challenges with native rag. The first one is usually it has low precision, which leads to misaligned retrieved chunks, or hallucination usually happens. Secondly, it has low recall, which means that it's unable to retrieve all the relevant chunks. Thirdly, it could have an outdated information, which means that there might be inaccurate retrieval results. And lastly, the generated response risk might be repetitive. As for advanced rag, this was created to solve some of the shortcomings of naive rag. In terms of retrieval quality, there are pre and postretrieval strategies in order to improve quality. Some of these strategies are sliding window, fine grain, segmentation and metadata. Lastly, for moderal rag, it's an offshoot of the previous types, but this time it provides greater versatility and flexibility. The great thing about modular rag is its organization. Structure allows substitution and rearrangement of modules within the model to fit your requirements. I'm now handing you over to Josh for the next steps. So now that we have a better understanding of the concepts involved about generative AI, large language models, and even retrieval augmented generation, let's now talk about a quick example on how to implement this in production. So there are various services and solutions available, and here we can see how we're able to use a managed machine unleashing service called Amazon Sagemaker, and how we're able to use it to deploy a large language model in our own cloud environment. Here we are able to use an SDK called the Sagemaker Python SDK. And inside a notebook instance environment, or maybe in a studio, we're able to use this to deploy our own self hosted large language model in its own inference endpoint. When we say inference endpoint, we basically have some sort of web server where the model lives there, and we're able to use that server to perform inference. That means that our questions and answers are passing through that web server in that server is used for its generative AI applications. In order for us to deploy a model using the sagemaker Python SDK, we simply use a few lines of code, and these lines of code include the following. So here we have our model, and we just use, deploy and specify the initial instance count as well as the instance type along with the other parameters. Here we can definitely choose a large instance type depending on the type of model that we're trying to deploy and of course if you use something like lanching, we're able to use it and we're able to utilize the large language model deployed inside that inference endpoint. So if we are to complete this large language model setup, of course we need to have some sort of front end application. And then this front end application points to a backend API and this backend API server or serverless system the serverless system makes use of the large language model and it basically processes the question and then it returns a response back to the front end application. In some cases you would need a database, but of course that depends on your type of application as well as the users using it. What if we have different files, let's say PDF files, and we store them inside a folder or directory in our machine called sources, and we decide to upload it inside a storage bucket like s three? Here we use the following command dispatch command to upload different files from our local directory up to an s three bucket. And now once this s three bucket has these PDf files, we're now able to use different solutions and services, for example Textrac, as well as Langchain, sagemaker and FIS, where we're able to extract the needed info from these Pdf files and convert them to a format which is easily processed with what we have in llms. So of course now this time we're now no longer limited to what the large language model has to offer. We're now able to utilize what's also stored inside the documents, let's say using Langchain, we now have a new chain which makes use document information and extracted and processed from the different files. If we were to ask it some questions, it will now utilize the content of the PdF files. And now we will now have a set of answers which is now definitely more relevant compared to the previous setup where we didn't use the PdF files at all. Going back to the question, can we build a generative aipowered application in 24 hours? Definitely yes. But once we have to implement a rag powered generative aipowered application, of course it may take a bit more time because you would need to of course set up the necessary resources and services, as well as making sure that the data and the files needed for this rag setup. And hope you learned. So thank you again and have a great day ahead.

Slides

Download slides (PDF)

See all 32 talks at this event!

Conf42 Python 2024 - Online

February 29 2024

Unleashing the Power of Retrieval Augmented Generation to enhance AI-powered Applications

Video size:

Abstract

Summary

Transcript

Slides

Joshua Arvin Lat

CTO @ NuWorks Interactive Labs

Sophie Soliven

Director of Operations @ Edamama

Join the community!

Featured event

2025

2024

Info

Conf42 Python 2024 - Online

February 29 2024

Unleashing the Power of Retrieval Augmented Generation to enhance AI-powered Applications

Video size:

Abstract

Summary

Transcript

Slides

Joshua Arvin Lat

CTO @ NuWorks Interactive Labs

Sophie Soliven

Director of Operations @ Edamama

Join the community!