Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. For today we'll be discussing
how to unleash the power of retrieval augmented generation
to enhance aipowered applications.
We'll quickly introduce ourselves. I am Sophie Sullivan,
and I am the director of operations for Edamama.
I have over nine years of experience in e commerce,
fintech, and retail in the Philippines.
Over the years I have built my expertise on process
management, AI, and everything else in between.
And I am Joshua Arvin Lat, the chief technology officer of
Nuworks Interactive Labs. I am also an AWS machine Learning
hero, and I am the author of the following books.
Here in the screen we can see three books I've written so far this past
three years. The first one is machine learning with Amazon
Sagemaker cookbook. The second one is machine learning
engineering on AWS, and finally building and
automating penetration testing labs.
So right now we are probably wondering what
this talk is all about. And it's about generative AI.
And of course we'll dive deeper on retrieval augmented
generation later. But of course, no AI talk
is complete without having a few examples on how AI really works.
So here in our chat playground, we basically ask
the generative AI service, what is the meaning
of life? So here our generative AI service simply answers
the meaning of life is a self filled question that has
been debated throughout time. Different people have different beliefs
and perspectives on this matter. As you can see,
the generative AI service aipowered our question, and of
course the answer is basically its own interpretation on
what the meaning of life is. Now, let's try a different
example. This time we have a text input,
and the generative AI tool gives us
an image response instead. So here we input
the prompt. Here the generative AI service simply
returns how it interprets our prompt in the
form of an image. Now let's try a similar
example, but this time let's input the prompt cat flying
with wings. So even if it's not really possible at this point
in time for a cat to have wings, the generative
AI solution is still able to provide us an image.
And it generated this image where this
image has a cat, and of course this cat has wings,
and it basically lets us know that this cat can probably fly
because it has wings. Then finally,
let's replace the word cat with dog. And here,
surprisingly, we have an image of a dog with wings,
and it's also flying. There are a lot of different possible
applications of generative AI, and you'll
be surprised with how the recent
innovations and findings have
helped this field progress further this
last couple of months. So before we start I
want to pose a question to everyone. Do you think
you can build a generative aipowered application in
just 24 hours? If you were to deploy your
own self hosted LLM, then yes, it's possible.
However, setting up a rag powered generative AI system
may take longer. We'll see this in action later in our presentation.
We can categorize and group AI into artificial narrow
intelligence, artificial general intelligence, and artificial
superintelligence. Currently, what we have right
now is ANI. We're still in the infancy of
AI, wherein it hasn't advanced yet to the point where usage
is widespread. Definitely a stage where there are already
some limited practical applications of it, but there's
still room to further improve in terms of integrating AI in
a broader way. In AGI, this is what
we are hoping to achieve, where AI is used in a broad and
wide range of domains. Lastly, ASI,
or artificial superintelligence. This means that AI has surpassed
even human intelligence, to a point that technology can
even solve all the world's problems. Again, we're still in AnI
then, after knowing the different types of AI, we also need to know
how to get there. Machine learning is a subset
of AI, where the focus is on building systems
that can learn from data. ML involves
deciphering patterns and trends in order to make the predictions
or decisions based on their learning. Under ML,
there are three main types of ML supervised learning,
unsupervised learning, and reinforcement learning.
Supervised unleashing involves labeling data or providing
the data the machine can learn from. One of the most
common examples is identifying if an email is a spam mail.
As for unsupervised learning, the machine has given the data,
but we don't inform the machine what it is.
The machine will have to figure out itself and make sense of the data
that it was given. An example would be cohorting customers
based on purchase behavior without being told how these
groups should be categorized. Lastly, for reinforcement
learning, just like Pavlov's theory, the machine is either
given penalties or rewards in order to make the best
decision based on this system. Then we have
deep learning. Deep unleashing is a subset of machine
learning. It involves neural networks with many layers,
hence the term deep. These deep neural networks
are designed to mimic the way human brains operate, to recognize
patterns and make decisions based on data.
Deep learning especially excels at processing large
amounts of complex, high dimensional data such
as images, sound, and text. From deep
learning comes generative AI, where models can generate new
text or data. This can include images,
audio, videos, and other forms of media or content for
generating text based on vast amounts of corpus.
This is called large language model or LLM.
It is a type of artificial system designed to understand,
generate and interact with human language at
a large scale. These models can grasp the nuances,
context, and complexity of human knowledge or language.
There are numerous limitations under
llMs, but I'll share just a couple of them, so I have five here.
The first one is fairness and bias.
LLMs can amplify biases present in their training data.
Since these models learn from vast amounts of corpus,
which may contain bias or discriminatory viewpoints,
the models can produce outputs that reflect these biases.
This issue raises concerns about fairness and the potential
perpetuation of stereotypes. Second is hallucination
or the lack of true understanding. These models
can provide outputs that are plausible sounding but factually incorrect
or nonsensical. Third is training
LLMS requires substantial computational resources
which make them very, very expensive to run.
Processing a single page of text requires computations
across billions of parameters, which can result
in high response times, especially for longer input documents.
Fourth is security and misuse. The advanced capabilities
of LLMs can be misused for malicious purposes,
such as generating deceptive content such as
deep fakes, fake news, automating spam
or phishing attacks, and creating propaganda. The potential
for misuse raises ethical and security concerns that
need to be addressed to ensure the responsible development
and deployment of these technologies. Lastly,
interpretability and explainability. It is often difficult
to understand or explain why an LLM
produces a specific output. The complexity and
opacity of these models make it challenging to trace
the decision making process, which is a significant issue in
applications where transparency and accountability are
crucial, such as in healthcare,
finance and legal applications. I'll now discuss
what foundation models are and how integral this is in
the realm of AI. Previously, AI was
used and created to solve specific tasks. For example,
an AI application before would be trained using a specific
library CTO perform a specific action.
But now we have foundation models that have a capability
to generate output encompassing multitude
of applications and cases. These are trained with
a wider range of data, billions and trillions of
data points in order to provide the best outcome,
and with this we are able to use the model to any
number and a variety of tasks. This isn't also
limited to just text, but also encompasses other
media like audio, video and images,
unlike llms, where it's focused mainly on large
or large language understanding and generation.
An example of a foundation model is OpenAI style e,
which generates images from textual descriptions.
What makes foundation models incredibly powerful is these
models are trained using unstructured data in an unsupervised
manner. You could build on top of foundation models
too. You could introduce new data to the model to
tune them to do specific tasks, or nlps,
or natural language processing like sentiment analysis
and classification. This action is called fine
tuning. We also have Rag, a retrieval augmented generation,
where you can augment knowledge without changing pretrained
model weights. Usually this external knowledge
source pertains to data related to internal company
knowledge bases. So again, we're not changing anything in
the foundation model itself, but we're simply retrieving the
data from a different source in order to obtain the necessary context
and generate the proper response. You don't
need to fine tune all the time to get the output you require.
For certain scenarios, you could provide a sentence and ask a question
to existing models. This is called prompting or prompt
engineering. On the right side of the screen you
can see I've also illustrated the different methods based on the
difficulty level of the implementation. The easiest to do
is prompt engineering, and the hardest would be, of course, if you built your own
foundation model. But why even bother customizing
your own foundation models? It's precisely the fact that you can
adapt to domain specific language. So for example, in ecommerce,
you would need the model to understand all of the products you want to
sell on the site. You might also want these models to perform
better at really unique tasks specific for your company.
Another would be if you want to improve context and awareness of these
models of your external company data. So, for example,
you might want to train your customer service team based on the specific policies
and rules that you have in the company. Let's now
focus on retrieval augmented generation
so what is rag? From the name itself,
it's about retrieving relevant context from external knowledge
bases and then augmenting it with your original
query, passing that CTO the foundation model to generate
an accurate response. There are a number of use
cases for RAG, and one of which is being able to improve content
quality to reduce hallucinations with internal sources that
are up to date. Another would be to be able to create contextbased
chat bot for enterprise related questions. So instead of
sifting through hundreds of company documents or faqs, it will
now be easier for employees to look up the relevant information based
on their prompt. Lastly, you could integrate this with
online retail by implementing personalized search.
Since the system should know customer purchase behaviors, it could
more accurately provide personalized recommendations to increase
relevance and conversion. There are three different
types of rag. The first one is naive rag is the easiest
to implement and the most straightforward. There are three steps
involved for this type indexing, retrieval,
and generation. In indexing, this is where chunking happens,
where data is transformed into vector representations through an
embedding model. There are a couple of challenges with native
rag. The first one is usually it has low precision,
which leads to misaligned retrieved chunks, or hallucination
usually happens. Secondly, it has low recall,
which means that it's unable to retrieve all the relevant chunks.
Thirdly, it could have an outdated information, which means that
there might be inaccurate retrieval results. And lastly,
the generated response risk might be repetitive.
As for advanced rag, this was created to solve some
of the shortcomings of naive rag. In terms of retrieval
quality, there are pre and postretrieval strategies
in order to improve quality. Some of these strategies are
sliding window, fine grain, segmentation and metadata.
Lastly, for moderal rag, it's an offshoot of
the previous types, but this time it provides greater versatility
and flexibility. The great thing about modular rag
is its organization. Structure allows substitution and rearrangement
of modules within the model to fit your requirements.
I'm now handing you over to Josh for the next steps.
So now that we have a better understanding of the concepts involved
about generative AI, large language models,
and even retrieval augmented generation,
let's now talk about a quick example on how to implement this in
production. So there are various services and
solutions available, and here we can see how we're able to use
a managed machine unleashing service called Amazon Sagemaker,
and how we're able to use it to deploy a large language
model in our own cloud environment. Here we
are able to use an SDK called the Sagemaker
Python SDK. And inside a notebook
instance environment, or maybe in a studio, we're able
to use this to deploy our own self hosted
large language model in its own inference endpoint.
When we say inference endpoint, we basically have some sort of
web server where the model lives
there, and we're able to use that server to perform inference.
That means that our questions and answers are
passing through that web server in that server is
used for its generative AI applications.
In order for us to deploy a model using the sagemaker
Python SDK, we simply use a few
lines of code, and these lines of code include the following.
So here we have our model, and we just use,
deploy and specify the initial instance
count as well as the instance type along
with the other parameters. Here we can definitely choose
a large instance type depending on the type
of model that we're trying to deploy and of course if you use something
like lanching, we're able to use it and
we're able to utilize the large language model deployed inside
that inference endpoint. So if we are to
complete this large language model
setup, of course we need to have some sort of front
end application. And then this front end application
points to a backend API and
this backend API server or serverless
system the serverless system makes use of
the large language model and it basically processes
the question and then it returns a response
back to the front end application. In some cases you would need
a database, but of course that depends on
your type of application as well as the users using
it. What if we have different
files, let's say PDF files, and we store them inside
a folder or directory in our machine called sources,
and we decide to upload it inside a
storage bucket like s three? Here we use the following command
dispatch command to upload different files
from our local directory up to an
s three bucket. And now once this s three
bucket has these PDf files, we're now able
to use different solutions and services, for example
Textrac, as well as Langchain, sagemaker and
FIS, where we're able to extract
the needed info from these Pdf files and
convert them to a format which is easily processed
with what we have in llms. So of course
now this time we're now no longer limited
to what the large language model has to offer.
We're now able to utilize what's also stored
inside the documents, let's say using Langchain,
we now have a new chain which makes
use document information and
extracted and processed from the different files.
If we were to ask it some questions,
it will now utilize the content of
the PdF files. And now we will now have a set of answers
which is now definitely more relevant compared to the
previous setup where we didn't use the PdF files
at all. Going back to the question,
can we build a generative aipowered application in
24 hours? Definitely yes.
But once we have to implement a rag powered
generative aipowered application, of course it may
take a bit more time because you would need to of course set
up the necessary resources and services,
as well as making sure that the
data and the files needed for this rag
setup. And hope you learned. So thank you
again and have a great day ahead.