Transcript
This transcript was autogenerated. To make changes, submit a PR.
Thanks for joining my session. My name is Samuel Baruffi. I am a
solutions architect with AWS and I'm very excited
to talk about vectorizing into the future
AWS retrieval augmented generation systems using
large language models. So a quick agenda
for today will be the following.
We're going to quickly talk about what are foundational
models, large language models. Then we're going
to talk about some of the capabilities that are very easy
and important to use when it comes to generative AI.
Then we're going to talk about some limitations of those foundational models.
Those models are amazing. It has revolutionized
and it's still revolutionizing many, many industries across
the world, but they have limitations. So we're
going to talk about what are those limitations, and we're going to talk about potential
solutions, especially using retrieval augmented generations,
which Reg is short for. Then we're
going to talk about what type of databases
can help us, you know, improve those
foundational models with Reg. So we're going to go through
the list of currently supported databases offerings on
AWS for vector. We're going to explain, explain at
a high level the capabilities
and the differentiations across those offerings.
And then after that we're going to talk about Amazon Bedrock, which is a
generative AI managed service
on AWS that allows you to very easily consume
different foundational models, both image generation
text to text, and also embeddings. And then
we're going to talk about Amazon Bedrock knowledge base,
which combines the powerful rack
systems into the bedrock ecosystem
and it allows users to very easily configure
retrieval augmented generation systems using
bedrock foundational models and also using AWS
vector databases that are managed. So we're
going to talk about how those two words can come together
to really empower a lot of companies and users
to create very powerful generative AI solutions.
And then in the end, we're going to do a quick demo just showcasing
the capabilities of bedrock and open search.
So without further ado, let's get started. So WiFi
national models, right in a before transformers
and generative AI was really powerful in
the past, traditional machine learning models
were really trained and deployed for specific
tasks. So you might have some models that were for specific task
generation, some models that were really able to do Q
and a, some bots, some models that maybe were able to do some type
of predictions. So they're really specific
models, but you need to deploy all these different models to potentially achieve
the combination or the collection of different tasks
with generative AI and transformers,
the foundational models, if you think about quickly on
the traditional machine learning models, you'd have a lot of label
data, and you train those models to that specific label
data. Right? What foundational models?
Using transformers enables users
to actually do all of those tasks within a
simple, not simple, but within a single model
that has been trained with unlabeled data. So foundational
models sometimes are also referred to as general models
that have good word representations and
can do a lot of different tasks that in the
past, you need to select the different models.
It is very powerful because with a single model now
you can perform a combination of different
tasks that in the past wouldn't have been possible.
So generative AI, you can use for
many, many different use cases. Here, it's just
demonstrating into four different categories,
the capabilities of generative AI. So you
can enhance customer experience by having, you know, agent assistance
or, you know, personalizations or chat bots that will
help enhance your customer experience. You can
also have boost, you can also help boost employee productivity
with conversational search. Let's say you
have vast amount of internal data and
you want to make very easy for users internally to consume
the data to improve the productivity. Foundational models can
help solve that problem and a very good solution. You can
also improve business operation. So if you're doing a lot of document
processing that maybe before was done by manual labor,
you can use those foundational models to potentially process,
you know, some entity extraction or maybe document
processing or maybe generation of documents. You can use generative
AI, and then of course, creativity.
With a stable diffusion models, you can create many different
images. You can do video enhancements, you can create music.
So those generative AI models are not only text generations,
but you can generate images, videos. And this is a very
fast paced, evolving technology.
So what is capable today might
very quickly advance in the near future. The text
to text models are really, really powerful today. Images have
become very powerful. And now we can see that video generations are
just starting to get more powerful than ever.
So what does aws in
terms of generative AI? Right, so AWS
is very quickly growing
the list of services and capabilities that support customers. To use
generative AI, we can, we have Amazon Sagemaker,
which is the platform for any machine
learning AI requirements, training,
inference, evaluation, you know, data ingestion,
data cleaning, you can name it. But when it comes to generative
AI, Amazon Sagemaker has a, a foundational hub
called Jumpstart, where you can actually deploy many, many different
foundational models that are going to be,
that you're going to deploy within Sagemaker. Sagemaker is going to deploy the
infrastructure for you and it's going to match that infrastructure.
But then we also have Amazon bedrock, which is a
completely managed service with pay as you go approach, that you
can select a variety of different model providers
and models within those model providers. We're going to talk in
a couple of slides into the future of this
presentation that are going to present some of the models that are capable.
And Amazon has also done a lot of
innovation at the hardware level. So you can see with Amazon
EC two, TRN one, which is
training instances, which are instances that
have proprietary innovative accelerators
for machine learning training from Amazon that really optimizes the cost
performance for companies that wants to train their own model.
Those could be foundational models or could be traditional
machine learning models. But we also have Amazon EC
two, InF two, which is short for inferential
two, which is a chip that is optimized for accelerating
inferential inference from your machine
learning models. Those could be foundational models or any other type of model.
And then last but not least, Amazon Codewisper,
which is a generative AI power coding assistant that
helps developers with code completion security scams.
You know, chat, you can chat with your code
and receive recommendations and different helps
in terms of fixing bugs and so forth. So those are
the things that AWS offers for generative AI capabilities.
And, you know, there are a lot more that goes within those services in
terms of functionality and features as
much as those models. Foundational models are really, really powerful.
There are limitations of large language models and, you know,
large language models and foundational models sometimes just use together.
But you know, large language models are just models that
can generate text or embeddings that really made
it possible to use generative AI as
we know today. But what are some of those limitations that we
know? So first of all, there is really limited contextual
understanding. So the model, because it has
been pre trained, he only knows information
to the date and is not going to know proprietary,
you know, private information. So he has limited contextual understanding
of what you are asking. You might be asking some
question that is ambiguous and it might have, you know, a contextual
limitation. In that sense, he also
has lack of domain specific knowledge. So if you, if you work for
company a and company a has a lot of private documentation that
it was not on the Internet, or even if it was in
the Internet, it might not be an expert on that domain specific.
So they are known to not be super good in
specific domains, especially if those domains don't have a lot
of data on the Internet. Which most of those models
get trained on top of it.
So this is a big one, lacks explainability of interval
ability. So it's very common that those large language
models might hallucinate. And hallucinate is it
just means when a response, an output from
one of those models are generated is stating
a inaccurate and not factual correct
information, right. So there is very little explainability of
why that information began to be. The way those models works is
just predicting the next word and they might just spit
it out. A lot of not factual,
accurate data. And it's really hard to know why
they have done that. So they lack explainability and interpretability.
And again, inaccurate information. It's what
we just described, which you might ask a question, the model
might give you an answer that sounds very, very confident
that answer is correct, but in fact, it's just a made up answer and
is not accurate, not neither factual,
accurate. So with that said,
with the limitations that we know, how can we potentially,
what are the solutions that we can put in place to help solve
this problem? So there is something called vector embeddings.
And what are vector embeddings? So vector embeddings
are using these foundational models.
Embeddings are semantic representations of
words by translating
into vector, mathematical vectors, float vector
vectors. So you can think about if a user inputs,
you know, New York and it runs into an embedded model,
and an embedded model is just a large language model that is able to
convert text into a array
of float numbers in a vector.
So you can see New York might be the
vector representation of New York, might be the one you see here.
There are different dimensions on vectors embeddings.
The bigger the dimensions, the more data and the more float numbers
you're going to have on the vector array.
And why are vectors embedding?
Is important because they carry
with those numbers, with these mathematical arrays
of flow numbers, they carry the semantic understanding
behind the text that you are embedding.
So, and we're going to talk in a moment why they are important.
But it's really important that if you have, you know,
terabytes of data that you want to store and you want to
very easily retrieve that data based on semantic understanding.
So you're not doing just an exactly match search, you're asking
a question. And that question might be related
to some of those, the context in your text that is also
known as semantic search. So the numbers will carry
a representation of the text in
itself. So now that
we might have generated. So it's very common on,
when you have those type of limitations into large
language models that we just described. One of the common
and best approaches to solve that is to add
an ability to retrieve the context from
your vector space and add the vector,
the text chunks that will be converted back from numbers
into text as context to your large language models.
But one of the challenges that you have once you create
all these embeddings, let's say you have multiple documents internally
and you want to translate all those documents, maybe PDF,
into vectors, what do you do with those vectors?
And here where vector databases play a big role.
So you want to make sure you can store those vectors representations,
those vector embeddings in a database. And then
after you restore there in the database, you have the ability to retrieve
by doing semantic search chunks of text that
are similar to the question or topic you
are trying to retrieve from. So how does vector
database works or this vector embedding system?
So if you think about in this diagram, you're going to have some raw data.
You know, it could be images, it could be documents, it could be audio.
For the sake of simplicity, for today's presentation, let's just focus on
text. So let's say you have a word document and you
want to create embeddings that behind the
scenes are going to create vectors, the arrays of vectors
for you. So what do you do? You create, you chunk that document
into different pieces, because there are limitations of
how many words you can create a vector.
And it is of course very depending on the embedding model, the foundation embedding
model that you use. But then once you have created
the chunks, you go through the model and you say, hey, here is the chunk
of text. Can you create a dense vector encoding for
me? So that is where it creates the vector embedding space
which will return an array of flow numbers for you that you
create your vector embeddings.
You can also create sparse vectors encoding, which is
a different way to perform and being more optimized when doing
the retrieval. Once you have those vectors, then you can
stores those vectors in a database. And we're going to talk about some
databases that AWS offers with the ability to
store those vector databases. And then after,
finally you can build applications that are able to
retrieve query your database using semantic understanding
and using different techniques like KNN
and few other ones that you can just ask a question
and you find the close similarities,
vectors from the probe and query that you've provided.
And after that you just, you copied
the vectors from your database. You run again in the embedding
model though that embedding model, just convert the vectors into
text and then you can consume the text into
the foundational models as texture text models that you
might have available. Now let's just talk
about the capabilities and databases that AWS offers
you into storing those vectors.
So there are a wide array of
databases that AWS provides.
They have vector capabilities you
can see here on the list. We are going to go through most of them
and I'm just going to talk to you about a high level why
and how they are different from each other. So we're going to have search engines
like open search. We're going to have relational databases like
Postgres Aurora Postgres and RDS postgres.
You're going to have document databases like document DB.
You're going to have memory ink in memory
databases like memory DB, and graph
databases like Neptune. And all of those databases
now have capabilities to run and store vector
functionality. So let's just start with our first
database. Amazon Zarora is
a relational database that is a managed database on AWS.
So Amazon Aurora Postgres flavor now has
the capability to run vectors
using a extension called, which is an
open source extension called PG Vector. What it allows
you to do is to have vector embeddings
stored on your relation database. So if you're already storing
your data using a relational approach and you just want to store
an additional vector representation
of the data, you can install PG vector both on Amazon Aurora
and RDS postgres flavor.
And once you restore those embeddings, you can
support different algorithms such as KNN ANN
H and SW and IV flat. Those are
just different approaches and solutions on
how to retrieve close similarities and chunks of embeddings
and text for you. And you know, for postgres apps,
the good thing is you don't need to make any driver
change. You can just literally use install the extension on Amazon Aurora
or RDS Aurora and continue to use your database.
So this solution is a very good solution for existing
postgres SQL users or any users
that prefer relation database. You can actually use them.
That right? So it's really powerful. There are a lot
of integration. So if you have ML
background but you are focused on relation database you,
I would recommend you taking a look at Amazon Aurora with PG Vector
and talking about PG Vector. PG Vector is an open source postgres
SQL extension that is designed for efficient vector
similarity search and perfect for levering machine learning with
your databases. So it supports storing data
along with your traditional data types while maintaining
postgres robustness features such as acid compliant
point in time recover and PG vector handles
exactly an approximate nearest neighbor. Search accommodates in
various distance measures like l two indian product
and cosine distance. Those are just different mathematical expressions
that are going to retrieve the similarity semantic
search for you as you can see here, PG vector
and with Aurora and RDS, sorry with Aurora are
also integrated with Amazon bedrock knowledge base. We're going to talk about
that in a moment. You have configurable require rate using
these different approaches like HM and SW
EF underscore search and IV IVF flat
probes. The good thing about PG vector, it can
scale to support over 1 billion vectors and the
dimension it can support vectors with a 16 up to
16,000 dimension. So that is a very good
way if you have relational databases and you want to store vectors and
this could be the place you you go.
Second, we nietzsche talked about a very powerful
service which is Amazon open search. So Amazon
Open Search is a NoSQL database that is
has been built from the beginning with scalability
and as a distributed database for
search. So you can use search and analytics engine
on top of open search. You have different
types of deployment for open source. So you can have a managed service that you
manage different instance behind the scene for you. But it's also
have the capability to deploy a serverless open search
where you don't need to manage, you know, even the service doesn't need to
manage any server for you. It doesn't abstract abstracts that
away from you. Open search has also the
capability restore vector using the KNN plugin.
It also supports different algorithms such
as KNN, AM, HMSW and IV
flat. IVF flat. So you can see that similar
to the Aurora postgres, Opensearch has
similar algorithm capability.
And if you have DynoDB tables, you can actually use zero ETL
from dynoDB to move the data into open source service
and you can vectorize those as well. So who are
open source service? Very. It's a good fit.
So if you are already an open source user or if you prefer NoSQL and
you want to do hybrid search as well. So let's say you have a piece
of text and you want to search both maybe search or
field from the text, but also using the vector semantic capability,
open search support that capability for you.
And with open search service
on AWS for vector it supports.
I really like the open search and we'll do a demo later on because you
can very easily and cost efficient deploy an open search serverless
vector database that will behind the scenes manage all the
index shared and manipulation of the data for you
and it can scale for over a billion vectors with very high performance
with the same dimensionality as Aurora. You can also
have configurable recall rates via different segments and EF
search. And similar to Amazon Aurora,
OpenSearch is one of the main vector databases on AWS
and integrates very well with knowledge base
on bedrock. But also open search has
a plugin called Neuro search that it can provide a
very seamless integration between your text ingestion
and the vector embedding creation. It can talk to Bedrock,
you can talk to OpenAI, you can talk with cohere.
By using this neural search it can automatically do all the
retrieval generation of the battings for you continuing
the segment. Vector support on AWS is
also made available through document DB. So document DB is
a very fast cloud native document database.
So again it's a NoSQL database that has MongoDB API
compatibility. You have different provision deployment
options that it's a managed service. It also supports
the same algorithms that I mentioned before,
KNN Am IVF flat.
By using MongoDB you can just elevate
the capability of your vector search if you're already using documentDB
or MongoDB. And what we
see here is the good thing about documentDB if
you're very familiar with document databases specific
JSON usage because document database are really powerful
with JSON, if you want to vectorize that information
by just enabling vector capabilities
on your document DB, it becomes very very powerful
and continue. This is a very interesting service.
Amazon Memory DB for Redis now also
have a feature that is currently in preview and hopefully very soon is
going to become GA general available that adds the ability
for memory DB, which is already a very popular and
performant database to have multi zero
ability to handle vector storage, index and
search capabilities. So memory DB,
like the name says, is a database that stores all the data in
memory and is ready's API compatible. It's a fully
managed service. You can see it supports different word vector
searches, algorithms that we mentioned. It has
abilities to support up to 32,000 dimensions
of vectors. And this is ideal if you really have a workflow
that requires single digit millisecond
latencies and throughput for your vector.
So let's say you are building a chatbot that should be really quickly or trying
to do retrieval augmented generation. That is super powerful.
Memory DB might be the best place to look for because
that very powerful capability and
then fine. Last but not least, you also have Amazon
Neptune analytics. So Amazon Neptune is the
Amazon graph database. It allows you
with the Amazon Neptune analytics allows you to have analytic
memory optimized graph database engines. You have
different discrete capacity deployments to deploy this database.
It supports agents w similarity algorithm.
You can see the dimension of that. This database for vectors are
much bigger with up to 65,000 and it complements.
So it's an addition plugin on top of your Amazon Neptune database.
And if you why would you use Neptune
analytics for your vector database? So if you're using neural networks, use cases
where you need to do vector search graph traversals.
This would be a very good approach. You can
also use Neptune database with serverless deployment,
but Neptune analytics only supports discrete capacity levels
at this time. So if you're curious to
learn more, I know I just covered very quickly these databases. I would
highly recommend that you just do a quick Google and search
our AWS documentation about how they work.
But now I want to talk about Amazon Bedrock.
I mentioned before in the beginning of my presentation that Amazon Bedrock
is the easiest way for you to build generative AI applications
on AWS. And the amazing thing about Bedrock
is a completely managed service for Genai models. So you
have a choice of multiple models with industry leading
foundational model providers that are available with a single API
call if you want. You can also customize and
fine tune your models using your own organization data.
And Bedrock has taken security
as job number zero and it has all
the encryption capabilities, privacy capabilities,
not using your data to train any of those models. So it's an enterprise
grade security and private service. With Amazon Bedrock
you have a broad choice of models, as you can see here. This list
is just as of today in March 30
as I'm recording this session 2024. Right now there
are seven different model providers,
AI 21, Amazon and tropic cohere,
meta nest row and stability.
Those models have different capabilities. So you're going to
have a text to text model where it's just a foundational model that you send
text and it returns text back by predicting the next
word. But you also have embedded models such as Amazon
text embeddings and Amazon Titan multi model embeddings.
But you also have an embeddings with cohere, which is the coherent
embedding multilingual. And on top of that,
you also have the ability to use bedrock to generate images with
stability, AI stable diffusion Excel 1.0,
but also with Titan image generator,
it's pay as you go. You pay per token that you consume and
you can choose the model that you're going to have access. In my demo I'm
going to show you actually in the demo, the demo that we're going to show
to you today is knowledge base for Amazon
bedrock. And this is where I'm trying to bring all my
presentation into a single place. Knowing the limitations of large language
models that I've discussed in the beginning, one of the ways
that you can work around the limitation is by creating a
rack system, a retrieval augmented generation. What is a
retrieval generation augmented is to bring pieces
of data on text into your
context before sending to a foundational model.
And the ability that you do that, the first thing you need to do is
to have a vector database where you can store all the vector embeddings
from your specific domain data.
You can retrieve the data at the query time. That data
is going to be converted from vectors to text and
then that data is going to be put it as the context of
your query to the foundational model. It's,
it can be very cumbersome to build this completely
rag solution. So what knowledge basis for Amazon bedrock
achieves is to automatically automate
all the ingestion and retrieval for
you on this reg system. So you connect
your knowledge base with a database, you there
are currently different supports for databases that are going to show in a moment for
vector databases. Then you select an embedding
model. Then you put your data on s
three simple storage service and
as soon as the data hits on that
s three you can sync knowledge base which
behind the scenes is going to create the embeddings, start embedding in
the database and then when you make a call to
knowledgebase for bedrock, that call you can decide
if that call just retrieved the data from your database
or if you want to do retrieve and generation, which means
just retrieve the data from Myvector database, send to the
foundation model, generate a response with my contacts,
awareness information and then give the answer back to the customer.
And you can select the model that you want to be used as the foundational
model and also the embedding as well.
So knowledge base on Bedrock has support for
currently different databases. So right now it supports vector engine
for open source serverless, redis, enterprise, cloud,
Pinecone and Amazon Aurora. There are more capabilities coming soon.
For example Mongodb. It's coming to be one of the vector
databases support on Amazon Bedrock and hopefully in the future more
of the databases that I talked today are also going to be available on
bedrock. And the last thing I want to show is with knowledge
base for bedrock you can use a single API call to do
the retrieval and generation. So if you look at this diagram
with a single API call on number one you can think about
a search query. So you can say,
let's just give an example. You are asking about a
proprietary question of your company,
right? And you know the foundation model doesn't know the answer.
So you can do a search query what bedrock
knowledge base you do, realizing that you need to do a retrieval on
your vector database. So number two is going to go there, do the retrieval,
then behind the scenes going to call your vector
database is going to retrieve that
embedding. The vector embedding is going to then convert
the vector into text. And then on four
it's going to send that text as context into your
bedrock foundational model, texture text generation.
And then finally it's going to send back the generation
of answer that it chose. And you can see here soon,
you know it's also going to support s three.
And now let's jump in to do a
quick demo of knowledge base for bedrock.
Awesome. So let's just jump into the demo.
The demo will be a very straightforward.
I have downloaded some files
from Amazon shareholder ladder.
So you can see here I have the 2019,
the 2020, the 2021 and the
2022 Amazon shareholder.
What I want to show is I have already created a
open search database and I have then linked
that database into bedrock knowledge base
and I want to show you that it created the vectors automatically
from s three. So just first let me show you.
I have an s three here. So I created an s three
bucket on that s three bucket. I just true
those four files. I could have as many files as, you know,
I wanted here. And what I've done then of
course I've created an open search database. So this open
search database you can see here, it's an open search serverless database.
I have a collection. So let me just close these ones.
I have a collection here I call bedrock sample.
So I created this database. There is a dashboard also created for
this database that I'm going to show in a moment. But the interesting part here
is if I go on bedrock, which bedrock is the service that,
that allows a no easy and scalable
way to create generative AI on bedrock.
The first thing we're going to do is let's just ask
a very specific question to a foundational model
without a reg system. So without using knowledge
base. So you can go here on text we can first
let's just look for a very specific, I think on the 2020.
There is a mention. Let me
just find the mention. There is a mention of 3000. Just bear
with me. Let me see if I can find on the document.
There is a mention somewhere here. I just need to find
that AWS has released over 3000
features,
3000 features and services. I don't think
it's highlighting here. So just bear with me.
Let me just, let me download this file. So what we're gonna do, we're gonna
download the file. Let me just download the file.
Let me open the file here. And I think if I search now here
features here. So 33 times I was
searching wrong. You can see here AWS continues
to deliver new capability over 3300
new features and services launch in 2022.
So what you're going to ask the foundational model without rag is
this. Let's go here. Let's go on bedrock.
Let's just choose one of the better models in tropic.
Let's just go with instant because I know this is just a fast
model and say how.
So let's ask the model how many new features and services
did AWS services
did AWS launch in
2022? So there is going to be the question.
You can see I'm going to go in the model you're going to ask and
the model says I don't have exact, I do not have.
Let's just wait the finish. And it says I
do not have the exact count of numbers or new features services
like launch 2022. So what this means in this place
here is that the foundational model itself doesn't have
that information, right? Exactly. But the document
that we have knows. So how do you put these two pieces together?
Well, the first thing we can do is if we go on
the knowledge base. So let me show you how I've created a knowledge base.
So I've already created a knowledge base on dialog.
And let me show you how the knowledge base works. So let me just scroll.
So you create a knowledge base. Then you choose a
data source. So in this case the data source is s
three. So you can see that is the s three I showed you. If I
go here and I show you can see this, we have these files.
So you put, you choose the data source first, which is just an s
three bucket. Then you choose the model that you want, bedrock knowledge
base to create the vectors for you. So we are using a
model that is offered within bedrock, which is the title
embedding model version 1.2. Then after that
you choose a database that you want to store those
vectors. Right? So you want to have a database where the
vectors can be stored and then you can retrieve that after the fact.
So if you look here we have a vector database. We're using
vector engine Amazon open search serverless.
We have created the index name. So open search works with multiple
index and within those indexes you can have a combination
of items and documents. So, and we said when
you create new vectors please add the,
add the vector into the vector field on that
item, on that document and add the text, the text itself
into the text field. Because remember open search can do hybrid. So search
in this case we're just going to do semantic search which is
doing a similarity algorithm on top of your vector.
So before I ask a question here, let me
show you. So I'm going to go on. So this is the open
search dashboard where you can run some open search commands to
see the data. So if I, this query,
what is this query is going to return? Is just going to return all the
different documents. So all the different ids
within that doc, within that specific index.
So you can see this index called bedrock sample index 665
is the same index that we've said here. So if you see here
is the same vector index, right. And these open search
serverless vector database there is nothing more than just the
vectors from the s three files that we have uploaded.
So you can see here I have multiple, these specific
item has a chunk of
this file here. So what we can do, we can just copy any chunk.
In this case I've already selected this chunk and I want to show
you how the vector is stored. So if you go and you compute this
you can see that it creates the index, it creates the id.
The sequence number might be because this specific file
has been chunk, has been, you know, parsing to multiple chunks. And this
is the sequence number 13. And here you can see the vector,
right? So you can see a bunch of numbers. I'm just going to, you know,
minimize this. But this is the vector. This is where the titan
embedding model has been called to generate this
vector. And here is the text.
So what knowledge base,
bedrock knowledge base automatically did for me was copy,
copy this chunk, ran this chunk of text
into my embedding model and then it generated the vector.
So this is the factor. So now what we can do and
you can see here, I think this is the one that I want
to show if I'm not mistaken. Let me see.
Yeah, here. So this is the chunk that we were gonna,
that my, I want to show you that bedrock knowledge
base will automatically retrieve and generate an answer for
me. So remember we tried with just the foundation model, it didn't
know, right. But now I have this piece of text and with the vector
embedding itself that has this information. So what we can do,
if you go back to backdrop, you can go on this
tab just for I guess usage,
you can select a model. Let's use the same
model before and let's copy.
Let's actually go. I think that the data,
let me just, let's do this. Just give 1 second.
Let's go here. I remember I copied this new features
and service launch 2022. And if you
go back to bedrock and you can
see here that I can just say knowledge
base for bedrock allows you to just retrieve the data
or retrieve and generate, I'm going to show you both. So if I just
go and I answer this, how many new features
and service AWS launch in 2022? Remember this is
exactly the same question I asked the model before and he
said he didn't know. So what, what I'm going to do first is
to generate an answer. So this is going to retrieve the piece of
text and then he's going to send the piece of text to cloud instance as
the model. And then finally it's going to generate an answer based on that.
You can see here, it's saying retrieving and generating the response.
And voila. It worked. So over three 3300
new features and service were launched by AWS in 2022. And you can see
that I have the source detail. So if I click to source this
layer, you can see that he actually retrieved from my database
a chunk and the same chunk that I was showing before that has these piece
of data. So what that rock did automatically with a single
API was retrieve the chunk, you know, convert back
to text, add that text as the context
of my question and then send back finally
to my cloud instant model to give the answer that
you can see here what you can also do. So if you clear this,
what we can also do, we can just say generate response.
Sorry, let's disable generate response. I'm just going
to give the answer this question.
Many new services and features.
How many new features and service did AWS
launch in 2022? When I click run
what this does. So you see that I disabled and just said I don't
do not generate response, just do the retrieval. So you can see
that he has, if you go source detail,
it has retrieved multiple
chunks for me and I would expect some of them
see here, the 3301
of the chunks has responded so in this case it returned multiple
chunks. You can decide how many chunks you want to
retrieve here, right? You can see here maximum number of chunks.
And finally what I want to show you everything that I'm doing. The console,
you can actually also run via APIs.
So you can see I have, let me just run this for you.
What you see here, this retrieve and generate is
the API that I'm calling. And here we can
give the same answer, just copy the same answer,
the same question. Apologies. How many new features and service
did a launch in 2022?
And you can see this is just going to call this a specific function,
which is this function here that is calling a bedrock agent
client API call, retrieve and generated. I pass some
information like my knowledge base id, the model id that I
want to use and the session id, and then it's just
actually going to generate the information back for me. So if
I run this, you can see it's running and the
answer is back here. So what I wanted to show is you
don't need to only use the console. Of course there are a lot of
APIs that you can use and you know, we can actually see the
citations, you can see the citations here.
Again, the same citation that I have, it comes
from. So the API and you can see the response comes
with a citation part automatically. And this
is pretty good because what open search
the combination of knowledge base backdrop and
open source serverless is super powerful because it pretty much
removes all the cumbersome and manual actions
that you need to do in order to create an ad events. Very powerful rack
system. So I hopefully you enjoy. Please feel free
to reach out if you have any questions. Have a great conference
and talk to you soon.