Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Welcome to my session. My name is Samuel Baruffi
and today I'm here to present a session called
Mastering Generative AI, harnessing AWS Genai
for your solutions. Let's look at a quick agenda.
What I'm going to be covering on my session, I'm going to start
a presentation talking at a very high level about generative AI
and the big impact in the world and also into
applications being built today. Then I'm going to
talk about at the very infrastructure level, what AWS is
doing with our own chipsets called inferential
and trainium, and also talk about a wide variety
of EC two instances with Nvidia graphic
cards. After that I'm going to start talking about more
on the application services and the platforms that you can build models
and use models. So I'm going to talk about Amazon Sagemaker,
I'm going to talk about Amazon Sagemaker Jumpstart and
then I'm going to spend the majority of the time talking about
Bedrock. Bedrock is our foundational models
as a service is the ability for users and companies
and organizations to call a single API,
choose different models from large language models
to text generations from embedding models, and easily receive
the response and actually build solutions on top of that.
Bedrock is a very exciting service that has a lot of features
baked in and being shipped as we speak.
And I'm going to cover some of those features as well.
Then I'm going to jump into a single slide that talks
about vector databases. I'm going to talk what is a vector database,
why it's important for generative AI solutions. And of course I'm going to be
talking about the solutions within the AWS platform that allows
you to run, create and operate vector databases.
Then to finalize the presentation piece of
my session, I'm going to quickly talk about code
Whisper, which is a very exciting tool for developers that
can actually have a companion helping with
generation of code on many different types of programming
languages and also infrastructure as a code.
And then I've done quite a few talks
with comfort e two in the past. I always like to end the session with
a demo. So I'm going to do a demo potentially using bedrock
and showing you how easy it is to use bedrock, some of the functionality
of bedrock, and hopefully showing you with a simple piece of python
code how easily you can call bedrock using
AWS SDK and receive a response from
a specific large language model that you're going to choose.
So let's just get started.
So what is generative AI? Right and how
are those generative AI powered? The most important thing to
be familiar with is, in order for this
generative AI explosion
and evolution, what is actually powering them behind the scenes
is what we call foundational model. Foundational model.
You can think about an AI model that
has been actually pretrained on vast amounts of instruction
data. Those state of art models that are
going to be talking later on in my presentation are known to be trained
across pretty much the whole Internet, right? Like, if you think about
it, like, all the amount of data
that the Internet has, that is actually all the data that those
models are being trained, then it
contains a large amount of parameters that
makes them capable. By learning all this data, which are the
parameters, they make their own word interpretations,
and they actually can learn very complex concepts by
that. And there is a technology that was
invented, I think, 2017, by a group of researchers
from the University of Toronto that
is called transformers, and that is the architecture using neural networks
that allows generative AI and large language models to
actually generate text. The good thing about those models,
those large language models, is they are generic
and general on their own. So it's
not only trained for a specific use case,
it can actually solve a lot of different use
cases. And I'm going to talk in a moment some of the use cases that
companies are starting to build. And these number of use cases
are pretty much infinite because you can think about those
large language models as a model that can answer any type
of question. And of course, we need to evaluate
performance and how big those
models are and how smart they are.
Not every model is exactly the same as others.
So there is that. And we are going to talk about
how AWS will help you choose
different models for your use case.
Another thing that a lot of companies are doing now is customizing
foundational models for their own data. So let's say you
are a financial company and you really want to have a specific model
super well trainium, to know every single piece
of information that your company has. There are different strategies.
You can use retrieval, argument generation,
or you can use agents to actually retrieve the data,
or you can actually customize foundational models. Customizing foundational models
are also known as fine tuning. That is also a capability that
you retrain the model to recalculate
the weights of those models, to add
your data, that potentially your data was private, it was not
available on the Internet. And now we want to actually have those models
also expert on your own data. We are not
going to talk in depth about those strategies,
but that's something that you just need to be aware that on top of having
foundational models, you can actually customize those to
even be more expert on the data that you as
a person or potentially organization have
control of. So some of the use cases
that we've seen the industry actually building.
So there are kind of three main categories, and this is just a very small
amount of use cases that companies are building. But the first category is
you want to enhance customer experience. So using
chat bots, virtual assistants,
conversational analytics, you can personalize a specific
user interaction, because those models are really good at
behaving and answering like humans, even though they are not
humans, but they answer like a human. It makes really
good to actually enhance customer experience, right.
Especially with chatbots. The other section is you can
boost employee productivity and creativity. So you
can think about conversational search. If an employee or
a customer wants to really retrieve an information from
a wide variety of data set, instead of
just doing a quick, simple search, you can actually have
conversational with those documents. You can do summarization,
you can do content creation, you can do code generation like we are
going to be talking about code Whisper. And the other category is
you can also optimize business processes. So let's say you
want to do a lot of document processing, you want to do data documentation.
Let's say you have a specific form that needs to be filled.
Given a specific data, you can join those two
sets of data and ask the model to potentially feel in
a specific way, right? So there is a lot of use cases
that companies are building or have built already to
enhance customer experience, boost employee productivity,
or optimize business processes. So that is just something you need to keep in
mind. Now, in order to run those
models, those models require a lot of computational,
specifically parallel computational. One good thing about
GPUs, graphic process units are that they are
inherent good at calculations that can be run
in parallel. So the reason why I'm saying that
it's really important for us to understand how AWs
have a wide variety of selections of different instances.
You can choose to run those models. So AWS have,
since 2010, a very good partnership with Nvidia.
Nvidia have been the leader of having very powerful
GPUs. And those GPUs are really good for large language models.
You can see here, there are widely variety of instances that offer
specific instances, specific GPUs from Nvidia.
The most powerful one is the P five EC two instance that
comes with the Nvidia age 100 pencil car.
Those are really big GPUs that can
run very big, large language models, and they have been used
for many companies, specifically on the generative AI
solution. But apart from the good
partnership that AWS have with Nvidia,
AWS is also in the front of innovating our
own chipsets. So we have graviton,
which is a general CPU, but we also have
accelerators called trainium and infra.
So trainium is the GPU accelerator
that is focused on giving you better price performance,
up to 40% better price performance than other comparable
GPUs for training your models. So it's a very
specific car that has been built by AWS
to allow you to have a better performance and price when
you're training machine learning models. And then once you have
those models, you need to run inference on top of that. And AWS
have also created inferential chipsets. We have two different
chipsets, inference one and inferient two, that also give you a better
price performance to actually run inference on top of those
models. I'll highly encourage you to just quickly search
about those. There is a lot of good documentation. Feel free to reach out
to me on LinkedIn as well if you have any questions.
Now, let's move from the infrastructure level to actually
platform and services that AWS offers customers
to actually run those models. Right? But in a day,
organizations are looking, how can they run those models and
use those models for all the use cases we've just discussed?
So, the first platform I want to talk is Sagemaker.
So, Sagemaker is the AWS
machine learning AI platform that encompass
a wide variety of features, from building new
data sets, cleaning new data sets, enriching new data
sets, training different models, different machine learning
models, neural networks, you name it. You can actually
run on Sagemaker once you run those models, you can
actually deploy those machine learning models at scale. You actually manage
the computational for you. You have monitoring, you can actually have evaluation.
You can do a roll of the automatically fine tuning and distributed
training for big models on top of that.
And you can actually, after you've trained,
operating all those models on Sagemaker.
This is a very, in the last six years since Sagemaker
was launched, we have introduced many, many new
innovations like automatically model tuning. So you can deploy
and train the model. And as you train the model, we will find
the right parameters to actually fine tune and
tune the model for the better performance of what you're trying to achieve.
And when I mean model, I'm not only talking about large language models,
I'm talking about any type of machine learning model that you want to
actually build, train, and deploy.
Now, when it comes to generative AI,
Amazon Sagemaker has a very specific feature that really
helps developers to get quickly testing
different larger language models. The name of the feature is called Amazon
Sagemaker Jumpstart. Sagemaker Jumpstart
is a machine learning hub with foundational
models made available that you can literally just click and
deploy. It'll give you the recommended
instance types that those models should run, depending on the size
of those models. So right now there are more than hundreds of
different models that have built in algorithms that have
been pre trained with foundational models. So a lot
of those models are available on hugging face. Some of the Amazon Alexa
models are also available for you to deploy.
The good thing is this is all UI and API based with
a single click of buttons or simple API calls. You can actually
have a machine, an EC two machine
running on Sagemaker actually hosting that model,
and all the things that you need to do manually by running
to actually run that model and do inference on that model will be taken care
of for you. We have a lot of notebooks,
Jupyter notebooks that have examples on how you can actually do
that. And the good thing about Sagemaker Jumpstart, some models
that are available on Sagemaker Jumpstart also have
the capability for fine tuning. So if you want to customize
a model, let's say the Falcon model,
180,000,000,000 parameters you want to customize with your
own data, you can go on sage maker Jumpstart and
you have actually an ease walkthrough
way of fine tuning those models. One thing to note here,
that is going to be very important to differentiate it from other services like
bedrock that I'm going to talk in a moment. Sagemaker Jumpstart
helps you run those models, but where you're
running those models is actually on a EC two instance
that you're paying every single minute or
second that you're running those, even though if you might not be using that,
you are actually paying for that instance. Those models are actually running
on your account, on your ECQ, on your sage maker
that behind the scenes runs ECQs, but you're paying for those.
So it's kind of pay as you go, but for
the whole instance that it requires, right? So you're not paying per tokens,
you're paying for the whole instance. That is something that you just want to watch
out because depending on the models, it can be very expensive.
But let's continue our journey regarding generative
AI. So when we look about
what customers are asking for generative AI is
which model should I use for a specific use case? How can
I move quickly? Most customers are
nodding to the exercise of training large
language models or fine tuning. They really have use cases that
they want to boost customer experience or increase employee
productivity. For example. They just want to reinterate run POCs
very quickly. And most important, how can they keep the
data that they're going to be running through those models? Secure and private,
right. That is a very important thing. You have your data. Your data shouldn't be
used to train new models if you don't want to.
And they should be kept secure encrypt by default.
So with all those three questions being asked by customer,
AWS have introduced last year a service called
Amazon Bedrock, which is the easiest way
to build and scale generative AI applications
with foundational models. And we talked about foundational models. Those are the models
that, large language models that are very big and
they can do a lot of general tasks. What does
Amazon bedrock offers you? First, it offers
you a choice, a democratization of
leading foundational models with a single API. And this is one
of the most amazing things about bedrock. You can
use the same service and the same API,
just choosing a parameter of your API
by choosing what model you want. And the model list has
been growing every single month. And you see in a moment
what are the models and model providers that bedrock currently offers.
But you can expect the model list and those capabilities to grow as
we speak. You can also run retrieval
augmented generation on top of that.
And I'm going to keep that on hold for now because I'm going to be
talking about a feature on bedrock that helps you do that.
You can also have agents that execute multiple steps
tasks by running lambda and calling your own APIs
or outside APIs automatically. And most important,
bedrock is security, private and safe.
Every data that you put to bedrock is not going
to be used to train your models. It's encrypted by default
and nobody else has access. This is really important to
keep in mind. You can also have vpc endpoints from bedrock
so the data never leaves your VPC. It goes through your VPC to a
vpc endpoint to bedrock where it hosts the service.
One important thing to note about bedrock, different than Sagemaker
Jumpstart, you pay AWS, you go. There are different pricing
modes on bedrock, but you start with which is the most.
I guess the way we start with bedrock, it's called on demand.
So depending on the large language model, the foundational model
that you pick, you're going to have a price per input
token and output token. When you're talking about text
you have a different pricing mechanisms for image generation.
But for now let's just keep it simple. You're going to pay for that,
right? So it's just the traditional AWS
cloud approach of pay as you go. And that becomes very
promising because instead of paying for big instances
to run those large language models for you, you can experiment,
iterate and create new products very easily by
still keeping your application in your solutions very
cost conscious. So now let's just
quickly talk about some of the what are
the model providers that are available on bedrock?
So the way bedrock is architected is
you have model providers. So those are companies that
have trained foundational models. And each model
provider, we have a different foundational models
available on that. Right. So here you can see
a list of seven different model providers that you can
pick from on Amazon Bedrock AWS,
the date of this presentation. So today is March
eigth 2024. As I'm recording this session,
we currently have seven model providers available for you
on Bedrock. So you have AI 21 that has the Jurassic two
models available for you. Then you
have entropic. And I'm going to talk about entropic in a moment. But they are
state of the art models with a very big performance.
Then you have cohere. With cohere you have both text
large language models and embedding models as well. So if
you want to create embeddings for your vector database, Cohere also offers you
with very performance embedding models. And it
was just introduced I think a couple of weeks ago, Mistral AI.
So you have two different models with Mistral AI. The Mistro
seven D and the mixture mix of
exports which is eight models that are put together
into a single API. Very good performance. Then you also of
course have meta with Leomachu which is an open
science model. Then you have stability
AI. So stability AI is one of the leaders
research labs for image generation. So the stable
diffusion XL 1.0 is a model that allows you to
generate images. So instead of just generating text,
it actually generates image. You input a text,
a cat walking in the park. It will actually generate an image
for you with a cat walking in the park. Then you also
have our own models from Amazon. Those are
called Titan. So Titan models offers you a text to
text model, traditional large language models. It also offers embedding
models and it also offers image generation models.
Right? So it's a set of models available for
you. The very important thing to keep in mind with
bedrock is we are democratizing the ability for people
to consume different models for a specific use case.
And you can go right now on Bedrock webpage
and click on the pricing and you see the different pricings for each model.
Depending on the performance, the size of those models, you might be paying a
specific price. And that is really important because you can now decide how
you want to build your applications. Remember,
it's the single API call and you do not need to manage
any infrastructure. That is something I want to highlight as well. All the infrastructure
and GPUs to actually run those models, which is very complex
and it takes a lot of capacity, is taken care by AWS for
you, and that is something very beautiful that
you should be using and taking benefit of.
Now, I really want to talk about the partnership that we have with entropic.
So entropic is a longtime AWS customer, and entropic
is one of the top leading research
AI companies in the world. And I'm
going to talk about some of the models that they have published
in the last years. But AWS,
Amazon have invested heavily. I think we've announced
a $4 billion investment last year. So entropic
has a very good partnership with Amazon. And the way we showed the
results of that partnership is, well, first of all, let's talk about
the story about entropic very quickly, right? So if
you look here at the timeline, we are in a very fast paced
environment. In 2019,
GPT-2 was launched from OpenAI.
Then some researchers have
published some papers about the performance of transformers.
And GPT-3 was launched sometime in 2020
with Codex as well. Right?
Most of the people that have founded entropic were
employees from OpenAI. So they left on OpenAI
in 2021 and they found Entropic.
You can see how quickly they went from founding
a new research lab and actually publishing very good and
making available very good models. So they published some papers
in 2021. Then in 2022
they finished training clot, right? And they have
something called Constitution AI. I would highly recommend you to search is
their whole way how they take very important
care on safety and alignment for those models.
And then in 2023, they have released the first cloud one model.
Then they have released one of the first companies to release 100,000.
Context window. Context window just means how much text
you can put per request. Then after that,
in 2023, they have released cloud two, which was a big improvement
from cloud one. Then a couple of months after they've
released cloud 2.1 with more improvements and performance.
Then this year, actually last week or this week?
To be honest, Monday this week. They have,
I would say, shocked the industry with very performant
and set of models of three different models
called cloud tree. And I'm going to talk about those. So cloud tree
comes with three different models. The first one is cloudtree haiku.
And you can see these on this graph is cloudtree haiku.
You can see the intelligence is a very performance
model, but most important is a very low cost and
very fast inference model.
Then they have also released cloud tree Sonnet,
which is their mid tier model, which is very, very intelligent.
It beats all the previous quad models in terms
of benchmarks, and it's in the middle when it comes to
cost. As a matter of fact, claw three sonnet is much more
performance than any previous quad models, but is actually
cheaper than cloud two and cloud 2.1 models.
And then of course, client tree, opposite is the most intelligent model
and has actually beat state of the art models
on most benchmarks. And you can see this data just
search cloud tree report paper, you see all those benchmarks
that are available. So now how
this entropic incredible performance
and innovation we call three model impacts
bedrock? Well, because the relationship that Amazon has with entropic
claw three models are already available on bedrock
as we speak right now. Claw three sonnet, which is their
mid tier model, very big performance with
very good price. All these models are multimodal.
Let me just say that multimodal means you can input text,
but also you can input images. Previous quad models could
only receive text as inputs. Those models
can actually receive image as input.
So you can put an image and you can ask questions about that image.
You can actually put multiple images per input. And all those three
models actually have that capability. And all those
models have now an even bigger context window.
Not only they have bigger context window, but the claim
is that with this bigger context window, doesn't matter
where you put the text on those context window,
the performance remains very similar and very good,
which was not actually true in previous models and actually not in
the industry as well. Claw three oppos and cloud
three haiku are going to be made available on bedrock very
soon. They are currently not available, but they're going to be made very
soon. Now that I talked about it,
let's just talk about some of the functionality that bedrock allows
you to use. So first,
you can actually use those foundational models as it is.
But if you really want to fine tune and customize
those models with your data, because you really believe you've
tried rag, you've tried prompt engineering and you're
not achieving the performance your use case require.
In my opinion this should be the last resort. But if you need to privately
customize models, you can actually use right now,
bedrock supports to automatically customize those
models. You put your data on s three in a private s three bucket.
You connect that s three bucket to bedrock and bedrock will automatically
fine tune and customize those models for you. Currently you can customize
models with Titan, Cohere and Lemachu.
Very soon we are going to open the ability to also customize
cloud models and potentially other models from other model providers.
So the good thing about this functionality from bedrock, if you have
done some fine tuning customization from auto in the past, it actually
requires a lot of science and it can be very complex. Bedrock completely
removes that. You just put your data on s three,
you go on bedrock and you point the data from bedrock on
s three and you choose the model. And behind the scenes bedrock, you just customize
new models and you notify when your specific model has
been trained. No one else will be able to use this model.
None of the data that you have actually provided from bedrock on
s three will be used by anyone else or to train other models.
It's just your model. And you can then consume that
model and run inference on that model by
making an API call. The same way you call API from
Bedrock, you can call Bedrock API to
use your own customizable model.
Now another very good thing, if you don't need to
customize your model is actually running retrieval augmented generation.
Retrieval augmented generation, for folks that are not familiar is just
the idea that, let's say you have a big data
set of documents and those documents talk about the
way your company operated and you want to have a chat bot
that actually answer questions about those documentations,
right? Well, those documents are likely private.
So the foundational model, that model providers made it
available on Bedrock, they don't know about your company operational
procedures. But when you're creating a chatbot, you actually want to make that
available for the
model itself to actually consume. So what you can do,
you can use what we call vector databases.
And I'm going to talk in a moment on what vector databases are
made available on AWS. But Bedrock has a feature
called knowledge base that makes all this process of running retrieval
augmented generation very simple. The way it works is
you go on bedrock, you first create an
S three bucket and you put all your documents on this s three bucket.
It's your s three bucket. Nobody has access.
Then you go on bedrock you choose
which model you actually want to run embeddings.
So you can choose between Titan for now, Titan and
cohere embeddings are just going through those documents,
converting those texts into vector numerical,
vector vector representations. And then finally you choose a vector
database. And right now you have a variety of databases
that you can select from. I think there are four options right now
that you can select and those numbers are going to be increasing in the future.
But you can select, for example, the open search serverless vector database.
Then automatically bedrock will run the embeddings
on the data that is on s three will store the vectors on your open
search vector database. And finally, which is,
let's say you want to run this chatbot. When you ask
a question, let's say you ask a question, what is the
HR policy for vacation
in New York as an example, right? What bedrock
can do, it can then retrieve your vector database by running what
we call semantic search. It can find the specific
chunks of text that are very likely to respond my question.
You copy those chunks of text into bedrock and
then you run your question, plus the combination
of chunks of text that has been retrieved from the
database, from the vector database, and you send that to your
foundational model. Then the foundational model, let's say cloud three,
will see all the chunks of text that talks about vacation policy,
New York. And you see your question. And then based on the information that you
have provided, because now the model has access to
the chunks of data that has the answer, will be able to provide an answer
for you. That is what is called retrieval augmented generation.
And you can actually run very simple with
knowledge bases. So that is one capability that
you can run. It's all managed for you and you can choose different
models to actually run. Another functionality is the
ability to enable generative AI applications to execute
steps outside your model. So let's say you have an
API where if
someone on your chat bot asks the question about what is the current
price of this stock, right? The model is not going to be able to answer
that question. Or probably if he answered that question, it's going to hallucinate,
meaning it's not going to be accurate, right? Because the training data
from that model was probably months ago or years ago. What you
can do on bedrock, you can use agents for bedrock. What allows you to
do is to provide. So you select a model, let's say cloud model,
you provide the basic of set instructions,
then you choose different data sources, maybe different APIs,
and then you specify the actions that it can take.
So the example I provided, right, you can say if someone asks you
about the pricing of a stock,
you need to call this API. Here is the open
API spec of my API and this is how you can call the API.
So what agents for bedrock do you ask? A question
for your model. Your model realizes it needs to actually
make a action, take an action on that request.
Behind the scenes, what bedrock will do, we will actually call Lambda,
which is a serverless compute platform
with lambda. The model will actually trigger a lambda.
The lambda code will already be prebuilt. Behind the scenes you call
the API that you have told bedrock to do
and then that API will come with a response, let's say the
value of your stock. And then you return to the larger language model to provide
you with the response. This is just one example, but what you can do with
bedrock, you can break down and orchestrate tasks.
You can invoke whatever API on your behalf so
you can do a lot of automation. And the capabilities
here are really infinite. It's just you configuring those
agents properly so you can do a lot of chain of thought as well
on top of that. So moving on, on the ability that,
what are the ability that we have for making
the responses very secure and safe? On bedrock
we have a functionality that is currently in preview, but it's called guardrails.
What guardrails allows you to do is to create consistently
safeguards, including on your models. Doesn't matter
if they are fine tuned or agents what it does,
you can create filters for harmful content both on
the input that you're sending to bedrock and also the output that bedrock
will tell you, right? So I'll give you an example,
right, let's say the example you
see here on the screen. Let's say someone asks you about investment
and device on your chat bot and you don't want to have that
input and actually output to be sent to the customer.
Right? So what you can do, you can create those filters and you can
say these topics deny and then this
is the response you should be giving back if someone is trying to
ask questions about investment advice. So you don't get into legal
complaints or problems that you might get into the future,
right? So this is one of the capabilities that is available
on bedrock and it's called guardrails. Another functionality
that I'm going to talk about it is batch.
So everything I've talked about so far is you just run an
API call and you receive a response. Pretty much
synchronous, right? API call goes in, API call
comes back. There are some use cases that don't require
live interaction, but you want to run a lot of
inference for a lot of documents in a batch mode.
So what Bedrock can do, its batch
mode allows you to efficiently run inference on the large volumes
of data. So you can put the data on s three, you can put different
json files with the prompt and the data you want to run behind the scenes.
Bedrock, you grab those files, you run the inference, you save the
results of those inference in another s three as the result,
and it's completely managed for you. And once
the batch is completed, you can get notified and you can do a lot
of different automation. So you don't need to write any code for
handling failures or restarts. Bedrock would
take care of that for you. And you can run that with base foundational
models or your custom trainium models
as well. One last nice feature
about Bedrock is model evaluation. It's still in preview,
but model evaluation is a really good feature of Bedrock.
As you saw, Bedrock offers you a wide variety of
model providers and models available. From those models providers,
it can be really complex to evaluate those
foundational models and select the best one. So what model evaluation on
bedrock allows you to do is to choose different
tests. And those evaluation tests can be either automatic
benchmarks that the industry use and bedrock
makes available, but you can also create your own human evaluation.
You can have actually humans evaluating the response from different models
and rating those models without actually knowing which model it is.
So there is no bias into the place. And you can have
your own data sets of questions and you can create your
own custom metrics or use
the metrics that comes with it. So some of the metrics that are there
are the accuracy, are the toxicity and the robustness
of the response. And you can see here a screenshot of a
human evaluation report across different models being tested
and automatic evaluation report. So I've
talked a lot about the different features
that bedrock makes available for you. But one of
the things that is important to highlight is right now,
thousands and thousands of customers are using bedrock
because the capability, the democratization,
the flexibility and the feature set that bedrock allows
them to build generative AI on top of pretty much every single
industry, right? So you can see big names like Adidas,
you can see names like the BMW group,
Salesforce and many, many others so highly encourage you to
test bedrock because it's a really cool feature. Two more
things before we go to the demo is we talked about the
retrieval, augmented generation and the need for vector databases.
And I just want to quickly tell you the story about vector databases on
AWS. AWS has a wide variety of
different databases that support vectors. As you can see here,
we have six databases that are now
supporting vector databases and depending on the use case,
you might choose one versus the other. The important thing here
is to understand that AWS is giving
the flexibility to pick and choose from the
database that makes the most sense for you. So a
very popular database on AWS for vector is opensearch.
So OpenSearch has a functionality for vectors
and you can actually even run OpenSearch serverless for Vector
database that have a very good performance and price. But you can see here documentDB
now has support for vector. MemoryDB for
redis has also support for vector RDS for postgres.
So if you're running a SQL database and you have a relational
use case and you also want to run specific vectors,
you can run pgvector which is a plugin library for
postgres that can run also on top of both RDs
and Aurora postgres. If you're doing graph databases,
you can actually run on top of Netun. And as I talked about it
right now, the direct integration for knowledge databases on bedrock supports
open search, redis, enterprise, cloud and Pinecone.
But very soon Aurora, Amazon, Aurora and MongoDB are
going to be made available as well. So that is
about vector database. The last thing I want to talk about it is the
capability for code generation and code assistant for
developers. So AWS has a service called code Whisper
which is AI powered code suggestion, as you see here
on the small video that has actually let me play
it again, the small video that is
demonstrating here, in this case it's a JavaScript code.
You can provide a single comment, in this case, parse the CSV
string and return a list of songs with positional or position
original chart date, artist title and ignore lines.
We're starting with hashtag, right?
Then you just click tab and it automatically returns the code generation.
This is pretty cool. And the way it works is you just have your
ID and there is support for a variety of vs
code, jetbrains, cloud, nine, lambda,
Jupyter notebooks. There are supports for pretty much all the popular
IDs out there. Install the plugin from AWS
that has code whisper support, then you can receive code suggestions,
and code whisper can actually do more than that. You receive real
time code suggestions for a variety of programming languages like
Java, JavaScript, go.
Net, and many, many others, but not also programming
languages. If you're building infrastructure, AWS, a code terraform or cloud formation,
you can also have suggestions for those. On top of being
an assistant for developers and improving productivity quite significantly,
you can also have a security scam. So the code that is being suggested for
you can actually give you security suggestions
to make sure you're writing actually secure code. And you can also have reference
tracker for different licenses on open source on
the data that has been trained. So if whatever suggestion is being given
to you has been trained from an open
source repository that has a potentially prohibitive
license, you can actually have that warning telling
you. And if you are an enterprise version of code whisper, you could say
developers should never receive recommendation for code that has been
generated on this specific license that is prohibitive
for my business use case one of the great things about
code Whisper is code whisper for individuals are
free. We are only one of the only companies that have an
enterprise grade product that if you're using
for an individual user like not
a company, you can install codebisper created
an AWS builder account. You don't even need to have an AWS
account, you just need to have a login with build
Id we call builder ID. You can use it for free.
Some features are only for enterprise, but most features and most
important feature which is code suggestions are actually available
for free. So I highly encourage everyone to take a look on this and
then hopefully this was a good overview of the
offerings on AWS for generative AI, most important for
bedrock. So I'll pause here and I'll come back
sharing my screen to actually do a presentation and a demo on
how can you utilize some of those functionalities in
the real world? Actually clicking buttons and making API calls
and writing some code. Awesome. So let's quickly
jump into the demo. Very simple. I have logged
in into my AWS account. I can search here the bedrock
service. I'll go and I'll jump inside the bedrock
service. Right now bedrock
is available in a few AWS regions. In this example we are
using North Virginia US east one region. If I click here
on my left side you can see the menu, you can
see some examples how to get start. You can just open those on
playground here. If you click on the provider you can see the
providers that I just actually showed to you on the
presentation. You can see some of the base models.
So each provider, for example entropic here have
the cloud models. You can see all the different cloud models that are currently available.
So I have for example cloud three sonnet, which is the median model that have
just got released this week. Have cloud 2.1, cloud two
and cloud 1.2 instant.
In this case, I don't have any custom model,
but if I had trainium before a custom model,
I would see the list of training models here. If I wanted to customize a
new model, could just go here, create a new fine tune job,
or continue fine tune job. But the thing I want
to show you, you can actually get started and play around and test some of
the models by just going to the playground. So if you look here, the playground,
you have the chat option. And what I really like about the
chat option, I'll just give you first example how you can actually talk to a
model. Let's say you want to talk to Claude and you want to talk to
the new Claude tree model, which is one of the most performance
in the industry. So let's just say, write me a poem
about AWS and its ecosystem.
Just a simple poem here.
So I can put the entry here. Because this is a multi modality
model, I can also put an image, I'll do a demo of an image
in a moment. You can see all the configurations of the hyperparameters,
like temperature, top P, top K. The length
of the output can be controlled here. In this case I'm just
keeping for 2000 tokens maximum.
You can see this on demand. If I click run,
it's actually calling the model and then it's actually generating,
in this case generating a poem for AWS and its ecosystem.
Right. You can see that it's pretty cool. One thing that I really like
about the playground are the following, as it's finishing
generating here, if we click on the three dots on the top menu,
you can export it as JSON and you can see streaming preference because
you are streaming. But the other thing that I like, you can
go down and you can see some model metrics. So you can see this actually
took 15,000
milliseconds. You tell me how many input tokens, how many
output tokens, and this is the cost, it's 0.0.
Because it's less than 0.0, there is like be a zero point
something that this will cost. Right. What I really like about
it on the chat AWS, well, we can compare models. So let's say I want
to compare claw three versus claw 2.1,
right? And I'm going to talk about it here. Let's see.
Talk about the
word economy in the 99
year, right? And I can go and I can run. So it's
going to run both models at the same time and I will
be able to compare the performance of both models, this is just like
by reading them. So let's just wait a little bit.
So you can see here he has outputted. So cloud 2.1
has outputted. I can see here, compare the response,
but I can see down below here how many tokens each one of
them had and so forth. Now you also have
a text playground instead of a chat. It's just like you
send one request and you get the response. What I like about this,
you go here and you select the model. Let me show you what I like
about it. And let's say write a small poem about
New York City, right? Let's just run this.
What I really like about this. So it's streaming back. But the best
thing about it, if I click on the three dots, I can actually see the
API request. And this is actually how I
would actually call this model through API. Right? In this
case it's using AWS CLI, but you can see the message here and
all the formats get properly configured for me.
And in a moment I'll show you some python code on how you can actually
do that. Few more things I want to quickly show.
If you want to generate images, you can actually generate images with
stable diffusion stability, AI and Amazon Titan. So I can say
create an image of a cat in the moon.
Let's just ask this for the
model. Let's see what actually outputs for me.
And then you could do whatever you want with that
image, right? So you can see it's cat with the moon. There's very simple
image. We can say create picture
of a cat that is super realistic.
Let's see if it does more like instead of you saw, there was
more like a paint with the moon in the background.
Let's see if this does. And this is what I'm doing here is just prompt
engineering. I'm not using anything specific. There you go. You can see an image
here that is a more realistic cat image,
right? Remember I talked about some of the other features
like guardrails. You can see the guardrails here, you can create the
guardrails, you can create the knowledge base, you can
create all the agents. Those are the things you can do. One of the things
I want to highlight, if you were to start using Bedrock for the first time,
the first thing I would recommend you doing is actually going model access
and enabling those models to have access on your account.
You don't pay anything for just enabling them, but if you
don't enable them, you can't use, and it's as simple as just going on manage
model access, selecting the models you want. In my account you can see
I have access to all those models, right? So it's pretty straightforward.
But now let's jump in into some code, right?
Likely most people that are here watching
my session are probably developers or people that do code.
So how can I actually call those models on bedrock using
a programmatic way. So the first example here I'll show you is just calling
cloud, right? So you can see I import bodo tree which is the
AWS ssdk for python. And then I
instantiated the bedrock runtime from the SDK. You can
see the bedrock runtime in the region. Here is the payload.
So I'm providing the model version.
So this is quadrisoned then the body.
Each model have a specific body and format that the model
providers have configured. And you can see that in the documentation.
I can show you the link in a moment. But once you have actually this,
you create a model. In this case you create a prompt. In this case just
saying write a text about going to the moon
and its technical challenges. Right? Then I create
that payload into JSON and then finally call the single API that
I've talked to you about, bedrock. So we always call bedrock
invoke model. And this is not using streaming. I'll show any
streaming version in a moment. And then I wait for the response.
I parse the response into JSON and then I print a response
where exactly the text from the
response is. So if I go on and run pythoncloud
py, I'm going to call behind the scenes that's actually
calling bedrock, sending the payload that I request to claw
tree. Running the prompt. Remember my prompt is talked about the
technical challenges about going to the moon. I write a text about
that. So once he actually returns the text from
bedrock then I will be able to just see the text. And there
you go. You see it wasn't streaming. So it's a pretty long
text saying embarking on a journey to the moon. Present multitude of
technical challenges. Not going to evaluate this but you get the
gist, right. So this is example one. The second example is calling
the same bedrock API but for a different model.
So you can see here I'm also invocating a bedrock.
And then I'm just saying can you write me a poem about apples?
Right? So let's just call this python
three titan Ui. So now this is calling the Amazon
Titan text model. And you can see it was very simple poem
about apple. Now there might be applications that
you're trying to build that require streaming. Like a chatbot, you don't want to make
the user waiting for, I don't know, like a minute to get
a response back. Sometimes those models take a while to finalize
the whole text completion so you can do streaming. So in this case,
very similar to what I've done before. This is a demonstration of
using clotry sonnet, but with a streaming right.
So I'm not using multimodality yet. This is just text. So I have
an input text and you see this just creating
the input payload. Then you can see here the API
that I call is just a little bit different. It is the invoke model with
response streaming. So what bedrock does,
as soon as you start receiving some chunks of text from the model,
we actually output back for you. And then here it's just like,
as you get the response, just display that for me, just do a print
on the console for me. And then on my main here, I'm finally
providing some model IDs and providing the prompt. In this case,
what can you tell me about the, what can you tell me about
brazilian economy? And then I'm
just starting the border tree with bedrock and then calling the
function that I created above. So if you go here,
clean the screen and you go cloud streaming,
and we try to run, oh,
sorry, python three, apologies for that. Python three cloud
streaming. So it's invoking my row and you can see now it's streaming the
response back and it's actually getting the response about the brazilian
economy. And you can see here it finalized.
So I even predicting, like, okay, why it stopped
because it was end of the turn. It finalized the response and also how
many output tokens you got. So that is just one example. The other example
is I want to use clotrisonet because it's a multimodal
large language model that also accepts
images as inputs. So what I want to do, I have this image
of a very cute cat. I want to provide this image to the model.
And you see here what I'm doing. Very similar again,
but now what I'm actually doing, I'm receiving the cat image.
I'm encoding that on base 64. Then I'm providing
on my messages for clot tree as a content. You can see I'm
providing now an image and a text. So the first I'm providing the image
AWS base 64 mastering. And then I'm saying as the
prompt, write me a detailed description of this photo and
then upon talking about it. So that's the request. And if you see down
below here, I'm invoking the model. So this is not using
streaming. Remember the example before was using streaming. In this case it's
not using streaming. So it's going to send everything, it's going to process the
response. Once the response is finalized it's actually going to show me and
it's going to just print the result. Let's just quickly do
this prod multimodality again.
I keep using Python two instead of Python three.
Apologies for that. Now I'm going to run Python three.
Hopefully this is going to start printing
the whole description. And you can see here the image show
a close up portrait of a cat with striking green eyes and a sweet
brownish gray fur coat. The cat says face a
slightly stern, yet alert and yet alert and
attentive expression. So it talks about the cat in the image. It's very accurate.
And then like I said, he writes a poem, emerald depth gaze.
So king blah blah blah blah blah, he talks about it. So you can see
bedrock is amazing because with very simple API
calls, I can call different models with different configurations with different
I parameters. And this is pay as you go. All what I've done
here is probably less than a penny because it's all
on demand. I'm not paying for any provisioned capacity because I don't need
in this example. Last thing is I'll recommend if you want
to look for some of the code that I've used. I based myself
on this GitHub public repository called Amazon
bedrock samples. You can go here introduction to bedrock and
you can see some examples. For example cloud tree. You can see the
example for cloud know with the image. This is the one that I've used
so highly recommend you get in there. And last, if you really want to
look in more in detail into each model and the hyperparameters on
how you call, you can go on the AWS bedrock documentation.
Within the foundational model submenu we have the model inference
parameters and when you click on specific models,
for example cloud, you can see the different cloud completion and
cloud messages API. On the messages you can see here, you can see
some code examples. So you have a very descriptive documentation for
you as a developer to actually take a look and deep dive.
So that is all I had to show for today. Hopefully it was very
useful. Feel free to connect with me via LinkedIn on Twitter if you have any
questions. And happy coding, happy genai applications
and I hope you find this useful. Thank you so much.