Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to today's session on unleash the power of
Genei. Generate growth and innovation with data.
My name is Akanksha Sharon. I am principal data lead
at AWS for UK public sector.
Now generative AI has taken the world by storm.
I'm sure all of you have heard about applications
like Chat GPT and it just shows how powerful
the latest machine learning modules have become.
The true power of generative AI goes beyond a search
engine or a chatbot or, you know,
chat GPT. It will essentially, you know,
transform how companies or organization operate or
will operate in future. Just to share some
perspective here, Gold Goldman Sachs
forecasted a 7 trillion increase in global
GDP. They also predict that JNAi
has lifted the productivity growth by
one and a half percentage over ten periods of time. This is
just a small glimpse of potential of Genai.
Now, I've worked with lot of customers, enterprise customers,
public sector customers, and I think it is safe to say
that everyone acknowledges the power of Genai and
they are comfortable in thinking big with Genai
or they are actually making plans how Genai can be utilized
in their organization. But what I find is nearly
everyone I speak with focuses on foundation
models and LLM more broadly. So the
iceberg that you see, the tip of the iceberg is the
Genai application, right? There is more
under the glacier, there is more that
meets the eye at the first glance, right? And this is like my favorite slide.
So what enables you to drive the value of
Genai? Now, Genai applications
are still applications, and like any other application,
you need a database underneath it. So you eventually
need an operational database to support your user experience
and your Genai applications. So if
you see on the slides on the right, you need a storage
layer, you need a database layer that will have purpose built
databases like document, DB, graphDB,
vector enabled databases, right? And then there
is data integrations. You need the source of
your data, you know, whether the data is going to come in batches
or via streamlining or, you know, you set up
your pipelines to keep up with data change.
Finally, you also have to consider governance,
you know, process to ensure data quality, privacy,
security, right? So while it is very tempting to
think about generative AI at a surface level or as a
tip of the iceberg, really the data you
need to nail down effectively
and use modern data architecture,
right? And data is, I could say
a foundation module of building your Jennai application.
And in future slides, you know, in upcoming slides we're going to
talk more about, you know, how data is more important. Right?
Next slide. Yeah. Another data point
that we have is from a McKinsey report. And,
you know, there's a link to that report that you can see.
Companies that have not yet found ways
to effectively harmonize and provide
access to their data, unable to
fine tune their generative AI and which
will eventually, you know, they won't be able to use
Genai for their customers or will not be able to unlock full
potential of Genai to do this. This requires
a very clear data and infrastructure
strategy. Now, why does data matter so
much? I've been talking about data in like few slides now, right?
So let's see why data matters a lot.
Now, when you want to build your Genai
application, there are unique
to your business needs and for, and unique for your customer base,
right? Your data is your differentiator. You know,
as the name suggested, the data is your key differentiator.
And let me give you more thoughts on this right now. When you think
about it, every company has access to some foundation
models, right? Some of them are easily available in the marketplace.
Some of them are easily available in GitHub that you can download
and use it, right? But the companies will
be successful if they build a Jenny eye application with
real business data or business value data
that will help them to build a
amazing Genai application which caters their
customer history, their needs, their utilization pattern
and whatnot. Right? So data is the difference between
the generic genai application and
those that know your business and your customer deeply.
And I've seen it with many customers where, you know, they have taken on the
shelf foundation modules and not really taken care
about data. And eventually they don't see much benefit of those applications
now, right? Whereas I've seen organizations who
work backwards from their data build the
Jennai application, or even if they take off the shelf genei
application, they actually embed it with their own business data.
And that really helps them to make it, you know,
serve their customer in a better way. Right? Now,
using data for Jnai actually doesn't mean that you have to
go and build your own model, right? So it doesn't
mean that, as I said right. Now, while some companies will build,
and there are a couple of type of, you know, company, so there could be
one organization that will build and train their own
large language modules with vast amount of data,
and many will use their organizational data to fine
tune their foundation models for their unique business
use case or their unique needs,
right? But underlying all of this, the key
message is the data is your differentiator.
Now, you know, in a recent surveys with CDO's,
we found that 93% of CDO's
said that the importance of the data strategy
and its role in making generative AI custom to
their business is one of the most important thing that they
can do, right. And on the right hand side,
37% of CDO's agreed that lack of
the right data foundation or a data strategy was one
of the top challenges to implement generative AI.
So now data foundation matters for generative AI because
the access to high quality data about your organization
and your customer improves the accuracy
and reliability of these GenAI modules and their
responses as well. Now this is another example.
Now, if you I would like to share an example of an online travel agency
here, and they want to generate personalized travel
itinerary. So when you want to do this personalized itinerary,
what you would like to use as an organization is your customer profile data
in your databases. And based on this data,
you would like to tailor the recommendation based on
things like past trips history,
travel preferences, hotel preferences, preferences of
family members, age of the family members and things like that,
right? So what you will do is you will marry that
data with the other company details like flight
details, hotel inventory, promotions and things like that.
So if you look at this, you know, there are a couple of data points
that you're using. There are two kind of different data sets that you're using.
So again, it is very important, where is this data set residing
and how easily can you access it? The more easily
you can access it and secure it, the more easy your
response is going to generate. The personalized travel
itinerary. Now there's another example here we
have. Now, you know, I've been talking about this powerful
capabilities of Genei to create content,
right? But to make this content, you know,
relevant to your organization, you would definitely like to
customize it and customize it with things like your
own brand logo, your own brand guidebook.
What were your previous ad, you know, content from your data
lake as well as, you know, company data, like real
time inventory of your transactional database and so forth,
right? So you eventually are going to use the jenny,
but you're under using the data from all your different
traditional or transactional databases as well.
Now, to get the high quality data for JNAi,
you need a strong data foundation,
right? In fact, like, I'm sure many of you who are listening
to me would have already spoken or would have
had a data strategy. You know, in your
organization, that's a different thing. Whether you're working towards it
or you're running into some issues with it, but we can definitely help you
with that process. Right. But Jenny, I make this
data foundation even more critical than ever because your
data is your differentiator. Right? I've met so many organization
that were not really to adopt cloud,
they were not thinking about data strategy. But now with JNai,
it is becoming more and more critical for them to
put this as a priority, right? So your data has
to be up to date, complete,
accurate, discoverable and available.
Right? So that is like your key things for your data
strategy. Now, these are a couple of modules
for JNAi that we have. Obviously we have purpose build LLM,
then we have fine tuning of LLM, and then we have Rag,
right? And for the purpose of this presentation, I'll pick up
the rag use case and work towards it.
Now, with Rag, the external data used to augment
your prompts can come from multiple data sources.
It could include documents, different repositories,
databases, APIs. Right. And Reg
helps the module to adjust its output with data retrieved
as and when needed, so that you know, it can prompt you
with right information. So this is just a quick overview of
what Rag is. And this is a very high
level reference architecture of, for rag.
Right. Now you'll notice two sides to
the story. On the left hand side you have processes that
occurs in the end user critical path. That is,
the end user interacts with application and is
waiting for a response. And on the right hand side are
the processes that happens behind the scene,
right, like ingestion from data sources,
batch and stream processing, data integration
with pipelines. So you need this for
populating your vector databases and various enterprise
databases or data warehouses.
Now notice the data governance and data warehouse
and vector data store. These are very critical, right.
And what I am seeing and what most and more customers are doing
is they are modernizing their entire infrastructure
by moving them to the cloud. And this includes
relational databases, non relational databases,
right. And let's talk more about the vector
data store that is there in the screen. Right.
Before we go there, let's look at this critical
path for the end user here. Right now.
This is again a set of use case, set of scenario
that we have. I don't have animation right now,
but I'll go by the numbers on the screen here. So,
yeah, the first one is, you know, the end user interacts
with JNAi application and typically by posing
a question. And this is just to give you a example of what
happens underneath, right? So an end
user interacts with the Jenny application is number one. The second is
the application loads the relevant prompt template and,
you know, you can create your own templates based on different rules
that you have and things like that. Then there is
a number three. Is that the question posed by the user?
Right. It could be a new question or it could be an ongoing conversation.
Right. So anyways, in that case, you know, what we have to do is
we have to look into the history data
store to allow the user to pick up where they left off.
Let's say this is in between the conversation. And, you know,
this is a very good example when you go online and you go for
like chat option or you go for online help
option, right. This is a critical workflow. And for that.
So anyways, application needs to pick up where the customer lost the application.
We need to pull that state into the right context. Right.
What was the context we were asking that question?
Number four is the application need to query for profile
or any other situational data, right. And this typically would
come out of like a data store. For example, if you're returning something,
right. It would go back to your historical data store and say,
when did you purchase this? And details of the order and all of that,
right. Number five is it tokenizes the original question,
so, you know, to get a set of embeddings from the LLM.
And number six, what happens is with those questions embedding,
it performs a similar search in the vector data store.
This is using some form of algorithm
which basically tells you the nearest neighbor search
for the algorithm, right. And it searches,
you know, that along with some context. Right. So it basically creates
its algorithm to search it. Number seven is once
all that data is synthesized into a prompt, it is
then sent to LLM to get a response,
right? And number eight is, you know,
it updates the conversation state and history according to the new
interaction. And number nine is finally is the
response that you see on the screen, like so, you know,
if you really dive deep, you know, I've used data stores and,
you know, retrieving the historical information and all of that.
Right. You know, if you dive deep, you know,
different layers and see what data services you should all
consider for these architectural prompts. Right? And one of
this is vector databases. So let me go to the next
slide and talk more about what is a vector data
store? Right, perfect. Now, vector embeddings,
basically it represents word and phrases
and entities as numerical vectors in a multi
dimensional space. Now, in this example that you see,
the words or items with similar meanings are
mapped closer to each other in this space.
This kind of representation or this kind of semantic
relationship actually enables genai to
understand similarities and relationship between words
and entities, right? So for example, like, if you
say sandals, high heels,
color, comfort, fit,
you know, all of these are similar things and this will help them to kind
of do that. Next slide.
Okay, now, vector embeddings are essentially
numerical representation of your audio
or video data, right?
While humans can understand the meaning of all these words,
right? But machine cannot and machine will
only understand numbers. So do
that. To make them understand that, we have to translate them into format that
is suitable for machine learning or for the jenny I application.
And this is essentially what is called vector embeddings.
This is a very good example of vector embedding.
Now let's assume by assigning numbers to different words,
you know, we can view vectors in a multi dimensional space, as I said.
Right? And then you can measure the distance between them.
For instance, you know, if you look at this graph, cat is
closer to kitten, whereas dog is closer to puppy.
So now by comparing these embeddings in this way,
the module, the Jnai module, will produce more relevant
and contextual responses for the question
that was asked or for a matching word, right?
So this is how, you know, whole vector,
you know, assignment works. Basically.
Another example, just to give you more insight, you know,
this is called a superpower semantic search for
use cases, you know, like rich media search or
for product recommendations. So when you go on the websites and you get some
product recommendations based on your previous purchase history
or what have you typed in or, you know, what have you seen
in that specific portal and things like that.
Right? Now, in this scenario and in this screenshot that you see on the screen,
you can see that semantic search greatly enhances the
accuracy of the output of the query,
right? Like one of the things that you say is bright color
golf shoes, right? So that is like very, very specific,
and that is how attaching vectors and numbers
to the search query makes it very, very precise
in scenarios like this, right?
Okay,
so most of our custom,
I've spoken about vectors, and now let's talk about how does this vector and
data work together. Right? Now, many of our customers
are using vectors for their genai application.
And one of the feedback that we have got from them is their existing
databases should have vector enabled
and it will make them more confident, it will
meet the requirements of being scalable, available and
provide durability,
storage and high compute, right?
And what we have done is we have made sure that when
your vector and business data are stored in the same
place, your application will run faster,
because when they are in the same place, there is no need or no,
you don't have to worry about data sync or data movement and,
you know, data silos at all. So we store our vector
and database together. And that is why we have,
um, enabled vector searches across our multiple
services that you see. We have Amazon open search,
we have Aurora postgres, we have RDS postgres,
Neptune document DB and DynamoDB also has
zero ETL for faster retrieval.
So this is our famous flywheel. And we start
off with, you know, unify where you make sure that you
break down your data silos, you innovate by building
new Ji application and you modernize your
data infrastructure. Now, the beauty about this flywheel
is you can essentially start off your data modernization or
data strategy from anywhere, right? Like,
I've met customers who would say, I'm going to start off with innovate
where I'm going to innovate genai application,
you know, work on my llms and work on my use cases
and then go to the modernizing your data
or, you know, infrastructure and then think about not
having data silos or making more use of those data. And then,
you know, flywheel goes where I have come across customers
who would say, yes, we would like to go to cloud first,
have a modern data structure infrastructure
on the cloud, have a great data strategy,
utilize all the benefits of the cloud and then go in the flywheel
and then innovate and all of that. So the beauty of this flywheel is,
you know, we can start off from anywhere and then once you
are in the flywheel, it will just power itself and goes from there.
Right. Also, you know, this whole flywheel
avoids the risk of getting logged into a proprietary
format. It will help you break down data silos
and, you know, empower your team to build Genai
applications, building a data foundation to
fuel your generative AI application. You know,
AWS provides a wide variety of services
which are comprehensive services for each use.
We have integrated with vector databases,
zero etL, so you can easily connect to your different data
stores, right. If you're already in cloud and using any
of your services, we have zero ETL in most of our services that will
easily help you to connect and access your
data all around. And then we have some
very good data governance as well available to
have secure your data in the cloud and
also utilize some policies or user access as well.
So we have lot of services aligned to that as well.
Now, where we can help, there are a couple of places where we can help.
You can obviously go to our generative AI innovation
center page and request for a conversation or if you're
already one of our customers and reach out to your respective
account team. But there are a few ways you can we can help you is
about getting buy in from your exec on data strategy.
The next one is we can help your organization to
envision, you know, data to
drive some of the business outcome. Maybe do a POC, maybe do a
first pilot. And then we also have options to basically
modernize your data foundation as well. So there are
a couple of ways where we can help you, right? And then reach out to
AWS generative AI Innovation center to help you more.
These are some two very good workshops. This is
a very technical, dive deep workshop. So if
you are interested to learn more, get your hands dirty,
scan the code, register for these amazing
workshops and you go from there.
So this is my last slide. Thank you so much everyone
for joining today's session and
have a great day. Thank you.