Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi there, thanks for joining the session. Today I'm going to be sharing how you
can get started on AWS for building and
orchestrating serverless workflows for generative AI generative
AI has taken the world by storm. We are seeing a massive shift in the
way applications are being built. A lot of this is through consumer facing
services that have come out like chat, JPT by Openei,
cloud by anthropic, and we are able to
see and experience how powerful latest machine learning models have
become. Generative AI is a type of AI that can create new
content and ideas, including conversations, stories, images,
videos and music. Like all AI, generative AI is powered
by machine learning models. But generative AI is powered by
very large models that are pretrained on vast amounts of data and
commonly referred to as foundational models. Now, throughout the session and
also in conversation that you'll have out there, you'll see foundational models
being interchangeably used with LLMs. Large language models
just to understand LLMs are a subset of foundational models where LLMs focus
on text. Specifically. There have been some amazing breakthroughs
through using foundational models in different industries. A couple
of these are where we see impacts in life sciences,
with drug discovery being powered by Genei. This has
enabled researchers to understand things like protein synthesis.
In financial services, we see Genei being used to help create
highly tailored investment strategies that are aligned to individuals,
their risk, appetite, and also financial goals that they want to achieve.
In healthcare, we have seen how physicians and
clinicians can use this to enhance medical images and
also to aid in better diagnosis. Think like a medical assistant.
And in the retail space we see teams generating high
quality product descriptions and listings based on product data that they already
have. Now you'll notice a lot of the use cases
for generative AI are for enhancing existing processes or experience
that are already there. A question that usually comes is we
already have services and applications that are out there. How do we take generative AI
and then add that to enhance the experience versus rewriting everything from
scratch? Now, to understand this, what you need to also
understand is how you view generative AI.
So from AWS's perspective, Gen AI
has three macro layers and these three are equally important to us and we are
investing in all of them. The bottom layer is the infrastructure. This is used to
train foundational models and then run these models in production.
Then you have the middle layer that provides access to these large language models
and other FMs that you need and the tools that you need to build and
scale generative AI applications which then use the LLMs under
the hood. Then at the top layer you have applications
that are built leveraging foundational models so they take advantage
of Gen AI quickly and you don't need to have any
specialized knowledge. Now, when you take this and map this against the
services that we provide from AWS, you kind of
see that the three stacks are kind of neatly segregated. At the
lowest layer of the stack is the infrastructure. This is basically where you get
to build cost effective foundational models. You train
them and then you can deploy them at scale. This gives you access to our
hardware, accelerators and GPUs. And also you get access to services
like Amazon Sagemaker that enables ML practitioners and your teams to
build, train and deploy LLMs and foundational models.
Then at the middle layer we have Amazon bedrock. This provides access to
all LLMs and other foundational models that you need to build
and scale generative AI applications without you managing the whole
infrastructure behind it, right? Without you actually managing the scale side of
things. Think serverless, but for machine learning models for FMS,
basically, then at the top layer are applications that help you to
take advantage of Genei quickly as part of your day to day operations.
This includes services like Amazon Q, our new generative AI
powered assistant that is tailored to your business. So think like Personas,
which are business users, data users, or even developers.
You could use Q as part of AWS, a plugin that's
already available for certain services, and then afterwards use that to get
an enhanced operation capability.
Each of these layers builds on the other, and you may need some or all
of these capabilities at different points in your generative AI journey. A lot
of what you see is in an organization, you'll have a mix of Personas that
would use all three layers. You use specific services from those three layers
to enhance productivity.
Now, Amazon Bedrock is the easiest way to build and scale generative AI applications
with foundational models. This is a fully managed service so you can get started
quickly and you can find the right model based on the use case that you
have. You can then also customize your model with your own data,
and you can do this privately. Nothing feeds your data back to the base models,
which then other customers would also have access to.
This doesn't happen and you have the tools that you need to combine the
power of foundational models with your organization data and execute
complex tasks. All of this is with security, privacy and
responsible EI safety, which you need to then put generative AI
into production for your users. Now, there's a lot of
models that are out there and from Amazon bedrock. These are
a couple of models that we provide and one of the reasons that we went
with this model is because everything's moving fast.
Experimenting and learning is the key right now and also generative.
AI as a technology is also evolving quickly with new developments.
Now when things are moving so fast, the ability to adapt is the most valuable
capability that you can have. There is not going to be one
model to rule them all, and certainly not one company providing the models that
everyone uses. So you don't want a cloud provider who is beholden
primarily to one model provider. You need to be able to try out
different models. You should be able to switch between them rapidly
based on the use cases, or even combine multiple models within a
certain use case. You need a real choice of model providers, AWS. You decide
who has the best technology. This is kind of like where we have
seen based on our building services that we want to provide the choice
to customers, which is you. This is why we provide
through Pedrog, access to wide range of foundational models from
leaders like AI 21 labs, anthropic coher stability AI,
also access to our own foundational models like Amazon
Titan. And the idea is that we provide an API as
part of this. So there is a layer, an API layer that provides
you access to the large language models under the hood or the foundational models.
And all you do is as a user or probably as a
developer, you create the prompts in a certain format based on what
the foundational model expects. You take that prompt or text embeddings
if you want to tune that model a bit more, and then afterwards send
that to the API layer and you can then get your responses
back and then use that as part of your applications. Now there are a couple
of ways you can use bedrock. One of the ways customers
usually start is by writing code. And the way you
integrate with Amazon bedrock is that you can use the SDK, right? So you
use the APIs and then afterwards access the foundational models.
So you load the libraries that has a bedrock API and then afterwards you
can also access data in other places like an
S three bucket. If you have data that's bigger than what's normal,
you can then access it in s three bucket for input and even for output.
You can then prepare the input and then handle the JSON to bring convert and
then afterwards decode the responses. If the return data
is image, it's an image of sorts. You can then store
that in an S three bucket. Then if you have retries,
then you'll have to do retry logic inside, and then afterwards,
if you have any errors, you may have to have a certain condition, so on
and so forth. You kind of get an ideas of what happens with code in
general. Now, this is what the code would look like, but how
do we actually look at providing simpler integration without
writing a lot of code? And for this, you need to also understand
the whole idea of sequencing. Right. How do you coordinate between multiple services?
Because a lot of organizations don't just have one specific app,
they would have probably a plethora of apps that power their business.
And you want to understand how these services are going to talk to each other
in a reliable and understandable way, because business processes
usually exhibit different patterns based on the inputs that are coming
in and what needs to be accomplished. Sometimes things need to be done
sequentially. So in this case, let's say you have a number of lambda functions.
So we'll use lambda as a proxy to understand this for different services.
So you have a lambda one, and then you have a lambda two. Now,
this is easy enough because you can have these in sequence.
So lambda one invokes lambda two. But what if you have more than two
lambda functions? What if instead of calling lambda two, you need
lambda one to also call lambda seven before calling another service, or before
calling a foundational model in this case. Now,
if one of these services or functions fail, there's no easy
recovery mechanism, and reprocessing previously executed
steps becomes difficult. So we add some persistence inside.
That's the next step. You have persistence because you have all these executions
happening behind the scenes. And this way we can
deal with state, right? Try to manage some kind of a coordination, try to understand
which service is being executed at this point of time for this
whole execution flow that's happening now. Because of this,
you have to also collaborate all these functions. You need to manage this persistence mechanism.
And there's no elegant way of coordinating flow or error handling between these
services. And not every process is sequential.
So, for example, you could also have certain processes that need to run in parallel,
or perhaps it can follow different paths based on the input or what happens in
an earlier step. That's a lot harder to do, and it gets even harder
the more successful you are, because more people want to use the flow processes you've
built out. You need to be able to handle errors as they occur.
And this could be things like retrying calls, or it
could be something as simple as following a different path in your workflow.
All in said, this is all things that you can still do in code.
This is something that has been done in code for quite some time. But what
if your flow also needs a human as part of the process? For example,
you need a human to review the output of a previous task to see if
it's accurate, like a spot check for example. Or you've
built out an application processing flow where the customer has requested a
credit limit that exceeds the specified auto approved threshold.
And then you need somebody else to come in and then afterwards review
that request, and then after say okay, yes or no, depending on other
data that they have. So that application needs to be routed to a
human for this to work, and this continues.
So as long as you have business processes that need to emulate what happens in
the real world, you're going to have this amount of complexity that you
need to build as part of your applications. So one approach to
manage this complexity is that you don't have to write a lot of code
and communication. Instead, try to visualize your sequences as
part of a workflow. And this is where AWS step functions comes
in. Step functions is service that allows you to create workflows. These are
workflows that allow you to move output of one step to the input of the
next step. You can arrange these in a workflow with conditional logic
branches, parallel states, tools, a map state, or even
specify wait states, like for example if you're running a job and then you need
to wait for a certain period. Over here
you can see a bit of an animation that shows you
how you can choose a service. You then can then
drag it from the left and then after put in the design view. Then the
logic gets added. Then each step or action the workflow is configured.
This also helps you to visualize how you can provide error handling
and also specify retry and backup strategy.
Step functions is serverless, so you only pay for what you use. It scales
automatically, which also means that you can scale to zero.
You're not paying when it's not being invoked. This is fully managed and provides
a visual building experience using a drag and drop interface called workflow Studio.
The visualization experience extends beyond building because when you
run your workflow you can also visualize its progress with each step,
changing colors as it moves forward and under
the hood. What happens is this is using code which is using Amazon
State's language, which is ESL. ESL is a domain specific language
and it's JsoN based. So you can then declaratively create your
workflows. So you provide that and we'll show some examples later.
You can then take that ESL and then add that as part of your deployment
pipelines so you can commit it to your repositories. You can also make pull requests
on this so that other team members can collaborate.
Now one of the things customers have told us with step functions, because step functions
has been there for a few years, is that it integrates natively with 220 services
and you can choose a service that you need to use as part of your
workflow and take advantage of the benefits. Now the
way step functions integrates with these services is through two ways.
First is SDK integrations and the second is optimized integrations.
SDK integrations, as the name applies, are provided
by step functions by directly integrating with the AWS SDK.
So that's over 10,000 API actions that you can use directly
from your workflow without the need to write any customer integration code.
Think blue code, which a lot of folks when they write serverless
applications with lambda you tend to write. You can remove a
lot of that just by using step functions. The other one is optimize
integrations. Now the way they differ from SDK integrations is that each
action has been customized to provide additional functionality for your workflow.
So beyond just the API call, you also get certain things like for
example where an API output is being converted from an
escape JSON to a json object. So depending on the kind of integration
that's bring, provided, those optimized integrations have that added value
needed so that you don't have to then write extra code for
maybe doing those manipulations. Now with any workflow
and orchestration around, you need to have certain patterns that are provided,
and these integration patterns by default are
something that API actions can be provided
with. So when you specify your workflow by default,
it is asynchronous so the workflow doesn't wait or block for the action
to complete. This is what you call as a standard request response call pattern.
So you start the task or the work to be done and the workflow doesn't
wait for complete, it moves on to the next step. This is great because it's
efficient. You can continue moving quickly, but sometimes
there are cases where you may need to wait until the request is complete and
then you progress. And there is an optimized integration pattern
called job run or also called sync. Because of
the word dot sync that's added to the end of the API action. Then you
also have a callback. This is what helps us to introduce a human into our
flow and we're bring to see a bit of that in the architecture later.
Now with these integrations that are available,
you then have an idea of how you can take a business process
and then afterwards integrate that across. But just to understand
why this is important, let's take an example of a standard
serverless application and show you why direct integration
actually makes more sense. So here's a classic example. You're querying
a database. So we have a lambda function that needs to get an item
from a dynamodb table. So from a code perspective,
what do I need to get started? I need the import AWs SDK
to interact with the table. Then I need to set up my parameters
to tell dynamodb what table I need to interact with. So this is
like the table name, the partition key, the sort key, and then
I set up my query so that there is a try catch block and then
I return any errors. Now above that I also need
to add lambda export handlers with my event object, my context
object, and then add another try catch block to catch other errors.
I may also need to convert data
structures like for example an object to a string, for example, for other reasons.
But you can see there's a lot of lines of code just to get one
item from a dynamodb table. Now each of these lines is an
area that something can go wrong. Because one thing
you have to understand is code is also a liability, right? When you write code,
you are responsible for the way it functions. You have to make sure that you're
writing it securely, you're using the right set of dependencies, ensuring that there's
no memory leaks and so on and so forth. Now when you look at it
from a step functions perspective, what you can do is you have a
single step that makes that item call to a dynamodb table and it's
just a scalable, right? I can still configure things like retries, I can still
catch any errors and then send that to a dead letter queue if I
need to so that I can do a retry later. And if you notice,
what happens is that this diagram isn't just a visual representation,
this is actually showing how you can take a certain action
and then after do that, take it from start till finish.
And you can show this to other folks in your engineering team. You can also
show this to business stakeholders so that they can understand what a flow looks like.
So added value with of course the whole idea of errors and
retries and the way it would look at when you
actually add the nodes in the end with certain integrations is like this,
right? So you have dynamodb, you have the getitem side, you have SQs send
message, so on and so forth. One other thing during development,
or even when you deploy a step function to production, is that
you need to understand what's happening in the workflow and when things go wrong.
And the way you do that is you have the execution flow where you can
see different parts of the execution and then you can go within a specific execution,
see the different states, what's happening within each state, what's the input
and what's the output, and also look at things like how much time it takes
to execute a certain state. And this is really critical when there are
issues. So a great way to get all of that together and then see that
in a single pane. Now let's dive into an
actual use case, right? And we have a demo towards the end. I'll show a
couple of demos in the middle, also about bedrock and integration,
and then one where it looks at an application that uses all
of this together. So let's say you
have an application that has videos being
uploaded, and then these videos need to be transcribed,
right? So we already have a service that's available called Amazon
transcribe. And in step functions, all I need to do is I can
drag in a transcription job start node,
so I can drag that in and then afterwards say, okay, fine, for any
image that, and then trigger that step function for any video that comes in,
for example, just kick in and then afterwards do a transcription of
that video. So automatic speech recognition
happens. And this makes it easy for developers to add speech to text capability
to their applications. This integration is super powerful.
This allows you to just have this without any code that's needed. Now let's
say I want to also do something beyond this, right? So I want to take
that transcription and I want to add some additional stuff.
And this is where generative AI can help us. So I want to create multiple
titles and descriptions for a video. I want to ask a human
to provide feedback based on what choice they want to
have from the titles and then also create an avatar for the video.
So you have text also, and you have also image generation happening. And the
way you do this with step functions is you can look at
optimized integrations for Amazon bedrock. Now there are
two new optimized integrations that we have provided, and there's more that's
been added ever since where the first one is invoke
model. And this invoke model API integration allows you to orchestrate
interactions with foundational models. So you call the API directly
through step functions. You give it the parameters that are needed, you provide the
prompt that is needed and then that gets sent to the foundation model. You get
the response back and then you can continue using that. The second one
is the create model customization job. Now what this does
is this supports the run a job, the dot sync call pattern that we saw
earlier. And this means that it is waiting for the asynchronous
job to complete before progressing to the next step in your workflow.
So say for example, you're trying to create a certain customization on top of the
foundational model. It'll wait for that and then it'll go to the next step and
then afterwards continue with that process. This is useful especially in
data processing pipelines because you are trying to do some kind of fine tuning to
the model. I'll quickly jump into demo
so that you can actually see what happens with standard
implementation with bedrock. Just quickly to understand if
you're getting started with bedrock, you need to make sure that you have access to
the models. Right now you have access to foundational models
in two regions, that's North Virginia and also Oregon.
When you go to the bedrock screen you will actually see there's
a section called the model access. And this gives you a list of all the
models that are available right now in those two regions.
And if you're doing it for the first time, you will have to go and
manage your model access and then grant access to it. You'll get
that immediately unless it's a brand new model that takes a bit of time
where you may have to submit certain use cases. In my case right
now I have clot three that's in the pipeline. I'm waiting for the
details to get approved so that I can get access to this clot three just
got announced a few days ago, support in bedrock.
So I have that immediately ready. Now let me jump in directly
into a workflow. When you go to step function and you create a new step
function, you're greeted with a blank canvas. You have a state box that's empty
over here. In my case I already dragged in
bedrock API and if you want to see
the list of bedrock APIs that are currently available, you have much more right now
where you can also manage operations on foundational models if
you need to. Things like the custom models for example and listings,
especially for processing pipelines, MLO Ops, so on and so forth.
In our case I just want to do an invoke model. So I'm going to
just show you what the configuration looks like. I have foundation models already selected,
and these are the list of foundation models that are already available. As you saw
in the previous screen. In this case I have
selected llama. So llama two is already selected in
this case, and now you can configure what are the parameters that
need to be sent. What I'm doing over here is I'm just hard coding the
prompt in another demo. Quickly after this I'm going to show
where you can actually customize the prompts based on input that you
may get from other applications or maybe from the user. In my case.
All I'm saying is, okay, there's a transcript from a video in a paragraph.
This is the same video you're going to see in the last demo.
This is an interview between Amazon's CTO Werner
Vogels and ex Amazon CEO Jeff Bezos. This is from 2012,
so eleven years old, and all it does is it
uses this transcript, and then I'm asking it to provide
a summary of this transcript. So what I'll
do quickly is I'll just do an execution, and we're going to
see how it looks like when you do an execution. I'm not passing any input,
it's optional because I've already hard coded the prompt over there.
Once I run this, and within a certain execution history
or a certain point of execution, you can see the actual path.
You can see what are the different steps that are being executed. And with
bedrock model already done in this case,
you can see that the input just was an optional input that got
sent out. And here is a summary that's come back from llama two. This is
basically a summary of the transcript. It gives an example of what
Jeff Bezos mentioned and what's the whole organization working
on towards. This was eleven years ago. You also get other parameters like how much
prompts were taken and generation token. All in all, without provisioning any
large language models, without you actually managing the scaling side or
even provisioning a large language model. So pretty cool. And the
other thing, what you'll realize is with step functions you also are able to
view the different states and how much time they took to execute.
So really useful, especially if you want to debug certain things. If there's any failures
that you get that. Also over here you can actually see those errors over there.
Now another powerful way of showing what bedrock
is capable of through step functions is chaining.
And this is another demo application. What this does is this emulates
a certain conversation that you can have with
an LLM, with anything that's doing text, right? So, for example, you have a chat
interface, and with any large language model, you have to always
provide the context of, especially the history of the conversation that's happening,
so that the next one can then understand the next conversation,
or the response can be based on that conversation from before.
So in our case, what we are doing is we're creating a chain, and in
this case, I'm leveraging another foundational model called
command text from coher. And what this does is this is
reading a prompt. So it's going to read for the prompt from the input.
So when you invoke the step function, you can actually have a look
at what are the different parameters that are there
in the object, in the json body, and then afterwards you can pick that out.
In our case, what I'm doing is, I'm just saying, okay, dollar prompt one,
send this as a prompt, and these are the maximum tokens. Now, in this case,
you'll see this is a different syntax based
on this model versus what was there for llama two.
And all I'm doing is I'm adding the result
of this conversation back to the initial prompts
that are coming in so that we have context throughout this conversation.
And now if I just go in and execute this,
I'll just copy this from a previous one, because I want to pass a similar
input. I'll just do an execution over here.
In my case, I'm passing three prompts, if you notice, in the state
also, I had three of them. And all I'm doing is, I'm saying, okay,
name a random city from southeast Asia. Just want you to give some information,
provide some description for it, and then provide some more description for it.
So let's start the execution, and as you'll see,
as the execution progresses, you're going to see all these states
changing the colors based on how the
foundational model is responding. So the first result is already
in. So it says, okay, here is a random city from Southeast Asia.
So it picks Ho Chi Minh from Vietnam.
Packages that in as part of this result, one is already added
in, and then afterwards sends it to the second conversation history. You'll see
conversation result two. Here are two interesting aspects of the city,
and it mentions certain parts of this. And then invoke model with
three. And the output over here is that it takes in certain
part. Now, with large language models being
nondeterministic, a lot of times you have to be careful with how you send
your prompts and then ensure that the context is remaining. Now, in a previous execution
of the same workflow, I was able to get the third prompt and
also make sure that it continues with the city, which was previously Ho
Chi Minh. So what I would probably want to do is I would create my
third prompt in such a way that I emphasize it clearly that this is
the city that you're supposed to use. And probably the way I
would do that is I would have certain parts in my inputs, which would probably
take certain things like the city or other things and then enforce that as part
of different prompts. But in a nutshell, you kind of see how you can
do chaining in this case, and you can also bring that within
this and have a bigger application that is using this. And we're going to talk
about the architecture of this for the rest of the session.
So let's continue with that use case of generating titles and descriptions
for the videos in this case. What happens is that,
like you saw earlier in the demo that I showed, you can select the
large language model. In this case, Titan is selected.
And what under the hood happens is that the ESL for Amazon bedrock
looks something like this, right? So there's an invoke model action that's happening, and then
there is a model that is being selected. It could be llama,
it could be anything else. And then there is a dynamic input that's coming in.
So dollar prompt, which basically means something else, is invoking the
step function and then providing this prompt. Now you have also inference
parameters that allow you to tweak the response
that comes back from an LLM for various things like probability and other
things. And when you look at
invoking the model, you can also provide input and output.
So for example, if your input is larger than 256 kb, because a
step function can only take 256 kb of content text,
usually in this case, what you can do is you can point to an S
three bucket for input and for output. It's a good way to ensure
that you're able to scale this application without facing
the restrictions or the constraints by step functions.
So this input and output is then used, and then you can change this and
you can continue using this in different states within step
function. One thing you'll realize is that in the first requirement,
it was actually mentioned about creating multiple titles. Now,
for example, we can continue using just the foundational models within AWS.
But what if we want to access something that's outside, let's say for example,
hugging phase, you want to access this foundational model from
outside AWS. We want to then
get the data, send that across, and then after get the response back and
then continue in our execution. Now when you look at accessing
a public API in general, it might look simple. Then the first question
comes is what is the kind of authentication that you need, right? Is there basic
authentication? Is there API keys? Is there oauth?
Is there anything else token management for example. Then you
also want to ensure that you're saving the secrets because you want to make sure
maybe there's an access key for accessing the API. You want to keep that somewhere.
Then you have input output handling. You also
then have a graceful retry if something goes wrong. Then also
rate control and so many other things. Now the way you would do that
with AWS lambda for example, or maybe a container
or virtual machine on EC two is that you would have your code running and
then you would have these different services which would fetch the credentials, you would manage
the token, you would then retrieve the request data and then afterwards
invoke and get back the data, maybe store it somewhere else also if needed,
this is what a resilient application would look like.
One other way you can do this without writing code is by public
HTTPs API integration on step functions. So step
functions has the ability to call virtually any SaaS application from a workflow
with the integration with HTTPS endpoints. So without
using a lambda function, you can use huggingface for
example, you can invoke an API and hugging face or maybe other APIs
like stripe, Salesforce, GitHub, Adobe for example.
And step functions now with this low code approach
provides you a way to connect AWS services with services that are outside.
And you can then take advantage of workflow studio because now you're dragging drop
all of these things and then after putting that as part of the workflow together
without changing or managing any code as part of
this. So with such
requests you can actually then put in your json object and then
in the request body you can then mention okay, this is the kind of data
that we are sending and this is what we are trying to retrieve back as
part of that transformation. One of the ways that you can actually
use this for integrating with HTTP APIs is that
you can manage the errors also through step functions like we saw kind of you
have that ability to do error handling. You can also
manage authorization as part of that integration. You can
also mention transformation of data because step functions already provides that for
optimized integrations. So you can also leverage that if needed for things like
URL encoding for request body and there's also a test
state that allows you to execute that specific state
without deploying that step function directly outside. So you can just
execute that specific state as a test and then afterwards make sure
that you're getting the kind of response that is needed. So with
the task state that's available, you have that single unit of work. You can
do an HTTP invoke and you can see that an existing
resource field is available now and you also have the new
option. And you can also then provide things like what methods
are being invoked. For example, what's the authentication field that is there?
The parameters block. This under the hood is actually using another
service called Eventbridge. So Amazon Eventbridge is being used for API destinations because it
has that ability to invoke or send requests
to an API destination. So the same connection is actually being
used as part of that. A lot of these parameters are actually optional.
So when you're invoking a certain API, probably you're just getting a response back.
You don't need to pass any query parameters. In this case, what we're doing
is that we can add a request for headers and then anything
else that's needed as part of the request. Now let's go
back to a requirement directly. So in our case, since we want to generate
multiple titles, we want to make sure that we're able to access
one title from our model ourselves,
and then after one from hugging phase. So we have a parallel state.
Now through step functions, this allows us to use both the foundational
models and you simply configure the endpoints on the right hand side.
This way the parallel state will then execute and
it will invoke each task in parallel. It then requires that
each branch completes successfully for the parallel state to be considered successful.
Now what happens if one of these branches doesn't complete successfully?
Right? So what if something goes wrong? Maybe there's an issue in the call
for one of our two FMs, and errors happen for
various reasons. And if it's a transient issue such as a network interruption,
you want to make sure that you're able to do a retry, and maybe you
want to do that retry for a couple of times. And then you also configure
something called as a backup rate to ensure that you don't overload the
third party system. And for these momentary blips,
it's just important. You just need to make sure that you have a retry mechanism
of sorts. But what if that underlying error is actually something
which requires a longer investigation, right, or a longer resolution? Time, because maybe
it's not under your control, maybe it's independent of your team,
and maybe it's somebody else who's managing it, or maybe even a third party.
And what may happen is you may exhaust your retry strategy and then eventually
that workflow step will actually fail. So you
want to make sure that you're able to run this entire workflow, but then at
the same time, if you can't, then you want to move it to an error
state, or then move it somewhere else so that you can retry it later.
So if you want to visualize this, this is basically what it looks
like. So you have a success tip that then kicks off a parallel workflow.
This parallel workflow has two branches, so you
have bedrock on the left, hugging face on the left. And let's say we invoke
the foundation model, we have some transformations we want to do using another
AWS service, and that is an extra step. But let's
say there is some failure in transformation because we have invoked something hugging
face, and then when we get it back, something's not working.
And this transcription job needs to continue. Right. There's some form that
needs to happen, and we have stopped it over here before actually moving
it to the next step, which is a human review.
This is where you have the option of Redrive. Now, Redrive allows
you to easily restart workflows, maybe because you have figured out,
okay, there's a problem, and then maybe it's got resolved, and then you want to
retry that workflow all over again so you can recover from failure
faster, and you only pay for what you need. So you don't have
to keep retrying it unless it's really necessary. So the way this
works is you will have these two branches on
the left and hugging face on the left. And let's say that when
we invoke, we do the transformation, but it fails in the transformation step,
so it gets fixed, and then after you come back again, and then you
do a retry again once more, and this time the transcription actually kicks in
because your transformations are already completed. And now it goes into the human review
space if needed. So one of the other things
you want to do also as a part of workflows is you want to have
observability. Execution event history is very important as
part of this because you have different states that are coming in, you have events
being fired. You want to make sure that you're able to filter and
drill down to what's actually happening within your workflow. This is kind
of like where you can see execution Redriven and it also shows
a count, a redrive count of how many times you are actually retrying that
execution through Redrive. So cool. I think it's a great way
to understand how you can actually manage events,
especially errors in this case, and then ensure that your workflows
are able to then continue properly.
Now, with multiple titles out of the way,
let's talk about asking a human to provide feedback.
Now, having a human approval is an automated business process and it is super
common. You have this as part of any approvals that are happening,
probably in the banking space, in the financial space. You have
probably also a human in the loop as part of maybe a foundational model that
you have created or you have custom built, or maybe you're fine tuned
and then you want to make sure that you're able to check the response that
are coming in. Maybe you have an EB flow that's happening, right? For a few
requests that need to come in, you want to have a human response that needs
to happen, human review that needs to happen. So the requirement is super
simple, but possibilities are endless when you need to do this.
So step functions integrates with services in multiple ways,
and one of the ways you can do this is through long running jobs of
a service, and you want to wait for the job to complete and
we'll use this integration pattern to achieve this requirement.
So what you want to do is you want to make a call to a
service integration. This passes a unique token and this token then gets
embedded in an email. It goes to maybe a server or
on premise and legacy server, or to a long running task
in a container, for example. And then once that response
is maybe it's reviewed, and then after they click on going ahead or
not, it returns using the step functions API, send task success.
And then the workflow continues from there. So as
part of the send response and wait workflow, there will be
a token that's sent out like I mentioned earlier, and this email notification is
already there. Maybe as part of this use case at least,
what will happen is there will be options that are being set. So choose the
title that's being generated by Amazon Bedrock, or choose the title that's
being generated by hugging face and then regenerate that.
Now the last part of this requirement is creating an avatar for the video,
which basically is an image in this case. And machine learning models,
especially in the foundational model space, you have built in algorithms.
We also have pre built ML solutions that are already available. You can probably invoke
a third party API again for this case, and there are multiple
ways you can do this as a part of bedrock. You also have access to
stability diffusion models, so you
can use that also as part of the step. What this does is in the
end, once you have tried the feedback, you can then generate that video,
sorry, the avatar for the video, and then you can store
that in an S three bucket and then share that link later.
Now, one of the things you'll realize when you want to create such a complex
workflow is that especially with foundation models, you want to have this whole
idea of creating chains of prompts, which we kind of saw in the demo.
Now this is a technique of writing multiple prompts and responses. A sequence of steps.
Step functions is a workflow is a great way to actually
leverage chaining, so you can actually use this.
And step function simplifies the way you invoke your foundational models,
and you have state independency management already in place. You can then
create chaining easily. You can also pass the state, as we saw earlier,
pass that to the next state that's needed, the response of a
state, and then pass that to the next one, maybe specific parts of the prompt
also if you need to. And all of this is again serverless. So think of
use cases like writing blogs and articles, response validation, conversational LLMs
that we see a lot these days. Now, if you want to now take all
of what we have seen and then put that in an architecture, this is what
it looks like. So for example, you have an API gateway
that a user would invoke through an application,
and then that would then put in an event into
a queue, and then this
event in the queue then gets picked up by a lambda function, which then would
trigger this step functions workflow. And in this case,
what happens is that you have a lot of these steps already in place
as part of the workflow. It sends the title and description to the user back,
and then afterwards you can then send the chosen title and description
as part of the human workflow, if needed, for the response,
for the review part. Then as part of the final
part where you have the generating the avatar,
you actually get an S three presigned URL, because that avatar
image gets created, generated, and then afterwards put in an S three bucket.
So here's a demonstration of this final architecture.
So there's a short video of an interview between Jeff Bezos and
Werner Vogels. What's going to happen is that
we want to generate a title and a description and an avatar for this video.
So there's a simple UI that you saw earlier.
This uses a websocket communication to talk to AWS
iot core service. And once you select the button, it then sends
that video's details. And then the workflow then gets executed
from the lambda function. And then you see that that step starts kicking in.
This gives a nice view of the execution. You have the color coding of the
state. And with transcribe being used initially, you get the text
back for the speech that is there in the video. And this transcribe
job is asynchronous. So there is a wait loop that is there to make sure
that we can wait for it to complete.
Right, so that's the wait loop that's there in the beginning. And once that
wait loop is done, wait loop is done. And using
the get transcription job API, we get the final response
from that transcription job. And then we read that transcript.
And that transcript is available in an S three bucket
already because transcription job will put it in over there. And then
once that is read, it is then passed down to this parallel execution.
As part of the parallel execution, we have this two calls that are being done.
One to bedrock, one of the foundational models
at bedrock, and then one to hugging face. In this case, we're just keeping it
simple. So we want to just make sure that we're able to execute this back.
And we want to then get user feedback now
quickly, just to show you what the outputs look like.
These are the inputs that are coming in. This is the transcript that's
there, the prompts. You kind of notice that the models that are being invoked is
also there as part of that. Here are the parameters,
here's the conversation that's happening with the video and the
s three bucket URL for the video and other things. And there's
a task token that's already there. This is part of the review flow that
is being invoked. So we have this task token that's being sent
to the page. And this page is basically where someone
can actually go in and then say, okay, do they want to select this title,
the first one or the second one? So we select one of the titles,
and then afterwards it goes down, and then it creates an avatar as part of
that title that's created. It sends that as part of a prompt to
one of the foundation models. So stability in this case,
and once you have that, that gets displayed over here, and that's an
avatar that is used now, for example, the team that uses it can then
copy this image and then put it in because it's already there in s three
bucket. Or probably it gets picked by another flow that then is used as part
of their content publishing pipeline. Now, to know more about how
you can build applications like this, there is a sample that's already available that has
different use cases. Also covers things like the error retries.
It covers prompt chaining and all
the other parts of creating a complex workflow with step functions for generative
AI. So have a look at this resource. A great way to do this,
and also with our blog posts that are linked as part of this resource.
So with that, I would like to thank you for attending the session and have
a good day and rest of conf 42. Thank you so much.