Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Welcome to LLM 2024 organized by
            
            
            
              conferred tour. My name is Indika Vimalasurier
            
            
            
              and I'll walk you through about how you can leverage observability
            
            
            
              maturity model improve end
            
            
            
              user experience of the apps you are going to develop
            
            
            
              using LL lens. So we will touch about how to start
            
            
            
              which is the foundation, and then probably take it up to
            
            
            
              around using AI to
            
            
            
              support your operations. As you might aware,
            
            
            
              by around 2022,
            
            
            
              the hype started with Chatgbt. ChatgBT was
            
            
            
              a hit, it was mainstream and it resulted
            
            
            
              in lot of people who are not into AI start creating
            
            
            
              generative AI apps. So now it's already has taken over
            
            
            
              the world. The world is looking at what are the use cases
            
            
            
              which we can use and
            
            
            
              leverage. It's already mainstream.
            
            
            
              There's lot of developers who are building apps
            
            
            
              connecting llms. So there's a need.
            
            
            
              Apps which we are going to develop has a capability of
            
            
            
              providing full end user experience because
            
            
            
              we all know how it can end. While generative
            
            
            
              AI is which is opening creating
            
            
            
              lot of new opportunities. We also want to ensure that the apps which
            
            
            
              is being developed, deployed properly
            
            
            
              in production environments and are being served to end users
            
            
            
              as per the expectation and we don't want to make it
            
            
            
              ops problem. So we want to ensure we build a solid
            
            
            
              observability into our llms as well.
            
            
            
              So as part of today's presentation, I'll provide you a quick intro,
            
            
            
              what is observability? And we'll discuss about what
            
            
            
              is observability mean for llms. So there are two kind
            
            
            
              of observability which we can discuss,
            
            
            
              so which we are going to discuss, so which is about a direct
            
            
            
              observability, and second one is about indirect observability.
            
            
            
              So I'll be focusing more on indirect observability
            
            
            
              when discussing about the maturity model, which I'm going to
            
            
            
              walk you through. Then I'll walk you through about some of the pillars,
            
            
            
              give a quick intro about what is the LLM
            
            
            
              look like, and then we'll jump into my main focus,
            
            
            
              a maturity model for LLM. So then
            
            
            
              we'll look at some of the implementation
            
            
            
              guidelines, the services which we can leverage, and then of course,
            
            
            
              like every other maturity model, this should
            
            
            
              not be just a maturity model where people will just follow blindly,
            
            
            
              but we want to ensure we tack into business outcomes
            
            
            
              so we have an ability to measure the
            
            
            
              progress. And then we'll wrap this up with some of the
            
            
            
              best practices and some of the pitfalls I think you should avoid.
            
            
            
              Before we start, a quick intro about myself.
            
            
            
              My name is Indigo Emilasuri. I'm based out of Colombo.
            
            
            
              I'm a serious reliability
            
            
            
              engineering advocate and a practitioner as
            
            
            
              well. I'm a solution architect with specialize in SRE observability,
            
            
            
              AI ops and generative AI working
            
            
            
              at Vergisa as a senior systems engineering manager,
            
            
            
              I'm a passionate technical trainer. I have trained hundreds
            
            
            
              of people when it comes to SRE observability
            
            
            
              aiops and I'm an energetic technical blogger
            
            
            
              and I'm very proud AWS community builder
            
            
            
              under cloud operations and also a very proud ambassador
            
            
            
              at DevOps Institute which is also known as PC
            
            
            
              CERT because they have acquired it. So that's about me.
            
            
            
              So I am very passionate about this topic, observability. So when
            
            
            
              it comes to the distributed systems and then llms,
            
            
            
              the end of the day I look at things from a customer experience
            
            
            
              and how we can provide better customer experience to our end users
            
            
            
              and then how we can make a better business
            
            
            
              outcomes part of the presentation I'm
            
            
            
              mainly focused on AWS. So I'm looking at
            
            
            
              llms especially deployed in and
            
            
            
              been accessed through AWS. So one
            
            
            
              of the fantastic service AWS has offered is Amazon
            
            
            
              Bedrock, which is a managed
            
            
            
              service where you are able to use APIs to
            
            
            
              access the foundational models. So it's
            
            
            
              really fast, it's really quick, you just have to ensure that
            
            
            
              you have the ability of connecting. So the key features
            
            
            
              are it's giving access to the foundation models and
            
            
            
              the use cases such as text generation, image generation and
            
            
            
              the use cases around those. So it's also providing
            
            
            
              this private customization with own data with the
            
            
            
              techniques like the retrieval augmented
            
            
            
              generation. We call it rack. And it's also providing the
            
            
            
              ability of building agents and executed tasks using
            
            
            
              the system, enterprise systems and other data sources.
            
            
            
              Obviously one good thing is that there's no infrastructure, so you
            
            
            
              don't have to worry about infrastructure. So AWS is taking
            
            
            
              care of the infrastructure. So that, that's why we call it fully managed.
            
            
            
              So it's a very secure and it's a, you know, it's a
            
            
            
              go to tool if you want to develop generative AI apps.
            
            
            
              It's already consist of, you know, some of the most
            
            
            
              widely used foundation models provided by a 21
            
            
            
              labs anthropic cohere, meta and stability
            
            
            
              AI, Amazon as well. So there are a lot of models and they are
            
            
            
              also continuously adding these models into their.
            
            
            
              So with that, our observability
            
            
            
              maturity model or the approach is mainly focused on application,
            
            
            
              which has been developed using Amazon bedrock.
            
            
            
              So moving on. I just want to give a quick idea like you know,
            
            
            
              so when we say generate AI apps, so what is kind of the
            
            
            
              use case? The use cases, a typical user
            
            
            
              can kind of like enter query. So it will
            
            
            
              come into our, the query interface like we can take it from
            
            
            
              my API or user interface. And then the,
            
            
            
              we will process, start processing this user query and
            
            
            
              then we will try to connect it with the vector encoding.
            
            
            
              So it's trying to find similar queries,
            
            
            
              similar patterns using in our vector database.
            
            
            
              And then we will kind of like looking at retrieving the top k
            
            
            
              most relevant context from the vector
            
            
            
              database and then we'll make it as input in,
            
            
            
              combine it with input when we are providing into llab.
            
            
            
              So why? So the key thing to notice that we generally combine
            
            
            
              the user input as well as the retrievals
            
            
            
              we receive from the vector database. With that we
            
            
            
              will start inferencing with the LLM, we will send
            
            
            
              the LLM the request input and
            
            
            
              then we will start updating the output as well,
            
            
            
              which we can combine with our rack integration and then finally
            
            
            
              we can send it to after customization to end user.
            
            
            
              So this is typically a workflow of
            
            
            
              generative AI and this is the way we want to kind of like
            
            
            
              enable observability. What is observability?
            
            
            
              I'm sure most of you are aware, but just to ensure
            
            
            
              that we are kind of in the same page, I just spend a quick
            
            
            
              short amount of time to give my perspective
            
            
            
              of observability. So observability is nothing but ability
            
            
            
              to intercept or understand the system's internal
            
            
            
              state by looking at its external output.
            
            
            
              So what are the external outputs? We are typically looking at locks,
            
            
            
              metrics and traces. So I like to think, you know,
            
            
            
              observability is like, you know, looking at the big picture entire this mountain,
            
            
            
              not only focusing on what's, you know, outside the water.
            
            
            
              So what we are trying to look at it, trying to ask the questions like
            
            
            
              what is happening in my system right now, how the systems are performing
            
            
            
              and what anomalies they are in my system, what are the different
            
            
            
              components interacting with each other, what causes
            
            
            
              a particular issue or failure? So when it comes
            
            
            
              to monitoring observability, obviously there are a
            
            
            
              lot of good things when it comes to observability,
            
            
            
              because observability is more of a proactive
            
            
            
              approach, it's active approach instead of a passive,
            
            
            
              and it's looking at the big picture and looking at more of a qualitative
            
            
            
              and quantitative data. We want to make a quick
            
            
            
              discussion and agree on something. And we want to agree when
            
            
            
              we say observability and llms, what that means. So when
            
            
            
              it comes to observability in llms or the
            
            
            
              apps being developed using llms, we can divide it into
            
            
            
              two parts. One is something we call direct LLM
            
            
            
              observability or observability of LLM itself.
            
            
            
              So what that means is that we in this scenario we will
            
            
            
              start monitor, evaluate and look at the large language
            
            
            
              model directly. So this is all about observability into
            
            
            
              large language model. But then there are other aspects like indirect
            
            
            
              LLM observability or observability of applications of
            
            
            
              the systems using LLM. Here we are not looking at
            
            
            
              the LLM directly, but we are looking at the applications
            
            
            
              or the systems connecting utilizing LLM.
            
            
            
              So this is just to ensure that, you know, we are able to both ways,
            
            
            
              we are able to provide some really good benefits to the end users.
            
            
            
              So both have its and
            
            
            
              the techniques we will use is pretty much the similar standard way.
            
            
            
              When it comes to observability, we will look at, you know, how we can look,
            
            
            
              leverage the logs, the metrics, the traces and other things.
            
            
            
              So now if we kind of quickly look at, you know, what,
            
            
            
              when we mean direct LLM observability, what that means.
            
            
            
              So here we will integrate observability capabilities during
            
            
            
              the training, the deployments of LLM and while it's been
            
            
            
              used, so it's at LLM itself.
            
            
            
              So the main objective is we want to gain insight
            
            
            
              into how the LLM is functioning, identify anomalies and other
            
            
            
              issues directly related to LLM, understand the decision making
            
            
            
              process of LLM here, how we approach this, we will activate
            
            
            
              logging and we will look at things like the attention,
            
            
            
              weights and other internal states of the LLMs when it's doing,
            
            
            
              when we are doing inferences, we will implement probes or
            
            
            
              instrumentation with the model architecture. So the observability
            
            
            
              is being implemented at LLM level and we
            
            
            
              will start tracking performance metrics such as latency and the
            
            
            
              memory usage, and also things like external techniques like
            
            
            
              attention visualization. So as I said, this is more of
            
            
            
              LLM level. So this is about fully fledged looking
            
            
            
              at how the LLM is performing. So when
            
            
            
              it comes to indirect LLM observability, we are mainly looking
            
            
            
              at the applications or the systems
            
            
            
              which we have developed connecting with LLM. So here
            
            
            
              we are not looking at LLM isolately,
            
            
            
              but we are fully focused on the application side. So this is
            
            
            
              to understand when it comes to our application, how is our application
            
            
            
              is behaving, what observability things which we can enable
            
            
            
              and how we can interpret the internal state.
            
            
            
              This makes sense because just like any other application
            
            
            
              for Genai also we want to understand how is our
            
            
            
              application performed like because there can be any number of issues coming in.
            
            
            
              And here again, you know, it's end of end user customer experience, it's the users
            
            
            
              who are using our solution here what
            
            
            
              we are looking at is again we will look at the logging,
            
            
            
              the other inputs and outputs related to LLM.
            
            
            
              We will looking at the monitoring metrics,
            
            
            
              we will look at enabling anomaly detections
            
            
            
              on some of the LLM outputs. Obviously we need
            
            
            
              the human feedback loops as well. And then you know,
            
            
            
              we will look at lot of metrics such as error rate, latency.
            
            
            
              And the key objective as you would have already guessed is to understand
            
            
            
              how is our application is behaving and how is our application
            
            
            
              is leveraging LLM and how good kind
            
            
            
              of output we are providing into our end users.
            
            
            
              So when it comes to the LLM observability in
            
            
            
              this presentation, when I say LLM observability, I am looking at indirect
            
            
            
              LLM observability. So I am looking at coming up with
            
            
            
              the maturity model which is catering
            
            
            
              to applications develop using
            
            
            
              application develops connecting to AWS bedrock because AWS is
            
            
            
              what I am focusing on and the other aspects of AWS
            
            
            
              is bedrock. So we are trying to see how we can integrate
            
            
            
              observability practice into generative AI applications.
            
            
            
              So we are looking at, you know, how we can identify
            
            
            
              these applications internally. States also focus on some
            
            
            
              aspects of LLM and the prompt engineering.
            
            
            
              So we will look at the indirect oversight of LLM functionalities
            
            
            
              and we try to make sure that the generative AI applications are
            
            
            
              reliable and they are providing
            
            
            
              what is it's been designed and the end users are happy with
            
            
            
              the performance. So we want to answer this question,
            
            
            
              why observe build for llms. So just like any other application
            
            
            
              llms also that generative apps being developed
            
            
            
              using llms also require observability because we need
            
            
            
              observability, you know, when it comes to ensuring we kind
            
            
            
              of like make sure that our generative applications
            
            
            
              are correct, it's provide the correctness,
            
            
            
              the accuracy and it's about the, the performance,
            
            
            
              it's about providing great customer experience. But when it
            
            
            
              comes to llms, llms have its own challenge. It's sometimes
            
            
            
              it's complex. We might have to look at,
            
            
            
              you know, what kind of anomalies, you know, or the model bias it's having
            
            
            
              or the model drift. So when we say model
            
            
            
              drift is the model can be working fine when
            
            
            
              we are doing testing for considerable period of time but it
            
            
            
              started, you know, it start failing. So this can have a adverse
            
            
            
              impact on our end user performance.
            
            
            
              And sometimes the models can create some biasness,
            
            
            
              you know, which is again, you know, bad, which can create some bad customer experiences.
            
            
            
              Then we will look at the pretty much the other standard things like debugging,
            
            
            
              troubleshooting, how best we are using our resources
            
            
            
              and the ethics, the data privacy, security.
            
            
            
              So implementing kind of like looking at these things
            
            
            
              again, observability for LLM is very important because
            
            
            
              that is again allowing us to provide and
            
            
            
              you know, kind of like give generate
            
            
            
              great customer end user experiences.
            
            
            
              So now we'll focus on trying to understand what are the pillar shaping
            
            
            
              llms. What are the pillar shaping or
            
            
            
              LLM observability. So one of the key things is
            
            
            
              I'd like to split into few parts. One is that LLM
            
            
            
              specific metrics. So one is that LLM inference
            
            
            
              latency. Here we track the LL latency of llms,
            
            
            
              the request, you know, which is coming to bedrock application.
            
            
            
              We will start monitoring the latency at different stages
            
            
            
              of the request, such as like when they coming from the
            
            
            
              API gateways and lambda functions, LLM itself.
            
            
            
              So where however we have defined, we'll try to look at the
            
            
            
              potential bottlenecks and how we can improve or optimize the performance.
            
            
            
              And then we will look at LLM inference success rate. So we will start
            
            
            
              monitoring the success rate of, you know, the request going and coming from LLM.
            
            
            
              And then we will start, you know, looking at what are the errors
            
            
            
              and whether there's increase in errors, what is the reason for errors,
            
            
            
              all the troubleshooting aspects as well. And we have
            
            
            
              this LLM quality, output quality where, you know,
            
            
            
              we will like trying to understand the quality
            
            
            
              of the LLM outputs. So which is again important.
            
            
            
              So that kind of gives us the ability to kind of like, you know,
            
            
            
              improving those areas. And one other important thing is LLM prompt
            
            
            
              effectiveness. So it tracks the effectiveness of the prompts
            
            
            
              which we are kind of like sending to LLM. So this again,
            
            
            
              you know, we will start monitoring the quality of LLM outputs
            
            
            
              based on those prompts and based on various different kind of prompts and
            
            
            
              how these are getting deviated and then start continuously
            
            
            
              refining this and moving on. Some of the other things
            
            
            
              are, you know, about LLM model drift. So we will
            
            
            
              start monitor the distribution of, you know, LLM outputs with the application,
            
            
            
              understand over period of time whether there's any significant
            
            
            
              output distributions. And then we'll start tracking the performance.
            
            
            
              And of course we will have to start looking at the cost and then when
            
            
            
              we are integrating with llms, whether there are integration issues,
            
            
            
              especially because, you know, we are integrating with the
            
            
            
              AWS, the bedrock, and then we will look at
            
            
            
              some of the ethical consideration as well. So we will start monitor llms output
            
            
            
              with the bedrock itself for potential
            
            
            
              ethical things, violations and other things. So we'll have to ensure
            
            
            
              that, you know, our generative AI apps which we have developed
            
            
            
              are 100% safe, there's no harm, illegal or discriminatory
            
            
            
              content, and llms are, and the
            
            
            
              generative AI apps are safe to use. So with
            
            
            
              that, you know, we are looking, we kind of like, those are the key things,
            
            
            
              you know, when it comes to the LLM specific metrics.
            
            
            
              And then when it comes to the prompt engineering properties, we will look at the
            
            
            
              temperature, we will start, you know, see how we can control randomness in
            
            
            
              the model, because the more higher the temperature,
            
            
            
              the diverse the outputs are. And you know,
            
            
            
              if you can lower the temperature, the more focused the outputs are. And then
            
            
            
              we will look at the top P sampling so that we know we can control
            
            
            
              the output diversity. And then we will look at the top k
            
            
            
              sampling and things like Max token
            
            
            
              and the stop tokens, you know, which is about signals to model to step
            
            
            
              generating text when, you know, this encountered.
            
            
            
              We will look at the repetition penalties, present penalties, batch sizes as well.
            
            
            
              So all of these things, you know, we can extract via logs and then send
            
            
            
              it to cloud lots, the cloudbot. And then, you know, we can
            
            
            
              create custom metrics and then start visualizing.
            
            
            
              And then two other thing is we can look at the, you know, in the
            
            
            
              inference latency, we can check whether the time taken for
            
            
            
              model to generate output for the given inputs.
            
            
            
              And then we look at the model accuracy and the matrix as well.
            
            
            
              So these things, you know, probably we are using AWS X ray and
            
            
            
              then, you know, start publishing these things into cloud work and
            
            
            
              then we can bring and create the alarms and wrappers around
            
            
            
              that. And few other things are other specific
            
            
            
              things. One thing we have to look at it that, you know, when it comes
            
            
            
              to the rag models, so what are the key things?
            
            
            
              So when it comes to rags, you know, we again have metrics like query latency.
            
            
            
              We want to understand the time it takes for the rack models
            
            
            
              to process a query and generate the response. And then we will look
            
            
            
              at the success rate, how successful are these queries and how
            
            
            
              often it's getting failed. We will look at the resource utilization and
            
            
            
              you know, in case if you are using caching, we look, we can look at
            
            
            
              the cache, it's as well. And when it comes to logs,
            
            
            
              we look at the query logs, error logs and the audit logs, you know,
            
            
            
              which will probably, probably give us a comprehensive way of, you know,
            
            
            
              auditing, troubleshooting. And then we'll try to enable traces,
            
            
            
              x ray, you know, which will provide us the end to end tracing so that
            
            
            
              way that we can have a complete
            
            
            
              observability into the data store or data retriever
            
            
            
              and other pillars are the tracing. So we have, we will use x ray,
            
            
            
              you know, so that will enable us to get integrate the traces and
            
            
            
              we will look at, you know, other integration AWS services as well.
            
            
            
              And then we will use Cloudwatch as a visualization tool. We can
            
            
            
              also use the Grafana, the AWS managed Grafana
            
            
            
              or any other things as well.
            
            
            
              So one other key thing is be mindful about alerting and incident
            
            
            
              management. So we can use the cloud virtual arms and we
            
            
            
              can leverage AWS system manager as well.
            
            
            
              So one important thing is the security. So we will leverage AWS cloud
            
            
            
              trail to audit and monitor the API calls and
            
            
            
              we'll ensure that the compliance with security and regulatory requirements
            
            
            
              are being tracked. I know we can integrate crowd logs with cloud
            
            
            
              work logs for centralization and then we will use
            
            
            
              AWS config so that we can continuously monitor and
            
            
            
              assess the configuration of our systems, AWS resources and
            
            
            
              we can ensure that we have compliance and best practices with
            
            
            
              the compliance standard with that.
            
            
            
              One key aspect is cost as well. So the more we are using our
            
            
            
              llms, you know, the more the cost factor comes in. So we
            
            
            
              can leverage AWs cost explorer and AWS budgets.
            
            
            
              And finally, one other important thing is that, you know, AI upscale building.
            
            
            
              So we will have to ensure that all the metrics, you know, whether it's
            
            
            
              the LLM specific, application specific or the RaG is specific,
            
            
            
              we will kind of like enable anomaly detection. And then for
            
            
            
              all the logs which we are putting into cloud work, we are
            
            
            
              able to enable the log anomaly detection as well. So we can also use
            
            
            
              Aws, the DevOps guru. So it's a machine learning
            
            
            
              service provided by AWS. So it,
            
            
            
              the DevOps guru will help us to detect and resolve issues
            
            
            
              in our system, especially identifying anomalies and other
            
            
            
              issues which probably we might not be able to uncover manually.
            
            
            
              And then we will look at leveraging AWS code guru as
            
            
            
              well because this allow us to integrate with the application so that
            
            
            
              we can do profiling and we can do the understand the
            
            
            
              resource utilizations usage based on our applications.
            
            
            
              Another very important thing is use AWS forecasting.
            
            
            
              So all the metrics and other things which we are bringing into the
            
            
            
              table, we can use the forecasting that will able to
            
            
            
              understand things in advance so that we can make better decisions
            
            
            
              and we can plan things ahead with that.
            
            
            
              Probably you can ask the question why we need a maturity model. So I
            
            
            
              am a big fan of maturity model because I think maturity models act as
            
            
            
              a north star. So we all want to start someplace and then take
            
            
            
              our systems into observability journey. So if you do
            
            
            
              that without kind of a maturity model or framework, then it's are,
            
            
            
              you know, you, you may ended up with any place, but by
            
            
            
              using a maturity model you can guarantee that, you know, you start with the basic
            
            
            
              steps and then you can finish with it some of the advanced things
            
            
            
              and you have better control of how you go there.
            
            
            
              So the LLM, the indirect
            
            
            
              observability maturity model, I have three pillars. One is,
            
            
            
              I call it level one, which is about foundational observability.
            
            
            
              And level two is the proactive observability. At level three
            
            
            
              we are looking at advanced LLM observability with AI Ops.
            
            
            
              So in the level one we will start, you know, capturing some of the basic
            
            
            
              LLM metrics. We will start getting the logs and start
            
            
            
              monitor the basic from properties, and we will implement basic
            
            
            
              logging and other distributed tracing. And then we will put up the visualization
            
            
            
              and other basic alerts as well. So this
            
            
            
              will kind of give you a foundational observability into your generative AI
            
            
            
              application. The next step is, you know, taking system more
            
            
            
              proactive, like be proactive. So here we will start,
            
            
            
              you know, capture and analyze the advanced LLM metrics and you know,
            
            
            
              start, you know, start leveraging the logs, then the
            
            
            
              other advanced prompt properties. And then we
            
            
            
              will enhance alerts and other incident management workflow so that
            
            
            
              we can identify things much faster and you know,
            
            
            
              resolve things much faster as well. So we will bring in the security
            
            
            
              aspect, the security compliance. We will start generating,
            
            
            
              leveraging, AWS forecasting so that we can start focusing
            
            
            
              about some of the LlMe specific matrix, matrix and the
            
            
            
              prompt properties as well. And for the logs
            
            
            
              we can also set up log anomaly detection.
            
            
            
              And when it comes to level three, which is kind of like the advanced level,
            
            
            
              which is the kind of place where you all want to be in, but you
            
            
            
              have to be mindful that it's a journey. Like you will have to start with
            
            
            
              level one, go to level two, and then we can be into level three.
            
            
            
              So at level three we start with integrating with DevOps guru
            
            
            
              and the code guru, so that with DevOps Guru will provide the AI
            
            
            
              and ML capabilities code guru will provide our quality
            
            
            
              of the code and then we will start implementing AIOps
            
            
            
              capabilities like other things like the noise reduction,
            
            
            
              smart intelligent root causes and then kind
            
            
            
              of like business impact assessments. So the forecasting feature
            
            
            
              will kind of like allow us to understand, if at all, if the
            
            
            
              models can drift, when that can happen, if at all,
            
            
            
              the models can start having a bias, when that can happen,
            
            
            
              the response time predictions and all those things. So the
            
            
            
              AI kind of thing is, can give you a full control of, you know,
            
            
            
              predictability of your generative AI application.
            
            
            
              So now I am kind of like look at more focus on
            
            
            
              implementation angle. So in the foundation model, like, you know, we can
            
            
            
              use cloud work metrics, like so that we can capture the
            
            
            
              basic LLM metrics, like, you know, the inference time, model size,
            
            
            
              prompt length and those things, the prompt properties. Again, we can,
            
            
            
              you know, leverage the sender, those logs into cloud work logs
            
            
            
              so that, you know, we can start monitoring basic properties like from
            
            
            
              content prompt sources and those things, any other logs,
            
            
            
              you know, we will be shipping into cloud work so that we can start,
            
            
            
              you know, getting the basic, the detail.
            
            
            
              And then we will integrate AWS x ray based on the technology
            
            
            
              we are using to develop our LLM to generate
            
            
            
              a app so that we can have ability to start looking
            
            
            
              at the traces and then visualization the dashboards.
            
            
            
              We can use AWS Cloudword dashboards and if required, you know,
            
            
            
              we can go into AWS managed Grafana dashboards
            
            
            
              as well. Alert and incident management. We are leveraging Cloudwatch
            
            
            
              and that will help us to understand some of the
            
            
            
              more the basic to a medium complex
            
            
            
              some of these monitors so that we can have a good control of
            
            
            
              our, how the llms are behaving and like how,
            
            
            
              how is our, the prompt is successful
            
            
            
              and overall how is our generative application is behaving and probably
            
            
            
              not probably, but how our end users are feeling about it.
            
            
            
              And then we will wrap this up with the cost like we using AWS Explorer.
            
            
            
              Because llms are sometimes costly, we'll have to ensure the usage and
            
            
            
              we start monitoring that as well. So level
            
            
            
              two, like, you know, we will go a little bit advanced for the metrics.
            
            
            
              We will start, you know, looking at advanced metrics like model performance and
            
            
            
              output quality. And again, prompt the properties.
            
            
            
              We will look at the advanced properties like the prompt performance,
            
            
            
              prompt versioning. And then, you know, we will start advancing,
            
            
            
              improving the incident workflows. We will look at the
            
            
            
              security compliance, we will look at more into the uplifting
            
            
            
              and like improving the cost factor as well. And one
            
            
            
              of the key thing, you know, here we will bring in is AWS forecasting.
            
            
            
              So using forecasting we want to ensure that we have the ability of
            
            
            
              forecasting all the key, the complex or even every
            
            
            
              metric related to LLM performance, LLM the
            
            
            
              inference, the accuracy and the prompt properties related
            
            
            
              things as well. So and then we will also look at
            
            
            
              enabling metric anomalies, log anomalies so that, you know, we start
            
            
            
              using some of the capabilities of anomalies this and
            
            
            
              finally we will bringing in AWS DevOps guru,
            
            
            
              so and the code guru and that will allow us to bringing in
            
            
            
              the AI capabilities into here the AI ML capability so
            
            
            
              that we can look at things from holistic ways. DevOps guru is a
            
            
            
              perfect tool. And then we will, you know, bringing in AI of practices
            
            
            
              and then kind of like, you know, bring ensuring that our incident workflow
            
            
            
              are more into self healing and there are a
            
            
            
              lot of other improvements and AI kind of
            
            
            
              things which we can bring. So what
            
            
            
              are the other things like, you know, so we will look at bringing in while
            
            
            
              we do this, we want to ensure that we measure the progress.
            
            
            
              So once we enable observability. So we want to ensure
            
            
            
              that we look at LLM. So, so how the LLM output quality
            
            
            
              is getting improved, how we are improving, optimizing our
            
            
            
              LLM prompt engineering area and then like
            
            
            
              ensuring that, you know, we can able to detect our model drifts in advance
            
            
            
              and then we can take necessary actions. We look at, you know, what are the
            
            
            
              ethical things, you know, our models based
            
            
            
              on that, how our models are behaving and then, you know, we look at
            
            
            
              the interpredictability, extendability and start
            
            
            
              keep a close eye on those things and generally like,
            
            
            
              you know, we will start kind of looking at end user experience
            
            
            
              as well. We will clearly define some end user specific
            
            
            
              service level objectives. We will start, you know, track the
            
            
            
              metrics and the improvements and we will start looking at
            
            
            
              the customer experience, ensure that, you know, whatever we
            
            
            
              do is align and correlate with customer
            
            
            
              experience. We see increasing customer experience as well.
            
            
            
              So and like overall that,
            
            
            
              you know, we develop and provide a better world class services into for our
            
            
            
              end users. And then some of the best practices is
            
            
            
              like, you know, so we will have to use a structured log and you know,
            
            
            
              if case you are heavily using lambda, probably go to power tools,
            
            
            
              you'll have to instrument the code, ensure that, you know, you get all the,
            
            
            
              the critical, the LLM specific metrics.
            
            
            
              Then you obviously use x ray to enable the traces as well.
            
            
            
              So the metrics which we are extracting, it has to be meaningful
            
            
            
              it has to add value. So it should be aligned with our business objectives
            
            
            
              as well. And to wrap up like, you know, some of the
            
            
            
              pitfall is that, you know, ensure that, you know, we kind of have a security
            
            
            
              we plan in advance and the compliance as well
            
            
            
              because that's again a key thing, you know, modern day when we are
            
            
            
              using generative applications and clearly define
            
            
            
              the roles like whatever objectives, you know, we are going
            
            
            
              to achieve with this. And probably you can have some numbers,
            
            
            
              you can have some measurable things so that you can start, you know,
            
            
            
              performing and kind of like try to get the benefit.
            
            
            
              So with this like, you know I'm, we are at the close so
            
            
            
              thank you very much. So here I have taken AWS as
            
            
            
              example, especially AWS bedrock. From here we have look
            
            
            
              at what is a general architecture of workflow of
            
            
            
              generator application and what are the key pillars of
            
            
            
              the observability LLM related observability pillars
            
            
            
              which we have to enable and then we will look at the three,
            
            
            
              the levels, the foundation, the proactive observability
            
            
            
              and advanced observability is aiops. And then we
            
            
            
              have look at some of the best practices and the pitfalls and
            
            
            
              more importantly how we can look at this from ROI perspective.
            
            
            
              So with this, thank you very much for taking time. I hope, you know,
            
            
            
              you kind of like enjoy this and then you have understood
            
            
            
              or you have taken few things which you can take into your
            
            
            
              generative application and make it observable and leverage into to
            
            
            
              deliver great customer experiences.
            
            
            
              So with this, thank you very much.