Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome. My name is Gareth and I'm pleased to be here today to
talk about exploring Chat GPT for improved observability.
So why do we care about observability? Well, Werner Fogels,
the CTO of Amazon, famously said everything
fails all the time. He was emphasizing
the importance of designing systems that can handle failures.
With this in mind, it's critical to plan and deploy a comprehensive
observability. The solution when designing,
building and operating our software solutions, we need to take into account the fact that
modern platforms are largely ephemeral in nature.
They are highly dynamic and constantly evolving. Modern platforms
like the hyperscalers make customers responsible for ensuring
their solutions are architected in a way that achieves their required
reliability. Let's take a look at some of the major outages
which occurred in 2022 which may have affected customers.
In January 2022, Google Cloud performed a
routine maintenance event which in the end went wrong
in the US west region. It caused increased
latency for 3 hours and 22 minutes. This affected Google
Cloud networking, DNS cloud run spanner
and compute engine. In March 2022,
Google Cloud again experienced an outage.
This time it was Google's traffic director which experienced
elevated service errors for 2 hours and 35 minutes.
This was caused by a change in the traffic director code that processes
the configuration. It also impacted a number of large customers like
Spotify and Discord amongst others.
In June 2022, Azure had an outage
where customers had trouble connecting to resources hosted in
east US two. According to Microsoft, this was due to
an unplanned power oscillation. The issue lasted
for around 12 hours. It affected application insights,
log analytics manager, identity service media services
and NetApp files. In July, a heat
wave caused cooling systems to malfunction at data centers in
London. This affected both Google Cloud and
Oracle. Also in July, AWS suffered a
power failure in its eastern zone,
US east one, the outage affected connectivity
to and from the region and brought down Amazon's EC.
Two instances this impacted applications of
customers such as Webex,
Okta, Splunk and Bamboo HR amongst others.
In September, Google Cloud again suffered an
outage, this time with its cloud file store list
instances API which started to fail with an error
code four to nine globally. Apparently,
this outage was triggered by an internal Google service which
was managing a large number of Google projects. It malfunctioned
and overloaded the file store API with requests. These are
just some of the major outages, but every day
there are minor outages occurring on the hyperscalers,
so keep in mind that you need to architect your solution.
You're responsible for architecting your solution to achieve
your required reliability goals.
Gartner estimated the average cost
of downtime RT downtime per minute was $5,600
in 2014.
Over the years, this number has been steadily rising. According to
Pingdom, today it can cost as much as $260,000
an hour in the manufacturing
industry, $450,000 in the it industry,
$3 million in the auto industry,
and up to $5 million in enterprise industries.
So a number of factors play a role in these costs,
those being the size of the business, the industry vertical, as well
as the business model. Today's observability challenges are
complexity. So we've introduced multicloud environments,
which are increasingly complex, and many
legacy observability platforms are not able to keep up.
The volume of data and subsequent alerts has also exploded
in recent years. It's resulting in lost signals
as well as alert fatigue for operations teams.
We also have challenges around silos within organizations where infra,
dev Ops, and business teams cause many
key insights to become lost or surfaced too late because
they don't talk to each other. Correlation is also a challenge for many
customers. So we need to realize what actions,
features, apps and experiences actually drive business impact.
For most customers, this is a very challenging thing to do.
Unfortunately, risks of iT outages are expected to
increase in 2023. This is largely due
to the fact of the current economic
climate as well as tech layoffs which are occurring.
So in 2023 alone,
715 companies have laid off around
200,000 employees. This results
in a loss of institutional knowledge and expertise
for these companies and other challenges due to
economic concerns and cost reductions.
So what do customers actually look for in their modern observability
solutions? Customers are increasingly looking for cost-effective and
unified observability platforms that can help them monitor and manage
their complex it environments. They also expect their monitoring
solution to not only provide
real time visibility into their systems,
but also to leverage machine learning and AI to
predict potential issues before they occur.
AI alerting is also something that businesses are looking
for, essentially to reduce
alert fatigue, and this also allows them to
proactively address potential issues before they become major problems.
Correlation to causation analysis also helps customers to identify
the root cause of issues and incidents, allowing them to quickly resolve problems
and minimize downtime. So there's a lot of innovation happening
in the AI and large language model space.
So what are large language models? So large language
models are a type of AI that can process and understand
human language. These models are trained on massive amounts of text
data, such as books, articles, and websites. They use
complex algorithms and neural networks to learn the
patterns and structures of the language, allowing them to generate humanlike
responses essentially to text based queries.
Large language models have a wide range of applications, including natural language
processing, chat bots, and language translation,
amongst others. Some of the most well known language models
include GPT-3 Bert, and some
others. Like Chat GPT. These models have been used to create
chat bots that can hold conversations with humans,
generate realistic text, and even write news articles. However,
there are some concerns around the ethical implications of large
language models, such as their potential to perpetuate biases
and misinformation. Okay, so large
language models leverage deep neural networks.
Deep neural networks attempt to imitate brainlike functionality.
Now, neural networks have been around for some time.
They are algorithms which are essentially modeled after the
brain that are designed to recognize patterns.
They interpret sensory data through machine perception.
The patterns they recognize are numerical in nature.
This is what all real world data must be translated into.
In the diagram on the left hand side, you can see a deep neural network.
So deep learning occurs when you use
stacked neural networks, that is, networks composed
of several layers. The layers are made of nodes,
and on the right hand side, we can see one of those nodes magnified.
A node is a place where computation happens.
It's loosely patterned, and it fires when it
encounters sufficient stimuli. A node combines input
from the data with a set of weights that
either amplify or dampen that input,
therefore assigning significance to inputs.
So we can see that we feed it an image of a dog.
It runs through the neural network and it
detects it's a dog. Large deep
neural network models are pretrained from the whole Internet.
This requires a significant amount of effort and engineering.
Initially, you need to prepare the data. This includes
actually selecting the data that you would use, filtering it,
deduplication of the data,
redaction. So essentially removing PII,
and finally tokenization. We then adjust the billions
of parameters to ensure
that our model returns the expected results. As you can
see, for GPD one, it had 117,000,000
parameters, but GPD three, which is a fairly new
model, has 175,000,000,000 parameters.
Finally, we need to actually reinforce the learning with human feedback.
So we see in the diagram on the left hand side that outputs
from a neural network are awarded by human labeler. So this incentivizes
the model or the neural network to
favor those outputs over others. These models
are extremely expensive in energy costs to
train, so it can cost in excess of $20 million
in energy costs. This is an important caveat.
Okay, here we can see some of the data sets
which were used to train the various open AI models
right in the family. So GPT one took around
4.8gb of unfiltered data, GPT,
240 gigabytes of human filtered data.
And finally, GPT-3 has 570gb
of filtered data from 45 terabytes of raw data.
Chat GPT, which is the focus of this session,
is a version of GPT 3.5. It's fine
tuned on dialogue, using 175,000,000,000 parameters,
so Chat GPT can understand and respond to a wide range of
user queries, from simple questions to complex conversations.
It can also generate humanlike responses and
adapt to the human's language and
tone as well. Chat GPT has been used in various applications, such as
customer service, education, and entertainment.
So typical language models use next token
prediction, or mast language modeling,
to predict the next word in a sequence. There are limitations
to these two approaches. So limitations are they
are unable to fully understand the context. Another is
inputs are processed sequentially on an individual basis.
What OpenAI and GPT essentially
brought to the table was that they
were designed as an
autoregressive language model.
So this uses previous words to predict the next word
in the sequence. This means they use previous words to predict
the next word in the sequence. They also leverage the transformer
architecture, which is a deep learning model that
adopts the mechanism of self attention,
so differentially weighting the significance of
each part of the input data.
In fact, Chat GPT is able to process
all input data simultaneously. So Chat GPT
is a generative AI that uses the ability to
learn, which has the ability to learn, sorry, and make decisions.
But this does not mean that it's Skynet.
There are some key differences between the two. Skynet is
a fictional AI system that was created to control military, weapons and defense systems.
It became self aware and decided that humans were a threat to existence.
This led to a war between humans and machines. Skynet is often
portrayed as a malevant force that seeks to destroy humanity.
So chatgpt and generated AI models. They're used in a variety
of applications, such as image and text generation, a far
cry from what Skynet supposedly could do in
Terminator. They are trained on large data sets
and can generate new content that is similar to the original data. Generative AI
models are not inherently good or evil,
but their use can have ethical implications.
While there are some similarities between the two, they are fundamentally
different, and we
still have a way to go to reach Skynet level.
It's always important to consider ethical implications when leveraging AI
in any application, including with generative AI
models. So luckily, chat GPD is not perfect. As I mentioned,
it can on occasions return nonsensical responses.
It's sensitive to minor changes in prompting,
it's excessively verbose and overuses
phrases, and it's challenged by ambiguity.
Another issue is that it's susceptible to prompt hacking
or injection, so we have to take care when
designing our prompts.
Okay, so how can we use Chat GPT in our
observability solutions? Well,
there are many things that we could use it for,
for instance, conversational UI. So using
natural language is a very comfortable way for users to query data.
Also code generation. So Chat GPT
could support developers and operation engineers when writing scripts in code.
Another area which is very interesting for us is intelligent
problem remediation. So Chat
GPT essentially is suggesting ways to resolve problems in custom
code. Finally, we could look at enriching
observability context using Chat GPT. So this means we will enrich
problem tickets or alerts using Chat GPT to
provide additional context and essentially driving more effective
remediation.
Keep in mind that Chat GPT's responses are non deterministic.
You can see on the left hand side,
I prompted Chat GPT with a question,
and then I prompted it again with the same question.
We can see a number of differences. Although at first glance it looks like
it produced the same output,
I would say that humans expect it,
systems or computers, to be deterministic.
Now, if we had to integrate this into our solutions,
operations engineers might be thrown
off by the fact that it produces different guidance
based on the same input. We need to do
some work to make sure that users of our systems
understand and receive the correct
output every time.
Also, to make informed decisions, Chat GPT needs to
build up a lot of context, and that's in the form of essentially
prompt and completion or question and answer. You'll see this
thread, this chat thread that I has with Chat GPT,
where I asked it why I was experiencing
additional latency between layers in my application.
Now, it was very verbose initially,
and as I drilled down into specific areas,
it asked more and more questions, right? Eventually you would get to the
answer, but you can't expect engineers
or operations teams to do this every single time to solve every
problem. It just takes too long. And this
basically means that, well, as a result,
we need to ensure that we engineer our prompts very
well using guidance. You'll see later in the next slide to
ensure that we get the right answers as quickly as possible.
So as I mentioned, prompt engineering is important. So this is,
in my mind a new discipline, and there are a
number of things that you should keep in mind.
There's a lot of guidance out there. In this case, I'm looking at
guidance from Microsoft.
But top off of the screen, we can see some basics for
designing your prompts, right? So be specific.
Leave as little to the imagination as possible.
Also use analogies, so be as descriptive as possible.
Provide samples, double down. You may need to
remind the model what you actually
want, because that may
be lost as you proceed in
your chat thread.
Make sure the
order of the prompts prioritize
what you actually want. So order matters.
Give the model an alternative, right? So give
it an option to say that I don't know or I
don't have enough information to do that,
those kinds of things.
And that translates into essentially a number of different implementation
techniques. So priming the model, for example,
you make sure that the model has sufficient context, instructions and other
information which is relevant to what you're trying to achieve,
and you use a system prompt to prime the model.
Providing examples is a very good
technique, and this also provides
additional context to the model. It's called fusod.
Learning excels
can be susceptible to recency bias, so make
sure that you repeat yourself with the most
important points at the end of your prompt.
So a few words or phrases at the end of the prompt.
So use a few words or phrases at the end of the prompt to obtain
the model response that you want in the format that you want.
So if you want JSON, you can tell the model that yes,
I want JSON in this particular format, or CSV or whatever
you want. Keep in mind that
large language models often perform better if the task
is broken down into smaller steps.
In recent months, we've seen a very big push by the hyperscalers
to incorporate generative AI into their platforms,
so Azure has invested heavily in OpenAI,
so $10 billion.
They have also announced a number of different services like
prompt flow and support for various foundation
models. They also announced that they will be
adopting the Chat GPT plugin standard
by OpenAI. AWS also
announced a number of different models. They also announced
new hardware and infrastructure for training models
to make it more effective, more efficient to
reduce those costs for training models. They also
recently gaed code Whisperer, Google Cloud,
in their Google I O conference, announced more than 25 products which
featured Palm two and Gemini
models which powered them they
also announced next generation a three gpus
for training models as well.
So I want to leave you with some thoughts.
Do I think that large language models are a panacea?
No, I don't. I think we need to use them in
the right way. I think prompt engineering is critical.
We need this new discipline, which may
require reskilling, and also correct tooling, which may
not even exist today, to support engineers
protecting intellectual property and data.
Also security. Their security concerns can be difficult.
We need to think carefully about how we do that when we engineer our prompts,
and we need to understand in general the risks of the
GPT family of models, as well as generative AI before
we actually use it in our systems.
Thank you for joining my session. I really enjoyed presenting
to you. Please feel free to reach out to
me anytime for further discussions.