Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and welcome to my talk titled GPT,
revolutionizing monitoring and logging systems.
So the AI you've been hearing about on the news mostly
refers to llms, which includes chat,
GPT, bard, my AI from
Snapchat, and so on. Llms are called large language
models, and they can be roughly thought of as a system that
generates likely strings in the same grammar that follows can input string.
It's important to note that these are not magic,
they are not self aware, and they don't
retain any memory of anything that's been said to them beyond what
was in the prompts that you provided at the
time of this recording. The big LLM providers
are OpenAI, which is the market leader in the space,
offers the most sophisticated models available today,
Google, with their Bard AI and stability AI,
which is open source and can run on your own infrastructure.
It's important to note that logs are liars often,
or sometimes rather.
While they generate likely strings, these don't necessarily
mean accurate strings, and they're best suited to workflows
where there's some human review, and creating
better prompts gives you dramatically better results.
So how do we create those better prompts?
Most people use GPT and other llms
by giving it what are called single shot prompts, where you
give it a question and expect some
output back. Here is an example of a single shot prompt
where we ask it to create a list of characters from
the book Kadoon and the format of first letter of first name
and last name concatenated together as you would
usually do for usernames. As you can see,
it didn't quite give us the results we wanted, though it gave us
something that loosely looks like the results we wanted. So how can
we improve this? We can improve it by
giving it some examples.
Here we give it the example of Paul Atreides. We want the
username P. Atreides for that.
So as you can see, it generated a list
of names that better fit
the output that we're looking for. Knowing that
providing it examples gives us better results. How can
we use that knowledge to improve applications that
we're building? With OpenAI, we can do what is called tuning
as well. A tuned model can be thought of as mini shot.
OpenAI allows you to provide json formatted
data of example prompts and example outputs that
we want. For those prompts, we can send many,
many prompts, prompts, and example outputs this way to OpenAI
to tune our model. And it's highly recommended that you do this,
as it creates much higher quality results and also
reduces costs significantly because we're no longer putting those examples
into our prompts, which counts against our license.
Here's how we can use GPT to
improve our monitoring and logging. Say we have a raw
log file. As you can see,
it's not too easy to work with that as is,
and we can give it a handful of examples of the specific
fields we want it to extract from that raw
log event. As you can see, it pulled
out all of those individual fields. Chat GPT's response
is highlighted in green here. Once we have those, we can then
say let's write some regexes that pull
out the field values for each of those fields,
and you can see it didn't always quite do the
best job. But this is a great jumping off point for building
regexes that we can then use in our application. Here's an example
of where we can use GPT to help us out with our monitoring
and logging. This is can event from Azure ad describing
a user being disabled using that few shot method discussed
earlier. We can give it some fields and their field values
and ask it to get the rest of the fields and field values from that
raw log event. As you can see, it did that.
We can then ask it to write regexes that extract each
of those fields, which it did with mixed
results, but it's a great jumping off point to build
the rest of your field extractions. Where GPT really
shines is summarizing large amounts of data into
something human readable. So here's that raw log
event from Azure ad that describes a user being disabled.
As you can see, it's not too human readable as
is, unless you spent quite a bit
of time looking at azure logs. So what can we do to
make that immediately recognizable to a analyst?
We can ask GPT to summarize it for us, and as you
can see, it did a fantastic job of that described
the core directory service where that law came from. It turned
that isolinear timestamp into something human readable,
and it described what the acting user was and what the
target user was and the result of that operation.
This was done using very little training. It was immediately
useful to us. How can we do this automatically
with Kibana? Kibana plugins are written in
typescript, so they're pretty easy to work with,
and elastic offers a template plugin on
their GitHub page. Highly recommend you take a look at that and
just building off that template,
add the API integrations you want to use.
There's also helpful guides out there if you want to create your own kibana
plugin from scratch. With the kibana plugin that we put
together, we were able to give
it that original log file and then have GPT add
a description to that log, which we then stored.
This is much more readable to a human analyst who would be reading through
this. Some caveats with building plugins
that interface with OpenAI is there is a token
limit, which we'll talk about later, that basically specifies
that we can only send some amount of data
to the API and get some amount back from OpenAI.
So the raw event we're sending it may not fit,
and we want to be careful to trim what data we send to only
what's necessary. It's also worth noting that this raw event may contain
information you don't want to send, including client
ids, specific usernames, and so on. So working
with the OpenAI API, there's a
number of different parameters we can look at the
model, the prompt itself,
temperature max tokens, top p
frequency okay, let's talk about some OpenAI parameters
for each of these. There's often analogous parameters for
other llms.
Temperature is a value that describes how random we want the model
to behave. It's a flute between zero and two,
where zero is instructing the model to behave
completely deterministically, where one prompt will
always give us the same answer back and higher
values will get more random and varied answers back.
It's advisable that you have a low temperature
for things where determinism is valuable. That can
be field extractions, creating regexes, and higher
values, where it's providing
more human readable responses back, that can be
summarizations, and so on. Tokens describe
the max amount of data that can be sent and received
in response to OpenAI.
Both the data sent and the data you get back are summed
together to see if they hit that max token limit or not.
One token is loosely one word,
though sometimes several words can be added together to equal
one token. Top P returns more
probabilistic answers back, where the lower the
value, the more probabilistic
answers are returned. So, for example,
0.1 will represent the top 10% of
possible answers OpenAI might generate,
and it will only give you results that came out
of that top ten. It's recommended that you set
either top P or temperature if you want to increase
or decrease the randomness of your responses, but not
both. Frequency penalty decreases the likeliness
for OpenAI to repeat itself.
It's a float value between zero and one, and presence penalty
decreases the likeliness, or, excuse me, it increases the likeliness
for OpenAI to talk about new topics.
Finally, there's the model of which
OpenAI has many different models worth taking a look at.
DaVinci three is the most
sophisticated GPT-3 model out right now.
It can provide the most detailed and creative responses,
though other models might be faster, they might be lower cost,
they might be more suited to specific subjects,
such as code generation, and so on.
Finally, let's talk about some privacy and confidentiality considerations.
All right, here's some open AI models that would be of
interest to somebody viewing this presentation. GPT four
is by wide margin the most capable large language model out today.
At the time of this recording, it is a bit expensive to
use, so I wouldn't recommend using this for bulk tasks.
Maybe summarization if a high deal
of sophistication is needed in the response.
For the most part, I recommend using GPT-3 five
turbo, which is a very capable three five model.
It's optimized for the sort of work that
this presentation has covered and can be used
very cheaply. However, if you're looking to perform
tasks that would require a large amount of tokens,
that it would be summarizing very large log files.
If you're looking for very detailed responses back,
GPT 432K is also fantastic.
As you can see on this table here, you can use far more tokens
with that than you can with any other model. Finally, let's talk about some privacy
and confidentiality considerations.
OpenAI and their privacy agreement does
say that they don't use any data sent to them via their API for training
new models, so you don't have to fear
secrets sensitive being included in any training
set for future models. However, they do not say
they don't retain logs or other properties about what might have been
sent to them via the API. So always be careful when
sending any sensitive data to a third party that might include
keys, secrets,
client ids, usernames, if you don't want to disclose that,
and so on. If you're looking for privacy, I highly
recommend looking at stability AI's open
source models, which you can run locally on your own hardware.
Hopefully this presentation has been very informative
for you, and if you have any questions, feel free to reach
out to me. Clay Langston at Oak nine
finally, let's talk about some privacy and confidentiality considerations.
With many large language models out there today,
data sent to them is used to train future models,
which you would not want them to do with a lot of the data you'd
be sending them for what we talked about in this video.
OpenAI and their confidentiality agreement says that
they do not use any content sent to them to train future
models if that content was sent to them via their API.
However, they do not say that they do not retain logs or
other properties about what was sent to them.
So be mindful. Sending sensitive data to a third
party is always a risk, especially if that data might include
secrets, tokens, so on,
or usernames, client ids, or other data like that.
If that's not something you wish to disclose. If you are
very concerned about the sensitivity of your data and who has
it, I highly recommend looking at stability AI and
looking at one of their models that you can run locally on your own
hardware. And that is all. My contact info is
C. Langston at Oak nine IO. Feel free to reach out
to me if you have any questions about anything talked about today,
or if you just want to say hello. All right, thanks everyone.