Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to my talk about measuring
expression complexity for PromptQL and MetricsQL.
A few words about me. My name is Roman. I'm software
engineer with experience in distributed systems
and lasts five years. I'm working on an open source project
called Victoria Metrics. Victoria Metrics is an open source time
series database. It's available on GitHub under Apache
two license. So I encourage everyone to check on it.
Okay, what this talk is about, so maybe many of you know
what Prometheus is or what euctorium metrics is, but just
in case, both of them are used for monitoring
and for alerting, for collecting metrics, storing them and processing.
Both of them are open source and Prometheus is
using PromQL as a query language and
Xtora Matrix uses Matrixql as a query language.
These two languages are very similar with the same syntax.
So this talk will be beneficial for if you,
if you familiar with any of these systems. And both of them
are well integrated with Grafana. So you can plot dashboards
with vectoriometrics or Prometheus as data sources.
And this exactly the case when
things can go bad. So imagine yourself in the middle of
the incident trying to understand why your system
is on fire. You go to Grafana, you open your tarpaulin, and then
it takes 60 seconds or something like that to load.
Everything is so slow and this just adds
more stress to your working day.
So I hope with this talk we will try to address this
stress level to reduce it, because hopefully you will
learn how to make your queries faster or at least to understand
why they are slow.
Okay, on the screen we have two examples,
two similar examples of a slow query and a fast query.
So the first query on the screen is the slow one.
It takes 3.5 seconds to execute. And the
second query is very similar, it's almost the
same, but it has a different metric name inside and it takes 50 milliseconds.
So why it is so
fast? Or the first query, why it is so slow? Normally I
will ask on a flying conference, I will ask people to answer in
the room, but we are online, so I will answer myself.
Well, because those queries are not identical, they scan
different amounts of data, those metric names,
they have different amounts of time series behind them.
So in order to understand this difference,
why we scan that much data, or why queries so slow
because of scanning much data, let's try
to build a mental model of what is going on with timesir database.
And maybe not all of you have this mental model
already of how Prometheus vector metrics works, but it is very likely
you understand how traditional relational databases work,
like MySQL for example. So let's try to refer to
that experience of optimizing MySQL queries and
try to apply them to prompt your metric SQL expression.
So to do this, I asked ChatGpt,
how can I optimize this SQL query? And chat
GPT provided really good advices, so that
was pretty helpful. So for example, I have this not
optimized query which selects everything from table,
bar and chat. GPT gave me these advice.
So use indexes, but avoid using too much indexes.
Select only what you need, try to filter data as early as
possible, think of your database design and maybe apply
partitioning if you need it. Okay,
so I follow those advices. I reworked my table.
I'm selecting only the columns that I need. I partition
my table by time. So I am adding the filter by date and
I index my table by user id. So I'm
selecting only the user I want to and this
query will become faster after I modified the table, after I modified
my database design, after I changed
my query to be efficient.
Okay, but can I apply the same tips, the same strategy
to prompt QL and metrics QL expression? Well,
not really, because neither the Prometheus
nor Victoria metrics provide you that flexibility,
actually. In fact, you can use only filtering
data as your main leverage on improving the performance of the queries.
Okay, let's now try to build a mental
model of how Prometheus vector metrics works.
Very simplified, sophisticated mental model.
So on the screen you have four time series
with the same name metric name foo all time series
has a value and the timestamp when that value was observed.
So we talk here about four attributes, the metric
name and metadata which identifies
the time series. And then we have a value and a timestamp.
Basically our observation when it was made and what
is the observation? If you want to learn more about internals
of the data model in, please visit docsecterometrics.com
keyconcepts and I hope you will find many useful information
there. Okay, when we ingest
this time series in the database,
both Prometheus and vector metrics will use metric name and metadata
combination as an id and that id will be added to
the index. So next time we can identify if we already
seen this time series and during read
time we will also know where to find our data
that user asked for value in timestamps
will be stored as a separate columns. Those separate columns
will be sorted differently. Different compression
algorithms will be applied to those columns. So this will be
stored like this automatically by Prometheus or vector metrics.
And knowing this, now we can say that data model
in Prometheus vector metrics can be changed. User doesn't
have control over this indexes.
Also no indexes are created automatically. You can't
change that. Data blocks already partitioned by time because
we are talking about time series database and the stored data types.
Stored data columns are already predefined. Values and
timestamps, names, metadata, you don't have control over this.
This is already predefined. So not
much we can do. Let's now understand better what time series
is. So we have a metric called
foo and it has four time series. Under these names.
The different only labels and the dots on
the screen are basically our data samples.
So data sample is a pair of value and
the timestamp when this data sample was
observed. And this is exactly why timeseries called time
series because we have a series of these observations. We have a
series of data samples ordered by time and
each time series can have arbitrary number of these observations at different
resolution. Okay,
and what makes our prom quell and
metrics QL queries slow?
There are two main reasons why they can be slow.
When we select big number of time series
and when we select big number of data samples within
those time series. Let's see it on example.
So here we have a query which doesn't
have any filters in it. No time filter,
no label filter. So we select everything. So we select all the time series,
we have all the time samples that we have. This query
will be slow, likely to be slow.
Now we add a filter to this query.
So we want only label one and label two
to be present in response. So we scan twice less data.
This query will be faster than the previous. And in this example
we also had time filter. So we want not all the
data samples that we want that there are in the database, but only last
five minutes. So we reduce the number of
samples we need to select. This query will be faster than the previous
query. And yeah, the most valuable
advice I can give you on this talk is selecting less data is
the most effective way to optimize the query performance.
If you remember only this from this talk, that already
will be very good.
Okay, so the question now is how I, as a developer
or DevOps can understand the volume,
the volume of data my query selects. How can,
how can I estimate complexity of my query?
Well, for this we need to answer this two questions, how many
series we select and how many samples we select. So on the screen right now,
you have advice how you can estimate the number
of series your query select. For this, you basically need to apply
a function called count. So you have your series selector and
you wrap it in the function count. This will calculate the number of time series
your query will select. Here's an
example of our first two queries. In the beginning of
the presentation, the first query is slow, the second is fast.
So the slow query selects about
1000 of time series and
the second query selects only one. So no
surprise why the slope query is slow. It selects 1000 more
time series than the second one. Okay,
these are serious. What about samples? For samples,
you need to apply a different function called
count overtime which calculates the amount of samples
on specified interval. Now let's try
it on example. So our slow and fast query.
Well, slow query selects 16
million of samples and the fast query
selects 17,000. So pretty huge difference.
Again, no wonder why the first query was
so slow and the second query was so fast.
Cool. So conclusion, no matter what your query
is, the more series you select and the
more data samples you select and process,
the slower this query will be. That's the rule of thumb.
And one more trick to this.
So selected samples aren't always equal
to processed samples. So, you know,
in Grafana you use these panels called time series which are plotting
many data points ordered by time on the time
series graph and those queries. To plot this
query, you need to issue range query range
query executes the given query with the specified
step paramount. Basically it executes it many
times on the selecting interval.
This is how it looks. So here we have an example of
query max over time over 1 hour interval for
metric name and we execute it on 1 hour
time range with 1 second resolution. So what
it actually means that max over time will be executed
3600 times and each time it
will be selecting 1 hour of data what is specified in
square brackets. So here we will select
2 hours of data samples but we will
process them 3000 times. So this is the difference between
selected samples and processed samples. This query will
be very slow because of the step param. If we'll increase
step paramount, the query will be more lightweight. If we set
step paramount to 30 minutes it will be pretty lightweight.
Okay, what about functions?
How slow are the functions that we use in our expressions?
Well, on the screen you have ordered list
from least expensive to the most expensive functions.
And I also used font size to emphasize
complexity. So the small font size are lightweight. Function lightweight
functions, big font size is expensive functions.
So label manipulation functions are the least
expensive. They don't do calculations, they just attach
meta information to the response. That's all. Pretty lightweight
transform functions as well. They're not doing anything
complex aggregate functions, more complex. They have some logic logic
in processing, so they need to calculate mean mux sample for example.
And the most expensive functions are allowed. Functions such as rate
increase min over time. Well, the functions that need look
behind min dup to specify it next to the metric name.
And by coincidence, rate and increase
are the most popular functions in the prom queue
or metrics. Kl it is very likely most of your dashboards
in grafana they use this function.
So yeah, rate and increase are the
most expensive. Oh wait, I forgot.
Subqueries are so big that they didn't fit the
previous slide. So I'm changing my mind.
Subqueries are the most expensive. What subquery is,
you can see example how subquery look like.
So they allow you to use
to combine functions to calculate the inner query many
times and then fed results of those calculations to the author
query. And this is why they are so expensive,
because the inner query can be calculated many, many times and this inner query
can be not very lightweight.
So you will end up with a
pretty expensive query in the end. And another bad
thing about subqueries, they are very complicated,
they are complex to write, they are hard
to understand, they are hard to read. So if
you have an option to not use
subqueries, I would stick to that. If you want
queries to be fast, don't use subqueries.
Okay, what other performance improvement tips can I give you?
Query caching of course, caching makes everything faster.
So Prometheus doesn't support caching out of
the box. Really? So it is community
implemented. The fix for that there are reverse proxies to
the rescue. Reverse proxies such as Promsky or
tanosquery frontend. What are reverse proxies?
It's a middleware between a
user and Prometheus server. So you
install it, user communicates with a proxy and proxy sends
query to the Prometheus and this proxy can
cache the responses from Prometheus. So the repeatant query will be served
without touching the Prometheus data source.
How this looks like in
reality. So here we have the range query which
calculates rate over the metric. And in response to this
query we get single data points
ordered like with 30 minutes or something like that here.
So we execute this query and we get this ordered
list of values with 30 minutes step okay,
if we execute this query one more time,
30 minutes later, it is likely that
we can reuse our previous result, right? Because those queries
in the past, they didn't change, the time series didn't change in the
past. So what this reverse
proxy do they cache those previous
data points and when you repeat your query again,
they serve the most of these data points from the
cache, and then they ask from the data source
only for the data points they don't have yet, because time
already shifted a bit like 30 minutes later, we issued the same
query, so we don't have the most recent data point.
And those reverse proxies can understand that and they can
ask, they can query only this for this missing
data point from the database. So that's cool. For repeating
queries, it reduces pressure on your Prometheus server.
Very very much so. I recommend using them.
They are pretty handy and will improve the query performance.
What else? Builders push down so we
have two queries on the screen. The first query
takes 3 seconds to load, the second query takes 70 milliseconds
to load, but they will provide you, they will return you
the same equal the same result.
And the reason why the first query is expensive and takes
too much time is because Prometheus executes the
first and the second query one by one.
So we execute the first query with the filter and the second query
without filter. And then when we do the division,
Prometheus will keep only those time series that match by
labels perfectly from the first expression and the second expression.
So basically it will drop a
lot of time series that do not satisfy the filter in denominator.
This is why if we specify the filter in both
numerator and denominator, the query will be faster because
Prometheus will not select the data it doesn't need for
this calculation. This called filters pushdown. So if you have
this somewhere in your queries, make sure that you propagate that
filter to each part of the
query. This will improve the performance.
What else? Recording rules this is also a common
advice by the Prometheus community that you can for
expensive queries. For mission criticals queries, you can precompute
some time series via recording rules.
So recording rule is usually generating
less data in the output than it receives
on the input. And this data is also time series.
So if you have this less data to
query back, then this query over recording rule results will be
much faster. It is likely if you didn't mess up, of course, with recording rule
expression. So this is a really cool
thing for a dashboard, because dashboards always have the same queries.
And if you pre compute those queries,
your dashboards will be lowered and faster. But there are also disadvantages
of using recording rules. So the
first disadvantage is that recording rules, they have a
constant pressure on your database because they constantly
reevaluate the same expression over and over,
and your data source need to calculate it over and over.
Also, recording rules produce extra time series in
response, and those time series need to be stored in the database.
So every time when you apply, when you set up a
new recording rule in your prometheus, you will have
more time series in the database stored after that.
So, and this requires resources, of course. And also
recording rules is something that needs to be maintained. You need to define
them and you need to remember
about them, because if you want to change the query in your
dashboard or you want to change metrics in your application which
exposes this metrics, you. You need to remember that somewhere you
have this recording rule configured, and it is very likely you
need to go and change the expression there as well,
otherwise it will be not correct. So yeah,
recording rules. So let's sum up
quickly what we can do. So, to optimize
promql and metrics Ql queries,
very good. The strategy will be to measure their complexity to
understand how much time series and how many samples we select.
Can we select less? Can we optimize this query somehow?
We can also use caching frontend to reduce the pressure on the database
for queries that are repeating themselves over the time interval.
We also need to carefully craft queries
to always use the right filters or to not have too
small step or too big look behind windows to get
the optimal performance for these queries. And of course use recording rules for
performance critical queries.
Okay, and now the second part. Can Victoria
Matrix make it easier for developers?
For DevOps for users? Yes, it can.
Well, Victoria metrics provide a set of optimizations and
instruments which will help you to understand and to
understand complexity of the query and improve its performance.
And the first thing that Victoriometrics provides is a cardinality explorer.
It's basically built in page in the Vector metrics
UI, which shows you the most expensive
metric names, the most expensive time series,
etcetera. So cardinal Explorer
helps you to understand what actually stored in your database,
how many time series are behind the metric name.
It also can help you to decide whether it's
worth to store those super expensive metrics. Do you actually use
them? Because if you get rid of them, your database will just become faster,
you will need less resources.
Yeah, so this is available in Victoria metrics
UI components. And the cool thing about
this cardinality Explorer is that you can use it with
Prometheus as well. So you can specify Prometheus URL in
Cardinality Explorer and it will give you some functionality of this
explorer for Prometheus.
Okay, query tracing. So there is also a built in
feature which helps you to understand what
happens on every stage of query execution. This is very
similar to what postgres has with explain
analyze statement. So here's how it looks like.
You execute the query and you enable trace query button and
it will provide you like tree lag structure with
every stage of query execution as it was executed
inside Victoria metrics. And it will show how much time was
spent on each stage, plus additional information about
what's happening on each stage of the execution.
So for example, in the query trace
you can see how many series
and samples it selected for this query. The trace
will say well, we selected 500 series,
66,000 of samples and 273
data in overall. The trace also can contain information
like we used that much ram
on during processing this query. So that's the estimation for
you. And with query trace you can understand
exactly what's happening when you execute the query.
Or for example query caching. Yeah. By the way,
the cache thing that I explained a few slides ago,
it is built in in vector metrics components.
So you don't need to think of reverse proxies. It is already there.
It is already used for range and instant queries. You don't need to
even think about this. And in
query trace you can see if your query hit the cache
and maybe it hit cache partially. So you will see how
many entries exactly was fetched from cache, how many
what? The query was sent to the data source to
fetch the missing data. So all this information is there in
the trace filters pushdown. You don't
need to think about this in victoriometrics as well, because it does,
it automatically. It optimizes the query when it can,
and trace will show you this as well. So you see the first,
in the first row of the trace
we see the exact query, how we send it, and on the
second step, Vector matrix applied optimization and pushed down
the filter to the second part of the expression to make it
in the most efficient way.
Okay, recording rules so remember this recording rules concept where
you need to evaluate expressions and persist results back
to the time series database. Vector matrix can do that,
but it also can do a better thing. It supports
the streaming aggregation concept. So you
can do this aggregations these pre calculations of
the data before data gets into the
time series database. And this is super cool because you
don't need to store raw data if you don't want to. You can
store only aggregates and this will tremendously
reduce the amount of resources your time series database need
to have. And by the way, this streaming aggregation concept,
it is available in the VM agent which one of the
Victor metrics tools. And you can use VM agent with your Prometheus.
You can use VM agent with other database which supports
remote write protocol. So yes,
I really like this feature because it helps to reduce pressure on
database many many times. It's super cool.
Yeah, so that was mostly it.
My talk. You can find more information in
this additional materials if you scan the QR code. You will go to the
slides of this presentation and all links there will be
available for you. So I recommend reading the blog
post about optimizing PropQL and metricsQl expressions
and also a very detailed blog post about how subqueries works
in vector metrics and in Prometheus you can find information about
query, trace and stream aggregation, etcetera.
Yep, so that's me. Thank you very much for your attention and
I will be happy to answer any questions if you will have them.
Thank you.