Conf42 Observability 2024 - Online

How to Measure PromQL / MetricsQL Expression Complexity

Abstract

Dive into the complexities of PromQL/MetricsQL expressions with Roman! Learn about query processing stages, identify bottlenecks, and discover speed optimizations for distributed processing.

Summary

  • Roman Roman is working on an open source project called Victoria Metrics. The project uses Prometheus and Xtora Matrix query languages. In this talk, Roman explains how to make your queries faster. He also explains why some queries are so slow.
  • The more series you select and the more data samples you select, the slower this query will be. And one more trick to this, selected samples aren't always equal to processed samples. No matter what your query is, the more volume it selects, the faster it will work.
  • Subqueries are the most expensive. If you want queries to be fast, don't use subqueries. Reverse proxies are a middleware between a user and Prometheus server. They are pretty handy and will improve the query performance.
  • Victoria metrics provide a set of optimizations and instruments which will help you to understand and to understand complexity of the query and improve its performance. With query trace you can understand exactly what's happening when you execute the query. This will tremendously reduce the amount of resources your time series database need to have.
  • You can find more information in this additional materials if you scan the QR code. You will go to the slides of this presentation and all links there will be available for you. I will be happy to answer any questions if you will have them.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to my talk about measuring expression complexity for PromptQL and MetricsQL. A few words about me. My name is Roman. I'm software engineer with experience in distributed systems and lasts five years. I'm working on an open source project called Victoria Metrics. Victoria Metrics is an open source time series database. It's available on GitHub under Apache two license. So I encourage everyone to check on it. Okay, what this talk is about, so maybe many of you know what Prometheus is or what euctorium metrics is, but just in case, both of them are used for monitoring and for alerting, for collecting metrics, storing them and processing. Both of them are open source and Prometheus is using PromQL as a query language and Xtora Matrix uses Matrixql as a query language. These two languages are very similar with the same syntax. So this talk will be beneficial for if you, if you familiar with any of these systems. And both of them are well integrated with Grafana. So you can plot dashboards with vectoriometrics or Prometheus as data sources. And this exactly the case when things can go bad. So imagine yourself in the middle of the incident trying to understand why your system is on fire. You go to Grafana, you open your tarpaulin, and then it takes 60 seconds or something like that to load. Everything is so slow and this just adds more stress to your working day. So I hope with this talk we will try to address this stress level to reduce it, because hopefully you will learn how to make your queries faster or at least to understand why they are slow. Okay, on the screen we have two examples, two similar examples of a slow query and a fast query. So the first query on the screen is the slow one. It takes 3.5 seconds to execute. And the second query is very similar, it's almost the same, but it has a different metric name inside and it takes 50 milliseconds. So why it is so fast? Or the first query, why it is so slow? Normally I will ask on a flying conference, I will ask people to answer in the room, but we are online, so I will answer myself. Well, because those queries are not identical, they scan different amounts of data, those metric names, they have different amounts of time series behind them. So in order to understand this difference, why we scan that much data, or why queries so slow because of scanning much data, let's try to build a mental model of what is going on with timesir database. And maybe not all of you have this mental model already of how Prometheus vector metrics works, but it is very likely you understand how traditional relational databases work, like MySQL for example. So let's try to refer to that experience of optimizing MySQL queries and try to apply them to prompt your metric SQL expression. So to do this, I asked ChatGpt, how can I optimize this SQL query? And chat GPT provided really good advices, so that was pretty helpful. So for example, I have this not optimized query which selects everything from table, bar and chat. GPT gave me these advice. So use indexes, but avoid using too much indexes. Select only what you need, try to filter data as early as possible, think of your database design and maybe apply partitioning if you need it. Okay, so I follow those advices. I reworked my table. I'm selecting only the columns that I need. I partition my table by time. So I am adding the filter by date and I index my table by user id. So I'm selecting only the user I want to and this query will become faster after I modified the table, after I modified my database design, after I changed my query to be efficient. Okay, but can I apply the same tips, the same strategy to prompt QL and metrics QL expression? Well, not really, because neither the Prometheus nor Victoria metrics provide you that flexibility, actually. In fact, you can use only filtering data as your main leverage on improving the performance of the queries. Okay, let's now try to build a mental model of how Prometheus vector metrics works. Very simplified, sophisticated mental model. So on the screen you have four time series with the same name metric name foo all time series has a value and the timestamp when that value was observed. So we talk here about four attributes, the metric name and metadata which identifies the time series. And then we have a value and a timestamp. Basically our observation when it was made and what is the observation? If you want to learn more about internals of the data model in, please visit docsecterometrics.com keyconcepts and I hope you will find many useful information there. Okay, when we ingest this time series in the database, both Prometheus and vector metrics will use metric name and metadata combination as an id and that id will be added to the index. So next time we can identify if we already seen this time series and during read time we will also know where to find our data that user asked for value in timestamps will be stored as a separate columns. Those separate columns will be sorted differently. Different compression algorithms will be applied to those columns. So this will be stored like this automatically by Prometheus or vector metrics. And knowing this, now we can say that data model in Prometheus vector metrics can be changed. User doesn't have control over this indexes. Also no indexes are created automatically. You can't change that. Data blocks already partitioned by time because we are talking about time series database and the stored data types. Stored data columns are already predefined. Values and timestamps, names, metadata, you don't have control over this. This is already predefined. So not much we can do. Let's now understand better what time series is. So we have a metric called foo and it has four time series. Under these names. The different only labels and the dots on the screen are basically our data samples. So data sample is a pair of value and the timestamp when this data sample was observed. And this is exactly why timeseries called time series because we have a series of these observations. We have a series of data samples ordered by time and each time series can have arbitrary number of these observations at different resolution. Okay, and what makes our prom quell and metrics QL queries slow? There are two main reasons why they can be slow. When we select big number of time series and when we select big number of data samples within those time series. Let's see it on example. So here we have a query which doesn't have any filters in it. No time filter, no label filter. So we select everything. So we select all the time series, we have all the time samples that we have. This query will be slow, likely to be slow. Now we add a filter to this query. So we want only label one and label two to be present in response. So we scan twice less data. This query will be faster than the previous. And in this example we also had time filter. So we want not all the data samples that we want that there are in the database, but only last five minutes. So we reduce the number of samples we need to select. This query will be faster than the previous query. And yeah, the most valuable advice I can give you on this talk is selecting less data is the most effective way to optimize the query performance. If you remember only this from this talk, that already will be very good. Okay, so the question now is how I, as a developer or DevOps can understand the volume, the volume of data my query selects. How can, how can I estimate complexity of my query? Well, for this we need to answer this two questions, how many series we select and how many samples we select. So on the screen right now, you have advice how you can estimate the number of series your query select. For this, you basically need to apply a function called count. So you have your series selector and you wrap it in the function count. This will calculate the number of time series your query will select. Here's an example of our first two queries. In the beginning of the presentation, the first query is slow, the second is fast. So the slow query selects about 1000 of time series and the second query selects only one. So no surprise why the slope query is slow. It selects 1000 more time series than the second one. Okay, these are serious. What about samples? For samples, you need to apply a different function called count overtime which calculates the amount of samples on specified interval. Now let's try it on example. So our slow and fast query. Well, slow query selects 16 million of samples and the fast query selects 17,000. So pretty huge difference. Again, no wonder why the first query was so slow and the second query was so fast. Cool. So conclusion, no matter what your query is, the more series you select and the more data samples you select and process, the slower this query will be. That's the rule of thumb. And one more trick to this. So selected samples aren't always equal to processed samples. So, you know, in Grafana you use these panels called time series which are plotting many data points ordered by time on the time series graph and those queries. To plot this query, you need to issue range query range query executes the given query with the specified step paramount. Basically it executes it many times on the selecting interval. This is how it looks. So here we have an example of query max over time over 1 hour interval for metric name and we execute it on 1 hour time range with 1 second resolution. So what it actually means that max over time will be executed 3600 times and each time it will be selecting 1 hour of data what is specified in square brackets. So here we will select 2 hours of data samples but we will process them 3000 times. So this is the difference between selected samples and processed samples. This query will be very slow because of the step param. If we'll increase step paramount, the query will be more lightweight. If we set step paramount to 30 minutes it will be pretty lightweight. Okay, what about functions? How slow are the functions that we use in our expressions? Well, on the screen you have ordered list from least expensive to the most expensive functions. And I also used font size to emphasize complexity. So the small font size are lightweight. Function lightweight functions, big font size is expensive functions. So label manipulation functions are the least expensive. They don't do calculations, they just attach meta information to the response. That's all. Pretty lightweight transform functions as well. They're not doing anything complex aggregate functions, more complex. They have some logic logic in processing, so they need to calculate mean mux sample for example. And the most expensive functions are allowed. Functions such as rate increase min over time. Well, the functions that need look behind min dup to specify it next to the metric name. And by coincidence, rate and increase are the most popular functions in the prom queue or metrics. Kl it is very likely most of your dashboards in grafana they use this function. So yeah, rate and increase are the most expensive. Oh wait, I forgot. Subqueries are so big that they didn't fit the previous slide. So I'm changing my mind. Subqueries are the most expensive. What subquery is, you can see example how subquery look like. So they allow you to use to combine functions to calculate the inner query many times and then fed results of those calculations to the author query. And this is why they are so expensive, because the inner query can be calculated many, many times and this inner query can be not very lightweight. So you will end up with a pretty expensive query in the end. And another bad thing about subqueries, they are very complicated, they are complex to write, they are hard to understand, they are hard to read. So if you have an option to not use subqueries, I would stick to that. If you want queries to be fast, don't use subqueries. Okay, what other performance improvement tips can I give you? Query caching of course, caching makes everything faster. So Prometheus doesn't support caching out of the box. Really? So it is community implemented. The fix for that there are reverse proxies to the rescue. Reverse proxies such as Promsky or tanosquery frontend. What are reverse proxies? It's a middleware between a user and Prometheus server. So you install it, user communicates with a proxy and proxy sends query to the Prometheus and this proxy can cache the responses from Prometheus. So the repeatant query will be served without touching the Prometheus data source. How this looks like in reality. So here we have the range query which calculates rate over the metric. And in response to this query we get single data points ordered like with 30 minutes or something like that here. So we execute this query and we get this ordered list of values with 30 minutes step okay, if we execute this query one more time, 30 minutes later, it is likely that we can reuse our previous result, right? Because those queries in the past, they didn't change, the time series didn't change in the past. So what this reverse proxy do they cache those previous data points and when you repeat your query again, they serve the most of these data points from the cache, and then they ask from the data source only for the data points they don't have yet, because time already shifted a bit like 30 minutes later, we issued the same query, so we don't have the most recent data point. And those reverse proxies can understand that and they can ask, they can query only this for this missing data point from the database. So that's cool. For repeating queries, it reduces pressure on your Prometheus server. Very very much so. I recommend using them. They are pretty handy and will improve the query performance. What else? Builders push down so we have two queries on the screen. The first query takes 3 seconds to load, the second query takes 70 milliseconds to load, but they will provide you, they will return you the same equal the same result. And the reason why the first query is expensive and takes too much time is because Prometheus executes the first and the second query one by one. So we execute the first query with the filter and the second query without filter. And then when we do the division, Prometheus will keep only those time series that match by labels perfectly from the first expression and the second expression. So basically it will drop a lot of time series that do not satisfy the filter in denominator. This is why if we specify the filter in both numerator and denominator, the query will be faster because Prometheus will not select the data it doesn't need for this calculation. This called filters pushdown. So if you have this somewhere in your queries, make sure that you propagate that filter to each part of the query. This will improve the performance. What else? Recording rules this is also a common advice by the Prometheus community that you can for expensive queries. For mission criticals queries, you can precompute some time series via recording rules. So recording rule is usually generating less data in the output than it receives on the input. And this data is also time series. So if you have this less data to query back, then this query over recording rule results will be much faster. It is likely if you didn't mess up, of course, with recording rule expression. So this is a really cool thing for a dashboard, because dashboards always have the same queries. And if you pre compute those queries, your dashboards will be lowered and faster. But there are also disadvantages of using recording rules. So the first disadvantage is that recording rules, they have a constant pressure on your database because they constantly reevaluate the same expression over and over, and your data source need to calculate it over and over. Also, recording rules produce extra time series in response, and those time series need to be stored in the database. So every time when you apply, when you set up a new recording rule in your prometheus, you will have more time series in the database stored after that. So, and this requires resources, of course. And also recording rules is something that needs to be maintained. You need to define them and you need to remember about them, because if you want to change the query in your dashboard or you want to change metrics in your application which exposes this metrics, you. You need to remember that somewhere you have this recording rule configured, and it is very likely you need to go and change the expression there as well, otherwise it will be not correct. So yeah, recording rules. So let's sum up quickly what we can do. So, to optimize promql and metrics Ql queries, very good. The strategy will be to measure their complexity to understand how much time series and how many samples we select. Can we select less? Can we optimize this query somehow? We can also use caching frontend to reduce the pressure on the database for queries that are repeating themselves over the time interval. We also need to carefully craft queries to always use the right filters or to not have too small step or too big look behind windows to get the optimal performance for these queries. And of course use recording rules for performance critical queries. Okay, and now the second part. Can Victoria Matrix make it easier for developers? For DevOps for users? Yes, it can. Well, Victoria metrics provide a set of optimizations and instruments which will help you to understand and to understand complexity of the query and improve its performance. And the first thing that Victoriometrics provides is a cardinality explorer. It's basically built in page in the Vector metrics UI, which shows you the most expensive metric names, the most expensive time series, etcetera. So cardinal Explorer helps you to understand what actually stored in your database, how many time series are behind the metric name. It also can help you to decide whether it's worth to store those super expensive metrics. Do you actually use them? Because if you get rid of them, your database will just become faster, you will need less resources. Yeah, so this is available in Victoria metrics UI components. And the cool thing about this cardinality Explorer is that you can use it with Prometheus as well. So you can specify Prometheus URL in Cardinality Explorer and it will give you some functionality of this explorer for Prometheus. Okay, query tracing. So there is also a built in feature which helps you to understand what happens on every stage of query execution. This is very similar to what postgres has with explain analyze statement. So here's how it looks like. You execute the query and you enable trace query button and it will provide you like tree lag structure with every stage of query execution as it was executed inside Victoria metrics. And it will show how much time was spent on each stage, plus additional information about what's happening on each stage of the execution. So for example, in the query trace you can see how many series and samples it selected for this query. The trace will say well, we selected 500 series, 66,000 of samples and 273 data in overall. The trace also can contain information like we used that much ram on during processing this query. So that's the estimation for you. And with query trace you can understand exactly what's happening when you execute the query. Or for example query caching. Yeah. By the way, the cache thing that I explained a few slides ago, it is built in in vector metrics components. So you don't need to think of reverse proxies. It is already there. It is already used for range and instant queries. You don't need to even think about this. And in query trace you can see if your query hit the cache and maybe it hit cache partially. So you will see how many entries exactly was fetched from cache, how many what? The query was sent to the data source to fetch the missing data. So all this information is there in the trace filters pushdown. You don't need to think about this in victoriometrics as well, because it does, it automatically. It optimizes the query when it can, and trace will show you this as well. So you see the first, in the first row of the trace we see the exact query, how we send it, and on the second step, Vector matrix applied optimization and pushed down the filter to the second part of the expression to make it in the most efficient way. Okay, recording rules so remember this recording rules concept where you need to evaluate expressions and persist results back to the time series database. Vector matrix can do that, but it also can do a better thing. It supports the streaming aggregation concept. So you can do this aggregations these pre calculations of the data before data gets into the time series database. And this is super cool because you don't need to store raw data if you don't want to. You can store only aggregates and this will tremendously reduce the amount of resources your time series database need to have. And by the way, this streaming aggregation concept, it is available in the VM agent which one of the Victor metrics tools. And you can use VM agent with your Prometheus. You can use VM agent with other database which supports remote write protocol. So yes, I really like this feature because it helps to reduce pressure on database many many times. It's super cool. Yeah, so that was mostly it. My talk. You can find more information in this additional materials if you scan the QR code. You will go to the slides of this presentation and all links there will be available for you. So I recommend reading the blog post about optimizing PropQL and metricsQl expressions and also a very detailed blog post about how subqueries works in vector metrics and in Prometheus you can find information about query, trace and stream aggregation, etcetera. Yep, so that's me. Thank you very much for your attention and I will be happy to answer any questions if you will have them. Thank you.
...

Roman Khavronenko

Co-Founder @ VictoriaMetrics

Roman Khavronenko's LinkedIn account Roman Khavronenko's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways