Shift Left your Performance Testing

Video size:

Abstract

Does your team have to deal with performance issues late in their dev cycle? Does this lead to a lot of unplanned work in your sprints? What if I told you, that your team can validate performance-related hypotheses right within your sprints and on their local machines? Attend this talk to know more.

Summary

You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud. I help companies with cloud transformation, exchange programming, agile and lean. My interests include distributed systems and high performance application architecture.
Most of the performance issues are identified within the staging environment. The higher environments are highly contested. We'd like to identify most of the issues on our local machine. There are multiple issues, or rather challenges, in applying shiftlift to performance testing.
There is a verifiability and falsifiability aspect to hypotheses. How does that apply to software engineering and particularly to performance testing?
One more example of this scaling down I'd like to share. I call it a with and without experiment, or the AB experiment. To say the cache makes a difference, in fact, it's working and then I can take it to the higher environment.
The problem with trying to run performance tests within your sprints cycle is time consuming. The better way to look at this problem is to reduce the longest testing itself. How do we do that? We reduce effort, we reduce complexity and reduce repetition.
Perfumes is a tool that we came up with based on some of these principles. It helps processors and developers collaborate, wherein the developer can write API tests in karate. The entire orchestration is handled by coffee and it's dockerized. Here's a quick demo of perfies.
Perfies can enable you to shift left and test very much on your lower environment. The same setup can pretty much be promoted to the higher environment. Gather metrics both from Gatling and your application and visualize that on Grafana.
The biggest challenge of shifting left and performance testing is the mindset itself for performance testing versus performance engineering. It'd be nice if we could use burp testing more like a learning activity through multiple spikes, which helps us avoid guesswork in our system architecture.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Are you an SRE, a developer, a quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud hello and welcome to this talk about shift left, your performance testing. My name is Hari Krishnan. I'm a consultant and a coach. I help companies with cloud transformation, exchange programming, agile and lean. My interests include distributed systems and high performance application architecture. Below are some of the conferences that I've spoken at. So let's quickly get started with the talk. Before we jump into the topic about why we need to shift left in performance testing, let's understand the context. So, a quick show of hands in which environment you identify most of your performance issues. Would that be a local machine? Not likely. Right? What about development environment? I haven't seen many teams identify, but if you are, that's great. Most of the performance issues, at least what we start identifying within the staging environment, because that's where we can at least generate some load against the particular environment. And lastly, for the teams that I have worked with, a big part of the performance issues are identified in the production replica because that's where you have the infrastructure that's close to production. And whatever we are able to reproduce or generate as a load is something that is representative of the real environment. And for those issues which we cannot identify up till this point, our users identify it for us and then we have to fix them. The intensity of the red color in this particular diagram here kind of represents like the darker the shade of red, the longer it takes to fix the issue, and that's not desirable. So let's understand first what is the current setup of performance testing and why we have that issue. So usually the developer starts writing code on the machine and then starts pushing it to the development environment and eventually makes its way into staging. And that's probably the first point in time when the performance testing comes in and has to set up the performance testing environment, write the test script, set up the load generator, the agents and whatnot, send the load to the staging environment and maybe even to the production environment, generate the report out of the test run and then ultimately share it back with the developer. Developer makes sense out of it and then incorporates the feedback into the code base. So that's the usual cycle. So what's wrong with this setup? The first problem itself is the fact that if we are identifying the issues as late as staging of production, it's already quite late in the cycle. Right? And what's worse is if we identify the issue as staging of production, then it takes an equal amount of time to again fix the issue, iterate over it, and then bring it all the way from local machine to development to staging. And that's not it, right? The higher environments are highly contested. I'm not the only developer who's trying to fix performance issues or working on trying to see if Pyfix is even working. There are other developers who would like to verify it also. So the higher environments get. There's a lot of timesharing going on there, and we tend to have a difficult time trying to identify which feature we need to test or which issue can we put it through its basis? So that's the difficulty there. And that's what leads to this kind of a graph where we identify most of the issues very late in the cycle, like leave it up to the users to find it, what should it look like? Instead, we'd like to identify most of the issues on our local machine. And then there are those issues which we cannot discount. Some of them have to be identified, or we can only test them or figure them out in the higher environment because of the nature of the issue itself. But largely we'd like to identify as much as possible on our local machine or towards the left hand side. That's what we mean by shift. Left here should be easy, right? The performance testing already has a robust setup, and as a developer, all one needs to do is wear the performance testing hat, borrow the performance testing setup from them, and start sending the load to local machine or development environment. And we can start depending less on the performance system and make their lives also a little easier. This is easier said than done. There are multiple issues, or rather challenges, in applying shiftlift to performance testing. Let's understand those. The first issue itself is a scale challenge. So how do you truly create a truly representative environment on your local machine or even development environment for something that's going to be running in production? Production environments are usually like sophisticated. You have a hybrid cloud, you have sophisticated distributed systems and microservice architectures and load balancing and whatnot. Production replica also seems to mimic a lot of it as you come down. The environments and staging has a little lesser complexity, and ultimately on the local machine, it's just a humble laptop. Now, how do I say that I have verified the performance characteristics of an application on my local machine, and so I'm confident that it's going to work on production? Or how do I even take a problem that I'm identifying on the production environment and bring it down to my local machine and even replicate it. Quite hard to do that, right? Second is obviously the network topology itself. Production environments might have a lot more complicated firewalls and whatnot. The latency levels, the proxies and practically everything that's involved there is a lot more complex and there's not much you can replicate. I mean you can try, but there's only to an extent that we can do all of that fancy networking on a local machine. And let's leaving the application aside, the performance setup itself tend to become quite complicated. The higher environments have like you need multiple machines to generate a sufficient load, and it's a fairly sophisticated setup, and you start stuffing that into lower environments, that is hard in itself, and you try to push that into my local machine as a developer, and this is what I have to deal with. This is my eightgB MacBook Pro and the memory pressure I'm dealing with while running Catlink test, that's not desirable now. All right, so how do we solve this? So, shift left equals scale down. You can't go to C trial for every single design change, right? That wouldn't be practical. We have to figure out a way to scale down the problem and try to figure it out on our local machines or on the left hand side. But that's not fairly straightforward, is it? So let's take two parameters and see if we can scale down a performance issue. So let's say I just want to deal with request per second and response time, throughput and latency. Very popular two metrics that we would like to figure out and how to scale it down. So if I have x and y as the two KPIs for my production, now do I scale it down with aspect ratio, or do I just say I'll maintain my RPS, but I will probably discount a little bit on response time? Or do I do something else? Like what is the accurate way to scale down and still validate the performance KPIs in a truly representative manner? The reality is we cannot. So what do we do about it? Even if we cannot scale down the real problem, we can scale down the trend and invalidate hypotheses. So since I spoken about hypotheses, let's understand what hypotheses invalidation is. So hypotheses, as our old school lessons have taught us, like you have to make a statement, and then there is a verifiability aspect to it and a falsifiability aspect to it. So for example, I could say all green apples are sour so in order to prove that statement, there are two ways. One, I can go eat all the green apples in this world and say, not even one of them is sweet. So all of them are Sar. So that's the verifiability aspect of it. Right now, that's not very practical for me to eat all the green apples in this world. So the falsifiability aspect is, even if I identify one green apple, that's sweet. I've potentially proven like the entire hypotheses wrong. So that's the hypotheses invalidation aspect. Now how does that apply to software engineering and particularly to performance testing? Let's take a look. So recently I was tasked with a somewhat similar problem wherein as you can see on the x axis, I have the throughput. As the throughput increases exponentially, I want my response time to degrade only at a logarithmic scale. So this is what was expected of the system behavior, and they wanted me to figure out if this has been satisfied by the application. Now, how do I scale this problem down? It's very hard, right? Because 10,000 RpS is not something I can figure out on my local machine. Not practical. But what can I do? Like I said, I can scale down the trend. The trend here, the clue itself was in the problem, which is the exponential increase in throughput and a degradation of the response time. Only in logarithmics case I can take that and scale that entire problem down and I just figure it out with one RPS ten, 900. Now when I tried it on my machine, I saw that that's not possibly true. The latency was in fact degrading at an exponential scale. This is not desirable. So what I can then do is very quickly realize that I have experiments, so I have disproved the hypothesis does not fit with the current application code, and I don't need to go testing on the higher environment. I can quickly apply a fix on my local machine, see if that's working, and then only then, if that works, then I can push it to the higher environment. Let's say I made that fix and I even got a better performance than what was expected. Maybe I'm even tracking very much under the logarithmic scale. But is this sufficient for me to say that it's going to work on the higher environment? No, it's inconclusive, because on my local machine I have proven it at a lower scale. But that does not mean it potentially will work in the higher environment since it is inconclusive. Now this part of it, I have to go with the verifiability experimentation. So I validate it in the higher environment. And I realize in fact what I found on my local machine does told good for some more time. But after 1000 rps it tends to fall off. So now I need to again identify what's going on beyond 1000 rps and why is it falling off. And for that I come back to my local machine, I figure out a way to replicate the issue, and then I can fix it on my local machine, verify it in the left hand side, and then move to the right. Now this is not all bad, right? The first part, like wherein I identified the issue at the very lower scale itself. The one fix, I could do it on my local machine itself. I didn't have to go to the higher environment, I only had to go to the higher environment where it was absolutely necessary. So I did save that one point where I had to depend on the higher environment. Now, as you can see on the left hand side we're learning through falsifiability, and on the right we are confirming our learning through verifiability. So that's the beauty of this whole idea. Now there is one more example of this scaling down I'd like to share. This is one of my favorites. I call it a with and without experiment, or the AB experiment. So again, the current characteristics of a system I was dealing with was, I told, hit 10,000 rps at 80% cpu utilization. What was expected is once we introduce a cache, I'd be able to hit 1 million rps at the same cpu utilization. Fairly ambitious, but let's see if we can scale this down. Now, 10,000 rps itself was out of bounds for me, 1 million rps is really, really hard. How do I figure this out on my local machine? Now the interesting piece here is I don't really need to worry about 10,000, 1 million and whatnot. What can I do? On my left hand side, let's say I at some max CPU, let's not even say 80% at some max CPU I have XRPs without cache. And at the same cpu level on my local machine with the cache, I have yrps. The only pattern that matters to me is the fact that X is less than y. It doesn't matter what xrps is, it doesn't matter what Y RPS is. But the fact that the cash does make a difference is what is important. Because if we don't do this, I've seen many times that the cash is just there for moral support really doesn't make any discernible difference. So for me, this was really an important eye opening experiment to run on my local machine. To say the cache makes a difference, in fact, it's working and then I can take it to the higher environment. I don't need to go to the higher environment to realize that my cache setting sensors are not working. So I could save that one aspect. So that's the with and without experiment. Let's get to the next challenge now with shifting left. Now, the problem with trying to run performance tests within your sprints cycle, take the typical sprint right, like the developer starts writing code for feature one and she's done with it. And based on what we've already spoken, we know that performance testing is a time consuming activity. So we don't want to do it right within the sprint, so it becomes difficult. So we create a task out of it and we hand it over to the performance system for the person to look into it and figure out, put the issue through, put the feature through its paces and come back with some feedback. Meanwhile, we churn out feature number two and then hand that also to the performance tester. Now, based on what we have done in the previous sprint two, we start churning out feature three. And that's the first point when the performance tester comes back saying, hey, that feature one which you developed, I found a few performance issues. It's not really confirming with the NFRs that were expected of it. However, we are in the middle of the sprint, we cannot change stuff. So we press on with feature number four and by then the performance system comes back with more feedback for feature two. Now what happens? Print number three, a developer, she has to sit through and look at the issues that have been identified, performance issues that have been identified on feature one, feature two, and meanwhile a performance tester is putting feature through three through its faces. So practically what has happened now is sprints three is a complete washout. And this is a very typical anti pattern which we call the hardening sprint, wherein practically we have not done any meaningful work. We're just fixing issues from some of the previous sprints. So this is not desirable. And this is not something new either. Every time we call a feature done before it's actually fully done, this is what happens. We don't complete the testing and then we call the feature done and move on to the next one, whereas the performance testing actually came back with some issues that were identified and we had to fix it. How do we solve this? The better way for us to do this is to collaboratively look into the problem. The developer and the performance system work together, finish the feature, pull us through its spaces, and then identify the issues, fix those issues and then fix the performance issues. And that's like we have to come to terms with the reality of it. It does take two sprints, and practically this is a best case scenario that we identified issues and we fixed it in the very first iteration and that itself took two sprints. Now, obviously this is not desirable. We'd like to become a lot more efficient at churning our features, right? So the better way to look at this problem, instead of siphoning off performance testing, is to reduce the longest testing itself. How do we do that? We reduce effort, we reduce complexity, we reduce repetition, and we have to automate all the way. And that's the only way we can solve this problem. So let's look at how some of those aspects we took and incorporated into our daily activities. So let's look at repetition. So developer writes application code. Performance testing is responsible for the performance testing setup. Developer writes API tests and Performance tester writes performance test script. API tests generate requests. Performance testing also generate requests. Very similar. API test assert response performance is not so much. They assert more on the response time. As long as the request itself is coming back positively, that's all we care about in performance testing. If you think about it, the API tests and performance test scripts are practically duplicated across every time. One of the examples I'd like to take here is one of the teams I was working with, a developer was writing API tests in karate, and the performance test was. Tester was writing the perf test script in Kala with Katlin. So obviously that was repeat effort. And wherever there is duplication, there is inconsistency because we are repeating it and we don't know whether the performance test script is in line with what the API tests are doing and what the feature really is. So there's that inconsistency, and then there is a disconnect. The developer is developing the feature and the API test, and the processor probably is not privy to all the details of it. So maybe if we do not understand the intricacies of the architecture, a performance system may not necessarily identify loopholes which the developer could have. So how do we solve this? Shift left equals repurpose. Take a perfectly good road. Can you chop it up? You remove all the unnecessary stuff, put in a big engine and go rallying. Very much the same with API test, right? We can reuse that as per test. That's what essentially we ended up doing. This helps with reduced maintenance because there is no inconsistency. The perf test will not go out of line with the API test. And also from the point of view of collaboration, it promotes better working style within the team because the first tester is now leveraging the API test that the developer has written and they both can collaborate on the system design for the purpose itself, reducing complexity. Now this is a hard one. Performance testing told have come a long way in the recent past, and there are a ton of tools out there, all of which are almost great in their own way, and we are at a loss for often choosing which is the best tool for the job, right? So that itself is like quite the choice among so many great told. And then once you have chosen your tool, then comes the metrics. It's not useful to just have a test performance test report without the application metrics, right? So we'd like to put the application metrics and the performance test results in one metric store and visualize it. And talking of visualization, we have Kibana, Grafana and the likes. So that's a lot more tools in the mix. And ultimately the custom code which has to come in to put all this tooling infrastructure and orchestration together. Now this is a fairly sizable piece of work and add to the costing, licensing and preferences of the team and the performancerelated whatnot. Now, based on multiple other factors, ultimately we end up arriving at one stack and the stack is fairly unique for almost each individual team and even within a company. So that's a lot of complexity and repeat investment into getting a post testing going. So that's hard enough for the performance testing to handle and set it up on the staging environment. If you hand it to a developer and say hey, stuff that into your laptop, that's not going to be very easy, right? Because as a developer I'd be at loss. I don't understand many of these tools. I'm a newbie to these tools. How do I install these? It's going to be quite difficult. So whenever there is this kind of fragmentation situation, a good answer is containerizing the problem, right? That's exactly what we ended up doing. So we containerized the performance testing setup itself and developed it more like code like, instead of creating the performance test setup on the higher environment and then bringing it to the lower environment and it's not fitting. Rather, we develop the perf test setup on the local machine and containerize it so that now it works well on the local machine as well as on the higher environment. That's the objective. So with that, I'd like to jump into a quick demo of perfies. Perfumes is a tool that we came up with based on some of these principles that we've been speaking about for some time now. It helps processors and developers collaborate, wherein the developer can write API tests in karate and the thing can be leveraged as process through gatling, karate gatling integration without actually writing scala code. I'll get to that in a bit and ultimately gather the metrics in prometheus, visualize the results in grafana in real time, and the entire orchestration is handled by coffee and it's dockerized, which means I can install it on my local machines or higher environment alike. So that's the whole idea. Now with shift left me jump right into the demo. So prose as a tool and then extracting it to a location of your choice. So I have already done that. So now all I need to do is set up the force home environment variable. So I'll do that. And with that pretty much called forfeit. Now I need a project which already has a karate API test which I can leverage as performance test. So for this purpose, what I'm going to do is I'm going to leverage the karate demo project which is on karate GitHub account and I'm going to take one of their feature files, the karate feature files, and run it as a perf test. So let's take a look how we can do that. So I already cloned this application, the karate demo project. So let me boot it up, the springboard application. So while that application is boating up, I'm going to quickly init our project. So what I'm going to do here is perfume sh. Init. So what init has done is essentially created two extra files, which is the perfume Yaml configuration which you will see right here. And there's a folder called perfume which I'll get to in a bit in terms of what the details are. Now what I'm going to do is the feature file. I'm going to replace the contents of it, sorry, not the feature file, I'm going to replace the content of the puffy Yaml file. So this is a template which perhaps has dropped in for us. I'm going to get rid of whatever there is here. Instead I'm going to copy paste something that I already have and let me walk you through from the top. So the first point we have karate features directory. So I need to let perfume know where the feature file is. Sitting the greeting feature which I'd like to run as performance testing. So the first parameter, the feature directory and the location, both of these put together help perfume locate where the feature file is sitting. And then I also have this simulation name which I'm going to call reading because that's what I'm testing. And ultimately there is this one other variable called karate mv, which is perfume. Now why do I need this? Because now the application, the springboard application is running on my local machine, whereas perfume is going to be running on Docker, which means I need to tell perfies to point to my local machine. That's why I have said whenever the environment is perfume, the application is running on post docker intern. So that's the only bit. It could be any other URL. Now with that pretty much we are ready to get started with the performance register. The rest of the lines here in this file practically has a load pattern. And this load pattern is very similar to the DSL that Gatling exposes through scala. And like I said, you don't have to write any color code at all to get this going that you can just configure the Yaml file and it will practically do it for you. And the last bit is the URi patterns. I don't need this for the test, so I'm going to get rid of it. So with that. So let's meanwhile look at whether the application has booted up. Yeah, it has. I'm going to say the springboard application grading and hello world. So the springboard application is indeed running now. So now I need to get started with perfume to run the performance testing. So I have the perfumes yaml. I've done my init. So what do I do next? Now I need to start perfuse perfume. Sh message start. What do I need to do? Start here, because remember the stack which I have shown you in the slide deck, this entire stack is now going to get booted up on my docker. And for that I need to run profit start for it to clear up everything and put the entire startup here. As you can see, I have my influx three v promise here. Grafana, the whole Brit is running right here. Okay, so now that means I am good to get started with the testing itself. And after it has booted, it's saying that Grafana is running a local of 3000. Let's go take a look. Yeah, and it is up. The default username and password is admin. Admin. We're not going to change it now. I'll just continue with that, and you will see that there is already a default dashboard that Buffy has dropped in for us. She's obviously empty. There's no data in there. So I need to run a test in order for it to populate some data. Now, I'll say puffy sh test. So when I say test by default, the first parameter it's going to take is the puffy yaml file. And it's going to take this file, interpret the load pattern that I have given here, and also understand the aspect that I need to run creating feature as a performance test. And it's going to create a Gatling simulation out of this and then run that as a pop test against the springboard application that we earlier saw. So right now, this whole thing is coating up inside of Docker. Let me just show you that you have the docker instance, which is running the Gatling test inside, and eventually you'll end up seeing all of this data in Grafana. So, as you can see already, the bottom two panels have started showing some data. Now, why these panels have cpu and memory. This is practically just a demonstration, the fact that I'm monitoring the perfect infrastructure here through the advisor, and I am plotting it as a way to represent how my performance test setup is behaving in terms of cpu and memory as the test itself is progressing. So I'll be able to see that side by side, and eventually you'll be able to see some data at the top as the test boots up. Now, as you can see, Grafana is now showing us the graph itself in terms of what the load pattern was. This is very much in line with the test ephemera, which is one constant user for the first 30 seconds and then three constant users for the next 15 seconds. That's what's happening here. They can also get a sense of how we can correlate this load pattern, what we're seeing on Grafana, along with the application metrics such as cpu, memory, et cetera. So that's like a very quick performance test. In a matter of just 5 minutes, we were able to convert our karate API test into Gatlink performance test with perfume. All right, so let's jump back into our deck. So how does all this help in terms of shift left of performance testing? Let's understand that in a bit. So, quick recap of what we just saw. You have your laptop and you have your API code and the karate test corresponding to that. You deploy your code to your local environment, and that could be dockerized, and that's your application. Now you install puffy's on your machine which is again dockerized. All you need to do now is create the puffy's configuration file and this config file is read by the Buffy CLI and it's going to convert the karate API test into the Gatling performance test. Gather metrics both from Gatling and your application and visualize that on Grafana so that you can analyze the results and based on that, make changes to your code. Once you're satisfied with the code changes that you made, and you know that it's working fairly to your satisfaction on your local machine, you promote your code to higher environment. Likewise you promote your performance test setup also to the higher environment. Perf is being dockerized. It doesn't matter, it can just be deployed into the higher environment as well. And you could use a perf test pipeline if you wish. Even for the higher environment, it uses pretty much a similar configuration file. It's able to generate load to the higher environment based on your karate EPA test. Gather metrics from that environment into the metric store, make it available to you for analysis through Grafana dashboard, and then you can make the change. So on the left hand side, with your local machine, perfus is able to enable. Perfies can enable you to shift left and test very much on your lower environment, and the same setup can pretty much be promoted to the higher environment and you don't have to deal with additional complexity. So that's the whole idea here. Well, with all the topics that we've spoken about today and all the tooling that we've seen, I think the biggest challenge of it all in terms of shifting left and performance testing is the mindset itself for performance testing versus performance engineering. Let's understand this in a little bit more detail. Because of the word testing. Performance testing is usually something that we may tend to believe that it's more like a verification activity that does towards the tail end of development cycle just to see if everything is working fine. But rather, it'd be nice if we could use burp testing more like a learning activity through multiple spikes, which helps us avoid guesswork in our system architecture by helping us learn quickly in terms of what we are trying to design. How do we do that? The only way to avoid guesswork is to bring in some scientific rigor into our day to day activities. So in line with this, I was just wondering if all the principles that we've spoken about now, can we sort of put that into, so to speak, which will help us think through shifting left more consciously. And that's where I came up with this idea called continuous evaluation template. On the left hand side you have some basic information like what is the problem statement, the baseline KPI, the target KPI you'd like to achieve, and the hypotheses that you're trying to learn or verify. On the right are the more interesting piece, which is the experiment design itself, which is for every hypothesis we have to design one falsifiability experiment and one verifiability experiment. I'll get to that in a bit. And ultimately the validated learning that we get out of this experiment itself. So let's look at this template in the context of a more concrete example. I'm going to leverage the example that we've seen earlier itself, which is to increase the rps with a certain cpu. So let's say at 80% cpu I'm currently able to achieve ten k rps, and I'd like to have a ten fold increase to 100 krp, as any developer would immediately jump to. I also came up with the same idea, why don't we just drop in a cache and things should become better? Now obviously the first aspect is I need to have a falsifiability experiment for this hypothesis that the cache will indeed help reduce repetitive computations and thereby reduce cpu. So what can I do on my local machine? Remember the with and without experiment, or rather the thought process that we saw earlier. So first I'll verify if adding a cache makes any discernible change at all to the cpu usage, and that would be my first check. If that doesn't work, there's no point in me even taking it to the higher environment and verifying if the cache is actually helping me achieve that ten k increase. Sorry, tenfold increase in the RpM. Now since it does not work, I need to understand what went wrong and I realized that the cache is not working because of the miss rate. If it's almost 100% now for this 100%, I hypothesize that the reason the miss rate is so high is because the TTL on the cache key is too small, so thereby before even can be accessed, the keys are expiring, so thereby I'm not able to hit the cache. So there's 100%. Now I fix the TTL and then try the same falsifiability experiment. And this time I see that there is some discernible difference between having a cache and not having a cache. So now I go to the staging environment, I can go to the higher environment and figure out now again, staging is not equivalent to production, so I cannot probably hit 100 krp. But what can I do? It's a Tenfold increase, right? Remember the trend scaling example that I've shown you earlier? Very much on those lines. What I'm going to do now is say whatever staging is able to achieve today, can I achieve ten x of that by leveraging the cash? That's the experiment I'd like to know. Say, suppose even that doesn't work. I'm able to only achieve about four x improvement. And I realize the issues. The missed rate is still about 40%. I still have high eviction rate. Now what do I do about that? Further hypothesize, I have high eviction rate because the cache size is too small. So as the cash is growing, keys are getting pushed out, and thereby again, I'm having high misflate and whatnot. Now, how do I design a falsifiability experiment for you? Obviously I do not have as much memory on my machine, so I cannot increase the cache size and experiment with that. But if I cannot increase the memory and increase the cache size, can I decrease the cache size and see if the eviction rate becomes higher? That should prove that there is correlation between cache size and eviction rate, right? At least I will know that this is the problem. And the hypothesis is indeed at least not falsifiable. Bet. Now if that works, then I can take it to the higher environment, increase the capsize of Teju, and see if I can achieve my number right. And yes, this time I'm able to achieve ten x of paging. So great. Now all this experimentation has now put me in a position to suggest what should be the TTL, what should be the cache on production. A production replica for us to achieve the 100 krp, right? And I can don't jump into a conclusion. I first verify that if I just deploy this setting, there should be some change in the cpu. At least it should drop a little bit. Which means like the deployment is successful on itself, and only when that is done will I go for a full blown test with 100 krps test itself, and if that is done, then I can ship the code. Now, as you can see, this template sort of forced me to think hard about solving the problems through falsifiability. And on the left hand side, rather than depending very much on the right. And for every single piece, like taking the cache, deploying it to production, then realizing the settings are incorrect and then coming back and making change. Now by shifting left, I've reduced my dependency on the higher environment, and that's where I find most value in this template. Now, obviously, for the purpose of demonstration, I've reduced this in real life. There was a lot more steps which I had to go through before I could shift. But you get the idea in terms of how this could be valuable and think through a problem in a lot more detail and have scientific rigor about it. Now, with that, thank you very much for being a very kind audience. And these are my handles, and I'd love to keep in touch and answer any questions that my time.

See all 48 talks at this event!

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

Shift Left your Performance Testing

Video size:

Abstract

Summary

Transcript

Hari Krishnan

CEO @ Polarizer Technologies

Join the community!

Featured event

2025

2024

Info

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

Shift Left your Performance Testing

Video size:

Abstract

Summary

Transcript

Hari Krishnan

CEO @ Polarizer Technologies

Join the community!