Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE,
a developer, a quality
engineer who wants to tackle the challenge of improving reliability
in your DevOps? You can enable your DevOps for reliability
with chaos native. Create your free account at Chaos
native Litmus Cloud hello
and welcome to this talk about shift left, your performance testing.
My name is Hari Krishnan. I'm a consultant and a coach. I help companies with
cloud transformation, exchange programming, agile and lean.
My interests include distributed systems and high performance application
architecture. Below are some of the conferences that I've spoken at.
So let's quickly get started with the talk. Before we
jump into the topic about why we need to shift left in performance
testing, let's understand the context. So, a quick show of hands
in which environment you identify most of your performance issues.
Would that be a local machine? Not likely. Right?
What about development environment? I haven't seen many teams identify,
but if you are, that's great. Most of the performance issues,
at least what we start identifying within the staging environment,
because that's where we can at least generate some load against the particular
environment. And lastly, for the teams that I have worked
with, a big part of the performance issues are
identified in the production replica because that's where you have the infrastructure
that's close to production. And whatever we are able to reproduce
or generate as a load is something that
is representative of the real environment. And for those issues
which we cannot identify up till this point, our users
identify it for us and then we have to fix them. The intensity of
the red color in this particular diagram here kind of represents
like the darker the shade of red, the longer it takes to fix the
issue, and that's not desirable.
So let's understand first what is the current setup
of performance testing and why we have that issue.
So usually the developer starts writing code on the machine
and then starts pushing it to the development environment and eventually
makes its way into staging. And that's probably the first point in time
when the performance testing comes in and has
to set up the performance testing environment,
write the test script, set up the load generator, the agents and
whatnot, send the load to the staging environment and maybe
even to the production environment, generate the report
out of the test run and then ultimately share it back with
the developer. Developer makes sense out of it and then incorporates
the feedback into the code base. So that's the usual cycle.
So what's wrong with this setup? The first problem itself is
the fact that if we are identifying the issues as late as staging of
production, it's already quite late in the cycle.
Right? And what's worse is if we
identify the issue as staging of production, then it
takes an equal amount of time to again fix the issue,
iterate over it, and then bring it all the way from local machine
to development to staging. And that's not it,
right? The higher environments are highly contested. I'm not the
only developer who's trying to fix
performance issues or working on trying to see
if Pyfix is even working. There are other developers who would like to verify it
also. So the higher environments get.
There's a lot of timesharing going on there,
and we tend to have a difficult time trying to identify
which feature we need to test or which issue can we put it
through its basis? So that's the difficulty there.
And that's what leads to this kind of a graph
where we identify most of the issues very late in the cycle,
like leave it up to the users to find it, what should it look like?
Instead, we'd like to identify most of the issues
on our local machine. And then there are those issues which we cannot
discount. Some of them have to be identified, or we
can only test them or figure them out in the higher environment because
of the nature of the issue itself. But largely we'd like to identify as
much as possible on our local machine or towards the left hand side. That's what
we mean by shift. Left here should be easy,
right? The performance testing already has a
robust setup, and as a developer, all one needs to
do is wear the performance testing hat, borrow the performance testing
setup from them, and start sending the load to
local machine or development environment. And we can start depending less
on the performance system and make their lives also a little easier.
This is easier said than done. There are multiple issues,
or rather challenges, in applying shiftlift
to performance testing. Let's understand those.
The first issue itself is a scale challenge. So how
do you truly create a truly representative environment
on your local machine or even development environment for
something that's going to be running in production?
Production environments are usually like sophisticated.
You have a hybrid cloud, you have sophisticated
distributed systems and microservice architectures and load balancing and whatnot.
Production replica also seems to mimic a lot of it as
you come down. The environments and staging has a little
lesser complexity, and ultimately on the local machine, it's just a humble
laptop. Now, how do I say that I have verified the performance
characteristics of an application on my local machine, and so
I'm confident that it's going to work on production? Or how do I even
take a problem that I'm identifying on the production environment and bring it down to
my local machine and even replicate it. Quite hard to
do that, right? Second is obviously the network
topology itself. Production environments might have a lot more complicated
firewalls and whatnot. The latency levels, the proxies
and practically everything that's involved there is
a lot more complex and there's not much you can replicate.
I mean you can try, but there's only to an extent that we
can do all of that fancy networking on a local machine.
And let's leaving the application aside, the performance setup itself
tend to become quite complicated.
The higher environments have like you need multiple machines to generate
a sufficient load, and it's a fairly sophisticated setup,
and you start stuffing that into lower environments, that is hard in
itself, and you try to push that into my local machine
as a developer, and this is what I have to deal with.
This is my eightgB MacBook Pro and the memory pressure I'm dealing
with while running Catlink test, that's not desirable now.
All right, so how do we solve this? So, shift left
equals scale down. You can't go to C trial for every
single design change, right? That wouldn't be practical.
We have to figure out a way to scale down the problem and try
to figure it out on our local machines or on the left hand side.
But that's not fairly straightforward, is it? So let's take
two parameters and see if we can scale down a performance
issue. So let's say I just want to deal with request
per second and response time, throughput and latency.
Very popular two metrics that we would like to figure out and
how to scale it down. So if I have x and y as the two
KPIs for my production, now do I scale it down
with aspect ratio, or do I just say I'll maintain
my RPS, but I will probably discount a little bit on response time?
Or do I do something else? Like what is the accurate way
to scale down and still validate the performance KPIs
in a truly representative manner?
The reality is we cannot. So what do we do about it?
Even if we cannot scale down the real problem, we can scale down
the trend and invalidate hypotheses.
So since I spoken about hypotheses, let's understand
what hypotheses invalidation is. So hypotheses,
as our old school lessons have taught us, like you
have to make a statement, and then there is a verifiability aspect
to it and a falsifiability aspect to it. So for example, I could
say all green apples are sour so
in order to prove that statement, there are two ways. One, I can go eat
all the green apples in this world and say, not even one of them
is sweet. So all of them are Sar. So that's the verifiability
aspect of it. Right now, that's not very practical for me to eat
all the green apples in this world. So the falsifiability aspect is,
even if I identify one green apple, that's sweet. I've potentially
proven like the entire hypotheses wrong. So that's the
hypotheses invalidation aspect. Now how
does that apply to software engineering and particularly to performance
testing? Let's take a look.
So recently I was tasked with a somewhat
similar problem wherein as you can see on the x axis, I have
the throughput. As the throughput increases exponentially,
I want my response time to degrade only at a logarithmic scale.
So this is what was expected of the system behavior,
and they wanted me to figure out if this
has been satisfied by the application. Now, how do I scale
this problem down? It's very hard, right? Because 10,000 RpS is
not something I can figure out on my local machine. Not practical.
But what can I do? Like I said, I can scale
down the trend. The trend here, the clue itself
was in the problem, which is the exponential increase in throughput
and a degradation of the response time. Only in logarithmics
case I can take that and scale that entire problem down and
I just figure it out with one RPS ten,
900. Now when I tried it on my machine,
I saw that that's not possibly true.
The latency was in fact degrading at an
exponential scale. This is not desirable. So what I can then do is very
quickly realize that I have experiments, so I have disproved the
hypothesis does not fit with the current application code,
and I don't need to go testing on the higher environment.
I can quickly apply a fix on my local machine, see if that's working,
and then only then, if that works,
then I can push it to the higher environment. Let's say I made that fix
and I even got a better performance than what was expected.
Maybe I'm even tracking very much under
the logarithmic scale. But is this sufficient for me to say that it's
going to work on the higher environment? No, it's inconclusive,
because on my local machine I have proven it at a lower scale.
But that does not mean it potentially will
work in the higher environment since it is inconclusive. Now this
part of it, I have to go with the verifiability experimentation.
So I validate it in the higher environment. And I realize
in fact what I found on my local machine does told good for some more
time. But after 1000 rps it tends to fall off. So now
I need to again identify what's going on beyond 1000 rps and why
is it falling off. And for that I come back to my local machine,
I figure out a way to replicate the issue, and then I can
fix it on my local machine, verify it in the left hand side,
and then move to the right. Now this is not all bad,
right? The first part, like wherein I identified the issue at
the very lower scale itself. The one fix, I could do it
on my local machine itself. I didn't have to go to the higher environment,
I only had to go to the higher environment where it was absolutely necessary.
So I did save that one point
where I had to depend on the higher environment.
Now, as you can see on the left hand side we're learning through falsifiability,
and on the right we are confirming our learning through verifiability.
So that's the beauty of this whole idea.
Now there is one more example of this scaling
down I'd like to share. This is one of my favorites. I call it a
with and without experiment, or the AB experiment.
So again, the current characteristics of
a system I was dealing with was, I told, hit 10,000
rps at 80% cpu utilization.
What was expected is once we introduce a cache, I'd be able
to hit 1 million rps at the same cpu utilization.
Fairly ambitious, but let's see if we can scale
this down. Now, 10,000 rps itself was
out of bounds for me, 1 million rps is really, really hard.
How do I figure this out on my local machine?
Now the interesting piece here is I
don't really need to worry about 10,000, 1 million and whatnot.
What can I do? On my left hand side, let's say I
at some max CPU, let's not even say 80% at some max
CPU I have XRPs without cache.
And at the same cpu level on my local machine
with the cache, I have yrps. The only pattern
that matters to me is the fact that X is less than y.
It doesn't matter what xrps is, it doesn't matter what Y RPS is. But the
fact that the cash does make a difference is what is important.
Because if we don't do this, I've seen many times that
the cash is just there for moral support really doesn't make
any discernible difference. So for me, this was really
an important eye opening experiment to
run on my local machine. To say the cache makes
a difference, in fact, it's working and then I can take it to
the higher environment. I don't need to go to the higher environment to realize that
my cache setting sensors are not working. So I could save that
one aspect. So that's the with and without experiment.
Let's get to the next challenge now with shifting left.
Now, the problem with trying to run performance tests within your
sprints cycle, take the typical sprint right,
like the developer starts writing code for feature one and she's done
with it. And based on what we've already spoken,
we know that performance testing is a time consuming activity. So we don't
want to do it right within the sprint, so it becomes difficult. So we
create a task out of it and we hand it over to the performance system
for the person to look into it and figure out,
put the issue through, put the feature through its paces and come back with some
feedback. Meanwhile, we churn out feature number two and
then hand that also to the performance tester.
Now, based on what we have done in the previous sprint two,
we start churning out feature three. And that's
the first point when the performance tester comes back saying, hey, that feature
one which you developed, I found a few performance issues. It's not really
confirming with the NFRs that were expected of it.
However, we are in the middle of the sprint, we cannot change stuff. So we
press on with feature number four and by then the performance system
comes back with more feedback for feature two.
Now what happens? Print number three, a developer,
she has to sit through and look at the issues
that have been identified, performance issues that have been identified on
feature one, feature two, and meanwhile a performance
tester is putting feature through three
through its faces. So practically what has happened now is
sprints three is a complete washout. And this is a
very typical anti pattern which we call the hardening sprint,
wherein practically we have not done any meaningful work. We're just fixing issues
from some of the previous sprints. So this is not desirable.
And this is not something new either.
Every time we call a feature done before it's actually
fully done, this is what happens. We don't complete the testing
and then we call the feature done and move on to the next one,
whereas the performance testing actually came back with some issues
that were identified and we had to fix it. How do
we solve this? The better way for
us to do this is to collaboratively look
into the problem. The developer and the performance system work together, finish the feature,
pull us through its spaces, and then identify the issues,
fix those issues and then fix the performance issues. And that's
like we have to come to terms with the reality of it. It does
take two sprints, and practically this is a
best case scenario that we identified issues and we fixed it in the
very first iteration and that itself took two sprints.
Now, obviously this is not desirable. We'd like to become
a lot more efficient at churning our features, right? So the better
way to look at this problem, instead of siphoning off performance
testing, is to reduce the longest testing
itself. How do we do that? We reduce effort, we reduce
complexity, we reduce repetition, and we have to automate
all the way. And that's the only way we can solve this problem. So let's
look at how some of those aspects we took and
incorporated into our daily activities. So let's look
at repetition. So developer writes
application code. Performance testing is responsible for the performance testing setup.
Developer writes API tests and Performance tester writes
performance test script. API tests
generate requests. Performance testing also generate requests.
Very similar. API test assert response performance
is not so much. They assert more
on the response time. As long as the request itself is
coming back positively, that's all we care about in performance testing.
If you think about it, the API tests and performance test scripts
are practically duplicated across every time.
One of the examples I'd like to take here is one of the teams
I was working with, a developer was writing API tests in karate,
and the performance test was. Tester was writing the perf test script in Kala
with Katlin. So obviously that was repeat effort.
And wherever there is duplication, there is inconsistency because
we are repeating it and we don't know whether the performance test script is in
line with what the API tests are doing and what the feature really is.
So there's that inconsistency, and then there is a disconnect. The developer is
developing the feature and the API test, and the processor
probably is not privy to all the details of it. So maybe
if we do not understand the intricacies of the architecture,
a performance system may not necessarily identify loopholes
which the developer could have. So how do we solve this?
Shift left equals repurpose. Take a perfectly good road.
Can you chop it up? You remove all the unnecessary stuff,
put in a big engine and go rallying. Very much
the same with API test, right? We can reuse that as per test.
That's what essentially we ended up doing.
This helps with reduced maintenance because there is
no inconsistency. The perf test will not go
out of line with the API test.
And also from the point of view of collaboration,
it promotes better working style within
the team because the first tester is now leveraging the API test that the developer
has written and they both can collaborate on the system design for the
purpose itself, reducing complexity.
Now this is a hard one. Performance testing told
have come a long way in the recent past, and there are a
ton of tools out there, all of which are almost great in
their own way, and we are at a loss for often choosing which
is the best tool for the job, right? So that itself is like quite
the choice among so many great told. And then once you have
chosen your tool, then comes the metrics. It's not useful
to just have a test performance test report
without the application metrics, right? So we'd like to put the application metrics
and the performance test results in one metric store and visualize it.
And talking of visualization, we have Kibana,
Grafana and the likes. So that's a lot more tools in
the mix. And ultimately the custom code
which has to come in to put all this tooling infrastructure and orchestration together.
Now this is a fairly sizable piece of work and
add to the costing, licensing and preferences
of the team and the performancerelated whatnot.
Now, based on multiple other factors,
ultimately we end up arriving at one stack and
the stack is fairly unique for almost each individual team and even
within a company. So that's a lot of complexity and
repeat investment into getting a post testing
going. So that's hard enough for
the performance testing to handle and set it up on the staging environment.
If you hand it to a developer and say hey, stuff that into your laptop,
that's not going to be very easy,
right? Because as a developer I'd be at loss. I don't
understand many of these tools. I'm a newbie to these tools.
How do I install these? It's going to be quite difficult.
So whenever there is this kind of fragmentation situation,
a good answer is containerizing the problem,
right? That's exactly what we ended up doing. So we
containerized the performance testing setup itself and developed
it more like code like, instead of creating the performance test setup on
the higher environment and then bringing it to the lower environment and
it's not fitting. Rather, we develop the perf test setup on
the local machine and containerize it so that now it
works well on the local machine as well as on the higher environment.
That's the objective. So with
that, I'd like to jump into a quick demo of perfies.
Perfumes is a tool that we came up with
based on some of these principles that we've been speaking about for
some time now. It helps processors and developers collaborate,
wherein the developer can write API tests in karate and the
thing can be leveraged as process through gatling, karate gatling integration
without actually writing scala code. I'll get to that in a bit
and ultimately gather the metrics in prometheus,
visualize the results in grafana
in real time, and the entire orchestration is
handled by coffee and it's dockerized, which means I can
install it on my local machines or higher environment alike.
So that's the whole idea. Now with shift left me jump right into the
demo.
So prose as a tool
and then extracting it to a location of your choice. So I have already
done that. So now all I need to do is
set up the force home environment variable. So I'll do
that. And with that pretty much called forfeit.
Now I need a project which already has a karate
API test which I can leverage as performance test. So for this
purpose, what I'm going to do is I'm going to leverage
the karate demo project which is on karate
GitHub account and I'm going to take one of their feature
files, the karate feature files, and run it as a perf test.
So let's take a look how we can do that.
So I already cloned this application, the karate demo
project. So let me boot it up,
the springboard application.
So while that application is boating up, I'm going to quickly
init our project. So what I'm going to do here is
perfume sh. Init.
So what init has done is essentially created
two extra files, which is the perfume Yaml configuration
which you will see right here.
And there's a folder called perfume which I'll get to in
a bit in terms of what the details are. Now what I'm
going to do is the
feature file. I'm going to replace the contents of it,
sorry, not the feature file, I'm going to replace the content of the puffy Yaml
file. So this is a template which perhaps
has dropped in for us. I'm going to get rid of whatever there is here.
Instead I'm going to copy paste something that I already have and
let me walk you through from the top. So the first
point we have karate features directory. So I need to let
perfume know where the feature file is. Sitting the greeting feature which I'd
like to run as performance testing. So the first parameter,
the feature directory and the location, both of these put together help
perfume locate where the feature file is sitting.
And then I also have this simulation name which
I'm going to call reading because
that's what I'm testing. And ultimately there
is this one other variable called karate mv, which is perfume.
Now why do I need this? Because now the
application, the springboard application is running on my local machine,
whereas perfume is going to be running on Docker, which means
I need to tell perfies to point to my local machine.
That's why I have said whenever the environment is
perfume, the application is running on post docker intern.
So that's the only bit. It could be any other URL. Now with that
pretty much we are ready to get started with the performance register.
The rest of the lines here in this file practically
has a load pattern. And this load pattern is very similar
to the DSL that Gatling exposes through scala.
And like I said, you don't have to write any color code
at all to get this going that you can just
configure the Yaml file and it will practically
do it for you. And the last bit is the URi patterns. I don't need
this for the test, so I'm going to get rid of it. So with
that. So let's meanwhile look at whether the application has booted
up. Yeah, it has.
I'm going to say the springboard application grading
and hello world. So the springboard application is indeed running now.
So now I need to get started with perfume to
run the performance testing. So I have the perfumes yaml. I've done
my init. So what do I do next? Now I
need to start perfuse perfume.
Sh message start. What do I need to do?
Start here, because remember the stack which
I have shown you in the slide deck, this entire stack is
now going to get booted up on my docker.
And for that I need to run profit
start for it to clear up everything
and put the entire startup here. As you can see,
I have my influx three v promise here. Grafana,
the whole Brit is running right here.
Okay, so now that means I am good to
get started with the testing itself. And after it
has booted, it's saying that Grafana is running a local of
3000. Let's go take a look.
Yeah, and it is up.
The default username and password is admin. Admin.
We're not going to change it now. I'll just continue
with that,
and you will see that there is already a default dashboard that Buffy has
dropped in for us. She's obviously empty. There's no data
in there. So I need to run a test in order for it to populate
some data. Now, I'll say puffy sh test.
So when I say test by default, the first parameter it's going
to take is the puffy yaml file. And it's
going to take this file, interpret the load
pattern that I have given here, and also understand
the aspect that I need to run creating
feature as a performance test. And it's going to create a
Gatling simulation out of this and then run that as
a pop test against the springboard application that
we earlier saw. So right now, this whole thing is coating up inside
of Docker. Let me just show you that
you have the docker instance, which is running the
Gatling test inside, and eventually you'll end up seeing all of
this data in
Grafana. So, as you can see already, the bottom two panels have
started showing some data. Now, why these panels
have cpu and memory. This is practically just a demonstration,
the fact that I'm monitoring the perfect infrastructure
here through the advisor, and I am plotting
it as a way to represent how my
performance test setup is behaving in terms of cpu and
memory as the test itself is progressing.
So I'll be able to see that side by side, and eventually you'll
be able to see some data at the top as the test boots
up.
Now, as you can see, Grafana is now showing us
the graph itself in terms of what the load pattern was.
This is very much in line with the test ephemera,
which is one constant user for the first 30 seconds
and then three constant users for the next 15 seconds. That's what's
happening here. They can also get a sense of how we can correlate
this load pattern, what we're seeing on Grafana, along with
the application metrics such as cpu,
memory, et cetera. So that's like a very quick performance
test. In a matter of just 5 minutes, we were able
to convert our karate API test into Gatlink
performance test with perfume. All right, so let's jump back
into our deck. So how does all this help in
terms of shift left of performance testing? Let's understand
that in a bit. So, quick recap
of what we just saw. You have your laptop and you have your
API code and the karate test corresponding to that. You deploy your
code to your local environment, and that could be dockerized,
and that's your application. Now you install puffy's on your machine
which is again dockerized. All you need to do now is
create the puffy's configuration file and this
config file is read by the Buffy CLI and
it's going to convert the karate API test into the
Gatling performance test. Gather metrics both from Gatling and your
application and visualize that on Grafana
so that you can analyze the results and based on
that, make changes to your code. Once you're satisfied with the code
changes that you made, and you know that it's working fairly
to your satisfaction on your local machine, you promote your code to higher environment.
Likewise you promote your performance test setup also to the higher environment.
Perf is being dockerized. It doesn't matter, it can just be deployed into
the higher environment as well. And you could use a perf
test pipeline if you wish. Even for the higher environment,
it uses pretty much a similar configuration file. It's able to
generate load to the higher environment based on your karate EPA
test. Gather metrics from that environment into
the metric store, make it available to you for analysis
through Grafana dashboard, and then you can make the change. So on the left hand
side, with your local machine, perfus is able to enable. Perfies can
enable you to shift left and test very much on your
lower environment, and the same setup can pretty much be promoted to
the higher environment and you don't have to deal with additional complexity.
So that's the whole idea here.
Well, with all the topics that
we've spoken about today and all the tooling that we've seen, I think
the biggest challenge of it all in terms of shifting
left and performance testing is the mindset itself for
performance testing versus performance engineering. Let's understand
this in a little bit more detail. Because of the word
testing. Performance testing is usually something that we
may tend to believe that it's more like a verification activity that does
towards the tail end of development cycle just to see if everything is
working fine. But rather, it'd be nice if we could
use burp testing more like a learning activity through multiple spikes,
which helps us avoid guesswork in our system architecture
by helping us learn quickly in terms of what we are
trying to design. How do we do that?
The only way to avoid guesswork is to bring in some scientific rigor
into our day to day activities. So in
line with this, I was just wondering if all the principles that we've spoken
about now, can we sort of put that into, so to speak, which will
help us think through shifting left
more consciously. And that's where I came up with this idea called
continuous evaluation template. On the left
hand side you have some basic information like
what is the problem statement, the baseline KPI, the target KPI
you'd like to achieve, and the hypotheses that you're trying
to learn or verify. On the right are the
more interesting piece, which is the experiment design itself,
which is for every hypothesis we have to design one
falsifiability experiment and one verifiability experiment.
I'll get to that in a bit. And ultimately the validated learning
that we get out of this experiment itself.
So let's look at this template in
the context of a more concrete example. I'm going to leverage
the example that we've seen earlier itself,
which is to increase the rps with a certain cpu.
So let's say at 80% cpu I'm currently able to achieve ten
k rps, and I'd like to have a ten fold
increase to 100 krp, as any
developer would immediately jump to. I also came
up with the same idea, why don't we just drop in a cache
and things should become better? Now obviously the
first aspect is I need to have a falsifiability experiment for this hypothesis
that the cache will indeed help reduce repetitive computations
and thereby reduce cpu. So what can I do on my local
machine? Remember the with and without experiment,
or rather the thought process that we saw earlier.
So first I'll verify if adding a cache
makes any discernible change at all to the cpu usage,
and that would be my first check. If that doesn't work, there's no point in
me even taking it to the higher environment and verifying if the cache is actually
helping me achieve that ten k increase. Sorry, tenfold increase in the
RpM. Now since it does
not work, I need to understand what went wrong and I realized that
the cache is not working because of the miss rate. If it's almost
100% now for this 100%,
I hypothesize that the reason the miss rate is so high is
because the TTL on the cache key is too small, so thereby before
even can be accessed, the keys are expiring, so thereby I'm not
able to hit the cache. So there's 100%.
Now I fix the TTL and then try the same falsifiability experiment.
And this time I see that there is some discernible difference between having a
cache and not having a cache. So now I go to the staging environment,
I can go to the higher environment and figure out now again, staging is
not equivalent to production, so I cannot probably hit 100 krp.
But what can I do? It's a Tenfold increase,
right? Remember the trend scaling example that I've shown you earlier?
Very much on those lines. What I'm going to do now is say whatever staging
is able to achieve today, can I achieve ten x of that
by leveraging the cash? That's the experiment I'd like to know.
Say, suppose even that doesn't work. I'm able to only achieve
about four x improvement. And I realize the issues. The missed
rate is still about 40%. I still have high eviction rate.
Now what do I do about that? Further hypothesize, I have high
eviction rate because the cache size is too
small. So as the cash is growing, keys are getting pushed
out, and thereby again, I'm having high misflate and whatnot.
Now, how do I design a falsifiability experiment for
you? Obviously I do not have as much memory on my machine,
so I cannot increase the cache size and experiment with that. But if I
cannot increase the memory and increase the cache size,
can I decrease the cache size and see if the eviction rate becomes higher?
That should prove that there is correlation between cache size and eviction rate,
right? At least I will know that this is the problem.
And the hypothesis is indeed at least
not falsifiable. Bet.
Now if that works, then I can take it to the higher environment,
increase the capsize of Teju, and see if I can achieve my
number right. And yes, this time I'm able to achieve ten x of
paging. So great. Now all this experimentation
has now put me in a position to suggest what
should be the TTL, what should be the cache on production.
A production replica for us to achieve the 100 krp,
right? And I can don't jump into a conclusion.
I first verify that if I just deploy this setting,
there should be some change in the cpu. At least it should drop
a little bit. Which means like the deployment is successful on itself,
and only when that is done will I go for a full blown
test with 100 krps test itself,
and if that is done, then I can ship the code. Now,
as you can see, this template sort of forced me to think hard
about solving the problems through falsifiability. And on
the left hand side, rather than depending very much on the right. And for
every single piece, like taking the cache, deploying it to production,
then realizing the settings are incorrect and then coming back and making change.
Now by shifting left, I've reduced my dependency on the higher environment,
and that's where I find most value in this template. Now, obviously, for the purpose
of demonstration, I've reduced this in real life. There was a
lot more steps which I had to go through before I could shift.
But you get the idea in terms of how this could be valuable and think
through a problem in a lot more detail and have scientific
rigor about it. Now, with that, thank you very much
for being a very kind audience. And these are my handles,
and I'd love to keep in touch and answer any questions that my
time.