Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to run fast catch performance regressions
in Python. I'm Everett Pompeii. I'm the founder
and maintainer of a tool called venture,
and today we're going to be talking about how to catch
performance regressions. Now, in order to catch a performance
regression, you have to first detect it. Detection is a prerequisite
to prevention. So when
do we, when are we able to detect performance regressions?
Well, we can do that in development or we
could do that in CI or in production,
and more often than not, that ends up being in production,
which is unfortunate because that means it's impacting our users already
and whether or not we have an observability tool. And we can see that before
anyone complains, they're nonetheless probably experiencing.
So we would, as developers, want to shift left
as much as we could, the detection of those performance
regressions. So that's what we're going to start off
talking with today, is about how to detect those performance regressions
using benchmarks. But before
we get into that, I'm going to kind of tell a little
tale that may or may not
be reflective or similar to some personal experiences,
but for all intents purposes, it's fictitious. So we've got
an app, I've got an app, v zero of the app. It's a
basic calendar API, right? And so it allows
people to schedule things and create events and things like that.
And so this is created in flask, but it could very
well have been in Django or in fast API.
Take your pick. So, got this calendar
app, it's working great. Got the Vzor app. Minimal, lovable product,
right? And then I decide, hey, I want to
add an additional feature to this, right? And so a
fun notification feature. So every few days it gives our users
a fun notification. Just kind of out of the blue,
right? Help keep engagement. So we're calling
this the Fizz feature. And so with the Fizz feature,
it returns Fizz if the day is divisible by three,
otherwise it returns none. And it's pretty simple
feature to implement. The business logic looks like
this. It's the fun notification
function. It just takes the modulus of three, and if
that's zero, then it returns fake, otherwise it returns not.
So that's great releasing to the customers,
they love our fun notification feature.
And so I'm super happy and I'm like, hey,
I'm going to make this even better. And I think you guys might kind of
know where this is going here, but I decide
to improve the fun notification feature, right? And I add
buzz, so return fizz if the day is divisible
by three, return buzz if it's divisible by five,
or fizz buzz if it's divisible by both. So otherwise,
still the same, return none. And again, this business logic
is pretty simple, right? It's just that
same modulus operator. But this time we got both fizz and buzz
or fizz buzz. So I
ship that to my customers, and they also
love it. And so I've got version two out and things are going great,
and they love it so much, I'm like, hey, you know what? I think I'm
going to add something else to it. And I do
my full desired implementation of the fun notification feature,
right? Which I call fizzbud Buzz fibonacci. So fizzbuzz
fibonacci, though, quite a mouthful.
It starts the same as the good old fizz buzz feature,
which the three, the five, or both.
Except if the day is divisible by seven, then it returns
the nth step of the Fibonacci sequence.
Otherwise, return that. And still
that business logic looks pretty simple. I just have that
extra two lines up top where I'm checking for the modulus seven,
and then I just do the Fibonacci sequence. And I moved on with
my day and shipped it out to customers, and they loved it.
And things were going great until three
weeks later when all of a sudden
production was on fire and I was like, what's going on?
What's happening? Right? I shipped a bunch of code between then
and the past three weeks, right? And so I had
to come in here and spend all day coming back to try and
figure out what was going on before I figured out it was this darn
Fibonacci feature that I had
done three weeks prior. And so I started looking at
this and said I should investigate,
oh, I should investigate a little bit more. And that's what I did,
went and look at my Fibonacci sequence function, and I
had done a very naive implementation of it. And so
I think you guys are probably smarter than I am and know that I shouldn't
have done this to begin with. But before we dig into all that, we're going
to kind of do an aside and look at benchmarking
in Python and how I could have take
this as a learning experience and go and benchmark my naive
implementation as I try and find a better solution.
So Pytest benchmark is a very popular benchmarking
suite within the Python ecosystem. There is also
another one called Airspeed velocity, which isn't quite as popular, but is
also pretty well known. We're going to be working with
Pytest benchmarks here because it works and integrates so well with
Pytest. So in order to install
pytest benchmark, super easy, just a pip env shell
and you just install Python
Pytest benchmark. So I have my
naive Fibonacci implementation
here in my fundnotification py,
and so I'm adding a benchmark to it that
basically cycles through every 7th day of the month and checks
to see how long this takes to run. Now, the key
parts in this and where Pytest benchmark, how it
works is you're passing in this benchmark
argument and that expects to take
a function here and then it basically just times that function. So however
long it takes that function to run is
your benchmarking time, essentially. And so here we're going
through every 7th day of the month just to kind of get a feel of
what the youll scope of
the time that it's going to use is.
So in order to run this, you just run Pytest
and then your file with your functions,
just like you normally do with Pytest. And this
is the output that I got for this naive version,
right. It's pretty high. It takes over a 10th of a second to run,
which is at scale, not a good thing when you
have a lot of people using, when I had a lot of people using my
calendar app. So then if we wanted to,
which is going to be important later, we can run save
on our benchmark output. So this will save it
to a directory which we can add to git, which then means that over time
we can track these benchmarks, even just kind of running
them locally here. And so we've
got my tested benchmarks,
naive implementation here. And so now I'm
going to go and add some memoization which help
improve the performance, hopefully of my function and
do the exact same test. Notice the test has not changed, the benchmark
has not changed, but just the Fibonacci implementation
has. And so I run that again,
same exact call to Pytest and
I get this output, which it's like a six less to run, right,
because memoization helps memoize.
So that is a definite huge performance improvement.
Now if we wanted to compare those, we could copy and paste or
whatever, but Pytest does actually give us a really nice way
to compare. You can just pass the number
that it kind of keeps of the previous version to kind of compare those
within those saved benchmarks that we just did.
So we run that and we get this output, which lets us
see our version now versus that previous naive version
and how drastically improved things are,
which is pretty great.
That is a great example of how to run and compare
with Pytest. Now, a little note
on micro versus macro benchmarks.
So far we've been doing micro benchmarks. I think
these analogously as unit tests,
unit level benchmarking. And so what
this does is it's really about just like a single function versus
what are called macro benchmarks, which are much more like integration tests.
They're kind of full end to end. So with my flask app
that I'm using for my calendar API here,
here's my endpoint, right?
And this is the fun notification endpoint. It gets
the date time, it gets the day from there, and then it calls my
fun notification function and then jsonifies things and then sends
it out. And so the thing is, I will be benchmarking
all of that. So if there's any changes outside of my code, it's great because
I also can detect that if there's any regressions and libraries I use
and things like that. But it is a bit more noisy
because of that. And you're also just going to have larger values,
just a thing to know.
But they work pretty much exactly the same way and the same way that unit
and integration tests are very similar. So back
to our fizzbuzz Fibonacci feature. I have implemented
my memoization, and I was very
silly before implemented
originally, very naively. So now that we have that fix in,
things should be good to go. Right. And so I'm
able to come in, play firefighter and put out the fire
that I caused in production, which is good.
Things aren't on fire anymore. But why
do I have to play firefighter?
It'd be preferable if I didn't, in the same way that
it's preferable that you catch your performance regression or youll feature
regressions before they make it to production and impact
your customers. It would be great to be able to catch your performance regressions
before they make it to production and impact youll customers.
And so youll could have observability tools and
they can help. But still that is too late, right? Like you're still impacting
your customers and users, and so
was I in my calendar app here. So production
is just too late to catch things and then development is
local only. So you've got those saved tests and things like that. But it's very
set to only the one environment that it's running in.
And it makes it very hard to share that across a code base
with multiple users in a development team.
So it's great for local benchmark comparison, both in Pytest and
airspeed velocity, but it is local only really.
And then in CI, continuous benchmarking
is the thing that we're going to talk about next, which is what
allows you to detect and prevent these performance regressions.
In CI, we are going to talk about Bencher,
but I will note that airspeed velocity also
has some kind of rudimentary basic continuous
benchmarking functionality, which if you're kind of looking for a simpler
tool to use, that might be worth checking out. But venture definitely has
a lot, much more features and is much more robust in this category.
So we can go forward taking a look at Bencher.
So what if we had continuous benchmarking?
What if I had continuous benchmarking when I was doing my calendar
API? Right? So rule number one,
let's go time travel back. But yeah, don't set it to 2020.
And so this is
venture. It's the GitHub repo of the open source tool.
As we time travel back here with my calendar app,
and I'm at that first version with the fizz feature,
what would this have looked like? So I
would have gone ahead and written a benchmark at that point in time,
as opposed to kind of trying to do
that proactively, I think is the best way to put that. And so it's
a very similar sort of test function as what we wrote
before, but it goes over every single day of the month,
right. And tests to see how our
function performs for our business logic. And so it
records that. And then in order to run that
in CI, we would need to download the venture CI and
install it, which is a simple debian package and super
quick and easy. And then we'd run that as part of our
CI process. And here we're keeping with the pytest
example, the pytest benchmark example. And we are
going to output our results to a
JSON file. And then Bencher, the CLI will read that
in and take that information and store
our results. So that's
great. We've got our version one instrumented. So then as we move on to version
two, we don't really have to do anything, it just picks it
up and runs it in CI. And we don't have to manually test anything locally
or do any work. It's just automatically picking up and doing the
work for us. So that's great. And things
seem to be going well. So then as we move on to version
three, right, with that naive Fibonacci implementation
there, we'll get an alert,
which is great, because things are running
incredibly slow compared to what they used to. But how does this happen?
Let's kind of take a look at that. As you track your benchmarks with
venture, youll kind of have that first
version, right? And then second version
youll add a little bit more functionality. And that third version is when you get
a huge performance spike, right? And so
that is what triggers the alert. And so
don't worry, we're not going to get too much in the statistics here,
but this is just a probability distribution, average kind
of distribution of what you'd expect. Even if you reran the results
multiple times, there'd be some variation. Right.
And so with that, that first test
is be right there in the middle, and then as you run kind of your
second version of the code, those are going to be clustered. The averages of
those two is going to be very close together. But that third
Fibonacci is going to be all the way out in the extreme. And you
set a threshold in venture, and if anything's outside of that threshold,
which more than likely that Fibonacci sequence
naive implementation would have been, then that is what triggers
the alert. So that
helps you catch performance regressions in CI. Instead of
you manually having to do it yourself, you're able to have
CI do it for you and rely on that to catch it.
So we looked at and talked about trying to
catch things retroactively in production.
And all of the work that that takes, putting out the fires
and things like that, it's just simply too late. We took a look
and learned how to run benchmarks locally,
which is with the very useful tools that the Python ecosystem
offers, Pytest benchmarks. And there's also airspeed velocity.
And then we took a look at using continuous
benchmarking with venture and how that could have helped
prevent all of that anguish and pain before.
And yeah, that's just awesome. So in
review, detection is required for
prevention, production is too late,
development is local only, and continuous benchmarking can
save us a lot of pain. So with that, thank you
all so much. This has been run fast. Catch performance regressions in
Python. That's the GitHub repository for Bencher if you
want to check it out. And if you wouldn't mind, please give
us a star. It really does help the project. And if
that GitHub link is too long to type out, then you can just
go to venture Dev repo, and it'll redirect you right
there. All right. Thank you all so much.