Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good day everyone. I'm Ranjan Mohan, a senior software developer at Alice,
and today this session will be about best practices
with respect to python development. A few things before we get into
the session. As such, I'll be using a few metrics to highlight as to
why certain suggestions are better, and those metrics should be taken with a pinch of
salt, simply because metrics like the execution time
haven't excluded the the thread sleep time, as in the
time taken to context switch in the CPU. So that's the
eventual goal, to exclude that in our measurements in a future iteration of
this, but as of now it doesn't. But because of the steep
gap in the time difference, we work under the assumption that it's
pretty evident that the context switch time is not
playing a big role here, simply because I'm not running anything CPU intensive,
and the fact that the time taken to basically run that is
pretty close to the overall thread execution time. The second thing over
here is this is a fragment of my experience, and I
have only covered best practices that I have resonated with closely.
So it's not the whole list altogether, but I've just
cherry picked a few and would like to share it to you over the next
few minutes. So let's get into the session within the general
section. The first suggestion I have is to use built in methods.
Here we have a collection numbers which contains
values from 100 to zero, as in 199, so on
all the way till one, right before zero, because one is zero is
excluded, followed by squared numbers, which basically should store
all the squared values of the numbers over here.
So we could use a for loop for it. But we have chosen to use
a map which is an inbuilt function, and we give it a lambda to say
hey, apply this lambda for all numbers within the numbers collection.
So this lambda takes in a single input, squares it, and returns
that as the output. Similarly, when we want to get the sum of the
squared numbers, we just make use of the sum function, and when we
want to find the product of all the squared numbers, we use a reduce with
a lambda that multiplies the two inputs that it has.
So this lambda on consecutive application to the
collection of squared numbers will continuously reduce it one
by one, and eventually at the end, it'll reduce it to a single value,
which ends up being the product of all the squared numbers.
So like I said before, all these goals
can be achieved using for loops. But there are two key reasons
why using inbuilt functions would be much better for these scenarios.
Number one is basically, most of the times it offers a very good
performance boost, especially when you're dealing with a large collection of
numbers or values. The second thing is it simplifies
the code and makes it more elegant, as opposed to a for loop where you
need to maintain variable names and add other references and so
on. Using a map or a sum or a reduce clearly
simplifies as to what operation you're planning to do. And it
also takes care of efficiency like point number one. So those are the two main
reasons. Number one, efficiency in terms of performance memory and
cpus. Number two is because of the fact that it generates
cleaner code when you use inbuilt functions, as opposed to
trying to reinvent the wheel. Now, the second suggestion over here is to
fail fast. Consider the requirement where I need to convert
years into months and the number of years is the
input over here. One approach I can take is I can check
for the success or the valid criteria to see if it's a valid year.
Go ahead and compute the number of months and return it. Else throw an exception
and fail. That's basically the fail late solution,
because you're checking for the validity and computing.
And if that's not the case, then you're failing. So you're failing later,
as opposed to you checking for the invalid condition first. And if
that's true, you're failing first, else you're basically computing the
necessary output. Let's try and run these methods,
both these methods for your values that are invalid from minus
n to minus one. So it runs from -100 to minus two,
and I'm going to run it for both and over 1000
times. I'm sorry, it's from -1000 to minus one.
So it goes from -1000 to minus two. So I'm
going to basically run it for all those values and then find the average execution
time for it. Let's see what is more efficient,
failing first or failing last. As you can see over
here, failing first is about zero, zero, five milliseconds,
whereas failing late is about 00:13
so there's a clearly more than a 50% cut
in the time taken when you're failing fast. So this is also
under the premise that the condition that you use to check a
failure criteria and a success criteria and a valid criteria
pretty much takes the same time. If the conditions that you use to
evaluate success or a valid criteria takes way more, it's way
less time than the one makes to evaluate a failure criteria,
then you'll obviously see the vice versa. So this is under the assumption
that the conditions to check for a failure, as well as a validity criteria take
the same amount of resources in terms of cpu time and
memory. Now the third suggestion over here
is to import only when necessary.
I have a baseline function here which does absolutely nothing, it just
returns. And I have another function which imports two commonly
used libraries, numpy and URL lib. So what I'm going to do is I'm going
to try calling both those functions, and I'm going to try timing it to see
how much time it takes. As you can see, the baseline
function did practically nothing, and I think the execution time might have been
in microseconds, so it rounded off to zero milliseconds. But on
the other hand, the import call function took about 264 milliseconds.
Now imagine a scenario where you're calling a function which calls another function
which calls another function, another file, and so on. For each file
it accesses, if it has a global import that is
right on top over here that imports things that are not
needed for the function being called, it keeps coding several hundreds of milliseconds
to the runtime, and that can end up being your performance bottleneck. It can add
few seconds to your overall execution time. So one
thing to notice, especially for expensive imports,
please do it only when necessary. I understand there is a trade
off, as in the convenience of managing all imports here is lost,
but with the convenience of ides where you can refactoring
analyze things. And a lot of these tools which you can use to analyze dependencies,
this becomes a bit of a moot point to maintain it on
the top. So for expensive imports, use it only
in the context where it's needed, be it within a method, within a class,
within a code block. That's for you to decide. But that
would end up saving a few seconds of runtime, if not 100
milliseconds of runtime at least. Now the fourth suggestion over
here is to use caches for methods where for a unique
sequence of input, you have a very, very repetitive, specific output,
right? The output is not going to change if the input is the same.
So for such scenarios, it would be wise to use a cache as in
what the cache does is it maps the input that
was previously called to the output it generated. So the next time, if the
same input is supplied and the method is called, it returns the same output from
the cache. It doesn't need to do the computation again. So for this
we're basically trying to compute the sum of the first n
Fibonacci sequence numbers and we're doing it
using recursion over here. And both of the method logics
are the same, except that the first method doesn't use a cache,
whereas the second method uses an LRU cache, and it stores
up to 128 elements, as in 128 unique combinations of inputs
and outputs. So what an LRU stands for is a least
recently used when the cache becomes full, it evics entries
that are not used recently. So as long as the
entry in the cache hasn't been used recently, it evicts it. That is
the strategy of this particular caching technique. So let's try running both
of these and try to see what the performance gain in terms
of execution time is. So as you can see, with the LRU cache,
it's pretty much close to zero. We don't see any noticeable increase
in time as the number of terms to some increase,
whereas for the method without a cache, we see
it exponentially rising. So as you can see, this is a scenario
where using an LRU cache, or any cache for that matter,
would significantly help. But one thing to notice,
in such scenarios where there's for the same input,
you get the same output. I wouldn't recommend using cache only for
one one exception only if the input objects are very,
very expensive memory wise, because in that particular
scenario, or that case, your cache can end
up occupying a lot of your memory and reduce the amount of free memory
available for other parts of the code to execute. So that is the only
caution you need to exercise. But in scenarios like this, where the
method just takes in a measly integer value, you could jolly
well use a cache to improve performance by several manifolds.
Now that we're done with the general category, let's move on to the loops category,
where let's compare the performance between loops, list comprehensions,
and a map. So the test code over here
is basically trying to accept a list of numbers, and trying to return a list
which contains the same numbers, but the square of the numbers,
not the exact same numbers as such, but the square of all the numbers that
have been passed as the input. So when I run this,
I'm basically trying to call all three methods
for values between zero and 1 million with increments of 10,000.
So it's going to take a bit of time for the execution to complete,
but we'll get a very good idea as to how the performance of all
three approaches for the same problem, or the same goal that we
need to achieve is like. So as you
can see here, list comprehension is the clear winner. It uses much
lesser time than any of them, especially when the size of the
input increases dramatically, whereas loop is somewhere
in between and map ends up using most of the time.
So one thing I would strongly recommend here is even if map ends
up using a bit more time in the order of a few tens of milliseconds,
I would still consider using map more than list for
loop, simply because of the fact that it's cleaner code. And over
time, especially for very large inputs, that's when you really start seeing
the efficacy of map as opposed to a for loop.
And over here, list comprehension is the clear winner for simple
scenarios like this. Go ahead and knock yourselves out with list comprehension.
But when you start generating in a much more complicated manner with
several conditions or multiple objects, then reading list
comprehension is not very developer friendly, so such
code can actually end up adding more cognitive complexity
in terms of the developer to take time to understand. So it would make more
sense to add a map or even a loop at some point.
So performance wise, list comprehension is the winner, even though for
loop gives you a better performance overhead, up to 1 million numbers. I would
personally still prefer using map simply because it generates cleaner
code and it generates a much better performance throughput
at a much, much higher scale. Now that we're done with the loop section,
let's move on to the string section.
Any PR where I see string concatenation about 90%
of the time, I see a comment that says use join,
don't use plus. And there's a very, very good reason why.
Let's take a look. There's a method over here which uses plus to concatenate
a given list of strings. There's another method over here which uses
join to concatenate the given list of strings. Let's try running
this across sequence of lists where the first list is
basically about one value, second list is two values, and so on,
up to 1000 values. And see how the performance difference is when we
use concat, as in the plus operator versus join.
So if you see the execution time for using plus is a
bit erratic. It goes up, goes down, goes up, goes down, goes up, goes down,
and as. And when we move across larger number of strings,
the trend of the execution time increases,
whereas while using join, it's almost close to zero, it's not even observable.
So that parallelizes it makes it way more efficient
in terms of performance. And that is the preferred
method of concatenating strings as opposed to plus,
for obvious reasons, as we can see, now that we are done with the
string section, let's move on to the collection section.
So in this case, whenever we talk about maintaining a
dictionary for a particular reason, we might have to initialize values that
are not present inside it. So assuming we want to add a fruit or
keep a dictionary that keeps track of the number of fruits,
and when we say add a fruit and we pass a fruit argument, it should
check if the fruit is there in the dictionary. If it is not there,
you basically initialize it to zero and then increment the value by one.
So this keeps track of fruits that have been added.
This is the primitive approach to go about it.
But one of the more elegant approaches, and not only
elegant, but even a performance gain approach,
is to use a get function within the dictionary. So what it does is
it tries to get the value of fruit. If fruit doesn't exist, it returns
the default value that we are specified, which is zero, and then it increments
it by one. So by doing such a thing, we're not only finishing it
elegantly within a line, we're also gaining performance. So that's what we're
going to test over here. So there are two operations that I'm testing.
Adding a fruit for the first time versus updating an existing fruit.
So let's try testing both of it.
When we try to add a new fruit in a regular way, it takes about
001299 milliseconds,
whereas when we try to add a new fruit using get,
it's zero, zero. There are three terms, seven, nine,
seven. But to be fair, adding a new fruit,
there's not much of a performance overhead. If I run it again, you'll see
the gap shrink or even the gap for the new fruit
being higher. But where you'll see a good
difference in the opposite direction is basically the regular way
versus regular way to update a fruit versus get.
So the regular way makes much lesser time in
this scenario, as opposed to using get, simply because
of the fact that when you use a get function, you're trying to
get it directly and there is an indirection inside that
to basically pass a default value as
well. So that is a bit of an overhead, not much. So let
me try running this again and we'll see what's happening. So in
this scenario, getting the fruit using get is much lesser than
this, as in getting the fruit or getting
the fruit after updating in the regular way,
as opposed to using get. In a nutshell, if you take a look,
using get for updating or adding, the average time
taken for both operations will generally be lesser than the
time taken to update and add in the regular
way. So you take an average of all these operations for the regular way,
and the average of all these operations for the regular way. This average
using get will always be lower than the average in the regular way,
and that's the reason why comparing it individually may not yield meaningful
results. But comparing all the operations that have been done
totally in the regular way versus using get will
make more sense in this scenario. The second suggestion for the collections
category is to use the generators as opposed to collections. When you
want to generate a list of numbers or a collection of numbers. One way is
to use a list comprehension to generate a list,
or even a for loop to generate a list. The other approach is by just
setting up a generator. So the key difference over here is when you use a
list comprehension, it actually generates the list and
how many of the numbers you're generating are generated and unstored in the memory.
Whereas in this scenario you're just storing a generator object
that lazily generates whenever you need that number or
you call that generator object to generate. So in that case, the memory
used for this is significantly lower than the memory needed
for this. So let's try to run this piece of
code and see whether it gives us any performance gain.
So if we look at the size occupied, especially with the increasing number of values,
we see a steady increase when we are generating the numbers
list as opposed to the generator. The memory usage is constant,
very very minimal compared to the list generation,
because it's not generating all the values and storing it is only
generating it when we use a generator to iterate or when
we iterate through the generator. So that is also one of the best practices
that I would like to suggest. Now that we're done with collections,
let's move on to the condition section. Whenever we have
huge list of values, there are two ways we can go about checking it.
One is to use a for loop and then just check if the number is
present within that. The other approach is simply to use the in.
This is not only more elegant, it comes back to the built in methods
and keywords that we have to use. This is way more efficient
in most of the scenarios. So let's take a look to see
how much of a time gain we get by using in,
or by just iterating through it and checking it out.
So as you can see, the time, there are four sections over
here. Primarily. One is basically the red,
which is matching using in, and then we have the orange,
which is basically matching using the for loop. So this is
basically the orange, whereas the red is over here and it's being
overlapped by the green and the other elements,
the colors over here. But in a nutshell, if you take a look,
the red spike over this is the red spike and these are the orange spikes.
So this red spike over here is for in. So there's a momentary spike over
here, whereas that may be because of the context switching in the cpu,
whereas for the for loop we see way more context switching
and way more increase in the time duration as opposed
to using in. So that is one of the key things that
we observe over here and reason why we would recommend using in,
simply because it's cleaner code and there is also a performance gain as opposed
to using the for loop. Now that we're done with condition, we have successfully
concluded the session. So I would be sharing this repository
containing this code, as well as a best practice markdown file
that contains all best practices we have discussed and
more, in fact. So due to a time constraint, I had to restrict
the number of best practices I could discuss with you folks. But I would love
to hear feedback from you and any other best practices that you may recommend,
so I would be glad to include it in my future session.
And any constructive criticism feedback is more
than welcome. Thank you so much for investing your time in this presentation.
I hope to see you in one soon, another one soon.
Thank you.