Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, myself Nisaraksha and today I'll be talking
about Python memory and rabbit collection and so
how it works and all. Myself Nisaksha and I am an undergraduate
can student and right now I'm interning at fountain
and startup. You guys can connect with me on the following social media
links. And yeah, if you have any feedback
for me on this presentation then also feel free to give
me and if you have any doubt or if you want to discuss
any topic then also, yeah, I'm open to that.
Now let's have a look at what will be the main points covered in
this talk. What will be the giveaway of this talk?
So first of all, I'll start with the Python objects. Then we'll shift
to the memory storage, that how memory is being stored and all. And after
that we'll have a look at a couple of garbage collection algorithms, what it
does, how they works and all that. After this presentation
you'll have a thorough understanding how the memory deallocation
and deallocation works. And so you might
be able to write some efficient Python code because now you know how the objects
are getting deallocation. So you might consider that thing while designing
your application. Also, as a Python developer, you might have only heard of
objects. So yeah, in Python everything is object only.
Now let's say here I have assigned nine to a variable,
and after that if I do id of variable, then I'll get an id,
and if I do type of variable, then I'll get a type which is like
class of int. So yeah, as soon as I assign this nine,
then Python. So object will be created which will have
unique id, its type value, and a reference count.
So like as soon as any assignment is done, an object will
be created which will have all these properties. Now, how the
memory storage actually works in Python, the objects
and the instance. So the objects and the instance variables are actually
created in a heap memory. So as soon as the functions
are written, so at that point there
is no need of that function or variable so
that objects will be collected by a garbage Collector. Now what's
garbage collection? We'll see that in upcoming slides.
And the methods and the variables, like the local variables and
the methods are created in the stack memory. So a stack frame
is created. Then as soon as the method returns,
those frames get destroyed. In Python,
you can have an object into an object, right? So a list into
list or a dictionary into dictionary, or a list into dictionary, dictionary into list,
any kind of thing, right? Because of this Python's dynamic
nature, we need to have smart memory allocation
scheme and also the speedy memory allocation speed. Now,
to speed up all these operations, Python has a manager which
is called Pymeloc, and it actually sits on the top of the
general memory allocator. So the main
work of Pymeloc is to allocate the memory
and speed up the memory operations. Now let's say
in any Python application you might have hundreds of
objects and a couple of long running Python processes,
right? So there might be need of huge
memory. Now if you as a developer are
handling the deallocation of memory, then it might be
a case that by mistake you might deallocation some
wrong object, some wrong variable which may
be in use in further program. Then your program will be crashed
and debugging will take a lot of time. So who
does this garbage collection thing? So who actually
deallocation the memory? So for that, the garbage collection
has a bag. It collects the objects which are of no
longer use and it de allocates them. It actually
releases the memory whenever the object is of no longer use. So you
can think this system as a kind of trash bin in our computer.
So if you don't need any folder, then you'll just shift
it into the trash and then it will be deleted. So it's a kind
of a similar system. The garbage collection automatically
tracks all the objects and it deallocation the memory automatically.
So it's a huge relief for a programmer that the programmer does
not need to deallocation the memory by himself. And the
thing is, the garbage collection algorithms actually
track these objects and it finds an optimal time at
which to deallocation. There are these algorithms
which does all this, and it is very useful for us also.
So as a programmer, we don't need to do anything.
Now let's look at the GC algorithms. So one is the reference
counting algorithm, and the second is a generational GC reference counting algorithm
is pretty much straightforward, it's easy and efficient also. And it
works as like whenever there is no reference. So as the name suggests,
reference fountain. Right? So whenever there is no reference of an object
at that moment, it delocates the memory of that object.
Now, to keep track of all the references, every object has
an extra field called reference count, which we saw
in the first slide that each python object, so each object
has its id, its type and
reference count and a couple of other fields, right? So actually
that reference count is increased or decreased based on
the allocation and all that. And the generational GC algorithm.
So it is based on a trace based algorithm so it actually
scans a list of objects and all that. So we'll look into much
detail in the upcoming slides. But these two algorithms
are there which work in different cases.
And so actually generational DC
algorithm detects the circular references,
cycle of references, and it is used in that case.
So let's have a look what all this means and let's understand it.
So first of all, we'll see the reference count. Now here I have assigned 19
to variable a. So if I do id of a, then I'll get the id
of this object. So as soon as I do this assignment,
like this object will be created, and of course the id will also be
there. So this object will be created, which has its type,
value and reference count. And all these properties
are automatically detected by the python. So we don't need to give it here you
can see the reference count is one, so we can say that
a is referencing towards this object. So like this
kind of thing. Now let's say if I introduce like new variable b, and if
I do b equal to a or b equal to 19, then what will happen?
If I do b equal to a, then you can see when
I do id of b, then I'll get the same id, which means b is
also referencing to the same id to the same object.
And as soon as I do b equal to a, the reference count will be
incremented by one. So now it will be two. Now as soon as I will
do c equal to 19, at that moment also the reference count will be incremented
by one, which means here reference count will be three and C
will be also pointing towards this object. You can get all this count
by sys get ref count. You just need to pass the object
in that and you'll get the reference count. But here note
one point that passing a variable into the function,
it also increments the reference count by one because
it creates a local copy of that, and that also
is referred to the same object. So if I do sys get
ref count of c, then I'll get the reference
count as four, not three. Now let's have a look how
the list works. So let's say I have one list, which is l one equal
to one two three, then I'll do id. I got an id.
If I do l two equal to l one, then I'll get the same id,
right? But now let's say I copy the list, like copy using copy
copy. If I copy the list one into l three, then you can see that
the id of l three and id of l one are not same. It means
the copy actually creates another object and it is referred to
that object. Of course, if I do l one equal to one three, four,
then the id of l one gets changed because now it is referencing
to some other object. Now let's see what circular reference
means because like generational dc algorithm detects it. So as the name
suggests, some kind of a circular. So like here it's a circle. Here it's a
circle, right? So circle reference means either an object is
referencing towards itself or like two objects are differing
towards. Here, let's say I have one list l one,
or list one equal to this, and I am appending list one.
Append list one. I am appending the object
into the same object. Or else let's say I have two objects,
object one and object two, and I am referencing object two into object
one and then object one into object two. Now, in this
kind of scenarios, if you look, then the reference
count will always be greater than one.
So the reference count won't be equal to zero at any moment.
Because if I am referencing towards myself only
then the reference never be equal to zero. So in this moment,
the reference fountain algorithm won't work.
And so the generational DC algorithm comes into picture.
Now let's see how the reference counting garbage
collection algorithm works and how it handles the
reference counting algorithm. It actually always counts the reference
number to the objects, right? And the reference counts
are kept in the memory so that the programs are executed
effectively. Now, first thing, we need to have a look at
that, in which case the count will be increased, right? So one will be,
if I pass an argument into the function, at that moment, the count
will be incremented. If I append an object to an object,
then also the reference count will be incremented. And if I
do any assignment, operator, then also the count will be incremented.
The reference counting algorithm scans the reference count. And if it
finds any reference count equal to equal to zero, then it deallocation the memory
for that object. Now, let's say I have one list, which is one,
two, three, and if I'm appending that list into list two,
if I have four, five, six, and I am appending that list one to list
two, at this point the reference count will be one of l one.
But now, as soon as I do l two equal to four, five, six,
and I am append l one into the l two, at that moment, the reference
count will be incremented by one, and now it will be two.
And if I delete the l two. Then at that moment the reference count
of l one will be decremented by one.
Also. Now for global variables. So what
about global variables? Right, so if the reference count
of global variables becomes zero, then this
algorithm will deallocation the memory, and after that we won't be
able to use that global variable. But the main
use of the global variable is that if that variable is
not in use at any state, then also I'll be able to access that anywhere
in my program. Right? So here in this case,
it is made sure that the reference count of the global variable never
drops to zero, which means they won't be de
allocated and they will be deallocation only
once the Python program execution is finished.
So that's of no issue here. Now let's look at one example
here. I have one function named poo here. So I have assigned one string
to my name variable. So at this moment the reference count will be
one. As soon as I pass this, my name into this poo function, the reference
count of this object will become two. And as soon as the execution
is completed, the reference count will become one.
You can delete the objects using del method, right. When you use
del method in Python, it actually decreases or
it actually decrements the reference count by one.
So if you delete an object, then also at that moment the reference
count will be decremented by one for that particular object.
Now let's see, what are the issues with this algorithm?
Right. First thing is, it can't detect circular reference,
because the reference count for the circular references,
or that cycles won't be equal to zero at any point.
And the other thing is that this algorithm actually has many
memory and performance issues. And also
it's kind of a big algorithm, but the main advantage is that
it is real. So the main advantage of this approach
of scanning the reference count and check if it is equal to equal to zero
is that the objects can be immediately and easily destroyed
after they are of no longer use. So as soon as the count is
found equal, equal to, to zero, at that moment the object will be deleted,
destroyed. So yeah, that's a plus point. But the thing is
that this algorithm can't detect the circular reference. Also,
there are some performance issues and all that. Now for
that, the generational GC algorithm comes into picture.
So it can actually detect the cycle. It could reach
to the object which was unreachable by the reference
counting, and it could delete that objects and it could
free the memory. One more thing to
know that in what cases, reference cycles could occur,
right? If we refer any object into objects. So let's
say a list into a list, dictionary into dictionary,
the same thing, or let's say two objects pointing towards each
other, something like that. So in this kind of scenario,
the reference cycles could be created, and these kind of scenarios
are found by the generational DC. And if they are of no
use or something, then the memory deallocation
will be done. Now, one thing to note is that this algorithm does not
run in real time. It runs periodically while the reference counting
was running in real time. So in real time it would scan
if the reference count is equal to equal to zero, and then all the things
were done there. But the generational Gc algorithm does
not run in real time. It runs periodically. So let's say at
x moment of time, I have declared like five variables, and so I've
created five objects, let's say then after some moment,
so let's say now at y period of time, the generational DC algorithm
started running. Now, between that period, let's say
at y moment, now only three objects are induced, and the two objects
which I had declared, like their executes or their function in
the program, has been completed. So now at that y moment,
first of all, it will check if the objects are newly created. If it is
newly created, it will insert into the generational zero list.
Actually, GC algorithm, it defines three
generation list, and it inserts the objects into that.
And then it checks whether the object is of use or no use,
right? So at that now by moment, it will see if the
objects are newly created. If it is so, then it will insert into the generation
zero list. Now it will check for the references. So it
will scan, and it will check whether all the
inserted objects are of use or of no use.
We'll see how it detects that these objects are of user,
of no use, or if it has circular references, let's say. Now the
algorithm found that there is a circular reference,
or two objects are of no use, it will discard
those objects and then insert the remaining objects into the
generation one list. Now, the same thing will happen with the generation
one list. So it will also check for the references.
It will discard the objects which are of no use, and then it will insert
the remaining objects into the generation tool list. And then
this generation tool list, this generation two list,
also scans it, and it discards the necessary objects. And now the objects
which are there in this second generation list, it remains until the
Python execution is not, until the Python code
execution is not completed. Now one thing is each
and every generation list maintains its own individual
counter and its own threshold. So let's say if it
exceeds the threshold, then also the scan will be done and it will try
to find some objects which are of no use or something, and it will try
to discard it, right? So now let's see how it
detects the cycle or how it detects which objects
to discard and how the circular reference is detected, and to
discard it, right? Which is like this step,
discard the objects. Now this algorithm has
two containers. One is the objects to can container. Second is the unreachable
container. And each and every object that supports this
garbage collection will also have an extra reference count. So this
reference count is the basic count of Python memory,
which is incremented on any assignment or like
something. But this now GC underscore ref will be an extra
reference count initialized by this. So as soon
as the algorithm starts, GC underscore reference is initialized.
So how the initialization takes place. So this link
one object is referenced by the outer
variable, which is a so the reference can will be incremented
to one. Then like this will be one because this link two is referenced by
link one. Then link three will be also one, and then link one will be
incremented to two because it is also referenced by link three. And here like link
for the GC underscore RF will be one. Now just imagine
that this link one, link two, link three are objects and let's
say a equal to link one object. And then internally this
link two and link three objects are referenced by this object.
After this initialization is done, algorithm makes
one more iteration, and then it finds that which objects
are truly referenced by the outer world. So here
link one object is directly referenced by the outer
world, while link two, link three, and link four are not
directly referenced by the outer world. Now, it will scan
all these objects, it will find this and it will
decrement the GC underscore reference field.
It will actually decrement the GC underscore reference field by
one if it does not find the outer reference directly.
So here link one object has direct reference from outside world,
so this will be one and this will be zero,
zero and zero because this object does not have the direct
reference from the outside world. Now as soon as this object
has been scanned and this number has been allocated, so let's
see what happens next. So now this will be the state,
but now notice that GC underscore reference equal, equal to zero
does not mean that these objects are not reachable. Of course,
GC underscore reference greater than zero means that it is actually
directly accessible from the outside world. But GC underscore
reference equal equal to zero does not mean that they are not reachable. Actually,
here link two is reachable by link one. Link three is reachable by link two.
So now one more scan will be done.
And now for the objects
which has GC underscore reference equal to equal to zero count, they will be marked
as temporarily unreachable, and they will be shifted to the
unreachable container like this. So let's say it saw
that GCN square reference of link four is zero.
It will shift this into this container. Then for three, it will shift this into
this container. Now, let's say it scanned link one. Then the
GC underscore reference was one, but so as it is greater
than zero, then it will not do anything. So this
processing is actually done by TP underscore traverse function,
and it actually marks all this temporarily unreachable
tag and all that. Now as soon as this is done, then the TP underscore
traverse again can and it finds all
the objects which has an immediate object from the object
which is actually referenced to the outside world. So here link two
has an immediate reference from the link one, and the link one
is actually referenced by the outside world. So now
this link two will be shifted to this objects to scan, and its
reference count will be incremented by one. Then it will scan link
three. Yes, now it will check whether the link to object has
an immediate reference or not. So yes, it has the immediate reference.
So it will get this link three object into this
object to scan container, and it will increment its GC
underscore reference count. And after that, like link
three, immediate reference is link two. So here also the GCNs
reference will be incremented. But now if you can see this
link four is all alone by itself. So it is neither
referenced by any object, and also it has no reference from
any object which is accessible from outside world. So now
in this kind of scenarios, this will be now marked as
finally unreachable. And at the end, all the
objects of this unreachable container will be deleted
by the algorithm. So this is
just an overview of the working if we
have three objects which are referenced by each other, and then
if we have fourth object which is referenced by itself,
now all these three objects are.
So the first object has direct reference by some variable,
and now the circular object will be
referencing to itself only, right? So in this kind of
situation, these cycles will be detected
and it will be discarded. So this is just an overview. I would
highly prefer to have a look at the design of the c Python's garbage
collector. It will give you more clarity and also they have explained
by some code, snippets and all. Also, I would highly recommend you
to have a look at the GC garbage function, the GC collect function.
And once you'll see, you'll get
some idea about how the garbage collection so how
the scanning and all that takes place. So actually,
I have referred these things, and I also referred to some
of the python memory model blocks at medium. So they are also
good. So yeah. Now if you have more interest
in knowing this, then I would highly recommend you to refer these
references. Thank you. If you have any questions or
anything then feel free to connect with me. Drop me an email,
or if you have any suggestions for me then also you are more
than welcome to have a chat with me. Thank you.