Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, everybody, and welcome. I hope you're all enjoying the conference,
and welcome to my talk. Today I'm going to be
focusing on memory management in go, and I've namely called it the good,
the bad, and the ugly. And that's because we're going to be covering a whole
bunch of topics around memory management. So a quick agenda.
What is this session going to be about? I'm going to tell you a little
bit about me, who I am, what I do, and where I came from.
The introduction to memory management itself,
how go's memory model works, and how you manage memory in
go. Going to look at a few code examples, some good and some bad,
and help explain them. And then we're going to look at some memory management in
some other languages, namely Java,
Python, and some rust. Also going to look
at some top tips, and I might even throw in a live demo at the
end as well. So who am I? Well, my name's
Liam Hampton and I'm a Microsoft senior cloud advocate.
I'm an all ser ambassador because security is something I always think
about, and I feel whenever I speak to developers, it's not always top of their
list. I'm a dev network advisory board member and I like
to write a lot of go code. That's my background. I also like to travel
the world, as you can see, with a few pictures here on the slide.
So with everything I do, I like to have some learning goals.
If there's anything that I would like you to take away from my talk today,
it is to understand the gomemory model and understand how to manage
memory in go. Let's talk a little bit about memory
management holistically. What is it?
Well, memory management is a way of keeping track of
memory locations in your program and on your system, regardless of
whether they are allocated or referenced, shall we
say, or unreferenced. Memory management is
a holistic view of what's going on and how your program can
run perfectly without falling over, without failing, and without
running out of memory on your system. Well,
why is that important? We ask for a number of reasons, but namely to
prevent memory leaks, stop security vulnerabilities appearing,
and to stop the slowdown of your system and programs from running.
You need to understand how to work efficiently on your system, and you
want to be the most cost effective you can be. Therefore, understanding how
to manage memory is really important, not just with
it programmatically, but typically how you understand the way
the system works. So let's take a step
back and look at the generics of memory management,
everything that we know and love already. Stacks and heaps,
what are they? Well, a stack stores local variables
and function call frames. So whenever you kick off a new function,
whenever you call it, it creates a callframe and that is popped on a stack
or pushed to a stack, should I say. Now that namely
brings me to the second point of it being last in first.
But. So let's look at this as if you're stacking
some books on the diagram. We have got one and
you're pushing number two, pushing number three, four, five and six.
And it creates a lovely stack, quite aptly named, if I say so myself.
And then let's say you want to get the first book from the stack.
You need to take each one from the top off at one
at a time. So it would go 65432 and one.
That's pretty typical for a stack because it's supposed to be quick,
it's supposed to be agile and it's supposed to be fast.
That's pretty typical. However,
if we're going to be looking at the sizes of it, they're typically fixed sizes.
If we look at a standard typical Linux distribution,
the default size is about eight megabytes.
That's how big a stack frame is supposed to be. However, if we
look at how go manages that, and we'll talk a little bit about this later
on, but we use something called go routines,
as you may or may not be aware of. So these are
another layer of abstraction away from the odd operating system, which typically deals
with threads. But this is something inside the language already.
Now this actually helps us with memory location or stack
allocation at this point. And it starts at about 2
memory, quite small. So that's really good, really fast and really
efficient. Now let's look at a heap.
What does a heap look like? Well, imagine it's like a cloud as we can
see here on the diagram. And next to it we have got two stacks.
These store dynamically allocated memory. It is
not for quick allocation, and it's definitely not for grabbing quick bits of memory
or quick bits of data out of memory. It is there for longevity.
These grow and shrink during the execution of a program, which makes it dynamic,
but that also makes it really slow and really less efficient.
So anything that cannot be stored in a stack is typically put onto the,
or into the heap, which is good, but that can become a
problem later down the line. So as we said,
stacks are for short lived data and heaps are for
long lived data. So longevity, how does
Go manage all of this? So I've kind of already alluded to it already,
but what does Go's memory model look like?
Well, it has a garbage collector. It is pretty famous for its
garbage collector. It's one of the key features of the language,
and it is where it automatically goes around after you,
after your program to reclaim memory, which was allocated.
So it does it automatically for you, gives you a hands up
approach to memory management, unlike some other languages,
like I said, not manual, it reduces security
and leak risk. Now, this is really important. One of the
key problems that we have with memory management as it stands is security
vulnerabilities. Understanding how to close connections,
how to close sockets, is important.
Understanding what happens to your memory when you're allocating global
versus locally scoped variables is also really important.
This helps to prevent the leaks and security risks that you may
see. The garbage collector, like I said, runs around after you and helps
to free up the dereference memory for you.
Pretty cool, and it's really, really important.
Next we have Go outlines and channels.
Well, what is a Go routine? I've kind of already spoke about a little bit,
but it is a lightweight execution thread. It's a layer
abstracted away from the operating system, which typically deals with your threads,
but it's a function that executes concurrently with the rest of the program.
It helps to create parallelism, and it
helps have asynchronous running
of your code, and that in turn is very
cheap. It helps you have a lightweight program that runs
asynchronously. It's brilliant. And that means less overheads, which is
even better. So when you actually go to
write a go routine, or when you want to dictate or sort of have it
in your language or in your program, you would just put the word
go and then the function signature. And I'll show you an example
of this afterwards. What's a channel? Well, a channel
is the transportation between the two go
routines that you, or more go routines that you may have. It's a communication means,
it actually allows you to take data or send data to a channel
from a go routine and then pull it off again so it talks to one
another. Because what's the point in having a go routine that's processing
some data and another go routine processing some more data and them
not being able to communicate? Well, this is exactly what a channel is
for, and it helps to prevent race conditions, locks and other synchronization problems
that you may face. And again with the syntax,
you dictate it with the chan keyword, which, I mean, I think go
has about 25 keywords. It's a really awesome language like that.
But when you want to write to it, let's say we have c.
C is the channel and we want to take x and we want to put
the value of x onto the channel. Well, you'd use that with a notation of
an arrow. And likewise when you want to read from the channel, you would take
that arrow and put it the other side to take it off. Again,
quite simple. So in essence, what does memory
model look like and what does it do? Well, it ensures the program doesn't
run but of memory by using the garbage collector. And it really helps
you a lot. It allows go routines to communicate safely
and keep good state. Therefore it is perfect to run your parallel code.
Brilliant. Now that's a whistle stop tool. There's obviously a lot more to it,
but we're going to keep it at that high level just for now.
How do you manage that memory? So, once you have looked at the garbage
collector, we have looked at go outlines and channels. How do you manage
memory in go? Well, there's two ways
that we can help you do that. There's one called the new function.
Now this is allocating memory of a variable for a given type,
and it is typically zero valued at this point. So let's
take an example. We've got pointer at the bottom and
we are going to be calling the function new and we're going to give it
the type of an integer. Therefore, if we want to get the
value of pointer, well then it's just going to be zero,
zero value from beginning. And the second way is to use
the make function. Now this is allocating memory for data structures.
So if you want arrays or slices, maps,
channels, and you then want to use them straight away with a
default value, this is when you'd use the make function.
Again, let's look at an example. We want to make slice.
So we say we want to declare slice and we want to
give it a default value of an integer slice with the values three
and five. Now this is how you would do that. And this
is where you'd use make. So you'd use new when you want
to initialize a variable and then use it later on your program, which is totally
fine. And then you can use make when you want to create a data structure
and use it straight away also fine.
Two really good ways to help with memory management in go.
So what about memory leaks? I've said it a few times and
I'm going to say it again, memory leaks are bad now
how can we avoid them? What is it? Well,
a memory leak is typically when you have a memory allocation
that is referenced but it's no longer needed and it's not freed
up. So this can eventually cause your program to crash or
slow down your system significantly. This is
really bad and we don't want to come into any of these and there's ways
to avoid them. But let's look at some typical scenarios of when you'd come across
this. So if you're not terminating a go routine completely
or properly, rather this can continue to hold on
to allocated memory and it just holds it. It doesn't do anything
with it, it's just there. But that memory block is now stuck.
You're not freeing it up. Now imagine if you had a number of go routines
doing the same thing. Well then you'd have a memory, you'd have
a memory allocation of this much and it would eventually fill up bit by,
but causing a problem late down the line.
So if somebody else or another outlines wanted to use a piece
of memory, well then it can't because it's already taken up.
What do you do? Well, it's just going to fall over.
Secondly, another really common mistake
that I see is assigning global variables and never using them again.
And I'm going to show you an example in a moment. But you never want
to assign a global variable and do nothing with it.
You always want to clean up after yourself and it doesn't always
happen again. A scenario of a memory leak and
of course the famous infinite loop. You're going to be taking memory
upon memory upon memory and you're not going to be doing anything with it.
It's just going to be dormant. That is a classic
example of causing memory leaks. What tools can we
use then? Well, there's a few that we can use and the go
toolchain has a really powerful one,
pprof. And this is basically a built in package that can be
used to analyze and understand the go language
and your functions. And I'm going to show you an example,
hopefully at the end. How can you personally help
as a developer? Well, there's a few things now. Number one, it's pretty
obvious. Be vigilant, don't use global variables if you're not
going to allocate them and deallocate them efficiently and properly.
You want to understand the code that you're writing and you need to understand
it properly. And secondly, the defer keyword.
Now this will help to reduce leaks with files, sockets, database connections,
anything that you do not want to leave open.
If you are opening, I don't know, say a file, and I'm going to show
you this in a moment, you're going to want to close that file regardless of
whether the function is going to pass or not or whether it's going to complete.
So let's have a little look at some code.
Here's a good example of the default keyword in
this bit of code. We're opening file txt.
We're then going to check for an error, which is pretty typical,
and then we're going to defer the closure. Now this
is the really important part. This file
close will execute after the surrounding
function, which means that even if it fails, even if that function fails
or doesn't complete processing properly,
the file is still closed. Really important because
we don't want to leave that open. That's when we use deferred
keyword, network connections, database sockets files,
things that you don't want to leave open that can also create security vulnerabilities
at that point. Again, not something we want to do.
The next one, garbage collector. What are we doing here?
Well, we're creating a struct called mystruct of type
struct. And inside there we
have got data, and that is of type byte.
So it's a byte slice in the main function. We are declaring
it as a variable. We are then going to allocate it
with about 100 megabytes. Really, really good
example of the garbage collector because once this is completed,
once this function ends, it's then going to clear up after itself.
The garbage collector can reclaim that memory that my struct has
been allocated at this point here. So once it's completed,
done, and it will clear it off. Perfect. Now let's
look at some bad examples. In here. We are declaring a
global variable of data and it's of byte,
it's of type byte inside the main function.
We are giving it again 100 megabytes.
However, once it's completed, and once it's finished,
we have still got a problem because that global variable is
still there. So how do you fix it? Well, you move that
data allocation into the function,
you give it a local scope. At this point,
we don't want it to be global. We don't want it to exist outside of
this function because it's never going to get cleared up. The garbage collector is just
going to look at it and think, you know what, it's still there, it's still
referenced, it's still being used, but it's completely dormant.
So you make it locally scoped so that when it does complete that function,
it is then cleared up. All of that data is freed up for the next
one. Let's look at another bad example.
Recursive functions, everyone's favorite. Right in
here we are calling recursion and we're
giving it a big number that is then going to
call the recursion function a number of times.
Inside here we have an if statement. If n is not equal zero,
then return and do the same thing again and the same thing again and
the same thing again. That is really bad because you are continuously taking
up memory and you're continuously iterating over this loop and you
don't want to be there. So it's
eventually going to run out of memory at some point. Okay.
Or it might not run out of memory. It might slow down significantly
during its execution. Admittedly, you're going to need a much bigger number
than this to really bring something down on like modern day
systems, but this is still very valid use case.
How do you fix it? Well, we reduce the recursive call
number. That's a pretty obvious one. Okay. There is no golden nugget
for this one. However, reducing the call number or using a controlled
iterative solution such as a range or
a duo loop, something that we can control specifically
as a developer. Okay. You don't want to just leave it to its own devices
and continue doing it time after time after time.
So now let's look at some go routines, let's look at some code and how
it really works. So what
do we have? Let's go over to the browser,
into the go playground and let's have a look, shall we?
All righty. So over here we have
got a go routines function.
We have got go task number one and go task number
two. And then we're going to wait for a little bit. We're going to wait
for those tasks to finish in here. All we're doing
is printing
task number one and task number two.
That's all we're doing. Nothing too complicated. However,
when we're running task number one and task number two in go routines, there is
absolutely no guarantee that task one will finish before
task two. And we're going to see if we can get it to do it
today. So let's run this function a couple of times
and see what we get. No guarantee it'll even work this time,
but we're going to see if we can try and find out.
So we have one and two, we've got two and one. Straight away,
two has finished before number one. Perfect example
of a go routine and parallel execution.
Let's have a look at channels. I briefly spoke about them.
Let's have a look at them in action. So first of all we are creating
a new channel and we're doing that. We're calling it c and we're
saying make chan of type Int. So integers.
We just want to be working with numbers here. Then we're going to send a
value to that channel. We're going to say go. So we're
going to dictate another go routine and we're going to say in there we got
a channel and we're going to pass 42 onto the
channel. Then what we're going to do is we want to read
from that channel. So we want to take the data off of it
at this point. We want to pull the data out of that channel.
So once this executes we would expect to see 42.
So let's go ahead and run it.
And of course we do. Now the next one
is where we're taking it down just a little bit deeper. And we're
going to look at pointers and referencing with memory addresses.
We are saying Liam's number is equal to 27
as an integer. We are then saying the
variable of pointer is equal to the memory
location of Liam's number.
And then we want to print them out. We want to print out the pointer
which is going to be the memory reference or the memory location. And then we
want to print out the pointer of that. So then we actually
want to print out the value. Then we're
going to look at a little bit of arithmetic. We're going to do the reference
of pointer is equal to pointer or the reference of pointer
plus two. And we would expect to get a numerical
value. So let's print that out and see what we get.
So we go ahead and run it. And down here we can straightaway say
that the value of pointer. So this line here,
line number 16 is the memory address location.
And then we have another one. Line number 17
is 27. The actual data at that memory
location. And then we're just going to do a bit of arithmetic. So we're basically
saying substituting the values in 27 is equal to
27 plus two, which equals 29.
That is how we're going to be working with memory locations and pointers
and references. And you do that with ampersands and little stars.
So that's a really good way to look at how you can dictate and manage
memory in your program. Okay, you don't want to be overwriting the
wrong values. You also want to keep a true value at some point, and this
is how you work with them. So let's go back to the slideshow.
Okay, so what are we doing next? Let's have a look
at some memory management in some other languages.
Let's go and have a look at Rust.
How does rust manage memory? Well, it's a little bit different to go,
rust uses ownership and borrowing, a completely
different approach to how go works with hits memory.
So ownership is where a piece of data basically has a
single owner that is responsible for managing its lifespan.
So it's deallocated by the rust compiler automatically. Completely different.
And borrowing is the idea that a piece of
data can temporarily be borrowed by another piece of
code, but it must be done in a way that guarantees its
safety and integrity. So again, it's got a real tight security
on its data of how it's passed around and where it's passed around.
But it's a really different concept of what we're used to
in go. So let's have a look at an example on
the slide. I've got a main function, and we have got a
string. So let's say we're giving s the value of string,
and it's going to be hello.
Then we're saying we want to calculate the length, but we're going to pass
string into that function, but we're only going to give it a
reference of string. Now we're borrowing the
initial value of s with the ampersand. We're passing
in a reference to calculate length.
From there, we're going to take it in. We're still using it. It's still locally
scoped, and it's still only a reference. We're just borrowing that
version of s, and then we want to give it an output
so it passes it back into the main function, which is still accessible
because it's owned by the main function, and then we're going to print out the
value. So it says hello is five,
so five characters. That is how rust is
working with its memory allocation. It passes
by borrowing and ownership. Completely different to
go. Then we're going to have a little look at Python. Now,
Python's somewhat similar. It uses a garbage collector, and it
uses a technique called reference counting.
So a reference is a way for a program to access
an object in memory. Okay? So when a variable is assigned to an object,
a reference of that object is created. This is
how Python works. It also has a cyclic garbage collector,
which basically periodically checks upon unreachable
objects or unreachable unreferenced objects in memory,
and then that's how it then frees it up. So if it can't call it,
if it can't reach it, it will free it. And it also
has a built in memory. Managing this cyclic
garget garbage collector is sometimes a bit of a pain
in Python because it runs in the background, and when it
executes or tries to free up memory, it typically pauses the
execution of your code, at least. So how I understand it to
be that can then cause security vulnerabilities
and memory leaks. So it's maybe not the most efficient way,
but it certainly works well in Python and everyone's old trusty
Java. It's actually really similar to go in a
lot more ways than people may think. It uses
a stack and a heap in a very similar way. It has a garbage
collector which manages the memory on the stack and with the
heap, and it uses a technique called mark and sweep,
very similar to what I just said. It will go around and try and find
all the referenced and all the allocated memory, and it makes a notation
of it. Everything that it finds that is not referenced, it will
then sweep away and get rid of. Little bit different
to how go works, because go actually uses a stop the world technique. But that
is a conversation for another day.
So what are my top tips for effective memory management?
Use the defer keyword. It schedules a function to
execute later, typically when the surrounding function returns.
So always use it. This will help clean up your files, close connections,
and release any locks that are still there.
Number two, use the garbage collector wisely.
It's important to understand you have it. It's important to
know it's there and it works in your favor. But if
you are consistently creating and discharging lots
of short lived objects, it's going to
make your program slower. Something or too much of something
good can sometimes lead to something bad, and in this case you can
run into that problem with go. So be
very selective of how and be very pragmatic when you
are making decisions with your code. And number three,
monitor memory utilization on your system.
Use the tools that are available to you as a developer, such as utilities
on a Mac, such as what I use. Use Pprof,
use the built in tools, use the go toolchain to help you create a
more dynamic program. Just understand it
from a lower level and it will help you write good
code. In conclusion, there are a
few things that I want to say. Memory management
is complicated. It is by no means easy, and this talk
very much scratches the surface, but hopefully opens the doors
and ignites some inspiration for you to want to dig a little bit
deeper into the management of memory. In Go.
The garbage collector handles most of it for you.
Like I said, it's important to know it's there. It's important to know it works
in your favor. Just understand it properly.
Thirdly, memory management is different across languages. You may be comfortable
in one language and it can be completely different in another, which ultimately
leads to a different style of coding and understanding. So there's more
to it than just knowing how to create a hello world.
Understanding how to manage it once you create more complex programs
is super important. And of course leaks are really bad.
So with that, before I close off, let's go see
if we can create and see Pprof in action.
So I am going to come out of my slides and
open up vs. Code. In vs code, I have got a little
program that is running. We've got a
couple of imports, pretty standard. We've got Pprof
also imported, but we're ignoring it. We have the little underscore which
if you write go, you'll know you're ignoring anything
with an underscore. We have a main function which is spinning up a
go routine and it's going to create a server for me
and it's on port 60 60. Now that's really
interesting because port 60 60 is actually where Pprof runs.
We're going to print out hello world and we're going to have a weight group.
Now this basically blocks and allows go routines to complete at
this point. Then we're going to add number one into a weight
group. We're going to add to it and we're just going to call leaky function.
Now, leaky function is a function I've got written at the bottom, which is
going to do some fun stuff for us. It's just going to
go around a for loop a lot of times.
So let's spin this up and see what we get.
So let's start the main function with gorun. Main go.
Now hopefully we should see an output here.
Cool. So we got hello world. Now let's go and check out Pprof.
Let's go go tool pprof.
And we're going to give it the heap. Although localhost 60 60.
Right, we're in now just like you would do in
your terminal if you want to follow some log or something. We're going
to do use the word or use the command top this
is going to help show us where our memory is being utilized.
And straight away we can see leaky function right
at the top with a use rate of 91%.
So it's using a lot of memory and
the longer I leave it, the more it's going to fill up. And of course
we could see a lot more of it. Now, there is also another way that
we can see this. We can go into a browser and we can go and
get like a full stack heat print, but as well. But that's
a completely separate kettle of fish. But this is just a really quick way
in your terminal to see how much memory you're using and where it's being
used in your functions. So let's head back to the slides
and we will just finish off with that. I would
like to say a massive thank you to everybody for joining my talk today.
And if you have any questions, please do reach out to me on social media.
I would love to answer your questions. I'd love to have conversations around this.
I'm learning just like you. So please do connect
and thank you very much and goodbye.