Transcript
This transcript was autogenerated. To make changes, submit a PR.
What if you could work with some of the world's most innovative companies,
all from the comfort of a remote workplace? Andela has matched
thousands of technologists across the globe to their next career
adventure. We're empowering new talent worldwide,
from Sao Paulo to Egypt and Lagos to Warsaw.
Now the future of work is yours to create.
Anytime, anywhere. The world is at your fingertips.
This is Andela.
Welcome to the secting channel. So, slices and maps in go.
We are going to talk about the most commonly used building structures.
In go. Slices, maps and channels. We probably all know
how to use them, but not necessarily how they
work under the hood. In this talk, we are going to follow an experimental approach
to analyze how they behave in memory
when we modify, create or access these structures.
Okay, we are going to need some class materials. The scalpel,
the microscope, and the subject. Let's start with the slices.
What is our escalpel in a slice? We are going
to use a function called escalpel that is going to receive a slice.
In this case, I'm going to use a slice of integrals because it's
simpler. And I'm going to use unsafe to get access
to the memory address of the slice and store
that data in a structure.
Then I have a microscope function. The microscope function
is going to show me the data in a readable way to analyze
what is going on. And the subject in this case
is a slice. A slices is not other than array or one
or more slices. We are going to see what I mean. This is the
structure of anslice. A slice is an array.
It's a pointer to an array in memory, just a
chunk of memory in the heap. In this case, it's a
chunk of memory that stores integers.
Then I have the length and the capacity of the slice.
What happened when I create a new slice of integers?
In this case, I'm creating empty slice of
integers. The memory address
is not especially important here because
we are not storing anything yet. The slice length and the
slice capacity is zero and the stored data is just an empty,
empty. Okay, what happened if I open a
new integral?
The slice code is going to reserve a chunk of memory to
store that integral and store
there the integral value. In this case one.
The slice length is one and the slice capacity is one.
Let's see what happened when I add more data there.
If I add four more elements, it's going to
change the memory address. Why? It's changing the memory
address because I don't have enough space
in the original reserved memory. If I go here
and see the slice capacity. When I added one element,
the slice capacity is one. So I'm not able
to store more data in that array because the slice capacity
is saying the amount of data that is able
to store that array in memory.
This case is one. So every time that we reach
the capacity limit of the slice is
going to happen at resize. A resize is just reserving
another, a bigger chunk of memory in the heap and
migrating all the data to this new chunk of memory.
In this case we added five
elements and in that process the
slice has been growing.
The slices is not going to grow one element at a
time. The slice capacity is going to grow
more than one element normally because you don't want
to resize the slice with
every single insertion. So in this case, at some
dont, I added enough data to grow the slice
and the slice growth to the capacity of
eight. That means that I reserve a memory
for eight elements, but in this case I'm only
using five of them. So the store data in memory
is 123-4500 because
I have eight positions in memory. Eight integrate space for eight
integrates in memory. Okay,
what happened if I create a super slice? A super
slice is just a
slices of a slice. In this
case, I'm creating a super
slice from the position one to the position four.
What is interesting here is the memory address.
You can see that the memory address here is c one,
a five 40, and here
the memory address is c five
four eight. That eight is because
we are using integral. And in my architecture that
means integral 64, what is eight bytes?
Because this is the same, actually it's the
same chunk of memory, but one byte
after, well, sorry, eight bytes after,
eight bytes after because I'm starting in the position
one. So I'm skipping the position zero of the original array
or the original slice. And also it's interesting,
the slice capacity, because it's exactly the
same chunk of memory. The slice capacity is
seven because before was eight. But I'm not able to use
the whole array. I'm just able to use
the rest of the array, the memory that
is reserved from the address that
I'm using. Okay, let's see
what happened if I modify a super slices.
If I modify, for example, the position zero of the super slice, that is
the position one of the original slice, what is
going to happen? Is the original slice also get modified?
It gets modified in the position one, and the super
slice is modified in the position zero as I did in the
code. Okay,
what happened if I append something that is even more
interesting? I'm appending six, the value six, to the
super slice. But the super slices was from position
one to position four. It is just
three elements and is adding a fourth
element at the end. But the end of the super
slice is not the end of the original slice.
So we are overriding the position, the fifth position
in the original array, and we are appending
in the slice. So the behavior from
the perspective of the original array is different, and the behavior
from the super slice, as you can see
there, the value six is affecting
both the super slice and
the original slices. Even more interesting
is if I happen more data and then modify
something in the super slice, if I happen more data
to the original slice and modify something in the super slice,
what I get is because
I added enough data to trigger a resize in
the original slice, we are no longer sharing the
array memory address. We are pointing to a different chunk
of in memory. So that generates
a disconnection between both slices.
So a super slice can be considered
a window to another slices if that slice
hasn't been resized or the super slices hasn't been resized.
That is probably a very unexpected behavior.
So I wouldn't try to relay in this behavior for anything.
Okay, I have the code that I used to run
this experiment here. I have a link at the end of
the talk with all the code.
So if you want to reproduce this experiment, you can. So just
check out the link at the end of the talk. Okay,
let's talk about maps. The scalpel for maps is going to be
pretty much the same. I'm just using unsafe to access the memory
and store that memory in a structure. In this case, as an example,
I'm using a maps of integral
values and integral keys because it's simpler.
The microscope is more complex
because the data structure in maps is way more complex than
the data structure in slices. And the subject is the map.
A map is just a set of map metadata and a set
of buckets. That is where the real data, the data that just
stored in the map, is stored.
Okay, what is the structure that store a
map? Well, we have the count, that is the number of elements in the
map. We have the flags. That is not important for this talk. We have the
b value that represents the number of
resizes of the map. Well, it's related to
the number of buckets. Not super important, but yeah,
it is related to the number of buckets. The number of overflows
give you an approximation of the number of overflow buckets.
The hashiro is a completely
random number that is generated whenever
you create a new map. And this
random number is used to generate the hash of
the keys to decide where the keys are stored
in the maps. This is interesting
because this gives the certain feature to
the maps that it is not predictable
in which bucket the keys are going to fall. It's not predictable
before you create this random number, before you instantiate the
map. So you can't relay in the order of a map or
things like that, or in what the order
of a map is not relatable in go.
One of the reasons is because the hash zero,
the buckets is just a pointer to a chunk of memory that store
a set of buckets. We are going to see what a bucket is
right away. Then we have the all buckets. That is another
pointer to
another chunk of memory. Storing buckets again.
And evacuated is the number of evacuated buckets. We are going
to talk more about that later and some
extra metadata that is not super important for this talk.
We have the bucket extract also. That is
the structure that is stored in the heap, storing the
real data. Each bucket is going to contain
eight elements and is
going to have a top hash that is just a set of
chunks of the hashes of the keys stored
in the bucket. Well, not especially
important for this talk either.
And then we have the keys and the values. The keys and
the values are two arrays of eight elements
storing each key and each value.
Each position in the keys correspond to the
same position in the values. And then we have the overflow
pointer. The overflow pointer is another pointer that points to another memory address
that contains a bucket, an overflow bucket. We are going
to see what an overflow bucket is later. Well,
let's start creating a maps. An empty map.
In this case, I'm creating an empty map with integral values and integrate
keys. The map size,
because the maps is empty, is going to be zero, the flux zero,
the b zero. What is for zero
size? Well, the b zero means
that there's only one bucket, the hashid that is
a random number. The number of overflow buckets is zero.
Well, the buckets is pointing to a chunk in memory
that is storing an empty bucket. An empty bucket means basically top
hash zero. All the keys are zero or the values are zero,
and the overflow pointer is null or zero.
In this case, the old buckets is not pointing anywhere
and the number of evacuated buckets is zero for now.
Okay, what happened if I add one element to this map?
If I, for example, store in the key one the value ten?
Now the map size is one. And everything else
keeps the same except for the bucket. If I go to
the bucket, I see that I store one key, that is the key one
and one value, that is the value ten, both in the same position
in the keys and the value of those arrays.
The top hash is just a chunk of the hash
of the keys, but it's not super important.
Okay, let's keep going. If I add more data,
in this case, I'm adding eight more elements.
Eventually this is going to trigger a resize. In this
case, now the map size
is nine, the b is one. Because this
number of elements has triggered a resize.
When a resize happens, it's going to reserve another
chunk of memory and it's going to migrate
all the data. It's going to migrate the
data from the original bucket to the new set
of buckets in memory.
Normally it duplicates the number of buckets each time.
And in this process of
migration, it's going to evacuate the old bucket.
That means pick all the keys and the values
and store in the corresponding bucket
in the new set of packets. That implies
that the keys have to be.
It needs to recalculate the hash to decide where
it's going to fall in the new buckets.
Okay. And the number of evacuated buckets is going to be one in
this case because the old bucket was evacuated and moved to
the new buckets, as you can see here.
Well the elements doesn't need to be the same of
the old set of buckets. What an overflow
bucket is, the match gets resized
when it reach certain threshold, but cant
happen that you have a maps with
some data that is not enough to justify a resize,
but one of the buckets is completely full and
then suddenly you try to add something and it's going to fall
in that bucket. And instead of triggering
a resize, simplest approach and better
approach to optimize the usage of the map is
just create an overflow bucket that is
going to reserve a chunk of memory to store a new bucket
and it's going to store in that bucket a new key and
link that to the existing bucket. In this case, as you say, the bucket
one have an overflow pointer that points to another bucket in another position
in memory that have just more
information related to the bucket one.
And also you cant see that the number of overflow buckets
is now one. What happened when you have
big resizes, when you have big resizes in
go, what it does is using
these buckets and old buckets,
variables that are pointing to memory
addresses. So what it does is reserve a
new set of buckets, doubling the size of the number
of buckets, and are migrating
the data to that buckets.
That new buckets is going to be buckets and the previous
existing buckets is going to become all buckets.
And it's not going to migrate all the data in
one time because that will take extra
time that can have some
performance impact. So what it does
is just start migrating the data gradually with
each operation. This way you don't have
a big blocked. When you need
to resize a map, you have just resized the map and it's not
blocking that much and it's just getting some performance
degradation during certain time. And whenever
everything is migrated, the map backs to normal.
All buckets and number of evacuated buckets is used for
this process of migration. Once all
the buckets in the old buckets are evacuated,
the system just released that memory.
Okay, here is the code. If you want to reproduce that,
you are going to find them at the end of the talk
channels. For channels, I'm going to use a channel
of infirty two. Again just using
unsafe to access the memory. Storing that in a
structure, the microscope more or less the same. Getting that
structure, printing that in a readable way.
The subject in this case is a channel. More specifically,
I'm going to use the buffered channels because
it's more interesting and you can infer how
a non buffer channel works. Basically not
assigning a buffer of zero to this. And you
are going to have exactly the same behavior. You can infer the behavior
based on that. A channel is going to have
a queue count that is the number of elements stored in the buffer.
A data queue size, that is the number of elements
that the buffer cant store. The buffer itself,
that is just a chunk of memory. In this
case, it's a chunk of four positions of in 32 because
it's a buffer. Channels with size four
and the channel type is in 32.
The element size is the size of each element
in the buffer. The closed defines if the channel
is open or closed. The element type is appointed to, in this case in 32
representation. So it's a pointer to the type that is stored
in the channel buffer. The send x and receive x
are two indexes that points to
the position in the buffer that is going to be the next place where
you send something or where you receive something.
The receive queue and the send queue is
a two wait queue list. A wait queue is basically
a place where you can store
go routines that are waiting for something.
So in this case, the recipe and the send q are where the
channels, sorry, where the goroutines wait
for the channel to have data or to have a space for data.
Okay, let's see an example. If I create a channel
of infertility, two with four positions, with a
buffer of four, what I'm going to have is
a Q count of zero data queue size of
four, because I have space for four
elements. And the element size is going to be four because it's an in 32,
that means four bytes. The buffer
is going to contain for now,
because it's just a four positions buffer of four integrals.
That is no elements yet.
Okay, what happened if I add one element to the
channel? If I add one element to the channel,
the Q count is going to be increased to one because
I'm adding a five. That five is going to be a store in the buffer.
Where the send X index was
pointing to, was pointing to the first position in the buffer.
So I store five in the first position in the buffer and increment the
senex to the next position in the buffer.
What happened if I add more data than the buffer can store?
In this case, I'm adding 4321.
If I add 4321, I'm going to add four
in the senex position. That was the second position. The senex is
going to be increased and I add three, and then I
add two and so on. But whenever
I try to add the one, the Q count is
already four, so there's no enough
space for adding a new element.
So what the goroutines is going to do is going to store
himself in the
send queue and it's going to park himself to
wait until, to wait until there is some
space in the buffer. What happened if somebody reads
something? If somebody reads something, it's going to receive that
five because it's going to read the
data in the receive X position. That was the
first element in the buffer. So it's going to read that data and return
that data to the requester and
it's going to increase the receive x.
The data queue size hasn't changed, but hasn't
changed because once we read that
data, we see that the San queue
have somebody waiting there. So we wake up that
go routine and that goroutines automatically
inserts the one value. The value that was waiting before to
try to insert is going to get inserted in the position
where the ten x was.
And that's the reason why we have still a four in
the q count and the first element in
the buffer now is one. And the receive index
and the send index have moved one position.
Okay, if I read more data from the channel.
What is going to happen is now it's going to decrease
the amount of data. The Q count to three.
Because there's nothing. Adding more data and
there's no more data. Now it's
going to read where the recifex was. That is in the second position.
So it's going to return a four. That is
what was in the second position before.
And it's going to increase the recifex.
And that's it. What happened
if I read more data than I
have in
the buffer? Well, it's going
to start reading data. It's going to read the three,
the two, the one. And when it reached the point where
there's no more data.
The goroutines that is trying to get data
from the channel that is empty.
Is going to add himself to the receive queue.
And park himself waiting for new data
to come in in the channel.
If somebody sends something to the channel, it's going
to wake up that goroutines. But what
happened if I close the channel? If I close the channel,
everybody that is connected to receive data from the channel.
Is going to receive the zero.
Well,
the closed message from the channel in
this case is going to go to the receive queue. Send that message
to any receiver that is in the
receive queue. And it's going to change the closed
value of the channel to one. And that's
it. Here is the code if you want to try it.
It's interesting.
Okay, some references. If you want to know more about
slices, maps and channels. Probably reading the
go runtime code is not complicated.
It's not super straightforward, but it's not especially complicated.
You can get more or less what is going on. Actually,
I wrote this talk just reading that information.
If you want to reproduce that experiment, you can go to
my GitHub. Repo. Dissecting go and
just reproduce these experiments is just a
lot of fun and some conclusions.
Well, understanding the building blocks of the language is going
to help you to understand the implications that
that building blocks comes with.
Because sometimes
these building blocks have some trade offs
in it. Have some decisions that were made by the.
By the Go team. And if you understand why
that decisions are there and how
they really behave,
you can use it better.
Well, also take into
consideration that there are some unexpected behaviors that can be
surprising, especially the slices. One is really
interesting.
And probably this knowledge is not going
to be needed for anything. That you are going to do in
your work, but it can help
in very specific situations.
So that's it. Thank you,