Transcript
This transcript was autogenerated. To make changes, submit a PR.
Concurrency primitives of Golang, when to and when not
to use some of them. Over the years I've seen a lot of code being
refactored every time there's a marginal change in
complexity because of some concurrency primitive, and I really wish
I knew some of these things which I probably will talk about.
And that could actually save a lot of channels and
a lot of toil just having to go about refactoring this
while speaking with concurrency. One of the important things to discuss
is what exactly is a race? A data
race? To be fairly honest, the most common definition
that comes to the mind is reading and writing to a same memory
location. But before we get to that, there's an important concept
that needs to be covered, which is memory ordering.
I'll probably show a small snippet of a code and
I want you to guess what the output would be.
Here's a simple piece of code. There's a declaration of VAR a
and b which is type int. There's a function f assigns
a equal to one, b equal to two. There's a function g which does
print b and print a. You can
ignore the fact that there will be a data race in here,
because the interesting bit is it is possible
that g prints two first and then zero.
Pay slight attention to this. What is the possibility
when it prints two and zero when print b observed
the value of b as two, whereas the print
a which happens after print b did not yet
observe the value of a equal to one which was
executed before b equal to two.
Why does this happen? It is because compilers at
the time of compilation or at runtime may
do perform instruction ordering change.
So it's not guaranteed that what you see as a equal to one
and b equal to two outside of the block outside of
these scope of that function would be observed in the same manner
and things is a memory ordering problem. And this results
int everything else where we need to now start serializing,
because what we're trying to do is we are solving this unpredictability.
Here's a sample piece of code. It's a pseudocode
to comes it may look like Python, to others it may look like go.
We have a bunch of URLs. There is an
array of responses where we will gather all the responses that
we get by fetching the URLs. There's a small counter of total
bytes received. Simply put, for I in URLs,
fetch the URL using a pool. So we're trying to restrict
the number of parallel connections that can be made outbound.
Once we receive the data. We want to write it to a response array
and also update total bytes received. The control
must block until all the URLs are fetched
and return the response altogether. The total
bytes received are also being sent by telemetry, but that is not
relevant. There are four aspects of
concurrency control which are happening here.
First is the handoff. The handoff is once
these parallelism is done, or I'm liberally using
the word parallel here because that's confusingly used, parallel and
concurrent. Once that is done, the control must pass
on to the main routine, and that's a handoff. So somebody
has to wait to see if all the workers
are done working on what they have done, and only then it must
proceed forward. There's a concept of data sharing here.
Each of the fetch URLs we want to do in a
non blocking manner, which should be able to send
data to be returned to a response array.
There's a serialization happening as well because we do
not want all the URLs to be executed together.
There is a control mechanism which will allow only
certain number of URLs to be hit, and that's these property of
a pool. Then there's also a data consistency problem here because
we want to channel the total bytes received, but because
there are multiple threads that may be performing this, multiple go routines
that may be performing this, multiple processes that may be performing this. Based on the
language implementation, the value of total bytes received
must increase collectively. That's a data concurrency
part as well. So to highlight these again, what are the four challenges?
A control handoff which is passing on
the work, the control from a set of workers
routines to the main serialization. We want
the ability to control what gets executed
at what point of time and what gets to wait, and a control
to block unblock data sharing work
while may be happening in parallel or in non
blocking asynchronous manner. The data must be receivable
by some other process as well, and data consistency at
all points of time. If we have agreed to increment the
integer, then everybody should see a fresh value of an integer
which is incremented. This is also
a derivative of saying that while multiple processes are coordinating with
each other, they will usually have a consensus problem. To solve
a consensus problem, the control needs
to be offloaded to somebody outside the system.
And this is exactly why we need synchronization primitives.
I'm going to call it a marshall. Not to be confused with Marshall Json.
Like a game marshall. What kind of
marshals exist to achieve this synchronization?
There is a sync weight group. There's a sync mutex,
anatomic channels and lockless data structures.
I'll probably not go into the depths of the definition of
defining what these are, because that is almost pretty much available.
I want to narrow this down to what semantics can be used
for what aspect of the problem statement which we discussed
just a few slides back over here. Control handoff,
serialization, data sharing, and data consistency. The first
one is a sync weight group. Fairly straightforward. You must have seen
this all over Golang, blogs, et cetera. Pretty straightforward to use,
but what is it used for? It's precisely only and only used
for control handoff. What is it not used for is it's
not used for serialization. What I mean by that is that using weight
groups you are more often than not only and only
just waiting for control. As the name suggests, it's a wait group
to reach the main routine while some other parallel
routines are doing their job. It's not used for data sharing.
There is no way you can share data, and it's not used for data consistency
either. Using things definition I'm going to through
the slides, there'll always be a fun section where we talk about some
fun mistakes. These fun mistakes are usually
things that we would miss out on, and it
gives us an insight on how these things work at the depth.
Now here's a simple program. What is wrong here? To walk
you through the code, there's a funk main which declares a weight
group. There is a for loop zero to 10 weight group add
simple construct. There's a go routine being called where we call hello
inside hello, we do a print and then we just wait.
Classic example of control handoff where we are waiting.
What could possibly go wrong here? What would go
wrong here is that when you run this code,
it runs int this problem of all goroutine are asleep.
It runs into a deadlock problem. But weight group
is supposed to solve a concurrency problem, not create another problem.
And the reason for that is that even concurrency semantics need to
respect the principles of the language. Now what is the principle
of the language? The principle of the language is I'm
passing weight group by value and not as
a reference. So the goroutine see a copy of
that synchronization primitive and they call a
done on that and hence it runs into a deadlocks.
So even the synchronization primitives need
to cater to the principle of the language.
The next one is sync mutex. What is the sync
mutex used for? It's used for serialization
by using mutexes also locks,
we have the ability of controlling which routine should work, which routine
should not work. When I say which, you're not identifying
a routine in particular, but you have the ability to rewrites code.
These only one can get executed while the other is waiting
for a read or a write operation to happen. So things ability to serialize
operations is what you get because that's what locks do effectively.
And it also allows data consistency,
because while inside a lock you can increment a
value, so the other reader cannot read the value. But what
it doesn't do, it doesn't do a control handoff because
it's used as a communication exchange control
mechanism between two processes, and usually not used for a handoff, which is where
a weight group comes into the picture. And it's also not used for data sharing
because there is no construct of it carrying its own data.
What are the fun mistakes around mutexes int?
A simple counter which has an embedded mutex inside it,
a pretty well known construct in Golang int also has an
integer n. There are two methods. There's an add method which
acquires a lock and increments n by the integer pass. There's another
value method which also rewrites a lock and
returns the c n. So effectively there should
not be any race condition here, because we are judiciously using unlocks.
Now what can go wrong here? If I run this code, when I
run this code, I surprisingly actually run into a race condition.
Now why is that? Because we just discussed that even synchronization
primitives need to respect the principles of the language.
So what's happening here? What's happening here is
counter is not a method which has been
bound. Value is not a method which has been bound. Cto the pointer reference.
What that means is that when value is invoked,
the state of the counter is copied and so is the
sync mutex. And while it is getting copied, it tries
to read n because somebody else had still acquired a lock
on that n. Effectively, it doesn't even matter
what you have inside the body of the code just by accessing value,
because the process of calling value invokes
a copy to happen, and by the virtue of that a
read to happen on n, and hence it results in a race condition.
So even if I was to replace that entire body of the
code with a return one, this would be the exact same
race condition. How do we solve this? We just
make sure that we actually use a pass by reference here.
So the counter should be bound, the pointer should be binding,
the value method, not the other way around. It's a fair conclusion
to say that while resolving a consensus problem,
the marshals themselves are not immune to the consensus
problem. They themselves have to adhere. CTo all of these principles,
underneath all these mutexes weight groups is the
third Marshall, which is the bedrock of all synchronization, which is these
sync atomic. Now surprisingly, atomic has very
limited applicability as such directly.
And most of the times we end up using the constructs
on top of this. But it is the underlying construct which
is used behind most of these other primitives,
which are higher order. Now, what is an atomic use for
data consistency? All int, all through, nothing else.
There is a value. I want to atomically change that value,
update an integer and do nothing else.
It is not uses for control handoff. It cannot be used for serialization.
I mean there can be derivatives of it, but by direct virtue of its own
properties, it doesn't do this by primitives.
Weight group is a derivative of this. So those allow because using atomics you
can create such higher other primitives as well.
Like we'll discuss later, lock free data structure. And it doesn't
do data sharing. It's not used for data sharing because it doesn't carry
it from data like channels do.
Now here's an applicability of atomically performing
a load and store. There's a simple map type map
string string. There's an atomic value inside m store.
We basically save a map. Then there's a read function and there's an
insert function. Read function just atomically does
a m load and converts the value of type
to map. Insert, however, tries to update
this value so it performs can m dot store.
Now just int this simple piece of code while,
and this is a piece of code which has been copied from the Golang documentation.
So what can possibly go wrong here? What can possibly go wrong
these is that if multiple writers
try to invoke the insert method, the construct is not immune
to it. Now what we do is we are doing a
mix and match. So while we are discussing atomics, but we have started using
other constructs to solve our problems. So we introduced
a mutex here. And on the insert we just
guarantee serialization by performing a mutex
lock and then we just defer it. What this does is that if multiple
routines are trying cto invoke the insert method only
prone of them runs at a time because mutexes are great at serialization.
We just agreed that and now we will
just perform this mutex block. Next up,
what would you expect the output to be here there's
a funk a. There's a main method. Inside the main method
there is xy two integers. There is
a print ln, which is basically calling the a
function. Inside the function we pass the pointer of
the integer inside the body of an a. It accepts
an integer and it returns a channel. The channel,
just before it returns, it spawns a go routine and it
does can atomic increment of an integer value.
There are two operations of this. In one variation we
simply call log print ln and we pass the
two fun we call the two functions and we read the channel value from
it. In another variation we store it into two
variables, ab and bv. There are two functions, a and
a called again, and the second line of it will
read from that. They don't look entirely different if
you think about it, because what we have just done is that in the second
variation we have taken the initialization one step ahead.
So it's two stage. Now it's not just one stage. What would you expect
the output to be? It's interesting that when I
run this code the first execution
gives me a two and a four, whereas the second
execution gives me a four and a four. Now which is understandable,
the four and four part. Why? Because it's the same atomic
operation. And remember, atomic operations were about data
consistency. So if it's the same integer
which has been incremented twice and read on the channel it will
obviously be four. But why did the two happen? To answer
this, let's add some log statement. Now what we do
is just after the atomic ad I've added a log print ln
and there's a time sleep of 2 seconds. So what that means is
that before the value is returned it's actually going to sleep for 2 seconds.
Now when we run this, the first execution
prints one prone and two four. The second also prints one, one and
four four. But interestingly, if you observe the timestamp,
the first int happens at the 16th second and then
it waits for 2 seconds and then int prints one and waits another
2 seconds and returns two and four. Whereas the second implementation
on the 20th 2nd, both of them print one and then after
2 seconds because both waited for 2 seconds return a four and four.
So effectively what's happening is the arguments inside
the function are executed serially. There's a classic
mistake that we do when we're dealing with this. A simple way to solve
is use the lower construct, which is actually concurrent
and not the first one because that would happen one after the other. Now this
can result in a situation because it is possible that you
may get blocked on the other value because the one hasn't worked.
Clearly evident in this code here because it happened as
two and these, the other value came as four. We also discussed that
atomics and mutexes can both actually be uses for data
consistency. So are they interchangeable? Is a common question.
Atomic, as we said, are the bedrock of int
using that others are done now? Yes and no,
because there's only one case where you can actually interchangeably call
them as performing the same role, which is assume there was
an integer operation where we had to increment something. Now I may use atomic
add int, or I may actually take a lock and do an unlock and apply
it. Int will give me the same behavior at the end of the day.
And this is the only time where we say that oh yeah, there may be
overlapping concerns, but interestingly atomics are actually way faster,
so you may be able to do more number of atomic operations compared with
the locks because atomics are sent directly to the CPU and
these are no language. VM ordered locks which are
being performed. So you would actually see a drastic performance gain. But all
over Golang we've always read and seen please do not communicate
by sharing memory. Instead share memory by communicating, which takes
us to the channels. Now what are channels used for like
atomics are only and only used for primarily data consistency.
Channels are used for everything, which is other than that, which is control,
handoff, serialization, data sharing. It's surprising how much you can actually
start doing with channels. I mean it's only not used for data consistency
now because channels are so widely used.
I'm going to cover a lot of these fun mistakes.
Fun for me, maybe not fun for your others. Look at
things. Piece of code. What could be going wrong here?
There's a funk request which returns an InT. There's a channel,
there is an I, it loops over zero to
five increments at plus plus, and it
writes the value to a channel. What could go prone?
These. What can go prone here is that
those goroutine are waiting to write on
a channel. Now if the receiver or the reader of
the request only read it once, there are multiple
invocations of those functions which are just blocked forever
because they are just waiting for the channel to clear up to rewrites.
To solve for this, we have constructs. What we do is we
wrap it inside a select clause which tries to write to
a channel, and if it doesn't work it falls to the default,
which is an empty. Surprisingly, this is also not the correct
way, because this can also run into a unique condition, which is the
last line can block forever now, because it's possible that
even before the channel is read from
outside of the request, the other goroutine tried
writing, found nothing, and fell through
the default section. So now the return channel
is waiting forever. One of the most interesting questions
that I always face when
I'm writing code as well, is when and who should close an
open channel. There's one of the things that I never thought, and it always takes
me the hard way, a simple piece of code. Again, it's a for loop.
Inside the for loop there's a value which is returned to a channel which has
been produced. So you can think of it like a producer. And underneath there
is a receiver which basically ends the value, appends it to an array.
Once we are done, we print. When I print, I find that
one is missing. Now, this would happen if
my program exited, obviously this loop. After code C, my programs
exited. Now, because WG weight was a weight
group on these producers. Once producers were
done producing and the channel was closed immediately,
it's possible that the receiver has not yet
performed the operation on receive and the function got exited and
the program got exited. And hence that's the reason why I did not see
a one. So how do I guarantee this, that if ten were
sent, ten were received in this particular things? First of
the implementation, the very first implementation, what we do is we
detach the close responsibility from the sender.
What we do is we basically make the WG wait
in another goroutine, and that is the one which closes.
And the execution of reading through the channel now
is passed through the main routine in
which I range over the channel. And even if it is closed
by the goroutine, there is still an iteration that
will happen which will still be able to read from the channel. This is one
way of implementing it. Another way of implementing it, which is actually
far more prevalent, is when both producers and senders
and receivers are actually two different go routines which need
to communicate. And this is where we introduce the dungeon. So you're
producing and the consumer is receiving. But at
comes point of time, the control flow must not leave
these parent function. And hence we have two
weight constructs. Now we are actually waiting via the wait
group. We close the channel and then we wait on
another channel, which is triggered after
the receivers are done receiving. So these are two
constructs of it. What else can go wrong with the channel?
Let's look at things long running piece of code. It accepts
a messages channel. What we're trying to achieve here is read from
messages, and if any of those reads take more than
a minute, then just return. And a bot, it's a
fairly straightforward code. It compiles. But what's going wrong here,
what's going wrong here is that channel can still decommemary, because inside each
code operations we are creating a new time after.
And this keeps on getting created, created, created.
All we wanted to do was we just wanted to wait for
a minute. A better way to deal with this is to
initialize a time dot after, save an allocation, keep reusing it,
keep looking for t channels output. If int happens after a minute,
it will return otherwise formed. Print ln. Once it's
done with the select, just do a reset. What this does is it doesn't
keep creating those channels and the time dot after in
every single iteration. Now all of these are about when two people
are trying to compete for a resource here,
everything, every such construct was about that. One of the best ways to
resolve a conflict is to avoid int from happening. Now this is where
we introduce the notion of lockless data structures. It's a study
in itself on how lockless data structures are implemented.
Why I stress on lockless data structures is because generalized
lock free algorithms are very hard to implement. Designing lock free
data structures is relatively easier. My slides have a bunch of
these links which are from Microsoft research and other
research papers which talk about building these common data structures which
are actually lockless. And most of them use atomic underneath. Because as
we discussed earlier, atomic is almost the bedrock of all concurrency control.
Many a times. What also happens is that a simple
solution is actually way better than a smart solution. If you look
at this piece of code, there's an input array of integers,
and there's an output array of integers. We need to run over the input array,
perform a go routine, which is something,
it does something, get the output of it and append it to the
output array. The simplest way to do this would be using a lock,
because now we're trying to serialize the access to the result output
array. Or do we really need a lock here?
A simpler way would just be to use an array and
use positional index. Now arrays, as long as they
are positionally accessed, can actually leave and do
not require a need of a lock. We can just simply do run
with an index. Once we get the output, we access the
array via the positional index, and it does
the job so many times we actually may not even need a
very sophisticated concurrency control mechanism because this itself will do our
job. Just to recap, what is the
need for concurrency constructs? Because we want to solve
the unpredictability problem, we want control.
There are two kinds of that. There are control constructs
which are used for control handoff. There are data constructs which are
used for data sharing, and data consistency. Constructs also
need to adhere to language principles. We saw how weight groups,
how counters, or how channels as well can actually misbehave
if you're not careful about how we are applying them. And now
we also probably understand which prone to use when to recap
were to look at the same problem again. Now when
we look at this, we probably are better equipped to handle this problem.
Thank you. I'm CTO and founder at last
nine. We are an observability company and I'm Pushova.