Conf42 Golang 2021 - Online

Concurrency primitives of Golang - how to use and how not to

Video size:

Abstract

Every marginally, sophisticated Golang code that I write needs refactoring to solve race and deadlocks. The concurrency primitives like atomic, sync, channels, and waitGroups have their uses and fail. The lack of understanding ends up with a lot of rewrites. This talk aims to solve that.

I have seen more Go code shipped as glue programs and binaries than actual web servers, usually intended to do the heavy-lifting of data transfers shortly and reliably. With such clinical needs comes a need to handle concurrency and parallelism. And each time more than one worker is doing a job, there will be contention and starvation. A lack of useful design principle guides ends up in Engineers writing non-performant, race-condition prone codes or ending up refactoring.

To provide an example, some of the things I wish I had known: - the real difference of when to use a Channel, a waitGroup, an atomic, or a mutex - one shouldn’t be using a wg.Add() inside a goroutine. - How does one solve the problem of multiple listeners waiting on an event? - How atomic Int operations can synchronize multiple workers signaling completion of work.

The talk is not limited to the above scenarios. It intends to cover a few of these constructs to reduce the refactoring that Golang engineers have to go through while also building a better understanding of distributed systems’ design patterns.

Summary

  • Concurrency primitives of Golang, when to and when not to use some of them. What exactly is a race? A data race? There are four aspects of concurrency control which are happening here.
  • The next one is sync mutex. Used for serialization by using mutexes also locks. Even synchronization primitives need to respect the principles of the language. While resolving a consensus problem, the marshals themselves are not immune to the consensus problem.
  • Underneath all these mutexes weight groups is the third Marshall, which is the bedrock of all synchronization. atomic has very limited applicability as such directly. It is not used for data sharing because it doesn't carry it from data like channels do. Using atomics you can create higher order primitives.
  • Channels are used for everything, which is other than that. Control, handoff, serialization, data sharing. It's surprising how much you can actually start doing with channels. I'm going to cover a lot of these fun mistakes.
  • Lockless data structures are very hard to implement. A simple solution is actually way better than a smart solution. Designing lock free data structures is relatively easier.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Concurrency primitives of Golang, when to and when not to use some of them. Over the years I've seen a lot of code being refactored every time there's a marginal change in complexity because of some concurrency primitive, and I really wish I knew some of these things which I probably will talk about. And that could actually save a lot of channels and a lot of toil just having to go about refactoring this while speaking with concurrency. One of the important things to discuss is what exactly is a race? A data race? To be fairly honest, the most common definition that comes to the mind is reading and writing to a same memory location. But before we get to that, there's an important concept that needs to be covered, which is memory ordering. I'll probably show a small snippet of a code and I want you to guess what the output would be. Here's a simple piece of code. There's a declaration of VAR a and b which is type int. There's a function f assigns a equal to one, b equal to two. There's a function g which does print b and print a. You can ignore the fact that there will be a data race in here, because the interesting bit is it is possible that g prints two first and then zero. Pay slight attention to this. What is the possibility when it prints two and zero when print b observed the value of b as two, whereas the print a which happens after print b did not yet observe the value of a equal to one which was executed before b equal to two. Why does this happen? It is because compilers at the time of compilation or at runtime may do perform instruction ordering change. So it's not guaranteed that what you see as a equal to one and b equal to two outside of the block outside of these scope of that function would be observed in the same manner and things is a memory ordering problem. And this results int everything else where we need to now start serializing, because what we're trying to do is we are solving this unpredictability. Here's a sample piece of code. It's a pseudocode to comes it may look like Python, to others it may look like go. We have a bunch of URLs. There is an array of responses where we will gather all the responses that we get by fetching the URLs. There's a small counter of total bytes received. Simply put, for I in URLs, fetch the URL using a pool. So we're trying to restrict the number of parallel connections that can be made outbound. Once we receive the data. We want to write it to a response array and also update total bytes received. The control must block until all the URLs are fetched and return the response altogether. The total bytes received are also being sent by telemetry, but that is not relevant. There are four aspects of concurrency control which are happening here. First is the handoff. The handoff is once these parallelism is done, or I'm liberally using the word parallel here because that's confusingly used, parallel and concurrent. Once that is done, the control must pass on to the main routine, and that's a handoff. So somebody has to wait to see if all the workers are done working on what they have done, and only then it must proceed forward. There's a concept of data sharing here. Each of the fetch URLs we want to do in a non blocking manner, which should be able to send data to be returned to a response array. There's a serialization happening as well because we do not want all the URLs to be executed together. There is a control mechanism which will allow only certain number of URLs to be hit, and that's these property of a pool. Then there's also a data consistency problem here because we want to channel the total bytes received, but because there are multiple threads that may be performing this, multiple go routines that may be performing this, multiple processes that may be performing this. Based on the language implementation, the value of total bytes received must increase collectively. That's a data concurrency part as well. So to highlight these again, what are the four challenges? A control handoff which is passing on the work, the control from a set of workers routines to the main serialization. We want the ability to control what gets executed at what point of time and what gets to wait, and a control to block unblock data sharing work while may be happening in parallel or in non blocking asynchronous manner. The data must be receivable by some other process as well, and data consistency at all points of time. If we have agreed to increment the integer, then everybody should see a fresh value of an integer which is incremented. This is also a derivative of saying that while multiple processes are coordinating with each other, they will usually have a consensus problem. To solve a consensus problem, the control needs to be offloaded to somebody outside the system. And this is exactly why we need synchronization primitives. I'm going to call it a marshall. Not to be confused with Marshall Json. Like a game marshall. What kind of marshals exist to achieve this synchronization? There is a sync weight group. There's a sync mutex, anatomic channels and lockless data structures. I'll probably not go into the depths of the definition of defining what these are, because that is almost pretty much available. I want to narrow this down to what semantics can be used for what aspect of the problem statement which we discussed just a few slides back over here. Control handoff, serialization, data sharing, and data consistency. The first one is a sync weight group. Fairly straightforward. You must have seen this all over Golang, blogs, et cetera. Pretty straightforward to use, but what is it used for? It's precisely only and only used for control handoff. What is it not used for is it's not used for serialization. What I mean by that is that using weight groups you are more often than not only and only just waiting for control. As the name suggests, it's a wait group to reach the main routine while some other parallel routines are doing their job. It's not used for data sharing. There is no way you can share data, and it's not used for data consistency either. Using things definition I'm going to through the slides, there'll always be a fun section where we talk about some fun mistakes. These fun mistakes are usually things that we would miss out on, and it gives us an insight on how these things work at the depth. Now here's a simple program. What is wrong here? To walk you through the code, there's a funk main which declares a weight group. There is a for loop zero to 10 weight group add simple construct. There's a go routine being called where we call hello inside hello, we do a print and then we just wait. Classic example of control handoff where we are waiting. What could possibly go wrong here? What would go wrong here is that when you run this code, it runs int this problem of all goroutine are asleep. It runs into a deadlock problem. But weight group is supposed to solve a concurrency problem, not create another problem. And the reason for that is that even concurrency semantics need to respect the principles of the language. Now what is the principle of the language? The principle of the language is I'm passing weight group by value and not as a reference. So the goroutine see a copy of that synchronization primitive and they call a done on that and hence it runs into a deadlocks. So even the synchronization primitives need to cater to the principle of the language. The next one is sync mutex. What is the sync mutex used for? It's used for serialization by using mutexes also locks, we have the ability of controlling which routine should work, which routine should not work. When I say which, you're not identifying a routine in particular, but you have the ability to rewrites code. These only one can get executed while the other is waiting for a read or a write operation to happen. So things ability to serialize operations is what you get because that's what locks do effectively. And it also allows data consistency, because while inside a lock you can increment a value, so the other reader cannot read the value. But what it doesn't do, it doesn't do a control handoff because it's used as a communication exchange control mechanism between two processes, and usually not used for a handoff, which is where a weight group comes into the picture. And it's also not used for data sharing because there is no construct of it carrying its own data. What are the fun mistakes around mutexes int? A simple counter which has an embedded mutex inside it, a pretty well known construct in Golang int also has an integer n. There are two methods. There's an add method which acquires a lock and increments n by the integer pass. There's another value method which also rewrites a lock and returns the c n. So effectively there should not be any race condition here, because we are judiciously using unlocks. Now what can go wrong here? If I run this code, when I run this code, I surprisingly actually run into a race condition. Now why is that? Because we just discussed that even synchronization primitives need to respect the principles of the language. So what's happening here? What's happening here is counter is not a method which has been bound. Value is not a method which has been bound. Cto the pointer reference. What that means is that when value is invoked, the state of the counter is copied and so is the sync mutex. And while it is getting copied, it tries to read n because somebody else had still acquired a lock on that n. Effectively, it doesn't even matter what you have inside the body of the code just by accessing value, because the process of calling value invokes a copy to happen, and by the virtue of that a read to happen on n, and hence it results in a race condition. So even if I was to replace that entire body of the code with a return one, this would be the exact same race condition. How do we solve this? We just make sure that we actually use a pass by reference here. So the counter should be bound, the pointer should be binding, the value method, not the other way around. It's a fair conclusion to say that while resolving a consensus problem, the marshals themselves are not immune to the consensus problem. They themselves have to adhere. CTo all of these principles, underneath all these mutexes weight groups is the third Marshall, which is the bedrock of all synchronization, which is these sync atomic. Now surprisingly, atomic has very limited applicability as such directly. And most of the times we end up using the constructs on top of this. But it is the underlying construct which is used behind most of these other primitives, which are higher order. Now, what is an atomic use for data consistency? All int, all through, nothing else. There is a value. I want to atomically change that value, update an integer and do nothing else. It is not uses for control handoff. It cannot be used for serialization. I mean there can be derivatives of it, but by direct virtue of its own properties, it doesn't do this by primitives. Weight group is a derivative of this. So those allow because using atomics you can create such higher other primitives as well. Like we'll discuss later, lock free data structure. And it doesn't do data sharing. It's not used for data sharing because it doesn't carry it from data like channels do. Now here's an applicability of atomically performing a load and store. There's a simple map type map string string. There's an atomic value inside m store. We basically save a map. Then there's a read function and there's an insert function. Read function just atomically does a m load and converts the value of type to map. Insert, however, tries to update this value so it performs can m dot store. Now just int this simple piece of code while, and this is a piece of code which has been copied from the Golang documentation. So what can possibly go wrong here? What can possibly go wrong these is that if multiple writers try to invoke the insert method, the construct is not immune to it. Now what we do is we are doing a mix and match. So while we are discussing atomics, but we have started using other constructs to solve our problems. So we introduced a mutex here. And on the insert we just guarantee serialization by performing a mutex lock and then we just defer it. What this does is that if multiple routines are trying cto invoke the insert method only prone of them runs at a time because mutexes are great at serialization. We just agreed that and now we will just perform this mutex block. Next up, what would you expect the output to be here there's a funk a. There's a main method. Inside the main method there is xy two integers. There is a print ln, which is basically calling the a function. Inside the function we pass the pointer of the integer inside the body of an a. It accepts an integer and it returns a channel. The channel, just before it returns, it spawns a go routine and it does can atomic increment of an integer value. There are two operations of this. In one variation we simply call log print ln and we pass the two fun we call the two functions and we read the channel value from it. In another variation we store it into two variables, ab and bv. There are two functions, a and a called again, and the second line of it will read from that. They don't look entirely different if you think about it, because what we have just done is that in the second variation we have taken the initialization one step ahead. So it's two stage. Now it's not just one stage. What would you expect the output to be? It's interesting that when I run this code the first execution gives me a two and a four, whereas the second execution gives me a four and a four. Now which is understandable, the four and four part. Why? Because it's the same atomic operation. And remember, atomic operations were about data consistency. So if it's the same integer which has been incremented twice and read on the channel it will obviously be four. But why did the two happen? To answer this, let's add some log statement. Now what we do is just after the atomic ad I've added a log print ln and there's a time sleep of 2 seconds. So what that means is that before the value is returned it's actually going to sleep for 2 seconds. Now when we run this, the first execution prints one prone and two four. The second also prints one, one and four four. But interestingly, if you observe the timestamp, the first int happens at the 16th second and then it waits for 2 seconds and then int prints one and waits another 2 seconds and returns two and four. Whereas the second implementation on the 20th 2nd, both of them print one and then after 2 seconds because both waited for 2 seconds return a four and four. So effectively what's happening is the arguments inside the function are executed serially. There's a classic mistake that we do when we're dealing with this. A simple way to solve is use the lower construct, which is actually concurrent and not the first one because that would happen one after the other. Now this can result in a situation because it is possible that you may get blocked on the other value because the one hasn't worked. Clearly evident in this code here because it happened as two and these, the other value came as four. We also discussed that atomics and mutexes can both actually be uses for data consistency. So are they interchangeable? Is a common question. Atomic, as we said, are the bedrock of int using that others are done now? Yes and no, because there's only one case where you can actually interchangeably call them as performing the same role, which is assume there was an integer operation where we had to increment something. Now I may use atomic add int, or I may actually take a lock and do an unlock and apply it. Int will give me the same behavior at the end of the day. And this is the only time where we say that oh yeah, there may be overlapping concerns, but interestingly atomics are actually way faster, so you may be able to do more number of atomic operations compared with the locks because atomics are sent directly to the CPU and these are no language. VM ordered locks which are being performed. So you would actually see a drastic performance gain. But all over Golang we've always read and seen please do not communicate by sharing memory. Instead share memory by communicating, which takes us to the channels. Now what are channels used for like atomics are only and only used for primarily data consistency. Channels are used for everything, which is other than that, which is control, handoff, serialization, data sharing. It's surprising how much you can actually start doing with channels. I mean it's only not used for data consistency now because channels are so widely used. I'm going to cover a lot of these fun mistakes. Fun for me, maybe not fun for your others. Look at things. Piece of code. What could be going wrong here? There's a funk request which returns an InT. There's a channel, there is an I, it loops over zero to five increments at plus plus, and it writes the value to a channel. What could go prone? These. What can go prone here is that those goroutine are waiting to write on a channel. Now if the receiver or the reader of the request only read it once, there are multiple invocations of those functions which are just blocked forever because they are just waiting for the channel to clear up to rewrites. To solve for this, we have constructs. What we do is we wrap it inside a select clause which tries to write to a channel, and if it doesn't work it falls to the default, which is an empty. Surprisingly, this is also not the correct way, because this can also run into a unique condition, which is the last line can block forever now, because it's possible that even before the channel is read from outside of the request, the other goroutine tried writing, found nothing, and fell through the default section. So now the return channel is waiting forever. One of the most interesting questions that I always face when I'm writing code as well, is when and who should close an open channel. There's one of the things that I never thought, and it always takes me the hard way, a simple piece of code. Again, it's a for loop. Inside the for loop there's a value which is returned to a channel which has been produced. So you can think of it like a producer. And underneath there is a receiver which basically ends the value, appends it to an array. Once we are done, we print. When I print, I find that one is missing. Now, this would happen if my program exited, obviously this loop. After code C, my programs exited. Now, because WG weight was a weight group on these producers. Once producers were done producing and the channel was closed immediately, it's possible that the receiver has not yet performed the operation on receive and the function got exited and the program got exited. And hence that's the reason why I did not see a one. So how do I guarantee this, that if ten were sent, ten were received in this particular things? First of the implementation, the very first implementation, what we do is we detach the close responsibility from the sender. What we do is we basically make the WG wait in another goroutine, and that is the one which closes. And the execution of reading through the channel now is passed through the main routine in which I range over the channel. And even if it is closed by the goroutine, there is still an iteration that will happen which will still be able to read from the channel. This is one way of implementing it. Another way of implementing it, which is actually far more prevalent, is when both producers and senders and receivers are actually two different go routines which need to communicate. And this is where we introduce the dungeon. So you're producing and the consumer is receiving. But at comes point of time, the control flow must not leave these parent function. And hence we have two weight constructs. Now we are actually waiting via the wait group. We close the channel and then we wait on another channel, which is triggered after the receivers are done receiving. So these are two constructs of it. What else can go wrong with the channel? Let's look at things long running piece of code. It accepts a messages channel. What we're trying to achieve here is read from messages, and if any of those reads take more than a minute, then just return. And a bot, it's a fairly straightforward code. It compiles. But what's going wrong here, what's going wrong here is that channel can still decommemary, because inside each code operations we are creating a new time after. And this keeps on getting created, created, created. All we wanted to do was we just wanted to wait for a minute. A better way to deal with this is to initialize a time dot after, save an allocation, keep reusing it, keep looking for t channels output. If int happens after a minute, it will return otherwise formed. Print ln. Once it's done with the select, just do a reset. What this does is it doesn't keep creating those channels and the time dot after in every single iteration. Now all of these are about when two people are trying to compete for a resource here, everything, every such construct was about that. One of the best ways to resolve a conflict is to avoid int from happening. Now this is where we introduce the notion of lockless data structures. It's a study in itself on how lockless data structures are implemented. Why I stress on lockless data structures is because generalized lock free algorithms are very hard to implement. Designing lock free data structures is relatively easier. My slides have a bunch of these links which are from Microsoft research and other research papers which talk about building these common data structures which are actually lockless. And most of them use atomic underneath. Because as we discussed earlier, atomic is almost the bedrock of all concurrency control. Many a times. What also happens is that a simple solution is actually way better than a smart solution. If you look at this piece of code, there's an input array of integers, and there's an output array of integers. We need to run over the input array, perform a go routine, which is something, it does something, get the output of it and append it to the output array. The simplest way to do this would be using a lock, because now we're trying to serialize the access to the result output array. Or do we really need a lock here? A simpler way would just be to use an array and use positional index. Now arrays, as long as they are positionally accessed, can actually leave and do not require a need of a lock. We can just simply do run with an index. Once we get the output, we access the array via the positional index, and it does the job so many times we actually may not even need a very sophisticated concurrency control mechanism because this itself will do our job. Just to recap, what is the need for concurrency constructs? Because we want to solve the unpredictability problem, we want control. There are two kinds of that. There are control constructs which are used for control handoff. There are data constructs which are used for data sharing, and data consistency. Constructs also need to adhere to language principles. We saw how weight groups, how counters, or how channels as well can actually misbehave if you're not careful about how we are applying them. And now we also probably understand which prone to use when to recap were to look at the same problem again. Now when we look at this, we probably are better equipped to handle this problem. Thank you. I'm CTO and founder at last nine. We are an observability company and I'm Pushova.
...

Piyush Verma

CTO @ last9.io

Piyush Verma's LinkedIn account Piyush Verma's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)