Conf42 Golang 2024 - Online

Common Mistakes in Golang and How to Avoid Them

Video size:

Abstract

Go, unlike many other low-level languages, is a fairly developer-friendly language, but even in it there are situations in which it is very easy to make a non-obvious mistake. In this talk we will look at such mistakes and learn how to avoid them.

Summary

  • Dmitry Karolev tells you about popular mistakes in Golang. Slices are a superstructure on top of arrays with the ability to change the length. Go has a special built in copy function that allows you to safely copy any slices. But there are not all the problems you may encounter when working with slices.
  • Next we will talk about channels. Channels are a synchronization primitive that provides the ability for one coroutine to send data to another. When working with channels, two questions always who should close them, and whether this should be done at all.
  • Time after in Go is a function that returns a channel that will close after a specified time delay. It is commonly used to create timers or set timeouts for certain logic to be executed in programs. It can lead to some unexpected situations.
  • A gorting is a lightweight thread of execution in user space. There is a common trap that is very easy to fall for if you are not aware of it. The discrepancy arises because of closures which are functions that capture variables from their surroundings. Since version 1.22 this problem has completely stopped reproducing.
  • Next we will talk about sync and atomic pair packages. In the previous example, we used sync weight group to wait for the coroutines to execute. Now lets talk about the no copy field in the weight group structure. It is possible to mark any structure structure as impossible to copy.
  • atomics provide secure access to shared memory for reading, writing and modifying variables. atomic operations are generally faster than mutex operations due to use of a specific set of processor instructions. However, with this advantage comes also a disadvantage that is periodically forgotten.
  • Defer allows you to defer execution of a block of codes until the end of the function in which it was called. It is typically used to ensure that resources such as closing a file or unlocking a mutex are freed.
  • Interfaces in Go provide code flexibility by allowing you to write generic functions. However, not everything is smooth with them. Next we will talk about rendering peculiarities. We have taken a close look at several common mistakes when programming.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, my name is Dmitry Karolev and today I will tell you about popular mistakes in Golang and show you the reasons why they occur and help you to understand how to avoid them. We will start with one of the basic concepts, arrays and slices. An array is a sequence of elements of a certain type and fixed lengths. An array is an immutable data structure and its capacity is always equal to its length. Slices in turn are a superstructure on top of arrays with the ability to change the length. To better understand the principles of how slices work, you need to know what the slice structure itself looks like. In the structure we see the fields dedicated to the length and capacity of the slice, so they are both obtained in O one, as well as a pointer to the array on which the slice is built. There are two things to remember about length and capacity. When creating a new slice, its length equals to its capacity, unless you specify a different value with the make function. And the second thing is the rate of growth of slice slice capacity. Since in Golang all arguments are passed to functions by value. When passing a slice, the value of the slice structure itself, which is now visible on the slide, is passed as an argument. In other words, only the reference to the array on which the slice is built is copied and not the array data itself. So you might end up with unexpected results if you are not aware that only the reference is copied, not the following. Consider this example. Here we have a slice consisting of ten declared in the main. We print this slice and as expected see zero. After that we call the changeslice values function where we write one to the zero index of the slice. Further, we print the slice in the main again and as expected see one. So far nothing unexpected has happened. The value of the first element of the slice changed as we wanted. Now lets play a little with what happens in the changesize values function. As before, we will insert a one into the zero index, then append a two and finally write a three into the zero index again. And now the things that are happening may seem more unexpected for some reason. The second print in the main displays the same value that it displayed in the previous example. That is, one unit the size of the slice did not change despite the presence of a pen, and the second write to the zero element of the slice did not occur. In fact, everything becomes quite simple if you remember the information about slices that we discussed above. At the very beginning we set a slice with length equals to capacity equals to one, and when change slice values, function is called the value of the slice structure itself itself is passed as an argument and the slice inside the function points to the same underlying array as the slice outside. For this reason, the first write to zero index is applied on the original array, which was created when the slice was initialized in main. Next we do append since the slice inside the function has its length equals to its capacity, Golang runtime reallocates memory for a new array, and the slice inside the function begins to point to it. This has no effect on the original slice in the main function. The next write already occurs in the new array pointed to by the slice inside the function, which again does not affect the original data. You can also encounter the same problem when trying to copy a slice. In this example, data from the structure of the original slice was copied into the new slice variable, including a pointer to an array with data. Thus, when executing a pen, we erase the data in the original array. Go has a special built in copy function that allows you to safely copy any slices on the slide. We can see that by using copy we transferred the elements from the original slice to the new one, and now we can safely append the elements to the new slice without fear of raising the data. However, there are not all the problems you may encounter when working with slices. Lets look at one more example. Suppose we aim to parse a news portal and for each new article store the first 100 characters of the news content in a memory cache. This would allow us to provide users with a preview of article. In this example, we basically see the logic described in a continuous loop. We fetch new articles, extract the first 100 runes from each and pass them to a specific function. Store article Preview this function is responsible for storing these previews in the in memory cache. However, the problem is that when we launch our service, to our surprise, it will eat up much more ram than we planned, all because we have allowed for a memory leak. Here. The operation of obtaining the first 100 runes from a use creates a slice of 100 elements long. However, its capacity remains the same as that of the original size. The entire array with the news text remains in memory, even if ultimately only a link to the first 100 of its elements is stored. By the way, why use runes instead of directly slicing the first 100 elements from the string? Why the need to convert to a rune array? Lets examine a few examples and compare slicing on a rune array versus directly on the string. To understand the differences here we take the standard line hello world, make a separate variable with the runes of this line and print slices from from the first five characters. According to the idea, it should be the word hello in three forms, as runes, as runes, convert back to a string, and as a direct string slice. In the output, nothing unusual appears. As we expected, hello is represented in both runes and bytes. Now let's try to say hello in Chinese and print the same thing. As planned, the first two hieroglyphs should be displayed, meaning hello. But something goes wrong in the option with regular line slicing, the hieroglyphs are not displayed. Basically because strings in go are made up of UTF eight characters. These characters can be more than one byte long. Slicing a string means you are working with bytes, not the characters themselves. So when we tried to get the first two charities of the string, we actually just got the first two bytes. In general, working with strings may bring a few surprises, since the work of string slicing and the LAN method work for it with bytes, and the for loop on a string will use the byte index s index I, but the variable C will contain a rune, which starts at this index. Therefore, it is often much easier to convert a string to a slice of runes and work with it. But dont forget about the overhead, which in this case we generally get. For each line there will be two variables, one of which stores the original line and the second stores an array of runes. If there are a lot of lines and the long, this may matter. Next we will talk about channels. Channels are a synchronization primitive that provides the ability for one coroutine to send data to another coroutine and provide secure concurrent access to shared data. When working with channels, two questions always who should close them, and whether this should be done at all. To answer them correctly, you need to know what can happen when working with a closed channel. There is a wonderful tablet that describes what we get when performing various operations on the channel in different states. Let's pay attention to the operations on the channel in the closets state reading from a closed channel works fine, but trying to write one or closing it again causes a panic. This leads to a clear guideline. The goroutine responsible for writing should be the one to close the channel. This way we minimize the risk of attempting to write to close the channel, which would lead to panic. Now let's try to answer the question, why close the channel? To do this, let's turn to the documentation and see needs a line. A sender can close a channel to indicate that no more values will be sent. If the sender closes the channel, it means that someone other than him may need it. For example, a channel reader. Let's look at an example where this might be useful. Here we can see the function write to chain in which writing is done to a channel. Surprise, surprise. Then, in the main part of the code, there is a loop that reads values from this channel. If you forget to close the channel, the loop will keep running forever, causing a deadlock. It is worth remembering that you should close the channel only in situations where the reader must somehow react to it. There is nothing wrong with leaving the channel unclosed. The garbage collector will be able to get rid of it in this state. Since we have discussed working with channels, it is also worth discussing the traps that structures using channels have prepared for us. One such structure is time after. It can lead to some unexpected situations. Time after in Go is a function that returns a channel that will close after a specified time delay. It is commonly used to create timers or set timeouts for certain logic to be executed in programs. Imagine a basic scenario where we receive events from a channel. If we don't receive any events from this channel within 15 minutes, we print a warning saying that we haven't received any events for a while. While this code may seem fine and run without issues, if we are monitoring memory consumption and there is a large number of events, we might detect a memory leak with an average flow of a million events. In 15 minutes, the leak will be about 200 megabytes. Considering that a single channel in go consumes around 200 bytes, simple calculations show that a new channel is created for each event. You might wonder how this is possible given that after each select statement, time after should go out of scope and be cleaned up by the garbage collector. As we discussed earlier, however, go, while logically structured, still still hold surprises. With closer inspection of the documentation, you will find lines that shed light on this behavior. The underlying timer is not recovered by the garbage collector until the timer fires. If its efficiency is a concern, use new timer instead and call timer stop if the timer is no longer needed. Thus, the channel that we create with time after will remain hanging in memory as dead weight for the time we set. That is, for 15 million. Next, we will talk about goutines. A gorting is a lightweight thread of execution in user space. While operating system threads are in kernel space, the fact that they run in user space means that they are controlled by goroutines are designed to be more efficient than traditional operating system threads. There is a common trap that is very easy to fall for if you are not aware of it. Also, it is not directly related to gortings. It is most often encountered when creating gortings in the loop. Lets look at an example. In this scenario we generate a slice of numbers from one to five and within a loop we create gurutins. Each coroutine adds its corresponding number from the size to a sum variable. You might expect the output to display the number 15 which is the sum of numbers from one to. However the actual output will be different. The discrepancy arises because of closures which are functions that capture variables from their surroundings. The peculiarity of their work is in how the captured variable is used. Gootteans do not capture the values of variables at the time they are created, they capture a reference to the variable. Therefore, when the coroutine starts executing, the loop has often already passed and the value variable has the last value from the slice through which we are iterating. Also, there is no guarantee that the loop will end before one of coroutines starts working. It leads to the fact that the value in the sum variable is not 15 this is such a common problem that go maintainers decided to change the semantics of for loop variables to prevent them from being unintentionally used in closures and and coroutines at every iteration. In version 1.21 a corresponding experiment appeared, and since version 1.22 this problem has completely stopped reproducing. But since version 1.22 is fresh, and probably not everyone has managed to update it, take note of this feature of how thezures work. Next we will talk about sync and atomic pair packages. In the previous example, we used sync weight group to wait for the coroutines to execute. And by the way, we did it wrong. Admit it. Who didn't notice it is worth paying attention to where we do weight group add and think about what their risks are. Let's figure out look at the weight group struct in the weight group structure. Interesting things. There are semaphore and there's certain no copy. First, let's talk about the semaphore. Or more precisely, about the fact that essentially weight group is a simple wrapper over a semaphore with three methods. Add increases the semaphore value, past value done decreases the semaphore value by one, and wait blocks execution until the semaphore value becomes zero. So the problem is, in the Gurtins we launched, there is no guarantee that the Gurtins start before wait is called. This means wait might finish before ad runs. And since gortings can launch in any order, we might wrongly assume that they are all done when some havent even started. Now lets talk about the no copy field in the weight group structure. Its a type that suggests it cant be copied. We will find the similar field in many structures of the sync package. Lets see what happens if we do a copy of it. In this program we have a counter structure that stores the map as well as a mutex which is supposed to protect the map from parallel writing. The mutex, just like Waitroof, has no copy. There are two methods defined on the counter structure. One increases the value of a specific key by one, and the other increases the value immediately by passed value. Finally there is a main in which we initialize the counter structure and launch two gooutines to increase the value of the same key, make a slip to wait for the Gurtin to complete and print the values that will end up in the map of the counter. But unfortunately we will never see the print because we will for in a panic. The problem with this code is that whenever increment is called, our counterc is copied into it, since increment is defined to be of the type counter, not pointer counter. In other words, it is a value receiver, not a pointer receiver. Therefore the increment cannot change the original counter variables that we created in the main. Thus with each call to increment, the counter was copied with all its contents, including mutex. Now remember that a mutex is essentially just a wrapper over a semaphore, and when we copy it, we also copy the semaphore. In this case, the copy and the original can live their own separate lives and nothing will prevent them from competing for operations with the same memory block. Therefore copying a mutex is incorrect. So thanks to zwery nocopier it is possible to mark any structure structure as impossible to copy. Main structures from the sync package are marked as such. Then using the govt command you can detect places where the marked structure is copied and finally find a potential problem in your application code. Now let's move on to another common synchronization primitive atomics. They provide secure access to shared memory for reading, writing and modifying variables. In addition, atomic operations are generally faster than mutex operations due to use of a specific set of processor instructions. However, with this advantage comes also a disadvantage that is periodically forgotten. Operations with atomics are atomic individually, but not atomic collectively. Lets look at an example. In this program, a Gaussian continuously increases a variable num by one in an endless loop. Meanwhile in the main function there is another infinite loop that checks if num is even, and if so it prints it on the screen however, we see that the number 287 is displayed and surprisingly its old. This occurs because after Num passes the parity check, its value isnt protected from further changes. Consequently, the gorting incrementing num can alter its value before it's printed to the screen. And next we will talk about another cool go conception called defer. Defer allows you to defer execution of a block of codes until the end of the function in which it was called. It is typically used to ensure that resources such as closing a file or unlocking a mutex are freed, regardless of whether the function exists due to a normal return, panic or error. Consider an example. Here we see the profile structure and several possible types for it, as well as the getbalance method, in which, depending on the profile type, one or another balance calculation method is selected. Let's say now we want to add a log with the final balance obtained during the calculation. As a result of such log, we will always see the entry profile balance zero. Why is this so? Let's take a closer look at what is written about defer in the language documentation. The arguments to the deferent function, which includes the receiver if the function is a method evaluated when the defer executes, not when the call executes. In our example, at the time of execution of deferred, the default value in the balance variable is zero. This is the value with which our parentheses executed in order to achieve the result we wanted to get. That is, for the final calculation amount to appear in the print, we can use concepts that we already met. Closures an anonymous function does not accept any arguments. The balance variable is defined within the body of the of this function. As we discussed earlier, a reference to this variable will be stored and the actual value will be retrieved when the anonymous function function as executed using the stored reference. Now it's time to talk about interfaces. Interfaces in Go provide code flexibility by allowing you to write generic functions that can work with different data types that implement the same interface. However, not everything is smooth with them. Let's look at an example. Here we see the requester interface and the concrete requester type, which implements the makerequest methods of the interface. In addition to it, we see the makerequesterconstructor with a return value of the interface type, and finally we see main in which the constructor is called and several prints occur unexpectedly. When starting, we get the following output got requester nil and the requester is not nil. It turns out. Interesting. To figure it out, we need to take a closer look at the interfaces, or more precisely, at how they are arranged under the hood. Under the hood there are two structures for interfaces, efe face for an empty interface, and iface for an interface with a defined set of methods that the type must adhere to. We are interested in the common fields, namely the data type, the interface implements, and the reference to the memory location where its value is stored. For two interface variables to be considered equal, both of these fields must match. Now let's see what exactly lies in these fields for our requester variable. Yeah, this is where the legs grow. Despite the fact that the actual value of the variable is nil, the type is not, which leads to the fact that the requester not equals to nil. And next we will talk about rendering peculiarities. Lets assume that you have created a library in Go in which some network request must be transmitted. Inside this library you have implemented a certain client that can make requests, receive some data in response, and transmit it in the form of a structure described in the models. Lets try to integrate this library into a service. We added it to the Gomod file, ran, Gomode Tidy and Gomoot vendor in the console. However, after inspecting the vendor directory, we were surprised to find only portion of the librarys, files and folders present. For those who have not studied how vendoring works, and in my experience this is more than half of developers, this will seem strange. Well, for answers we go to the language documentation and again everything falls into places. The vendor receives only those packages that are necessary for the successful build and testing of the application. That is, if we initialize a client from a library somewhere in the service in which we connected this library, the packages required for this will be pulled up. In itself, this situation may simply seem just like an unexpected feature of the language, but in fact this is such a subtle hint that is possible to bring the implementation of the logic of going to an external service inside the library. This is not the best idea because in this way we increase the coherence of the logic as well as we reduce the capabilities of consumer services in terms of customizing the interaction of the library with external services. And that's all. We have taken a close look at several common mistakes when programming can go and discussed how you can avoid them. I hope this brief excursion through the complexities of slices, channels, go routines and other aspects help you strengthen your knowledge of language and provide valuable insights. But don't forget that go as a language does not stand still and and is constantly evolving. So I wish you to develop with it and hope that you liked this report. Thank you.
...

Dmitry Korolev

Senior Software Engineer @ Avito

Dmitry Korolev's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways