Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, my name is Dmitry Karolev and today I will tell you
about popular mistakes in Golang and show you the reasons why
they occur and help you to understand how to avoid them.
We will start with one of the basic concepts, arrays and
slices. An array is a
sequence of elements of a certain type and fixed lengths.
An array is an immutable data structure and its capacity is
always equal to its length. Slices in turn are a
superstructure on top of arrays with the ability to change the length.
To better understand the principles of how slices work, you need to know
what the slice structure itself looks like.
In the structure we see the fields dedicated to the length and
capacity of the slice, so they are both obtained
in O one, as well as a pointer to the array
on which the slice is built. There are two things to remember about
length and capacity. When creating a new slice, its length
equals to its capacity, unless you specify a different value
with the make function. And the second thing is the rate
of growth of slice slice capacity. Since in Golang all
arguments are passed to functions by value. When passing a slice, the value
of the slice structure itself, which is now visible on the slide,
is passed as an argument. In other words, only the reference
to the array on which the slice is built is copied and
not the array data itself. So you
might end up with unexpected results if you are not aware
that only the reference is copied, not the following.
Consider this example. Here we have a slice
consisting of ten declared in the main.
We print this slice and as expected see zero.
After that we call the changeslice values function where we write
one to the zero index of the slice. Further, we print
the slice in the main again and as expected see
one. So far nothing unexpected has happened.
The value of the first element of the slice changed as we wanted.
Now lets play a little with what happens
in the changesize values function. As before, we will insert a one
into the zero index, then append a two and
finally write a three into the zero index
again. And now the things that are happening may
seem more unexpected for some reason. The second print
in the main displays the same value that it displayed in the previous
example. That is, one unit the size of
the slice did not change despite the presence of a pen,
and the second write to the zero element of the slice did not occur.
In fact, everything becomes quite simple if you remember the
information about slices that we discussed above. At the very
beginning we set a slice with length equals to capacity equals
to one, and when change slice values, function is called the value
of the slice structure itself itself is passed as an argument and
the slice inside the function points to the same underlying
array as the slice outside. For this reason, the first
write to zero index is applied on the original array,
which was created when the slice was initialized
in main. Next we do append since the slice inside the
function has its length equals to its capacity, Golang runtime
reallocates memory for a new array, and the slice inside
the function begins to point to it. This has no
effect on the original slice in the main function. The next
write already occurs in the new array pointed to by
the slice inside the function, which again does not affect the original
data. You can also encounter the same problem
when trying to copy a slice. In this example,
data from the structure of the original slice was copied into the
new slice variable, including a pointer to an array
with data.
Thus, when executing a pen, we erase the data in the original
array. Go has a special built in copy function
that allows you to safely copy any slices on the
slide. We can see that by using copy
we transferred the elements from the original slice to the new one,
and now we can safely append the elements to the new slice without
fear of raising the data. However, there are
not all the problems you may encounter when working with slices.
Lets look at one more example.
Suppose we aim to parse a news portal and
for each new article store the first 100 characters
of the news content in a memory cache. This would allow us to
provide users with a preview of article. In this
example, we basically see the logic described in a continuous
loop. We fetch new articles, extract the first 100
runes from each and pass them to a specific
function. Store article Preview this function
is responsible for storing these previews in the in memory cache.
However, the problem is that when we launch our service,
to our surprise, it will eat up much more
ram than we planned, all because we have allowed
for a memory leak. Here. The operation of obtaining the first 100
runes from a use creates a slice of 100 elements long.
However, its capacity remains the same as that of the original
size. The entire array with the news text remains in memory,
even if ultimately only a link to the first 100 of
its elements is stored. By the way, why use runes
instead of directly slicing the first 100 elements from
the string? Why the need to convert to a rune array?
Lets examine a few examples and compare slicing
on a rune array versus directly on the string.
To understand the differences here we take
the standard line hello world, make a separate
variable with the runes of this line and print slices from
from the first five characters. According to
the idea, it should be the word hello in three forms,
as runes, as runes, convert back to a string,
and as a direct string slice. In the output,
nothing unusual appears. As we expected, hello is represented
in both runes and bytes. Now let's
try to say hello in Chinese and print the same thing.
As planned, the first two hieroglyphs should be displayed,
meaning hello. But something goes wrong in
the option with regular line slicing, the hieroglyphs are
not displayed. Basically because strings in go are made
up of UTF eight characters. These characters can
be more than one byte long. Slicing a string means you
are working with bytes, not the characters themselves.
So when we tried to get the first two charities of the string, we actually
just got the first two bytes. In general,
working with strings may bring a few surprises, since the work of string
slicing and the LAN method work for it with bytes,
and the for loop on a string will use the byte index s index
I, but the variable C will contain a rune,
which starts at this index.
Therefore, it is often much easier to convert a string to
a slice of runes and work with it. But dont forget about the
overhead, which in this case we generally get.
For each line there will be two variables, one of which
stores the original line and the second stores an array
of runes. If there are a lot of lines and the long,
this may matter. Next we will
talk about channels. Channels are a synchronization primitive that
provides the ability for one coroutine to send data to another coroutine
and provide secure concurrent access to shared data.
When working with channels, two questions always who should close
them, and whether this should be done at all. To answer
them correctly, you need to know what can happen when working with a closed
channel. There is a wonderful tablet that describes what
we get when performing various operations on the channel in different
states. Let's pay attention to the operations on the channel in
the closets state reading from a closed channel works fine,
but trying to write one or closing it again causes
a panic. This leads to a clear guideline. The goroutine
responsible for writing should be the one to close the channel.
This way we minimize the risk of attempting to write to
close the channel, which would lead to panic. Now let's
try to answer the question, why close the channel? To do this,
let's turn to the documentation and see needs a line.
A sender can close a channel to indicate that no more values
will be sent. If the sender closes the channel, it means that
someone other than him may need it. For example, a channel reader.
Let's look at an example where
this might be useful. Here we can see the function
write to chain in which writing is done to a channel.
Surprise, surprise. Then, in the main part of
the code, there is a loop that reads values from this channel. If you forget
to close the channel, the loop will keep running forever, causing a
deadlock. It is worth remembering that you should close the channel only
in situations where the reader must somehow react to it.
There is nothing wrong with leaving the channel unclosed. The garbage collector
will be able to get rid of it in this state.
Since we have discussed working with channels, it is also worth discussing
the traps that structures using channels have prepared for us.
One such structure is time after. It can
lead to some unexpected situations. Time after in Go is
a function that returns a channel that will close after a specified
time delay. It is commonly used to create timers or set
timeouts for certain logic to be executed in programs.
Imagine a basic scenario where we receive
events from a channel. If we don't receive any
events from this channel within 15 minutes,
we print a warning saying that we haven't received any
events for a while. While this code may seem fine and
run without issues, if we are monitoring memory consumption and
there is a large number of events, we might detect a
memory leak with an average flow of a million events.
In 15 minutes, the leak will be about 200 megabytes.
Considering that a single channel in go consumes
around 200 bytes, simple calculations show
that a new channel is created for each event.
You might wonder how this is possible given that after
each select statement, time after should go out of scope and
be cleaned up by the garbage collector. As we discussed
earlier, however, go, while logically structured,
still still hold surprises. With closer inspection of the documentation,
you will find lines that shed light on this behavior.
The underlying timer is not recovered by the garbage collector
until the timer fires. If its efficiency is a concern,
use new timer instead and call timer stop if the timer
is no longer needed. Thus, the channel that we create with
time after will remain hanging in memory as dead weight
for the time we set. That is, for 15 million.
Next, we will talk about goutines. A gorting is
a lightweight thread of execution in user space. While operating
system threads are in kernel space, the fact that they run in user
space means that they are controlled by goroutines
are designed to be more efficient than traditional operating
system threads. There is a common trap that is very easy to
fall for if you are not aware of it. Also, it is not
directly related to gortings. It is most often encountered when
creating gortings in the loop. Lets look at an example.
In this scenario we generate a slice of numbers from one to
five and within a loop we create gurutins.
Each coroutine adds its corresponding number from the size
to a sum variable. You might expect the
output to display the number 15 which is the sum of numbers
from one to. However the actual output will be
different. The discrepancy arises because of closures
which are functions that capture variables from their surroundings.
The peculiarity of their work is in how the captured variable is
used. Gootteans do not capture the values of variables
at the time they are created, they capture a reference to the
variable. Therefore, when the coroutine starts executing, the loop has
often already passed and the value variable has the
last value from the slice through which we are iterating.
Also, there is no guarantee that the loop will end before one of
coroutines starts working. It leads to the fact
that the value in the sum variable is not 15 this
is such a common problem that go maintainers decided to change the semantics
of for loop variables to prevent them from being unintentionally
used in closures and and coroutines at every iteration. In version
1.21 a corresponding experiment
appeared, and since version 1.22 this
problem has completely stopped reproducing. But since version
1.22 is fresh, and probably not everyone
has managed to update it, take note of this feature of how thezures
work. Next we will talk about
sync and atomic pair packages.
In the previous example, we used sync weight group to wait for
the coroutines to execute. And by the way, we did
it wrong. Admit it. Who didn't notice it is
worth paying attention to where we do weight group add and think
about what their risks are. Let's figure
out look at the weight group struct in the weight group
structure. Interesting things. There are semaphore and
there's certain no copy. First, let's talk about the semaphore.
Or more precisely, about the fact that essentially weight group
is a simple wrapper over a semaphore with three methods.
Add increases the semaphore value, past value done
decreases the semaphore value by one, and wait blocks
execution until the semaphore value becomes zero.
So the problem is, in the Gurtins we launched,
there is no guarantee that the Gurtins start before wait
is called. This means wait might finish before ad
runs. And since gortings can launch in any
order, we might wrongly assume that they are all done when
some havent even started. Now lets talk about
the no copy field in the weight group structure.
Its a type that suggests it cant be copied.
We will find the similar field in many structures of the sync
package. Lets see what happens if we do a copy of it.
In this program we have a counter structure that stores the map as
well as a mutex which is supposed to protect the map from parallel writing.
The mutex, just like Waitroof, has no copy.
There are two methods defined on the counter structure. One increases
the value of a specific key by one,
and the other increases the value immediately by
passed value. Finally there is a main
in which we initialize the counter structure
and launch two gooutines to increase the value of the same key,
make a slip to wait for the Gurtin
to complete and print the values that will end
up in the map of the counter. But unfortunately we
will never see the print because we will for in a panic.
The problem with this code is that whenever increment is
called, our counterc is copied into it,
since increment is defined to be of the type
counter, not pointer counter. In other words,
it is a value receiver, not a pointer receiver.
Therefore the increment cannot change the original counter variables
that we created in the main. Thus with each call to
increment, the counter was copied with all its contents, including mutex.
Now remember that a mutex is essentially just a wrapper over
a semaphore, and when we copy it, we also copy the
semaphore. In this case, the copy and the original can live their
own separate lives and nothing will prevent them from competing
for operations with the same memory block.
Therefore copying a mutex is incorrect.
So thanks to zwery nocopier it is possible to
mark any structure structure as impossible to copy.
Main structures from the sync package are marked as such.
Then using the govt command you can detect places where the marked
structure is copied and finally find
a potential problem in your application code. Now let's
move on to another common synchronization primitive atomics.
They provide secure access to shared memory for reading, writing and modifying
variables. In addition, atomic operations are generally faster than
mutex operations due to use of a specific set of processor instructions.
However, with this advantage comes also a disadvantage that
is periodically forgotten. Operations with
atomics are atomic individually, but not atomic collectively.
Lets look at an example.
In this program, a Gaussian continuously increases a variable num by one
in an endless loop. Meanwhile in the main function there is another
infinite loop that checks if num is even, and if so
it prints it on the screen however, we see that
the number 287 is displayed and surprisingly
its old. This occurs because after Num passes the
parity check, its value isnt protected from further changes.
Consequently, the gorting incrementing num can alter its
value before it's printed to the screen.
And next we will talk about another cool go
conception called defer. Defer allows you to defer execution
of a block of codes until the end of the function in which it was
called. It is typically used to ensure that resources
such as closing a file or unlocking a mutex are
freed, regardless of whether the function exists due to a
normal return, panic or error.
Consider an example. Here we see the profile
structure and several possible types for it, as well as the getbalance
method, in which, depending on the profile type, one or another balance
calculation method is selected. Let's say now we want to
add a log with the final balance obtained during the calculation.
As a result of such log, we will always see the entry profile
balance zero. Why is this so?
Let's take a closer look at what is written about defer in
the language documentation. The arguments to the
deferent function, which includes the receiver if the function is a method evaluated
when the defer executes, not when the call executes.
In our example, at the time of execution of deferred, the default value
in the balance variable is zero. This is the value with which our parentheses
executed in order to achieve the result we wanted to get.
That is, for the final calculation amount to appear in the print,
we can use concepts that we already met. Closures an
anonymous function does not accept any arguments.
The balance variable is defined within the body of the of this
function. As we discussed earlier, a reference to this variable will
be stored and the actual value will be retrieved when the anonymous function function
as executed using the stored reference.
Now it's time to talk about interfaces.
Interfaces in Go provide code flexibility by allowing you
to write generic functions that can work with different data types
that implement the same interface. However, not everything
is smooth with them. Let's look at an example.
Here we see the requester interface and the
concrete requester type, which implements the makerequest
methods of the interface. In addition to it, we see
the makerequesterconstructor with a return value of the
interface type, and finally we see main in which the
constructor is called and several prints occur
unexpectedly. When starting, we get the following output
got requester nil and the requester is not nil.
It turns out. Interesting. To figure it out, we need
to take a closer look at the interfaces, or more precisely,
at how they are arranged under the hood.
Under the hood there are two structures for interfaces,
efe face for an empty interface, and iface
for an interface with a defined set of methods
that the type must adhere to. We are interested
in the common fields, namely the data type,
the interface implements, and the reference to the memory location
where its value is stored. For two interface
variables to be considered equal, both of these fields must match.
Now let's see what exactly lies in these fields
for our requester variable.
Yeah, this is where the legs grow. Despite the fact
that the actual value of the variable is nil, the type
is not, which leads to the fact that the requester not equals
to nil. And next we will talk about rendering peculiarities.
Lets assume that you have created a library in Go in
which some network request must be transmitted. Inside this library
you have implemented a certain client that can make requests, receive some
data in response, and transmit it in the form of a structure
described in the models. Lets try to
integrate this library into a service. We added it to
the Gomod file, ran, Gomode Tidy and Gomoot vendor
in the console. However, after inspecting the vendor directory,
we were surprised to find only portion of the librarys,
files and folders present. For those who have not studied
how vendoring works, and in my experience this is more than half of
developers, this will seem strange. Well, for answers
we go to the language documentation and
again everything falls into places. The vendor
receives only those packages that are necessary for the successful
build and testing of the application. That is, if we
initialize a client from a library somewhere in the service in which we connected
this library, the packages required for this will be pulled up.
In itself, this situation may simply seem just like an unexpected
feature of the language, but in fact this is such a subtle hint
that is possible to bring the implementation of the logic of going
to an external service inside the library. This is not
the best idea because in this way we increase the
coherence of the logic as well as we reduce the capabilities of
consumer services in terms of customizing the interaction
of the library with external services.
And that's all. We have taken a close look at several common mistakes
when programming can go and discussed how you can avoid them.
I hope this brief excursion through the complexities of slices,
channels, go routines and other aspects help you strengthen
your knowledge of language and provide valuable insights.
But don't forget that go as a language does not stand still
and and is constantly evolving. So I wish you to
develop with it and hope that you liked this report. Thank you.