Struct Embedding, Instrumentation and Code Generation

Video size:

Abstract

Reviewing how struct embedding and other techniques can be used to improve code design, performance, and usability in your Go applications.

Summary

Maramos is a communication platform. It writes in the backend in go, we write in the front end in typescript and react. Each superstore have a single responsibility over a certain part of the data. We don't want to share responsibilities in the same code.
Go provides some kind of syntax sugar to make this track embedding more comfortable. It can be really easy to reuse for other things as we're going to see soon what didn't work well. Another problem is the interface has to be homogeneous.
Open tracing means that you have to add a lot of small details here and there in your code. We need to repeat the transactions when we receive a repeatable error. We just generate automatically a layer that catch any error. And how it's really interesting to have code generators.
Everything there implements the store interface, we can just say that they are all stores. All the layer has to share the same interface. Embedding is not inheritance. You need to understand that well and how embedding works to see how it works.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everybody, my name is Jesus Espino. I'm software engineer at Maramos and I'm going to talk about struggle embedding, instrumentation and code generation. Well, what is Maramos? Maramos is a communication platform. We write in the backend in go, we write in the front end in typescript and react. And we are focused on security and performance. We are an open source project and with an open code model we have a self hosted version and we provide features to deploy on the cloud like Kubernetes operators and things like that. And of course we have our own SaaS service. Well what are the main pieces and why I'm talking to you about Maramos? I'm going to talk about maramos because I'm going to explain something that we did here in Maramos. This is the Maramos architecture. We have the client that is a react typescript application that calls to the API and the websockets API. The API and the Websockets API call the app layer. That is where our business logic live. Our app layer is going to leverage a set of services to provide the final functionality. And one of these services, for example, is the file service that allows us to store files in s three or in the local file system, or the email service that allow us to send email notification and all that stuff. The important piece here is the store. The store service is can abstraction that allows us to provide all the storage mechanism related to database access, database storage, database queries, all that stuff is inside the store. And the app layer doesn't know anything about SQL, doesn't know anything about how the data is actually stored. The app layer only knows that the store is going to take care of the entities and store them and is going to return me then whenever I have to use them. Well, what does our store look like? Our store is interface. It's a huge interface that have a lot of superstores. Each superstore have a single responsibility over a certain part of the data. For example, the Tigna store is going to take care about the team model and how it's storing the database, how we query the teams, all that stuff. Then we have the user store that does the same for the users, the bot store for the bots and so on. This is how it looks like in the code. We have the store interface that have a set of methods that return, well, each superstore struct the superstore interface. In this case, the team Superstore interface is going to have a set of methods related to the team model. If we want to implement this interface, we have to implement each of these methods. So our SQL store, the implementation of the store that we have is going to implement each of this method accessing to the database using SQL. But we will able to build a completely different store using MongoDB or using any other database. What's the problem that we are trying to solve? We want to add caches to our system, to our store, but we don't want to share responsibilities in the same code. We want to have a very code generation of concerns and we decide that we want to build something that is completely separated. We don't want to see checking cache invalidation cache insertions in our SQL related code. So we want to have our SQL code that is going to generate the queries, it's going to query the database and then we want to have in another place the cache logic where you insert things in the cache, you retrieve things from the cache, you invalidate the cache, all that logic should be separated. Well, our initial approach is to use a well known pattern, that is the middleware pattern. We create a new set of interfaces and strikes to provide this pattern and well, the result wasn't easy to understand. This is how it looks like. We had the SQL store that implements the store interface that we already saw. The SQL store also have to implement this new layered store supplier. This layered store supplier is going to have this set chain next that is going to set the next element in the chain of this chain of middlewares and the next is going to provide the next middleware that is responsible for the rest of the logic. The cache layer is going to implement this layer, the store supplier only. And then we have this layer, the store that is another extract that is going to provide, well, it's going to have the database store database, the SQL layer, the SQL store, and it's going to have also the cache layer and any other layer that you want to add is going to go here and then we are going to have this set of superstore that are overriding or not that are going to be delegated to the layer of the store or are going to be delegated directly to the SQL store. This approach work well, but it's not easy to understand and it's not easy to think about or reason about. Well, this is the code, this is how it looks like in the code. I don't want to explain much that it's more or less the same that we saw in the previous slide. What went well and what didn't work, what went well is the middleware pattern. It's something that is well known, something that is kind of easy to think about it, because conceptually you already know the concept of middleware, you know how it's expected to work. So from the concept perspective, was really easy to understand. Also, we had the opportunity to provide extra information without affecting the layers beneath. So for example, we were able to add this hint to the cache layer, allowing to add certain extra information, certain context from the app layer, to decide if we want to cache something or not, or if we want to invalidate the cache or not. This is because from the app layer we have way more context and we have the big picture of what we want to do with the data from the store. We only know that we are adding a new team, we are removing a team, we are adding people to certain teams or something like that. But we don't have the big picture. We don't know why we are adding that and we don't know if we, right before that we added other thing and we don't need to cache anything or we don't need to invalidate the cache or whatever. So that was an interesting thing to have, but we weren't using it so well, was a great feature that we were taking advantage of that what didn't work well was a bit hard to understand and follow all the code there, and at the same time was a bit hard to add new caches because you have to modify different places. We have to modify the cache layer, we have to modify the SQL layer, and we have to modify other parts. Like the layer store was complicated to add things there and there was a lot of code in a lot of different places and was really error prone and wasn't the best approach in terms of maintenance. Well, our current approach, what we use for our current approach, stroke embedding, instead of creating all this middleware logic where you have all these layers and all that stuff, we take advantage of the stroke embedding feature of go to create these layers, just store. The concept of a store is going to be embedded in another store. So we can create a layer that embeds the store and automatically is going to be a store because it's embedding a store. So this feature is a great feature of the language. And you can build this kind of layers really easy. Well, we rely on the existing interface. We relay on this store interface. We remove the layer, the store, the layered suppliers, all that stuff is gone. And we relay only on the existing store interface. We created this cache store, local cache, store that embeds the other store, the SQL store in it, and we override the methods that we need and everything else is transparent. Well this is how it looks like way simpler, right? The SQL store is going to implement the store. We are going to write all the methods, we are going to write all the SQL code. We are going to write a lot of stuff for the SQL store that is going to be needed anyway. But for the cache layer, we are going to embed the store. So automatically the cache layer without any method in it is going to implement the store. We only need to override the places where the cache need to take some action. For example, whenever I add a new post I going to cache certain information. Or whenever I add a new user I'm going to cache certain information. Whenever I get some user I'm going to cache that information. Whenever I modify the user I have to validate that cache. So we only need to find that places where we need to modify and update our cache and override them. And that is what we do. We create this local cache store that embeds a storetore. Any store can be embedded, but let's think about it as the SQL store. So we have the local cache store and we embed the SQL store in it. Then we have a set of Superstore. That is the specific cache implementation of this superstore. In this case it's the team Superstore. We override the method that gets the team store in this case is going to return this own implementation of the team Superstore and how we implement this superstore. The Superstore is going to embed again this team store, any team store, but we're going to think about it as the SQL team store. Then we add the root store. That is a private attribute that we are going to use to share some data and share some methods. And it's going to be just the local cache store instance. Finally, we need to add methods to this superstore. This method for example is get method. And the get method is going to check the cache. It's going to return the cache. If there's a heat and if there's not a hit, it's going to go and use the embedded track. It's going to take the embedding thing store and use whatever is there. It's going to use the underneath store. So the SQL store and it's going to get the data from the SQL store, check if there's any error, and if there's no error, I'm going to cache this information and return the result well. What went well and what didn't work, one of the things that went really well was the simplicity of the solution. This is very simple, this is very straightforward, this is very clear pattern. It's really easy to understand, really easy to think about it, and really easy to think about other options that we can build with. This also was very simple. To add new caches was super straightforward, was overriding methods and just delegating everything else in the underneath the store. You don't need to think about adding in three different places code. You only have to add code in the cache layer and it's all that you need to do. It's a really general approach, so it can be really easy to reuse for other things as we're going to see soon what didn't work well. There's some subtleties around stroke embedding. String embedding is a struct embedding. It is not inheritance. So you are embedding a struct in another struct. So think about it as something that you are doing manually. You have a struct and you embed something inside that struct. Go provides some kind of syntax sugar to make this track embedding more comfortable, and allows you to call methods from the embedded struct in the parent struct. The parent struct can override that. Methods can define that methods, and if the methods are defined, it's going to be call from the parent. But if it's not defined, it's going to be call from the embedded struct. But there is the problem. Whenever you call a struct that embeds other struct, if the methods is not defined, it's going to call the underneath method the embedded struct method. And once you call the embedded strand method, that is not going to know anything about the parent. The context of that method is going to be the embedded object. So there's no information about the parent at all. So if in that method you call another method of extract, doesn't matter if you override that method on the parent, because you are in the context of the embedded extract. So you are going to call the methods of the embedding extract always. So that means that that can lead to some subtle errors that are really hard to track and really hard to find. But if you really know what struggle embedding is, it's really easy to avoid them. So one of the problem is these subtle errors that can happen. You have to be sure that your team knows what struggle embedding means and for sure know that struct embedding is not inheritance. Okay. Another problem is the interface has to be homogeneous. So that means that some flexibility get removed and doesn't allow you to add these kind of hints or things like that, or specific parameters for certain layers. And all that stuff is the price that you have to pay to have this homogeneous interface that you can wrap in layers. But this is the first solution. We built this for the cache. Went really well actually. But we start thinking, oh well, we have this new layers architecture that we can leverage for other things. For example, we can leverage that for instrumentation, to add instrumentation to our store without modifying anything in the store. Just having some instrumentation in a well defined layer and separating all that logic from the rest of the store. It's a great separation of concerns. You can have login if you want to log all the actions that you are doing. If you want to log specific actions in specific places. Auditing, for example, if you want to audit when something get accessed or get removed or get modified or something like that. Well, something that is really interesting is a storage and query delegation. For example, if you have your SQL store that, store things in SQL, SQL is great, but it's not the best option for every single problem out there. Sometimes you want to store unstructured data, sometimes you want to store data that is not so important to lose over time or is not necessarily to be 100% sure that you are storing the data and the data is 100% consistent. For example, some temporary data related to the status of the user, or if the user is typing something or was the last channel that the user viewed or things like that. That information is important for our users, but it's not critical. So you can leverage some in memory database, you can leverage some search specific enzyme like elasticsearch or bleep. Or you can start in a struct data in coach tv or Mongo for example, that can provide you certain performance improvements or certain extra features for certain pattern usage. Well, also we can add their extra validation if we want to be sure that certain things get consistent in the database. We can add extra validation in a layer. We can add extra error handling. For example, if you have a non relabeled network connection with your database, there can be some timeouts or there can be some network connection problems and maybe you want to handle that at the store level and struct the app layer from all the logic needed to retry a timeout on the database or retry certain situations under certain errors or you want to track certain kind of errors and store that information in a sentry or something like that. We start with instrumentation. We added this timer layer. The timer layer is just a layer that wraps every single method in the store and adds a timer and calculates how much time it takes to execute the query in the store. Yeah, it wraps everything with can almost identical method. So this is a lot of code and it's a very annoying kind of code that you have to write. And then the maintenance of that is really boring and error prone and complicated. So wow. Generators to the rescue. We are going to write one after another, same time, the same thing, a lot of times without a reason. Well, go provide us generators and we are going to use them for building this timer layer. Timer layer. This is an example of a method wrapped in the timer layer. In this case is the safe method of the Audi store. We start the timer, we execute the underneath the store call, and we calculate the last time, the time that has been spent in that method. And then if the metrics are enabled, I'm going to check if the query will succeed. And if succeed, I going to store, well, whatever it succeed or not, I'm going to store that information. In Prometheus. This is really great because it helps a lot to investigate bottlenecks. For example, we have all the information on how much it takes to execute every method in our store. This is can histogram. So we have the information of the average time. We have information about things like how many times these methods have been called. So we know how much time it takes in a cumulative way. So we can decide, okay, this method is called just a few times, but it's taking a lot of time each time. That is something that we have to handle. But at the same time you can think, oh, this method is really fast, but it's getting called like millions of times. So if you are able to improve the performance there, you are getting a very important performance improvement. So sometimes the time that is taken for certain methods is not that important and it's more important the time that is taken in total, not for each call. So this kind of information is there in a grafana and we can explore that. We can set alerts on that. So we can decide, for example, if a methods gets an increment of 10% of time to get executed in certain time, we can execute an alarm and send an email and say, okay, this method gets degradated in that date will be is because you upgraded to a new version and maybe that degradation is acceptable or is explained by some changes in the code that are necessary, but you don't degradate that without noticing. That other thing that we did is adding open tracing. Open tracing is great and give you a lot of information about what is going on in your system. But adding open tracing means that you have to add a lot of small details here and there in your code. And was something that we didn't want to do because we don't want to contaminate all our methods with this set of information in open tracing. So what we did, we create a layer that is almost the same of the timer layer. But for open tracing we created this layer and we also replicated that for other places. We use open tracing in the API using the middleware of the API that was already covered. And then we had to add open tracing to the app layer. The app layer is a big structure that has a lot of methods and that methods, well that is the way that we organize that methods. So what we did is just build automatically generate interface that match that structure with that interface. We created the layer for the app using again code generation. So now what we have is whenever we change something in the app layer or whenever we change something in the store, we only have to execute code generation and it's going to generate all the open tracing code for us. And we don't have any open tracing related code in the app layer and we don't have any open tracing related code in the rest of the store. We only have that information in the specific set of auto generated code that we have there. Well this is how it looks like in the code, the open tracing layer method, we just set the open tracing information. We execute the underneath method in the store and we add more information to the open tracing and that's it. Okay, the retries, the retry layer in the database. We want to use serial stable isolation level in the database. And that has a problem when you use read committed, basically you try to execute the queries and time is going to work pretty well. And when the load is pretty low, it's almost impossible. It's really hard to refuse that a transaction failing there. But when you are using serial disable isolation level, the problem is the isolation is so high that whenever you try to run two transactions and one of them modifies certain data and the other one is querying, some part of that modified data is going to fail. But it's not going to fail in a way that this query is broken or something like that. It's just saying okay. I'm not able to execute this transaction because something was modified before. So you need to execute the transaction again. And that is what a repeatable error means in the database. So whenever a database return, a repeatable error means retry. It probably is going to work, you only need to retry it. But because the transaction is something that I can re execute automatically from the database, because you are able to do things between the transactions and you are able to do calculations between the transaction, it's not easy for the database to infer that the transaction is repeatable by itself. So you need to repeat the transaction from the outside. Well, because we need to repeat the transactions when we receive a repeatable error. That was pretty easy to do with a layer. We just generate automatically a layer that catch any error that is a repeatable error and try again. Well, this also helps us with this. Whenever a deadlock happened in the database, one of the transaction is going to succeed and the other is going to get killed with a repeatable error. So that is something that happened really in a really rare way, but is something that can happen in very loaded environments. And what was happening before is just, well, it returns can error to the app layer and it returns an error to the API and probably the API is going to retry again. Now we are going to retry vaguely in the SQL store. This is how we did that. We have a get, for example, in this case we have the get method. We just enter in a loop, try to execute the query. If that works, great. If it doesn't work, if this is not a repeatable error, I'm going to return the error. But if it is a repeatable error, I'm going to try again. I'm going to repeat and repeat and repeat until it succeed or it fails three times. After three times we give up and return an error. Then what is really interesting here is we have the timer layer, we have the open tracing layer, we have the retry layer, and all that layers are auto generated. Everything that we change in the store is going to be automatically up to date with just a make generate. That is awesome. So if you have this kind of code, it's really great to have generators. And how we do that, we use ASt to analyze this struct, this interface, this store interface and all the soup interfaces. And we build a data struct where we have all the superstore that are defined, all the method of the superstores, all the parameters of the methods, all the return values of the methods, all that information is in a new struct that we pass that information to a template and that template generates the code. We have different templates, we have the same ast code that analyze the store and then we use that same structure that we just generated to populate three different templates, one for the timer layer, one for the open tracing layer and one for the retry layer. And that templates are going to be generated and it's going to generate a certain amount of code. And on top of that we are going to use go format package to reformat that. Why we use go format package because we don't want to be super correct when we generate the code. Generating the code is already a complicated task and generating code that the Go format likes is even harder. So we just delegate that in go format. We generate the code and reformat it with Go format package. So the developers are happy and the go compiler is happy. So this is an example of the timer layer template. As you can see there we have the superstores. We range over the superstores, we range over the methods of the superstores and we generate the functions there. We generate the star equal time module now and all that stuff. We are generating all the code there. It's not easy to understand, but once you write this has been working really well for a long time without almost any maintenance. Okay. But not everything can be generated or can be automatically generated. So we have to build something else. And I already talked about this storage query delegation pattern and in this case we use this pattern to build the search layer. For the searches in maramos we use full text search in the database. But we also support other searching mechanism like elasticsearch or bleep. If you want to use elasticsearch or bleep, what we do is just add a search layer on top of our SQL database layer and every search in the store is going to be delegated to elasticsearch or bleed. Every search in the store or every action to the store that needs to update the indexes is going to execute an update of the index in elasticsearch or bleed. And anytime that you try to search something it's going to hit the elasticsearch or bleed but it's not going to hit the database. So you are going to have probably better performance from a specific for search back end to search stuff. And actually we have more features and a better search using this in giants than the database one. And you are going to free some database titles for other stuff. So that is another interesting thing. Well, we want to make this transparent from any store user like the app layer. If the app layer is trying to use the store, they don't need to know if they are using elasticsearch or bleep or something like that. They only need to know that they are searching for users. And if the elasticsearch is enabled, it's going to get handled by elasticsearch. But the app layer doesn't know to need anything. Well this time we created the layer writing the code and here is an example. For example, in this case we are talking about the post store. We are overriding the method save of the post store and we are just saving the post using the SQL store underneath. And if there is an error I do nothing. But if there's no error, I'm going to index that post, I'm going to update the index of that post in the elasticsearch or bleed. If I search for a set of posts, I'm going to check in giants, I'm going to check what engines are enabled and I'm going to try to search in that in giants. If one of the injuries fail, I'm going to try in the next one until I find any injury that works. If none of our elastics are or bleeding giant specializes in giants works, we are going to fall back to the database search. We can disable this fallback and if we disable that fallback, it's going to return an empty list. But if we don't disable that fallback, we are going to just call the underneath SQL store to return the results. This works well. If you have for example downtime in elasticsearch, you can just use the database search as a fallback. And this is the final onion. This is how it looks like in our system. We have the app that is calling the store and it's passing through all the layers down to the SQL store and going back through the layers again to the app. The SQL store is at the bottom, it's taking care of all the SQL queries and all that stuff. The retry layer is going to take care of the repeatable errors. The cache layer is going to cache things and it's going to invalidate the caches and it's going to take care of, maintain and use the cache. The search layer is going to take care of, maintain and use the search indexes in elasticsearch or bleep. The timer layer is going to take care of all this timing around all this collecting information about the times and sending that to Prometheus and it is not here, but optionally you can have the open tracing layer. The open tracing layer is optional because have an important performance impact. So we can enable and disable it and usually it's disabled. But if you enable that, it's going to wrap this store entirely and it's going to provide that information to the open tracing service. This is how we build the onion. We instantiate the SQL store, we wrap that into the retry layer, we wrap that into the local cache layer, we wrap that into the search layer and we wrap that into the timer layer. And finally we return that final store. Because everything there implements the store interface, we can just say that they are all stores. The SQL store is a store, the retry layer wrapping a SQL store is a store and the SQL store wrapped by a retry layer and wrapped by a local cache layer is a store. We can reorganize all this and change where the layers are. For example, I can move the timer layer right after the SQL store. And that way we are going to measure only the time that the SQL store is taking. If you consider that the local cache layer, the cache layer that we are using is contaminating the data because you are interested on how much time the database is taking and you don't care about how much time is the store taking in general, only about the database. You can move the timer layer up there. Even you can create another timer layer and have different information, the SQL store information and the store information. You can play with this concept of everything as a store. To move the layers and make decisions about how we set up the layers and disable any of these layers is just not adding the wrapping. So if you want to enable or if you want to disable the search layer, you can just decide by a config setting if you want to have a search layer or not. And you just don't wrap the store with this layer and that's it. Well, there's some drawbacks. As I said already, all the layer has to share the same interface. That is a problem because you don't have enough flexibility to add certain things like the hints for the cache without modifying the whole store. So you have to modify the whole store interface if you want to add these hints for the cache, but if you want to add other kind of hints for the search, or if you want to add other kind of extra information for open tracing or for timing, you have to add more and more and more information to the store interface. And that is something that doesn't scale well. So I think this is the price to pay you have to accept that you have to use the same interface if you want to build this layer based approach. Probably there are some tricks that you can try, but it's not something that by design is going to fit well then the other problem is embedding is not inheritance. So it is not a problem per se, but it's something that can generate problems if the people doesn't understand well that the team of the people that is touching the store in this case needs to understand that the embedding is not inheritance. Embedding is struct embedding. So you need to understand that well and how embedding works to not end up having weird bugs that are really hard to debug. Well, some reference if you want to see how we implemented the store and the store layers and the generators and all that stuff is publicly available in our mattermost server repo in the store directory. If you want to see our old version of that with the middlewares and all that stuff, you can check the version 50 zero that is a bit old already, but you can check that and well, it can be interesting. If you want to know more about stroke embedding, there's a talk really interesting from Gophercon UK. And if you want to know more about code generation, there's another talk from Gophercon UK that is really interesting too. So thank you.

Slides

Download slides (PDF)

See all 25 talks at this event!

Conf42 Golang 2021 - Online

June 24 2021

Struct Embedding, Instrumentation and Code Generation

Video size:

Abstract

Reviewing how struct embedding and other techniques can be used to improve code design, performance, and usability in your Go applications.

Summary

Transcript

Slides

Jesus Espino

Full Stack Engineer @ Mattermost

Join the community!

Featured event

2025

2024

Info

Conf42 Golang 2021 - Online

June 24 2021

Struct Embedding, Instrumentation and Code Generation

Video size:

Abstract

Reviewing how struct embedding and other techniques can be used to improve code design, performance, and usability in your Go applications.

Summary

Transcript

Slides

Jesus Espino

Full Stack Engineer @ Mattermost

Join the community!