Conf42 Rustlang 2023 - Online

Turning smart contracts into indexers with cross-compilation in Rust

Video size:

Abstract

In this talk I describe how functional programming patterns and conditional compilation in Rust can result in the same code being compiled to both native and wasm targets. This has applications in some blockchain-powered software because smart contracts can be reused as indexers.

Summary

  • Michael Birch is a senior Rust engineer working at Aurora Labs. His goal is to show how patterns in functional programming can be used in rust to write code that is easier to test, maintain, and reused in multiple applications. He hopes this talk will be both accessible and interesting to blockchain developers, rust experts, and functional programming enthusiasts.
  • A blockchain is an appendonly data structure with immutable history. Once something has been added to the blockchain, it cannot be changed. A smart contract is a program that can run in the virtual machine of whatever blockchain you're talking about.
  • Getting data off of the blockchain is slow because it requires multiple network calls. Indexers help address this by having a specialized view of the state which is optimized for particular queries. How are we going to accomplish that using rust?
  • Aurora is an ethereum scaling solution built on top of the near blockchain platform. Aurora claims to be compatible with many tools in the Ethereum ecosystem. All of the code is available on GitHub, it's all open source.
  • On most blockchains, calls to other contracts are synchronous, but on Nier, they are asynchronous. Rust can accomplish this using traits and type generics along with a little bit of conditional compilation. You can write business logic in a way that uses pure reusable code by abstracting away platform specific effects.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to my talk. My name is Michael Birch. I'm a senior Rust engineer working at Aurora Labs. I've been working with Rust since 2018, but before that I was doing functional programming in Scala. I love how Rust is an imperative programming language, but takes inspiration from some of the best features of functional programming, some of which we'll talk about today. My goal with this talk is to show how patterns in functional programming can be used in rust to write code that is easier to test, maintain, and reused in multiple applications. In your tech stack. In particular, I'm focusing on how we applied these ideas at Aurora to do what it says in the title, turn our main smart contract into an indexer. I hope this talk will be both accessible and interesting to blockchain developers, rust experts, and functional programming enthusiasts. Though each group may find different parts of the talk more or less informative in terms of an outline, I'll begin by getting everyone up to speed on the basics of blockchain technology. Only what you need to understand what I mean when I'm talking about smart contracts, indexers, and how they relate to one another. Next, I'll talk about the key features of the rest programming language that enable us to do this smart contracts into indexers trick. And finally, we'll do a deep dive into some real code that showcases how these ideas were applied at Aurora. So, to get started, let's talk about blockchain basics. What is a blockchain? A blockchain is an appendonly data structure with immutable history. What I mean by this is the data structure can only have additional data added to it, and once something has been added to the blockchain, it cannot be changed. Up on the slide, we have sort of the classic depiction of a blockchain. The squares in the diagram are the so called blocks which represent the individual chunks of data in the data structure, and the arrows show how the blocks are connected to one another. Hence blockchain. This data structure you can imagine extending infinitely to the right by adding a new block with can arrow pointing to the previous block. As I mentioned, the history is immutable because the connections actually depend on what data came before it. So you cannot change that earlier data and still maintain that connection. This is cryptographically secured, but the details are not important for what we're talking about today. So you might ask, what goes into these blocks? And the answer is transactions. For typical blockchain applications, you have transactions that are interpreted in some kind of virtual machine, which causes some kind of global state to change. For example, in the case of Ethereum, which is a relatively large blockchain platform you may have heard of before, the state consists of all of the accounts and contracts that exist on the platform, and the virtual machine is the ethereum virtual machine. The EVM transactions can be things like send three ETh to some address or call some method on a contract that was previously deployed. And speaking of smart contracts, let's talk about those. A smart contract is a program that can run in the virtual machine of whatever blockchain you're talking about. Up on the slide. I've got sort of a simple example where in the first transaction you deploy some contract, it gets included into the state, and then in a later block you have a new transaction that is calling a method, in this case, the method foo on that contract. Smart contracts can do all kinds of things. They're limited only by the power of the virtual machine that they operate in. For example, smart contracts can represent tokens, maybe even tokens that are tied to some real world currency, so called stablecoins. Smart contracts can be marketplaces where you can trade between those different tokens, or they can be escrow lockers to facilitate some kind of exchange. Maybe even in the real world, smart contracts can be validated storages where you can prove that off can events happened, say for example in the leaderboard for blockchain enabled games. Or even a smart contract could be the entire virtual machine of some other blockchain platform. And this might sound crazy, but it's an important part of this story that we'll come back to later. For now, let's keep being with our blockchain basics. All of this is just about a single copy of the blockchain and how it evolves. But in a real blockchain platform, the blockchain is distributed and continuously built by multiple participants that are decentralized, the so called nodes of the network, and the transactions are submitted by users. The nodes eventually agree on the blockchain that they're all collectively building up to some point using a consensus algorithm. Again, the details are not very important for what we're talking about here, but suffice it to say the blockchain is only eventually consistent, and getting data about the blockchain off of the network is slow because of the decentralized nature. So how do users actually get the data off of the blockchain? They interact with some kind of RPC. Usually that RPC is run by a third party service, but users can choose to run their own code that are a part of the network and still expose an RPC interface to themselves. The key point here is that getting data off of the blockchain is slow because it requires multiple network calls, and even still because of the decentralized nature, it's only eventually consistent. So something that happens in one node may take a long time to propagate and be visible by any given RPC for a particular user. So to help address that problem, we have this idea of an indexer. Given the wide range of functionality of smart contracts, it's not always efficient to query a blockchain platform directly against its RPC. There's just no way for a single RPC to capture all of the different possible variations on the states that different smart contracts could hold. And indexers help address this by having a specialized view of the blockchain state which is optimized for particular queries. For example, a block Explorer like Etherscan, which is the biggest block explorer for the Ethereum blockchain. It'll show you all of the tokens that are held by each user. Up on the slide, I've taken a screenshot of ether scan where I've selected some random account and you can see it holds 23 different kinds of tokens. This kind of query would be very difficult or maybe even impossible to do with just the RPC because the mapping is actually reversed on chain. Each token knows the list of balance holders, but there is no index for all of the tokens held by each of the addresses. So to solve that problem, Etherscan has an indexer which maintains that reverse lookup and makes the queries efficient when a user visits their website. So the way to think about how indexers fit into the story here is you have some user talking to some application UI which is talking to an indexer, and the indexer is fed data from an RPC and the indexer is maintaining its own internal specialized view of the state. Indexers, like I've been saying, help create low latency experience for users or web, two experiences in the sort of blockchain jargon. Since the indexer needs to populate the database from blockchain data, it needs to understand what state of the smart contract is important. Therefore, smart contracts and their indexers are always related to one another. So can idea we might have is to have a single source code, single code base that contributes both to our smart contract and our indexers. Again, because they're so closely related to one another, this would be great since it would mean less maintenance. There's just less code overall to maintain. And even better still, it would make our indexer more powerful than your average indexer, because it would have access to all of the smart contract's logic, and it would be able to do things like simulate an entire transaction off chain, and that gives you free low latency feedback to users about potential errors. So this is what we're aiming for. This is our idea. How are we going to accomplish that using rust? So there's a few key concepts that we want to take a look at. First of all is different compilation targets. So up on the slide I've got links to the official rust documentation if you want to sort of take a deep dive and read more about this yourself. But at a high level you have the rest compiler giving you some kind of output from your code, some kind of executable output from your code. In the typical case, it's an executable for whatever operating system you're running on Linux, Windows, Mac, and maybe even specialized to the specific hardware of your machine, say it's an arm or x 86. But there are more executable targets than just these. In particular, webassembly is an important one for this story, because the blockchain platform that Aurora uses has webassembly as its virtual machine. So on the slide I include the commands that you would use in rust to install the webassembly target compilation target, and then how you would select that targets to build your project. Now, given that you can have these different targets, it's useful to be able to write code that selects or is specialized to a particular compilation target. Conditional compilation allows us to do this. Again, I've got a link up on the slide which you can do some further reading on if you're interested. But long story short, the syntax looks like the example I've got up there. So we have some function Foo, and the implementation of Foo depends on if we're compiling to wasm or not. Again, this is useful because it lets us make specific choices that only work on that architecture. So in the case of webassembly, you have this notion of host functions, which we'll talk a lot about later. But host functions are functions that the host, that is the computer running your webassembly virtual machine exposes for the WASM module to use, whereas in a normal application those don't exist. But in a normal application you do have access to disk, for example, so you could use that to access storage. Now, the drawback of conditional compilation is that it's a little verbose, and it makes the implementation a little hard to read, right. You can see, the function foo maybe looks a little awkward because there are these two totally separate branches. Additionally, because you have these separate branches, it's a little tedious to use inside of an IDE. Your iDe will usually only show you one branch of the conditional compilation at a time, so you have to sort of switch back and forth between them when you're trying to write code for it. But fortunately, we can minimize the amount of conditional compilation that we're doing by abstracting most of our code over the platform specific effects. And to do that, we have this idea in rust called type generics. So the whole point of a type generic is you write a function, in this case get balance, which takes in some type, but we don't care what it is, we only care that it has a particular interface, and that interface is defined using a trait, and indicated in the syntax they are using a so called trait bound. So we have in the brackets between there the angle brackets I O says that whatever this type I is, it must implement the IO interface defined by the IO trait. In this example, I'm keeping it pretty simple with just the read and write methods, and the type signatures are maybe a little suspect. You can imagine maybe that there should be some kind of option or result involved in the read, but for the sake of example, let's suppose this is what it looks like. The point is that when you have this type generic, you don't care what the specifics of the implementation are, you can still make use of those methods in the implementation of your function. So in the case of get balance, we're imagining reading some bytes out of the state and interpreting them as a U 128 value. So this get balance function is now totally generic over anything that implements I o. And in particular we can imagine implementing this for webassembly as well as native code. And these would look a little bit different because of the different implementations that are possible in those targets. So for example, on the indexer side maybe our implementation would be reading to a database on the disk, while on the webassembly side it would be calling to a host function like I mentioned before. But either way, after we've done those implementations, the getbalance function is now available and identical in both situations. So this code is generic and reusable in both cases. Now, of course, this particular example is a little bit silly, because get balance is just a one liner function, so you probably would imagine it wouldn't be that big of a deal to duplicate that logic in the two places instead of jumping through all of these type generic hoops. But this idea scales, right? It doesn't matter how complicated the get balance function is, and at some point it's complicated enough that you don't want to have to duplicate it into both places. So this is the whole point of the functional programming patterns. Let's take a step back and just talk about the big ideas here, because this really isn't about just rust or just blockchain. This is programming style in general. The code concept is when you're writing code for environment specific effects like reading and writing from state, you can abstract over them in your core business logic to make it pure. That is to say, it doesn't depend on the particular environment, it doesn't depend on any compilation target specific effects, and by factoring those things out, pushing them into a trait, and only having those trait implementations has the boundaries of your applications. The result is most of your code is easier to test and easier to maintain because it can be tested using in memory structures, right? Just different implementations of those traits that don't even actually do the effects. So in the case of storage, right, you could have just a simple hash map in memory that you're reading and writing for your tests, but still be able to test all of your business logic code, because again, it is agnostic to the details of that implementation, while in the production implementation you're actually using a real database. And again, all of the code that is being used with that database has still already been tested in this in memory environment. The other nice thing about all of this is this code is easier to reason about. You don't have to worry about are there going to be random side effects in particular parts of the code because they're all explicitly given in the type signatures. You can't do I O without having the I O trait present. If you're writing code that is generic, overall possible compilation target. So whenever you see a piece of code that doesn't have the I O trait on it, you know it's not doing any I O. And conversely, when you do see a function that has the I O trait, you know that some state is being used by that function. This is something that you might have heard before, has the principle of least authority, where the idea is that code should only have access to the minimum amount of capabilities that it actually needs to function. So code that doesn't need I O doesn't have I O, and similarly for other effects. Finally, this code is easier to reuse, and that is kind of the whole point of this talk. Once your code is written in this abstract way, you can use the same code for any different application that you need. So in the case that we're talking about, it's a smart contract and an indexer, but you can imagine that the same kind of thing would work. Say, if you had a web application and a mobile application, you could have the same rust code base at its core and compile it to webassembly. That gets used as part of a web app or down into a mobile code and used in your mobile app as well. And again, it would be the same idea where if you're abstracting over the FX, you don't need to worry about whether you're talking to a javascript wrapper in the browser or an Android operating system, for example. But all of that aside, let's get back to our main topic here. So at Aurora, we made use of these ideas, like I've been saying, to make an indexer out of a smart contract. So what is Aurora? Aurora is an ethereum scaling solution built on top of the near blockchain platform. I've got a couple links up on the slide if you want to read more about that and dig deeper. That's available. But for today's story, the part that is most important is Aurora's core product is this EVM, the Ethereum virtual machine we talked about before. But it's deployed as a smart contract on top of near uses webassembly as its blockchain VM, which means it's super powerful. Rust compiles to it, no problem, like we saw earlier. And you can write an implementation of the EVM in Rust, compile it to Webassembly. Now you have it as a smart contract on Nier, and that enables this whole scaling solution where now you're using Nier to run the EVM transactions instead of Ethereum itself. So you're benefiting from near's better scaling, the sharding and consensus and all this stuff. But again, out of scope for today, the most important detail to continue our story is Aurora claims to be compatible with Ethereum tooling, including the metamask wallet, the hard hat developer tool. And if these are not familiar to you, don't worry about it. Suffice it to say, these are tools that are very common in the Ethereum ecosystem, and they rely on a particular RPC, right? They rely on the RPC that Ethereum nodes would expose. So for Aurora to be compatible with them, it must also expose that same RPC interface. So there's sort of an obvious implementation on how to do this. We know that this thing is deployed on Nier, and Nier has its own RPC. So you can imagine a proxy which translates Ethereum RPC requests into the corresponding near RPC request that's talking to the Aurora contract, and then translates the response back. But as we've mentioned, this is quite slow. It involves multiple network hops, and so you may end up with lower latency with a higher latency than you want. The other issue is some of the RPC requests on Ethereum are fairly beefy. You can ask an Ethereum RPC node to simulate entire transactions, and if you're trying to do that on a network node inside of webassembly, it can be additional latency on top of the just simple network latency. So getting to the main point here, the way we can solve this is by writing an indexer that is actually able to function has this ethereum RPC. So again, because it needs to do this whole transaction simulation, needs to have all of the functionality of the aurora contract baked into it, and we're going to use the exact methods that we've just been talking about to make this possible. So all of this code is available on GitHub, it's all open source. I've got a link up on the slide if you ever want to go and take a look at it yourself. But we'll take a look at a few snippets and get a sense of what's going on here. So the code that's up on the screen right now is the real code out of the actual Aurora engine. So here's the actual implementation of get balance that I sort of gave a toy example of earlier. It's pretty similar to the toy example, with a few extra error handling things going on. And similarly for the setbalance, again, notice the IO type generic, right. We are using a read and a write on type that we don't actually know its concrete value, we only know its interface, and we can compose these together into more complicated functionality. So the add balance function uses both get and set together to add an amount to a balance. But again, you can see the I o trait is present so that it can be passed along to the other smaller functions. So let's look at what the implementation of this I o trait actually looks like. So this is for the webassembly side. Notice that the main struct we're interested in is just a singleton struct. It's just called runtime, as in the near runtime, as in it is running inside of wasm, and it's a singleton because it doesn't actually have any fields or state that are important. The implementation comes from the host functions that the near runtime exposes to us. So you can see in this implementation we're calling this storage read function. It's all fairly low level and the details aren't super important. The point is that we know that this host function exists and we can call it, and that gives us access to the storage. And the same thing, on the right side, there's a host function we can call it. We're passing in some pointers because it has to do with the host, reading out values from the WASM memory and writing the value into the actual underlying storage that the near node has. But again, details aside, it is implemented using these host functions and it satisfies the interface that we've defined. Now on the other side, the indexer side, this is native code, and it's a little more complicated. We have the last field, there is a database handle to a RocksDB database, because that's where the storage is actually going to be located in the case of the indexers. And I see a few other fields as well, because we want there to be conditional features that the RPC implementation has as compared to just smart contract. So for example, we needed to have access to the whole history of the state, not just the state, at any given moment in time. The reason is because when a user is accessing the Ethereum RPC, they can actually specify previous blocks to simulate their transaction against. And similarly, because we are interested in the potential of simulating transactions without actually committing them to the state, we need all of the changes to happen in memory instead of being eagerly written to the database. So that's what the transaction diff and output parts of the structure for. So this is what the read storage implementation looks like. Again, the details are not super important. What matters is that this is satisfying. Exactly the same interface, but the implementation details are different. So in this case we are taking a look at the in memory changes first. Again, because they weren't committed to the database, and if it's not present in there, that is, the key hasn't been written before in this transaction. Then we have to go back and look at the database. And the database has an iterator in it, because we are going backwards through the history to find the key value pair that's relevant. Similarly, on the right storage side, same kind of same point, right. It implements the interface that we've defined, but in this case it's writing to this in memory diff instead of writing to the database, or in the case of the webassembly calling a host function. But what's very cool about this, and you might be know Michael, there are webassembly standards that tell you what interfaces should look like for interacting with storage. You don't necessarily have to jump through all these hoops. But the point is that this isn't just about storage, it's about any kind of environment specific effect. Storage is one that we commonly think about. But in the blockchain world there are other effects that we care about as well, and they have analogies in other contexts too. So for example, the environment variables are things like what block height are we executing in, who signed this transaction, what time is it? All this kind of stuff is all set in the environment of the transaction's execution and is available in the webassembly contracts from host functions. But we still need to be able to access that stuff when we're simulating the transaction locally. So we have a trait that exposes all of the things that a user could potentially ask for, and we have a separate implementation for the indexer. In that implementation, the information comes out of the actual block that we're consuming. Or in the case of a user submitted simulated transaction, we just fill in default values for these. But you could also imagine an implementation where these were literally environment variables, as in system environment variables, right. Say we were writing some kind of CLI application to interact with this instead of making it as an RPC. Then you could read environment variables as part of that command line call to fill in this as well. Right. And that would be another implementation, which again our sort of core business logic is agnostic to, but opens up the opportunity for making these other applications. Similarly, there's this idea of promises, which is actually a near specific concept on most blockchains. Well, specifically on Ethereum, calls to other contracts are synchronous. They happen all in the same transaction and all at the same time in the same block. But on Nier, calls to other contracts are asynchronous. This has to do with the sharding model. Again, the details aren't super important, but suffice to say, this is another kind of effect that exists in the webassembly smart contract context, but doesn't exist in the case that we're running an indexer, right? Because the indexer only looks at the Aurora smart contract itself, it doesn't index the whole near environment. So it's actually not able to simulate these cross contract calls, but we can still factor them out as a trait and put in no op implementations for the case of the indexer. And this also works fine because most transactions on Aurora are contained within the EVM. Like I mentioned, the EVM tends to be this synchronous all in one block type transaction. So most of the time we don't actually need these, but sometimes we do, and therefore we need the code to handle it and we're still able to factor it out and do it sort of for real in quotation marks and webassembly and just no ops in the indexer, and have it also work totally fine. And all of this sort of builds up to huge complicated functions, right? So this is just the interface of the submit function. This is sort of the main entry point for users when they send an EVM transaction to Aurora. And so you can see we've composed together all these different kinds of effects. We've got the I O for storage access, the environment variables that are present, and the possibility of calling other near contracts. Now if I left the implementation off, because again the details aren't really what's important here, but the point is that this is a relatively large, relatively complicated function with multiple lines of code, and having to duplicate it between both the indexer and the smart contract would have been too big of a maintenance burden and wouldn't have made sense. But because we can write this abstract code that is reusable for both, it enables us to have this indexer as RPC. So as a final point here, this is what the actual RPC code looks like, or part of it anyways. So there's a particular RPC method for Ethereum RPCs called ETh underscore estimate gas. This is one of those transaction simulation RPC methods that I've been referring to. And this is sort of, it's only a snippet, it's not the full implementation, but this is sort of the main points of what's going on. We have to set up the structs that implement the traits that we've set out, right. So this is the environment variables. You can see we're setting default values because it's just fixed. In this case it's just a simulation. And then we also have the engine access struct that we saw on an earlier slide, and we're accessing it at a particular block height and transaction position because again the user can specify those things in their request. And the actual implementation of this, which I've left off the slide because again the details aren't important is a closure that accepts this I O object. So now it's got access to the MV object and the I O object, and it can call methods like submit. And again, the point is that it's not duplicating any of that complicated logic that exists in sort of our functional programming generic core. It just calls the method submit and then does something with the result. So in conclusion, at the highest level, the thing that I want you to take away from this talk is that you can write your business logic in a way that uses pure reusable code by abstracting away platform specific effects. And as a nice side effect, this code is actually easier to test and to maintain. In this particular talk, we were using Rust as our programming language of choice, and Rust can accomplish this using traits and type generics along with a little bit of conditional compilation. And for the specific example of this that we talked about today, it was in the blockchain context where Aurora has the EVM smart contracts that is able to also be an indexer and share a code base, and this enables their low latency RPC for Ethereum compatibility. But as I mentioned before, there are other possible applications for this. For example, using the same code in web and mobile applications, I'm sure you can come up with other examples based on your own fields of interest. Maybe you're working in embedded devices and you want to share a code base between the embedded version and say, the simulated version of your hardware, et cetera. Again, the main point is functional programming patterns enable you to reuse code in a way you may not be able to otherwise. Thank you all for your attention. Got some links up on the slide here. If you want to read more about near or Aurora. If you want to learn more about me, you can get in touch with me directly on Telegram at BirchMD. Or if you want to take a look at other code that I've worked on, you can check out my GitHub under that same username. And I also have a blog that I occasionally write to on my website, Typedriven CA, and I write about mostly rust and mirror, but some other stuff occasionally as well. Thank you for your time. See you later.
...

Michael Birch

Senior Software Engineer @ Aurora Labs

Michael Birch's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)