Conf42 Rustlang 2023 - Online

Using Rust for Numerical Applications

Video size:

Abstract

Do you wish there was a better way to harness the power of modern hardware without sacrificing development speed? Would you rather sell kombucha than using CMake again? Look no further! In this talk we will explore the world of Rust for numerical applications and its flourishing ecosystem

Summary

  • In this talk I will give you some guidelines to develop efficient Rust code for numerical application. Together with its ownership model, Rust makes a superb language for numerical applications. But what I think is Rust's killer feature is the tooling.
  • Writing numerical applications to deal with the real world is difficult. I firmly believe that intelligence is just a robust methodology to recursively improve upon my stupidity. Before optimizing we would like to profile and benchmark, and I will discuss those two things in the next slides.
  • The development of the MVP is going through baby steps. Focus on writing correct and easy to read code. Then we can work into optimizing the code. Finally, I totally recommend you to use a battle test library. It is up to your applications to know whether it's fast enough.
  • Perf is a performance analysis tool for Linux. We can visualize perf output using flame graphs. Flame graphs are used to visualize where time is spent in our program. Once the profiling is done, we would like to do some benchmarking. The perfect tool for that is criterion.
  • There are three complementary strategies that we can use for testing. You can ask your large language model to generate unit tests for you. Other thing that you can try is a property testing framework, something like protest. Just look for edge cases.
  • In computing a round of error or random error is the difference between the result provided by a given algorithm. Floating points cannot represent all real numbers accurately, so they are always running errors. If you are into numerical applications, it is really important that you have a look into what numerical analysis says about the algorithms that you are going to use.
  • For other than trivial algorithms, other than three lines algorithm, we should use a third party library. If we are talking about really well known algorithms, I think it is much better to join forces with another community. Try to maintain the most efficient algorithm that is available.
  • Some libraries that are using every day are fantastic for numerical applications. They are used for array manipulation, statistics, linear algebra, for array. With that I think that I have covered some of the perks that I consider in Amus Health for the medical applications.
  • Most of the numerical algorithms are going to be implemented in Python. Materin is a tool that facilitates the creation of python models that wrap ras code. By exposing our APIs, our libraries to the Python community, we're going to accelerate calculations. I think this is a win, a win situation for us and python communities.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
In this talk I will give you some guidelines to develop efficient Rust code for numerical application. Whether you are starting from scratch or improving on top of an existing application, I hope this talk will give you an overview of the general workflow for developing performance applications. Harnessing the power of was so a little bit about me well, I'm a scientific minded software engineer with a background in scientific simulations. Currently I develop autonomous trading systems in Russ and Python. During my career I have been implementing numerical applications in C. Haskell, Fortran, Python and Russ. In this talk, I don't intend to compare one programming language to another. Instead, I would like to give you my distilled experience. Specifically, what I have learned about numerical implementations in was. Now, numerical implementations are usually associated with applications that require a lot of speed. So in fields like science, engineering, finance, math, we need to perform a large number of calculations and simulations. We want to understand how proteins work. We want to know as soon as possible whether the seasonal rain discharge is going to put us underwater. We want to know whether to buy or sell. Now, we don't want to buy or sell in a couple of months. So we of course have several constraints over the calculation that we can perform because we have limited human and computing resources. Our goal is then to run as fast as possible, squeezing all the performance that we can from our hardware. Now wires for numerical applications as I mentioned before, we are looking for running as fast as we can, and Rust is blatantly fast. Together with its ownership model, it makes a superb language for numerical applications. But what I think is Rust's killer feature is the tooling. So having cargo taking care of compiling and linking is just amazing. If you ever develop numerical applications with other programming languages like Fortune or C Plus plus, you probably have to deal with cmake. And if you compare Cmake and cargo, well, you realize that cargo is light years ahead. So I really recommend you. Cargo for the medical applications was in general all the tooling system. Another great reason to use was is increasing ecosystem, having a strong open source community providing the right tool for the job. There is a way to build better software. Now we all know that, well, real work is complex and writing numerical applications to deal with the real world is difficult. So I would like to share with you my philosophical mindset to deal with this complexity. I firmly believe that intelligence is just a robust methodology to recursively improve upon my stupidity. I think that it doesn't matter how much we prepare for a given project, whether we know what are the right algorithms, and we carefully design the architecture. There are always going to be issues that we are going to discover Apostri. So we can be either eternally frustrated about it or we can try to recursively improve of our mistake. So I think that have a very open minded it's going to help us to keep the peace of mind and build great software. So when this prelude let us dive into numerical applications, so the general algorithm could be like follow. So we start from a correct but probably a slow version of a numerical workflow. This initial version could be handed to us by the research team who's developing it in Python. Then we implement an MVP in Russ. We decide whether this MVP is fast enough for our use case. Then if it's not, we would like to optimize. But before optimizing we would like to profile and benchmark, and I would like to highlight those two things and I'm going to discuss it in the next slides. So what is so important to do? So let's start with the MVP. So I like to see the development of the MVP as going through baby steps. So first we would like to use Clippy. So if you don't know what clippy is, it's a linter for us and it's using to suggest you to change the code using more performance patterns. If we don't have the best code. So it will basically say, okay, please replace this function by another one or change the code in a specific way. So I totally recommend you to use clipping then do not fight the border checker. We humans are terrible at knowing which part of the code is taking most of the time. So before you start drawing reference left and right, or trying to remove clones from the code, just focus on writing correct and easy to read code. Then we can work into optimizing the code. Finally, I totally recommend you to use a battle test library, especially if it's for performance critical operations. I would like to come back to the third party library subject in a bit now. So we have the MVP and it's time to decide whether it's fast enough. Well, it is up to your applications to know whether it's fast enough or not. So maybe it is or it is not, but you can throw hardware to it. Great, maybe it's okay, but you would like to squeeze the most out of it. So great. So you would like to do some optimizations, but before doing some optimizations, we need to profile, profile, then benchmark, then optimize. So what we would like to profile. So as we mentioned before, we humans are terrible at the using game of what in the code are we spending most of the time. So unless that you work professionally developing compilers, I just advise you to measure. So for measuring we can use tools like perf. So perf will help us to figure out where in the code we spend most of the time. Perf is a performance analysis tool for Linux that allows to collect and analyze various performance related data, as we will see next, we can use perf output using, we can visualize perf output using flame graphs. So let us see, what are flame graphs? So flame graphs are used to visualize where time is spent in our program. How do they work? Well, every time that our program is interacted by the os, the location of the function and its parents is recorded. This is called stack sampling. These samples are then processed in such a way that common functions are added together. The metrics are then collected in a graph showing the color stacks where each level of the stack is represented by horizontal bar. So the width of the bar correspond to the amount of time spent in that function. And the bars are arranged vertically to create a flame light shape. So what it means is like the main of our program is going to be in the bottom, so the libraries that we call are going to be in the middle and our final functions are going to be on top. So it's very important to realize, like the x axis doesn't indicate time. What we really care is about the width of the bars. So the width of the bars indicates where most of the time is spent by the cpu doing that particular call. So perf flame graphs are going to give us an indication of what are the functions, what are the libraries that are consuming most of the time. Now, once the profiling is done, we would like to do some benchmarking. And the reason for doing the benchmarking is like, okay, we need to know what is with confidence, to know with competence what is going to be the real impact of changing the code. So we don't want to do random changes and hope for the best. We would like to be able to measure what is the impact of changing certain part of the code. And the perfect tool for that is criterion. So criterion is a benchmarking library that provides a statistical, that provides statistical confidence on the size of the performance improvements or regressions. So what it does, it allows us to see if we change something, how much it affects the general performance of the code. So first I would like to show you how it works, and then I would like to show you some metrics of the output. So in this toy example, I'm going to use criterion, so we can use here criterion cmake. Then I'm going to use a library called mullah. It implements a function called convolve that is using to perform one dimensional convolution. If you don't know what is one dimensional convolution, do not worry about it. So the idea is like we are going to ask criterion to run this one deconvolution using arrays of 100 length. So we are going to generate these arrays. The elements of these arrays are random numbers taken from a gaussian distribution. Criterion is going to run this function many times. First it's going to warm up, and then it's going to run this function many times. And then we set out a baseline to know if we may make some changes. What is the actual performance using the baseline? So now that we have an idea how to use criterion, let us talk a little bit about optimizations. So I think that the three most important applications that we can perform in a code is first you make sure that you pick the right algorithm. Once you know that, then you make sure that you pick the right algorithm. And finally you ask someone else to check for you if you have picked the right algorithm. So what it means is like, well, you need to do your math homework. If you don't feel confident about your math, you need to chase your favorite mathematician to give you some help about it. Once you are pretty sure you have chosen the right algorithm, then you can write other applications. Like let us try to preallocate the vector. So there are functions like from back, like preallocate a vector with certain capacity. Or you can use a non cryptographic hash algorithm for the hash map. So hashmap the default behavior. It use a hashing algorithm that is for cryptographic applications. What it means is, takes some time to compute the hashes of the keys, because numerical applications are usually far away from user inputs. We want to run the hashing algorithm as fast as we can. So I totally recommend you to use a non cryptographic algorithm. So I will point to one of those algorithms, not the algorithms, but the actual library in a bit. If you want to know more, I recommend you to have a look at the perf book of Ras. Now let's go back to benchmarking plus optimizations. So we are trying to optimize our combo 1D function. So what we do first is that we run the criterion without changing any code. So what it's going to do is going to try this function many times. It's going to warm up and then it's going to set a baseline. Then we do some code, some modification in the code, and then we run it again. We see that this new modification generates a meager 2.5% of improvement. Then we say, okay, this seems to be going okay. I would say then let us try something different. Then you try get another change in the code. But unfortunately there are no optimizations gained. I would like to tell you something that is really important about benchmarking. So, benchmarking assumes that you are running in an isolated machine. So what it means is like if you try to run a benchmarking, your local machine where you are developing, and then you just have browser Spotify, a lot of application running. Then if you just stop Spotify, for instance, you will see an increase in performance of 15%. So actually you need to run in a machine that is not doing anything else. Otherwise you're going to have spurious performance gain. So when you are benchmarking, just have another machine, a VM, another local machine, somewhere that you can access and only run the calculations that you are trying to benchmark. Okay, so now we know about benchmarking, we know how to optimize, we know what we are targeting. So it has arrived the moment to test. I think that there are three complementary strategies that we can use for testing. So let us review them. So first, you can ask your large language model if every large language model to generate unit tests for you. I'm not going to discuss whether AI is going to overtake the word. I'm just simply saying that large language models are fantastic for unit testing. So use them for generating tests for you. The other thing that you can do is just look for edge cases. By edge cases I mean you can try to look for inputs for your model, for model, or for your workflow that can generate numerical instabilities. Other thing that you can try, it is a property testing framework, something like protest. I think protest deserves its own explanation. So let me give you a little example of how it works. So, protest is a property testing framework. It's inspired by hypothesis from Python. What it does, it allows you to test that certain properties hold for your code, for arbitrary input. And if something fails, it will try to create a minimal test case that is going to tell you in what specific input your property is not hold and why it fails. So in this example, I have two functions that we are trying to find. The maximum of a slice of loads and the minimum of a slice of loads. Then using protest, we are going to set up a test that is going to generate random vectors when dimension from zero to 1000 using floating points, and then using the protest macro, we are going to check that the property holds. So what is the property? Well, we want to know whether the mean value is smaller than the rest of the slice, and we want to know whether the max value is larger than the rest of the values in the slice. So prop test is basically going to help you to set up your tests in such a way that you can generate a bunch of random inputs and test it. The property of your interest holds. Great. So now we have covered testing and we have covered what is the general workflow for developing a numerical and algorithm. So to implement then, now let us talk about some other aspects that we need to know about numerical applications. So now one important thing is flaunting points. So in computing a round of error or random error is the difference between the result provided by a given algorithm. So if we take one algorithm and we use arithmetic, and the difference between that and using the same algorithm with the same input, using finite precision rounded arithmetic. So as we know, floating points, they cannot represent all real numbers accurately, so they are always running errors. So whether you are implementing something from scratch or taking an offshore library, you need to realize that there could be potential issues when plotting points and numbers. So there is a subfield of mathematics called numerical analysis that deals when designing methods that get approximate but accurate numerical solutions. So if you are into numerical applications, it is really important that you have a look into what numerical analysis says about the algorithms that you are going to use for your applications. Another important thing to mention is like if you are doing operation in finance, there is a library called rust decimal that is going to help you to perform calculations for finance without running errors. Now, once we have covered that, let's move to the third party libraries. Usually when we need to decide whether we want to implement sorting or we want to bring a third party library, we need to ask us several questions. Well, how important is this algorithm that we are trying to implement or to bring from another library to us? How confident are we about using, able to implement this? Are we willing to maintain this library? Or if there is something else outside, well, around in the open source community that already implement this algorithm, it is in a good state. These are the questions that we need to ask ourselves before deciding. I think that the general rule of thumbs is as follow. I think that for other than trivial algorithms, other than three lines algorithm, we should use a third party library, even though it's writing in a non so shiny language like c or c plus plus. And the reason for that is as follows. So if you want to know how an algorithm works, I think that the best way to do it is just try to implement it yourself. And this is a fantastic way of learning about how algorithms work, and I do it all the time. But there is a big difference between trying to learn something and trying to come out with a faster, better, more performant and more robust general application. That is usually what we found in open source code. So in open source libraries, what's going to happen is there have been many cycles where different approaches have been tried. It have gone through many errors, it have gone through a lot of applications. And I think that it would be really naive to try to think that for really like a really popular algorithms, we just go there and we come up with something much better. Well, if you are implementing an algorithm from scratch and you are a master in that part of mathematics, and you are the only one who knows that, yes, by all means, just go ahead and bring your algorithm to the community. But if we are talking about really well known algorithms, I think it is much better to join forces with another community, even though it's not the rust community, and try to maintain the most efficient algorithm that is available. Finally, I would like to speak about two things. One is some applications that are already available for was, and other is like the interface between was and python. First, let us talk about some libraries that are using every day, and that I think they are fantastic for numerical applications. So the first of them is not a single library, but a family of libraries that are called was ndrrate. So they are used for array manipulation, statistics, linear algebra, for array. They are fantastic well maintained library. Another well known library is Ryon. That is basically if you have sequential calculation and you want to run in parallel, you can use Ryon to help you to do that. Polars is a light infos data frame library. Rascash provides you with a non cryptographic hash algorithm, what I mentioned before. So like for caching of all numerical application, we are going to use a lot of hash maps, but we don't want a cryptographic hash algorithm because it's really slow. And also there are other two libraries approach. So grass and the array offers a feature when approached. So to help you to compare floating points. So when you are doing calculations using arrays, it's really important to be able to know whether two arrays are the same or not. Finally, there is this other library called Order float that is going to help you to compare totally order floats. With that I think that I have covered some of the perks that I consider in Amus Health for the medical applications. Finally, I would like to have a word about Russ and Python. So, engineers, scientists, quants, many professionals from the medical fields are proficient in Python, and they are not super proficient in rust. So what happened is, like, most of the numerical algorithms are going to be implemented in Python. And, well, as we probably know, Python is not the most efficient language there. So for me, it is natural. Like, we would like to bring grass to the Python community. So we would like to integrate grass with Python. So if you don't know, there is a library called matrix that is the perfect tool for this situation. Materin is just a was tool that facilitates the creation of python models that wrap ras code. So they make really easy to call your was code from inside Python without the need to do some integration using, say, python. And I think that by exposing our APIs, our libraries to the Python community, we're just not only giving the Python users well, we are going to accelerate their calculations, but also we are going to expose them to how was works. So I think this is a win, a win situation for us and python communities. And with that, I would like to thank you for your time. This is what I had to say today. If you have any questions, please, you can also always drop me a message reach, or you can send me a message through LinkedIn. Thank you very much for your time.
...

Felipe Zapata

Software Engineer @ Altas Technologies

Felipe Zapata's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways