Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everybody, and welcome to this talk. We are going to be
spending some time to think, but how to really improve our
rust code? The talk's title is get maximum benefit from zero
cost abstractions. I'm really going to try to think about
the ways to get the most from rust while
keeping your runtime performance extremely high. One thing
to note, though, is that clone of the talk is actually completely new.
There are some three guide places to look for
some really useful resources after this. One is the API guidelines,
another is the index for Clippy, and then
is a repository idiomatic rust that you can search for.
The API guidelines provide probably hundreds,
at least dozens of really useful and practical
tips for being able to use
trust effectively.
Clippy provides a very
definitely provides hundreds and hundreds of programmatic checks
for your code,
along with explanations in all of them.
So, for example, in this case, we're checking to see whether or not
we are comparing
for known constants that are already defined inside the standard
library. By following Clippy's
recommendations, we actually get much more
closer to our intent with and we
remove bugs from our code. Okay,
so now I've got acknowledgments out of the way. I would also like to define
zero cost magic doesn't actually exist.
We can't get something for nothing.
So zero cost, in the sense of zero cost abstractions
actually means zero additional cost. You couldn't
have written something better yourself.
Possibly a more technically precise way to express this would
be to say zero marginal cost.
Lastly, I want to also point out that zero runtime zero
cost also relates to runtime performance, and therefore we can trade
off compile time execution
if it will provide us with faster runtime
code. That is, builds can take longer if
the program will run faster.
Okay, let's start off with a couple of very small quick
tips. Some really useful
literals exist in the trust language. One of
them I quite like is the byte literal
for being able to encode ASCII as an
integer as a u eight integer,
rather than as the can or char
type, which takes four bytes. If we prefix
a character literal with a b, we get a
u eight value. Capital a is
the number 65 as u eight value.
There is a similar way for
us to be able to keep two
inject unicode literals, sorry,
unicode. Inside our source code. With the
slash u, we can add in any code point rather than
encoding the actual source literal ourselves.
Okay, so now let's touch on idiomatic trust. We had a look at clippy
before, but what we really want to do
is enable the rest of our team to follow
along with our code. And we do that by maintaining
the conventions of the ecosystem.
Programming is a team sport,
and writing software is easier than reading software.
So by that I mean it's harder to
follow along someone else's mental model or someone else's
thinking when you're just reading the source code.
In trust, I would say that getters and setters are kind of generally not
useful. We have traits for interfaces,
we don't use inheritance, and so therefore the
same benefits, if there were any. I'm not confident that there
were the same benefits of Java style getters and setters
are sort of not available in trust.
Like all rules, or most rules, at least there's an exception. And that
is if you have, let's say a wrapper type that provides
access to one logical thing, a get,
or might be might be useful. So for example,
from the standard library we have these non zero types which provide
access. Two, the raw value and cell
also does a similar thing. They have a get method which
returns the inner type.
Are there other conventions? I've touched on the API guidelines
right at the start,
when you are dealing with conversions of your types, you want to
use the right method. By that I mean either as
or to or into, depending on how you are performing the conversion,
as well as sticking with the conventions
of the ecosystem relating to generating iterators
over some collection.
You should also eagerly implement rates
if your type is able to be
compared for equality with other types. Implement partial
equality. Implement clone where you can, or partial,
or where it's appropriate, because your consumers or
the people that are importing your code cannot
implement those traits themselves. For your type,
it's impossible for them to take a foreign trait
like has and a foreign type. Let's say something
that you've created and implement hash. For that foreign type.
To do so, they would need to create a new type around it and it's
just kind of annoying. So where you can, you should
implement as much as possible. Want to
talk a little bit about actually creating some other practices
which will lend themselves to quality software?
There are linters inside the ecosystem. We've talked
about Clippy before, and to invoke it we use cargo.
Clippy rust format is available via cargo
format inside your crate.
For extra points, you can
make things harder for yourself by adding a pre commit hook.
This will actually you could ask git to
run cargo format on your behalf and fail if their formatting
isn't correct. In fact, you can ask cargo format
to update the code itself potentially,
which might, may or may not be whats you want.
So to set up the hook,
you first create a file called pre commit
or pre commit inside the hooks directory
of your hidden git directory,
make it executable, and then git will invoke it every time you run
git commit. Here's an
example. We need the
hashmap syntax at the start because we didn't provide a file
extension. I like to add a comment saying what this file actually
does, and I've kind of got a cheat available,
which is that if I want to skip any of these checks, I can
just invoke git commit with cheat equals
one, and then the pre
commit will pass. Otherwise I ensure that the formatting is correct.
I run the rust compiler under its check mode,
which is a fast version of the compiler, and then I run Clippy,
which is a more lengthier thing to check.
At first I'm checking formatting, then I'm checking that the code compiles,
and then I'm checking that the code is ergonomic.
If those all pass, I'm allowed to commit my code.
Something else relate, which is a little bit more complex, is when
you define a function and you are taking a
reference to some type, try to remember to
take a reference to the borrowed version of the type rather
than borrowing an own version. So for example,
instead of borrowing a string, we borrow a
string, or we take a string slice as an argument. It'll be easier for your
callers. It turns out, for technical reasons relating to Rust's
dref trait, that if you accept a string slides,
you'll actually enable your callers to call your function with
a reference to a string, because a reference to a string
actually dereferences to an STR
with a lowercase s. So what
we want to do is change this
string, and then all we should
need to be able two do here is just change string into stir,
and then we can now suddenly
call is all caps
with oh, that's not correct,
and we're done. It's not always going to be this
simple. Some other code will
be slightly more complex. The same thing is,
just to be very clear, we want to avoid things like
taking a reference two a box of,
let's say t.
What we shouldn't prefer to say is just take the reference
to a slice of t.
The second form here is going to be easier on your callers and will still
enable people who have boxed the type to
be given access to it. Okay, sorry if that's confused.
Some people let's reduce the cost of
monomorphization. That sounds a bit crazy.
Every specialized function that your code creates
or your compiler creates whenever you use generics,
and instantiate whenever you use generics,
the compiler will generate a specialized version of a
function for all of its input types. This takes up
space. We can actually avoid some of that space by
being a bit sneaky.
Let me show what I mean.
I've got a function here is all caps, which is actually doing exactly
the same thing AWS before, but now we
also want to accept a cow
for whatever reason, just because it's
the type in the standard library. That is the most
fun to say.
Now I've got here a crazy looking generic
type which says that I'll accept a reference to any
t, whether t de references
to str to a string lowercase
s. This will create a new version
of is all caps for all of its input types.
What I'm going to do instead is duplicate
the function two start with and
then change it to change
the one that I want to call to
not being a generic.
And then I'm going to give it an underscore to indicate thats it's private.
And then in the public method, although public
because I'm not using the pub keyword, although I can I
just call is all caps with
the prefix underscore. So now what's triplicated?
Or if I had say a cow and then a string and then a
string slice. Only this smaller fraction
of the code itself is actually triplicated. But there's only
one version of the calling of the function that
ends up being the body of the work.
But there's a trick. We need to ask rust
not to inline the code,
otherwise things code will all get injected to all
of the specific versions of the
code that we want to create. So that's good. Okay, so now we have reduced
the cost of monomorphization, at least the space cost.
Let's talk a little bit about testing.
And I say testing generally because actually what I
really want to speak about is formal methods.
If you have people using your software, you really don't
want that software to break. And formal methods,
including formal verification,
are probably still in the difficult, like in the spectrum
of easy to difficult. They're probably still in the difficult slides, but they are coming
towards easy may
be too strong. They're kind of moving towards learnable.
Modeling involves spending time. So we kind of create a
specification for how we want our program to run.
And then the first step to kind of using modeling
is actually two just write a
document describing how you want the program to work.
This could be called readme driven development. You kind of
write the documentation before the code.
This gives you time to think it through and design your API
without actually having anyone dependent
on your code. There are many ways that
are more advanced than that, including formal verification, other languages,
but writing a document actually
will save you intense amount of debugging time later on. I make
that as a claim without evidence, so if you disagree with that,
please fire up in the comments. That's fine.
I would like to make this claim a form
of testing that's less common than unit tests, but it's still really valuable.
Is this thing called property testing in rust,
the prop test crate provides the ability
for you to test a range of inputs for
functions rather than just one test at a time or,
sorry, a given unit test. Instead of giving it one specific input
and checking its output, you can trust ask for random inputs within code
range. It's actually
very interesting and
quite revealing sometimes about it finds the edge
cases for you. And if you really want to find the
edge cases, then you fuzz your inputs.
A fuzzer is a program which generates random inputs
for your functions. Now this sounds really complicated
and difficult. It turns out though that
fuzzing libraries actually make it relatively easy for you
to just fuzz a specific function of your program.
And you kind of write these test handlers, kind of like these baby
programs thats call one function of your API and
then just kind of give it random input like things thats should
never ever appear in practice and see what breaks. And every time
it breaks your program gets stronger because you
fix the problem. You fix the problem,
right? Okay,
so lastly, maybe not. Lastly, there's actually plenty of slides to go.
The new type pattern is very very handy and is very confusing or alien
to people who come to rust. I've got this problem
and that is we've got two thermometers
reading the temperature and they actually are reading
the same temperature. This is 20 degrees celsius. And turns out
that 20 degrees celsius is also an integer in Fahrenheit,
which is quite fun to know. So what
we really want when we calculate our average temperature is 20 degrees
celsius or 68 degrees
fahrenheit, depending on if
you are one of the one countries I believe that uses Fahrenheit in
the world anyway,
but instead we get 44. Now this is not a problem with
the type system. These are all floating point values. Trust compiler
dozens not care really that you have
made like a logical error. What we do instead is we
wrap f 32 or floating point value in
our unit. Now,
when we go to compare them or add
them together and then divide, the program will fail
to compile. Now, you may be wondering,
is failing to compile really preferable to crashing
or no, is failing to compile really preferable?
And the answer is yes, actually, you don't want things to just silently
work that are broken. Like it's better, two, not have the thing
start than to things for something to fall over half
the way through and even worse, fall over without you being ever
told because you get a valid input, a valid output
when you try to create the average temperature.
One subtle way to improve this even further, even further
is two, say that booleans and options should really
be returned from functions.
Now that's because if you assign this boolean from
this call to is alive or it's a method, I suppose, and you would get
a bool out, it's like true. Further later on in
your code, you now need to wonder,
why do I have a boolean in my code
ten lines later? It kind of becomes a little bit confusing.
You need like a variable that's well named, and if it's not,
then you've just kind of got a crazy variable that you
like. True or false? What does that actually mean? It's hard to know.
It might be you're comparing a quality with something.
So instead we can change
from an is underscore method. So that's a method
that returns a boolean. Instead we
return just our own enum.
And this way it doesn't matter where our variable appears. It's always
going to refer to. It's always going to
be sort of self documentating that it's a
value that encodes the state of
whether or not something is valid or alive or not in itself.
Oh, this is actually a lie. It doesn't have self in there.
Nearly version of this slide actually included type
parameter in lifecycle.
But then I thought that people might get picky and complain that I
also needed to implement clone and so forth, so I should have
fixed that up.
I like market traits. So this is a really
nice part of the type system and
really focuses on this idea of a zero cost abstraction.
And that's partially because I think traits are kind of one of the centerpieces
of rust.
Within the standard library, you get a small set of marker traits like
copy, send, sync, sync, copy,
send, size, synced and unpin that live
within the standard marker module.
And it's hard to read there, but actually at the bottom it's saying
that size is so common that you
actually need to opt out of it in
your own types. So the compiler will always derive size
for you unless you ask it not to explicitly.
The idea behind these types is that they are providing information
to the type system that takes up no internal
representation in the final binary or final executable itself.
So they take up zero space, and they are therefore
zero cost in our definition. From the start of the page
start of the talk, we can get further with zero
cost abstractions. It turns out that using
option wisely, if you are
dealing with references, is super,
super handy.
Stupid thing to say.
If you wrap a reference type that is a
ampersand t or a box t in
option, it takes up no more space in memory.
That's because rust guarantees that
option, sorry, that references which are pointers
are never null. So there's a free bit pattern available that can be
used for pattern matching.
We can make our rust code easier
to use by avoiding a couple of gotchas.
We want to avoid magical typecasting with
draf. So if you remember a couple of slides ago, we talked
about the dereference trait and that we
kind of were able to accept multiple types. So this was a
reference to a string versus a string slides.
And it turns out that you can base this to
create something called draf polymorphism. It's a
known antipattern. Now this is a more
difficult code example,
but let's see if I can show you what the problem is.
Whats I might want to do if I was to come from an object oriented
programming language is to kind of use draf to recreate
inheritance. I've got here a base class with
a greet method,
and then
I have a person class that has inside of it a member
base. And the
Draf implementation for person involves
returning self dot base. So when
the person is dereferenced, it returns this base
class and the base class has an object.
Now if I have a main method,
I can use some sort of one of the magical things
about rust thats is this dot operator.
Implicitly, dereferences on demand will enable
me to call a greet method from Greeter just
by because person has implemented draf,
we do not want to abuse this. It's like a very sharp
edge feature. It will cause problems
and it's going to be very very confusing.
We have two ways out of this.
We can either use a trait. So trait greet
will enable us to say that the person can
implement greet and actually rely on the default
method itself. We don't even need any extra code. And now
we can, we now we can call greet from person. In fact, we've simplified
a bunch of other stuff because we don't need multiple types
and so forth. It alternatively,
if we really want our base class,
if we really wanted to kind of create something which has kind of templated,
we need to write the facade ourselves. So we just
create a greet method and
then we're given access. Two,
the base greet, you see, thats the
person's greet implementation is just dispatched
to the internal class. And now if I have
main greeter
greet does the right thing. It's slightly less
ergonomic than breaking the rules,
but please don't break the rules. It really will cause problems.
I promise to improve
your rust code. You should make it impossible to create
like partially constructed types.
It's really tempting to add like an is valid method or
validate step, but actually doing so will
just create mistakes because people are
lazy and will forget people
are humans. We can actually encode the validation logic
inside our constructors and return result rather
than t.
This is a form of programming called making illegal states
unrepresentable. But in this case,
I have sort of a building and a height. Now the building's
height must be nonzero,
and what I want to avoid is validation.
So in order to get around this,
I have some cheater code.
In the new method. I actually
I'll just copy and paste and I'll go
back a slide.
My new method becomes a we
return result, and if height is nothing,
we return zero height. So I needed an error type in there also,
and otherwise I return. Okay,
this makes it impossible for there ever to be a valid
building object around that has the
illegal type. Now there is a little bit of extra bureaucracy
around because we need kind of a result type and we need an
error type.
By the way, if you have implemented debug,
then you have implicitly implemented error as well, because there's an automatic,
but that is,
and this isn't spelled correctly,
but this will provide us with significantly stronger,
more robust software.
I got a couple of other pieces of advice before we wrap up.
You've probably heard of generics, which is static dispatch,
and trait objects, which dynamic dispatch.
But I'm here to tell you today that there's a third way called
enum dispatch. We create can enum type, which encodes,
which kind of encapsulates all the possible states
or all the morphisms, all the types that
our thing could be, and then we match on it inside
the calling code. The downside is that it becomes
slightly unwieldy to use inside the functions
that make use of this kind of supertype.
You need to match on every single instance of it. Now to make it smoother,
there is a crate called enum dispatch which takes
a lot of that pain away. It's much, much faster. And some of the
benchmarks that this crate provides we're talking about
sort of ten times performance gains even inside
the static dispatch case, which seems kind of crazy.
But I encourage you two take a look
and see if your trait has more
than two or three methods. There's probably
an opportunity for you to refine your design.
There are very few traits inside the rust standard library that
have more than one method that you need to implement.
When they do have multiple traits. Sorry,
multiple methods inside the one trait. A lot of them are
provided via default implementations.
If you have a trait thats is very narrow but
deep, you'll find that it's much easier for your callers to
make use of, versus an API that is kind
of broad but shallow is more specific. It's not
general enough, and you'll find that it's really difficult for
people that are using your trait to make use of.
That's everything that I have for you today. I really hope that you have enjoyed
the talk. Hopefully I've said a few
things that you've agreed with, a few things that are new, and possibly
even a few things that you disagree with. You are very welcome to
say hi in the comments and let's see if we can start a discussion.
Hit me up on Twitter and let's see where we can go.
Take care.