Transcript
This transcript was autogenerated. To make changes, submit a PR.
Jamaica makes on these
real time feedback into the behavior of your distributed systems
and observing changes exceptions errors
in real time allows you to not only experiment with confidence,
but respond instantly to get things working again.
Close today
I wanted to walk you through how I built
interactive commandline tutorials using Webassembly. So the
application I want to focus on today is sandbox bio,
and this is an application that features interactive
commandline tutorials. It's mostly aimed at bioinformaticians,
but it also has tutorials for general command
line usage. So here I have an octututorial. On the left you
have the instructions, and on the right you have this playground where you
can start writing commandline and executing
them right away. So here I'm taking the first few lines of a file.
I can also make more complex commands, like taking the output
of awk that prints the third column and piping
it into the head command. And what's
interesting about this is that not only is it running
the real know, this is not a simulation,
it's running AUC in the browser. There are no servers
that do any of this computation. How, you ask?
Well, that's where Webassembly comes in.
And so let me start by telling you a bit more about webassembly
itself. To me, Webassembly is just another language
that you can use in the browser. We can use HTML, CSS,
Javascript. Now we can also use Webassembly.
The key difference though, is that Webassembly looks a
little strange. So here's a very simple piece of
code in Webassembly that defines a function,
and this function returns a string that has.com
42 in it. That's all these does. This looks pretty
complicated, but the thing about Webassembly is that you don't write
this code directly. It's a compilation target,
meaning that you write code in another language, or you
take existing code in another language like c,
and then you compile it to webassembly so that you can run it
in the browser. So that's why people talk about Webassembly
as being a compilation target. The best support that
we have today is c and c in rust,
but there's other languages that you can absolutely compile to
Webassembly. So why?
The reason webassembly has been really powerful so far is three
things. Number one, reusing code. All these are
examples of tools that were on a desktop
or on the command line that have been ported to the web without
having to start from scratch. Number two,
performance. In some cases you can replace
slow, heavy Javascript computation with
faster, more optimized webassembly, and you can get speed
ups. And number three, there's this idea that you can really
run webassembly wherever a runtime for it exists.
So there's webassembly runtime in the browser, but there's also
webassembly runtime outside the browser, right? If you do edge computing
like Cloudflare or fastly, if you use node or Dino, you can run it
there was. Well, or you can run it on small devices. Now, how do
you concretely get started?
How do you compile things to webassembly practically,
and if you're compiling c and C plus plus
tools, I would say by far the best choice is
unscripted. It's a fantastic toolkit. It helps simplify
this compilation and offers a whole bunch of utilities
that I will mention in a bit. All right, so let's take
a look at a concrete example. We have
this commandline utility called CPK. This is
a tool commonly used in bioinformatics, and what you should
note is, number one, it's a useful tool, number two,
it's written in c, and number three, I want to run it in the browser,
how do I do that? And so if we put webassembly aside
for a second, how do you compile this tool
in order to run it on your own computer outside the browser?
Well, you would use a C compiler like GCC. And so
here you tell the compiler I want to output a binary file called Ctk,
and I have a whole bunch of flags. If you want to do the
same thing, but compile it to webassembly, what you can
do is use Mscripton's eMCc.
So this stands for mscripton C compiler. It's basically a wrapper
around Gcc that makes these compilation to webassembly easier.
So it looks fairly similar. Instead of outputting a
binary file, you output ctk js.
Note here is that these actually asks MScripton
to output both a wasm and a JS file.
So you may be wondering, what do I need the js file? I thought
this was webassembly, and it is. But one
thing that mscripton does is give you this Javascript
file if you want it, that helps you initialize the module,
helps you deal with calling various functions,
has a bunch of utilities around file systems that I'll
mention in a second. So that could be really powerful to avoid having to
rewrite all that yourself. And so you can see the other flags are
fairly similar, except when we get to Lz.
So this means I want the zealib library. And so
instead of using that you tell these script and use zlilib equals
one. Because the alternative is you would have to bring in
the zlib code and have it also be compiled to
webassembly. And you don't have to figure that, but you
can just tell inscription yeah, I want zlib, and mscripton
does that for a whole bunch of other libraries that are commonly
used. Zlib is very commonly used for compression,
but if you use Png files, you can use png.
If you do a lot of graphics or games, you can ask
emscript to load sdl the same way.
And the last thing I'll mention is this force file system.
You technically don't need to tell these script in that it will figure it
out. But I just want to make it explicit here that most
command line tools expect there to be a file system like
they operate on files, these output files. And so to
make it possible to use that tool as
is in the browser, mscripton creates a virtual file
system in the browser in memory. It doesn't affect
your real files, it's just a mock file system,
but it helps you do things like you could ask the user
to give you a local file and then you can
mount that file on the virtual file system,
giving it a path that you can then give to your command line
utility. And so then it can work the same way normally does.
And so this is another thing that you get out of
outputting this js file. Okay, so how
do I actually call ctk then? Well, if I'm on
the command line, I just call ctk like this and give it the
parameter within scripton you would do
module call main and this is Javascript code, right? And then you
give it an array of parameters that you want to give the
webassembly module. And then behind the scenes mscripton will
figure out how to convert this to something that the
webassembly module will understand. Because keep in mind,
webassembly only understands numbers, right? So you can't
pass in strings, you have to do this transformation. So this
was using Gcc, but mscripton
has a whole bunch of wrappers for other build tools. If you users g,
you can use em. If you're making a library emar,
same thing for make cmake and configure,
you can use these wrappers from mscriptin to do
the compilation. Now one thing to keep in mind is that I
just showed you a pretty simple example.
It can get pretty complex to compile something to
webassembly. Some things use threads.
Encryption has some tools to make that easier to use web workers for that.
Some tools use SIMD. Now that's not
entirely supported. Webassembly. Net currently supports
SIMD 128 bits, but if you're using something
different, it might not work. If you have assembly
code, actual assembly code in your c program,
you absolutely cannot compile that, right? And so in those
cases, if that code is only there for optimizations, there's usually
flags that you can use to disable that to get around it.
And these are other things like this that make it a little harder to
compile it. Or if you have sockets, that's really tricky. You have to work
around that. Anyway,
if you're curious about learning more about how to compile things
to webassembly for use in these browser, I wrote a book
a few years ago focusing on that called level up with webassembly,
and you can check it out@levelupwasm.com okay,
so now back to sandbox bio. We have these
tools that I want to be able to run in the browser, like awk,
grepJq, and a whole bunch of core utils like ls and head and
tail. These are all written in c and c plus plus,
so I can use the process I talked about earlier
and compile these tools from c to webassembly.
And now I'm able to run these
tools in the browser. So just to put it into
the context of the application, where do I
actually execute these webassembly modules?
So the first thing is we're going to use exterm js,
which is a library that helps you simulate the look and feel
of a command line. But of course this library will only make
it look like a terminal. You still have to interpret the commands and
do something with it. And so what I do is essentially
parse the user's input into an abstract syntax tree.
So this lets me get a clear view of what are these
programs that are running, what are the parameters that we give the program.
And we need to be able to handle computations such as
piping, right where the output of command is the input of another process.
Substitution is also common on the command line. Things like variables,
you need to be able to handle that. And so you need to parse that
ahead of time, have it in data structure that you can
then go through one step at a time. And for example,
were you say, okay, first I start with Auc I'm going
to run call the main function from AuC wasm,
I'm going to give it these parameters and then I
want the output of this to be the input of
the head wasm module that I will call.
So that's kind of how this webassembly fits
in to the application. And then
in the background I have a process that
stores the file system state in indexdb.
So this is because I want
users to be able to make modifications to these files
on abscripton's virtual file system, but still be able
to see them when they refresh the page. So if I modify this orders
Tsv file for example, I want that to be maintained
across sessions. So why
use webassembly for this use case? What are the alternatives?
Well, so here's what it looks like with Webassembly.
You have a browser, you have a server. All the
server needs to do really is give static assets to the browser.
This is the Javascript for the app logic and the
utility code that we get from mscript, and it also has
the wasm binaries. So then once these
are in the browser, anytime you need to run a command,
you just need to execute it in the browser. You don't have to reach out
to the server at all. And also like I mentioned,
we keep track of the file system state in the browser
itself. And so here's what it would look like
without Webassembly. If we can't run
things in the browser, then we have to run them in the server. And so
the server would provide the browser with some application logic.
And now every time you want to run a command, let's say it's an op
command, you have to go to the server. The server has to be managing,
spinning up and down some sort of workers that
can execute arbitrary user commandline on demand,
give the answer back to the browser.
But now this is a lot more complicated if you want to maintain
file system state, and in a way you have to,
because in the browser the state is
at least maintained until refresh, even if you don't have these
system. But on a server you would need
another way to track which users is making which request
and on which files and what is the state of each one of those files.
So the advantages of using webassembly
is first of all it's a lot cheaper.
In the Webassembly case all I'm doing is serving static assets.
These is very cheap to do. I can put that behind a CDN and I'm
done on the server. Side, I would have to be managing
a lot of computer resources and a lot of storage resources,
and so that would get quite expensive. And because of that
it's a lot easier to scale this webassembly.
Side, I can easily support millions of users,
whereas without webassembly this would be trickier.
The other advantage is that it's more secure
to execute arbitrary commands within the sandbox
of the browser and webassembly, whereas if you
want to do the same thing on your servers, you have to absolutely make
sure that users are not escaping
the sandbox that you have. It's also more responsive
to use Webassembly because it doesn't need to reach out to
the server, wait for a worker to be ready, execute the request,
go back to the browser. That makes it a lot slower
and so we can make it more responsive with Webassembly and
it's a lot easier to maintain the state. With Webassembly,
I just store the state in each user's browser. It could
be temporary, that's fine, but on the
server I have to associate a file system to each
user because if you send me a command that modifies a certain
file, that file may be different depending on where
the user is. In these tutorial right
now, there are disadvantages. The first one
is that data size is limited
in the sense that the files that you users in the tutorials
can be too large. If they're too large and you're doing
too much computation, the browser just won't support it.
It's going to take too long. It's going to lag things down very
dramatically. And so the way around that is
the tutorials use a very small subset of
large data sets to illustrate the point
of using some of these tools. And that's okay, that's not that
big of a disadvantage. These are tutorials, after all. They're meant to show how
to use the tools, not necessarily to fully
analyze hundreds of gigabytes of data.
The biggest disadvantage, I would say, is that all the tools
that are featured in the tutorials have to be compilable to webassembly
somehow. And like I mentioned earlier, that can get
really tricky in some cases. It's just not practical to
do so. To me, that's the biggest disadvantage
for this website. Now, I've talked
a lot about how awesome webassembly can be. I think it's important to
keep in mind when it doesn't make sense to
use webassembly. I want to say three things. Number one,
too little or too much computation in the browser.
When you're facing that situation, it's probably not a
good use case for Webassembly. So concretely,
let me give you an example of too little computation is if you
use a language like rust,
for example, to write front end UI,
and then that gets compiled to Webassembly, to me that's
too little computation. It adds a lot of complexity,
first of all, but also adds a lot of overhead of webassembly.
And you're absolutely not going to get speed up
for this sort of simple UI. And so
for that, I would say probably not a good use case for
using Webassembly. The other example is too
much computation. If you're running some analysis that takes two dozen
cpus and 50 gigs of ram,
probably stay clear from using webassembly
for that purpose as well. I think really
the sweet spot for WebAssembly in the browser
is things like audio and video processing,
gaming, it's been users by games a lot, simulations and
subset of computations, playgrounds like sandbox Bio
and these sorts of things where you're not doing too little or too much computation,
but just enough that makes sense given what you're doing in
the browser and given the complexity that you're introducing into
your code by bringing in webassembly. So number two
is, you don't need to use webassembly yourself if someone
has already done the hard work of compiling the tool you're interested
in to webassembly. So make sure you leverage
these libraries like SQLJs or Pyadye if you
want to use SQLite or Python in the browser.
The idea being that now you're just using an off the
shelf JavaScript library. As far as you're
concerned, whether they use Webassembly or not is kind of irrelevant.
And that is a great place to be in because
it means that you don't have to deal with all the maintenance burden
and these compilation burden. And number three,
don't try to replace containers, right? When we talk about using
WebAssembly. So far I've mostly talked about webassembly in these
browser. You can also use Webassembly outside the browser.
And so here's a hypothetical example.
You have a whole bunch of containers that are used for your
Python web application. You have an Nginx postgres for the
database, and then you have the python side of things that uses
G, Unicorn and flask.
You're not going to compile every single one of
these containers into a webassembly
binary instead. First of all, that's going to be really complicated.
Dealing with things like especially postgres, sockets and such
is going to be nontrivial. But also when you compile Python to webassembly,
that adds a significant amount of overhead and typically you'll see
a lot of slowdown. And also the benefits just
aren't really there. And so to me, this blind
replacement of containers with webassembly does not make sense.
And I think most people in the field agree that webassembly
will not replace containers. It's just that in certain situations,
webassembly becomes another option. So to
me, where it really makes sense to use webassembly outside
the browser is first if
you want to safely run user provided code.
And so what this means is if you have an application and you want to
let users write code to extend the functionality,
using a sandbox like webassembly outside the browser makes
a ton of sense, and that's a really good use case. Another one
is edge computing. Edge computing is
the idea that you can spread your code all over these world,
and depending on where your users are,
they will execute the code on a data center that is closest to them.
And so there clearly speed matters if you're doing that.
And so one thing that's nice about webassembly is that
it is more lightweight than containers, and so it can initialize
a lot more quickly. So that's another use case where it kind of
makes sense. Finally, I wanted to share some
resources with you that I thought could be useful.
The first one I'll mention is sandboxbio itself.
It is primarily focused on bioinformatics,
but it also has interesting command line tutorials that would be
applicable to a general audience like AUC and JQ.
We also have playgrounds. So I often find myself
writing an AUC or said command where I want to write
something that I want to test really quickly without having
to type something. Pressing enter up, arrow key modify,
enter up modify. And so this playground
lets you do that very easily. So anything that you type in here gets immediately
executed in the browser and shows you the output
of the command. And this obviously also uses Webassembly,
another resource. So I have this
open source package called bioasm. This is a
library of mostly bioinformatics tools that are
compiled from C to Webassembly, but I think
it could be a useful resource if you're looking for other examples of
compiling complex applications to webassembly.
And finally, there's also my book levelupwasm.com
and I also have a whole bunch of free articles
and other talks that I've given that you might find interesting.
And with that, thank you very much for being here.