Transcript
This transcript was autogenerated. To make changes, submit a PR.
Jamaica real
time feedback into the behavior of your distributed systems
and observing changes exceptions errors in
real time allows you to not only experiment with confidence,
but respond instantly to get things working again.
Close today
I'm going to be talking about implementing a performance URL parser
from scratch. So before we start, I want to
give some a little bit of information about me I'm
yold Nizipli. I'm a software engineer. I'm a master's
student at Fordham University.
I'm a node core collaborator and performance team member.
Ive been mainly focusing on text encoders and decoders and the
performant in nodeshifts. I'm the creator of
URL state machine and fast query string.
So let's give a little bit of a background before
we start into what URL parses are and
how that goes. So a little bit of brief introduction to
JavaScript runtimes. So nodejs is a runtime on top
of v eight engine, the JavaScript engine behind chromium.
Just like Nodejs, Dino is also using v eight and
a little bit of new runtime bound GS using webkit
as a runtime.
Basically in JavaScript runtimes is that whenever
you call a function, the javascript under lines user
land runs a c plus plus code using a bridge through node js.
And the value that you retrieve
from the C Plus plus is deserialized on the JavaScript side and then present
to the user, which is the developer so computationally has a
function living in C plays plus. This means that you don't need to focus
any of your code in JavaScript, but you can write it in
c if you want to produce performant
code. So let's start with URL parsers.
Why URL parser matters? So URL parsers are
the most important functionality for a web server. Every single network request
uses URL parser, and it basically creates
the baseline of benchmarks with any
network request. So this is
an example of a diagram of a request coming to a server. So it's basically
on the left side there's this person, let's say Alice or Bob.
Bob submits a credential through user
agent which sends to the client. The client sends a
request, submits the credentials client id, redirect URL
to authorization server authorization, returns authorization code,
and then user agents like redirects authorization code and client request
token and so on and so forth. So basically every arrow from left
to right includes a URL,
and for every left to right request it
needs to parse and understand
what that URL is and what the path is. So that's where URL
parses come in. So URL parser is a very significant
component, but it's too complex and leads to bugs and that's why
nobody focuses and nobody wants to focus on
it. So this is one of the examples while I was
working on this project is that I thought
that I found a bug and it seems that like 256-25-6256
URL is invalid. And one of the libraries what
we URL recommended that actually it worked in Chrome and
safari, but not in Node Js. So this means that the initial
implementing of a URL parser was somehow wrong or not
up to date. So after
this feedback that I get, I opened an
issue like URL parsers does not relate to IP before with more than four parts.
This is one of the issues, URL is not compliant with specification.
This is another issue created by me and later they're all
fixed. But let's
talk about the goal of a URL parser. What is the goal?
Basically, if you look into this diagram, you will see on the
orange side that HTTPs userpassample.com
80 80. So basically HTTPs is the protocol.
Username password is an optional value with add characters
and then hostname is always there.
The port, if it's another special
port which is like 80 or four, four three. It's not required because URL parsers
understand that. And then we might need some path name
and search and hash. So if you look at from this diagram, you can
easily say that oh, okay, it's really easy to understand and implement URL parse,
right? You just need to get, if it starts with HTTPs,
if there's a user and path, what about hostnames? What about ports?
What about path name? We want to make sure that the search
is a valid one and so on and so forth. But it's not
as easy as writing a regex and it's much more complicated.
So what is the state machine behind it? There is actually a specification
for writing URL parsers and this is an
example of it. This represents all of the state machines
that needs to traverse
in order to complete the URL parser. So if you look at from
a much more closer perspective, there is a schema start
state scheme. Start state can go to no schema state. Let's say we don't have
HTTP HTTPs. If it starts,
then it's schema state. Then we need to parse that schema.
What is that valid one? What are the valid ones. If it's HTTPs
then we might omit the port part because
HTTPs is always four four, three. If it's HTTP then
it's 80 if it's SSH. So there are some special cases
and according to schema state it can go to special authority estate,
special relative authority state or relative state.
So special authority estate, which means that either I
have a username password kind of combination,
so I need to parse that state. If it's a special relative or authority
state, it can be a file path. In windows it's
something else, in Mac OS Unix systems it's something
else. And according to that, it can be a relative state. It can be a
relative state. So it's not as easy as writing regex or
userland. So this is one of the states that are available in the URL
parsing specification. So let's
dive into scheme start state. So if c
sees a character is an ASG alpha append c lowercase
to buffer and set the state to schema state. So this is pretty straightforward.
Otherwise, if state override is not given, set state
to no scheme state and decrease pointer by one, otherwise validationary
return further. So if you look at from this perspective, it's pretty straightforward,
right? But this is one of the, I think 20 states
that are available in this specification. So let's
look into a little bit more complex efforts. No scheme state.
What happens if I don't have a HTTP, HTTPs and so on and so
forth? If base is null or base has an OPAC path and c is not
u, which is like hashtag validation error return failure.
So first of all we need to know what is an OPAC path? It's all
explained in the documentation.
Otherwise, if base has an OpAC path and sees an hashtag set
URL scheme to the base scheme, URL path to path URL query
base and URL fragment to the empty stream. So if my path starts with
a hashtag, which means that this is a fragment state, so the URL needs to
go to the fragment state and that's one of the end states, otherwise blah.
So this is what URL specification
URL part of specification is all about.
I kind of hear, why are we even
talking about it? So the main issue that
actually led me to this implementation is that there's a significant
performance issue with what Wegrl in node js,
and in order to update it there was
an issue opened and I
wanted to make it faster. So can it be faster? So this is
one of the issues re implement was VGrL parser and Wassem.
The URL is there for you if you want to look.
And it was actually opened by one of the technical
steering committee members, which is James Nell.
He basically said that it should be possible to realize a
significant performant improvement by porting the implementation to WASM.
So wasm is webassembly. So if we implementing the URL
parser through WASM, then we might get a URL
parser. So this was pretty interesting to me, because right now I know what
I needed to do. I need to port the implementation in a language that's
performant like C or rust,
and then I need to expose the API through
webassembly so that node js applications can consume it.
But before we start, let's take a step back and understand what our
options are. Webassembly is really webassembly
our only option. The potential solutions,
not for writing URL parses, but also for most of the application,
is that you can either compile C, rust or any other language
through webassembly and expose the API. Or you can
write an API which is node API in C rust,
and you can actually run the C code and expose it
through the c binding. Or you can write a pure Javascript.
So we don't want to write and expose our
implementation through C
and rust through any API, because that's the actual implementing what's
happening right now. So let's start with webassembly then.
So I chose rust because it's
proven that rust is super performance. Actually,
I wanted to learn rust. So this is one of the emotional
decisions that led to this decision making process.
So webassembly and rust is a really good job thanks to Firefox. There are lots
of contributions on that side. There are really great libraries, and the ecosystem is ready
for mass adoption. And one of the
most crucial parts in nodejs, and on top
of that there was umdichi. Umdichi is a really good example.
It's backed by node and actually
by using webassembly and compiling LLA HTTP
to webassembly, they had a 1329%
performance boost and it was really good, and it was the basis of webassembly.
But these are the expectations. But the reality is decoding
UTF 16 to UDF eight.
Webassembly communication relies on shared memory. Okay, there's some
problems with deserialization, and deserialization of that data.
Shared memory requires decoding and encoding. So text
encoding and decoding in nodejs relies on c plays plus, which means that we're
going to use text encoder and text decoder. So why
did we end up with text nodejs?
So a quick recap. So the goal of this presentation
was that we are trying to improve the existing implementing improving
by removing the c bridge,
but text encoding and decoding in node
js relied on c. So let's do a benchmark about
all of those things. If you use a webassembly and
if you use rust and expose it through webassembly,
this is again roughly estimation. This is not a full
feature set URL parses that's written in rust, but the rust
implementing is 28 24% slower.
For URL constructor, this is just by passing the
URL and returning an object. That's it.
On the other side, URL search param set is 86%
slower because whenever you are calling set, get those kind
of variables. There's a c plus plus to JavaScript communication,
which is we are using webassembly but there's always a bridge between it and for
fm it's 96% slower. The main
reason for URL search params is slobber is that because it's
actually implemented in JavaScript and nodejs. But I wanted to make
sure that if it's possible to also improve the URL search parameters implementation
too. So it means that it's 86% slower.
That's a fact, not a random number.
And here we are again, we are starting this
presentation from scratch. Can we write a fast URL parser using
Javascript? So today I'm going to talk about the road fast URL parser
for node js. My name is Dipli and let's start from scratch
again. So why Javascript?
So if you know, there was three
main reasons that we thought what are the three
alternatives? We could either use c plays plus and
rot through webassembly. We could either use Nodejs API with
c, or we could just write
any javascript. So existing implementation in C we
tried webassembly and did some benchmarking, but it
was slow. So there are no other solutions that does not use
a c bridge. So I'm kind of limited by the technology
of my time as Howard Starkin.
So as a result, URL state machine mpm
package rise. It's available through GitHub
on anonyric URL, which I will include at
the end of this presentation. But it basically is a super fast compliant
URL state machine for nodejs that's written in JavaScript,
but not as fast as I would expect it to be. So this
is one of the packages, the usage example. So if you just pass the
string of the URL, then it actually returns an object which
is scheme username password, and you can retrieve those information.
So let's do some benchmarks about the JavaScript implementation
of it. There's another library called what we G
URL, which provides the same feature set, and it's around 576%
slower. URL state machines is yes,
five, six times faster, I guess. And the actual URL
implementation provided by node js is that it's 85% faster than
my solution. So this means that the decoding and
encoding side of it, and it's slow, and JavaScript
is kind of slow for these particular operations.
So there are some of optimizations
that I did, and I'm going to talk about those things. So using function versus
class, actually up until nodejs 18,
up until v eight 9.59.7 version,
if you create a class and have private variables on
top of it, it was actually 55% slower.
And just by converting a class to function, a bit
like function prototype kind of keywords,
it makes your code 55% faster. So this is one of the first
optimizations that I did. The second thing was if
you change your conditional function to
switch, or if it actually improves
your quota. So if you have more than ten keys,
if you're doing an if else statement, actually converting it to switch
is faster. If you have less than ten,
then if is faster. So again,
it depends on how your implementation is going, and you need to choose correctly
and do some micro benchmarks in order to understand. But again,
if you're using a new node version, these dynamics can change because
there are micro benchmarking that are based on v eight.
So there's another one that like string to number comparison.
This is really important because if you want to compare, let's say string, how would
you compare two different strings? First you would check for if their lengths
are equal. If their lengths are equal, then you would traverse every
character one by one and make sure that each character is equal to each
other. This means that I need to access all of
these compared strings, each character one time, and then
make an equal operation and then check it. So this is one of the idiomatic
implementations that you need to do. So just
by changing my alternative, like my states from strings to
numbers, which means that this is one of the changes that I did.
The authority 100 schemes start 101,
I had a really huge performance.
So this is one of the changes that I did,
again, really important. One of the other thing is that
with slice versus substring. Slice creates a new
string, substring does not, but actually
slice is faster. This is one of the examples. Instead of having
substring this point to plus one and this input length, I just change it
with a slice and calculation of fragment is
a lot faster. On top of that is none.
So defining if it's not a number versus if it's checking for
undefined is actually really
costly operation. This is one of the mistakes that I did. Not a
mistake, but it was mentioned directly in the URL
parser specification and I wanted to be 100% compliant
to it. So in order to make it faster. Right now I'm starting
to divert from the specification
and I'm trying to introduce optimizations of it. So this is what happens with
all of those runtimes. If you want to make things faster, you need to find
shortcuts. And those shortcuts are not always
means that you are going to follow specification. And then there
might be some uncode issues that might occur.
And because you are not following the specification, it will be also super hard for
you to track. So yeah, instead of saying core point
and accessing grid, I just checked it for if
it's undefined, undefined. If it's not, just get the core point at
character zero. And this is also a really huge
boost. Instead of saying is none, then I'm just
checking for if it's undefined again.
So one of the again, most important parts
of optimization is memorization. If you're recalculating
a variable every time, if you're doing comparison many times,
then if you store that variable in the memory and then access
to that variable in the function callback,
then again super fast. So this is
one of the optimizations that I did. So the URL
specification actually states that you need to check
for if the current URL scheme is a special schemes
state. So previously I was just checking
if it's in an object of special schemes, then I
would assume that it was a special URL. But the thing is, you just need
to set these if it's a special URL or not, when you
are changing the URL scheme, then it's actually a lot faster.
So let's do some statistics. So because
of the work that I did, there's almost 2000
lines of code was written, more than 800 benchmarks
was run, and the current implementation is 99.9
perfect spec compliant, that's 0.1%.
That one test is related to text encoders and
like encoding UTF eight and UTF 16.
But it again can be completed 100% really
easily. There were 733 successful
web platform tests that was run in under to make sure that our implementing
is correct. And even though I use all
of the I tested all of those WPT tests,
it was actually 96% covered
in the tests. So this means that the URL specification
does not have the necessary tests to cover 100%
of the code that was written. And I
think this should be handled on the web platform test.
So performance in terms of
performance, the JavaScript based parser was 566 66%
faster compared to the other
JavaScript implementation, but again it was 95
and 96% slower than the native C plus plays implementation.
So what basically is that there's a really
great guy from Brooklyn that said that great power comes with
great responsibility after v eight fixes this performance problem. So we are
really dependent on v eight on performance, and I hope we
will have some breakthroughs in the future. So a small
footnote, this is a product, this, this whole presentation,
this whole project was a product of my graduation thesis,
Fordham University. And so I would like to thank William Lord from
Fordham University, as well as James Nell and Mario
Colina from Nodejs and
thank you for that. And thank you for listening.