Transcript
This transcript was autogenerated. To make changes, submit a PR.
Speed. Safety. Development experience,
fearless concurrency. These are
all things that you associate with programs written in rust.
How about somewhere buzzwords? Words like elegant?
Oh, that's a good one. I'm Zeeso and I'm going
to share the gory details of how my blog works and why people often
mistake it for a static website. Buckle up and kick back.
We're going to learn about the Internet today. I'm Zeeso.
You've probably seen my blog on that orange website or that other orange website.
I also study philosophy and have been writing a novel. I work
at Tailscale as the archmage of infrastructure and I do developer relations.
My blog is somehow one of the best resources for learning Nix and Nixos.
This talk will contain opinions about website design and the like.
These opinions are my own and are not the opinions of my employer.
Websites are social constructs. There are only servers that
speak this weird HTTP protocol and then sometimes spit out a markup
language called HTML. If you're lucky. This HTML is
then understood by very principal humans or web browsers,
and then it all gets transformed into roughly what the writer or designer wants
it to look like. I write all my posts for the blog
in Markdown in emacs. Sometimes I do brain
dumping or initial drafting in Apple notes on my MacBook or iPad,
but it all gets into emacs eventually for publication.
This markdown has some front matter in YaML, this has metadata
like tags, the series, it is in stream recordings
related to posts and when it is scheduled to be posted publicly.
This is all used by the templates to make sure that I can't forget to
put it in articles, but the main focus is
on the contents of the post, the words I type out.
In the process, I've organically grown my own custom markdown
dialect on top of a markdown parser named Comrac.
Comrac is made by a friend and it is the most important part of this
website. However, over the
years I've found that vanilla Markdown just isn't good enough for my needs.
I've grown features out on my blog that require more fancy
things like the conversation snippets and the newly added aigenerated
hero images. At first, I just implemented
a hacky markdown extension. It applied the conversation
snippet logic to anything that matches a markdown link
with a weird URL scheme.
Unfortunately, that ended up not scaling well
as the conversation snippets got more complicated, like when
I need to add links. So I brought in
a library called law underscore HTML. I use
this to transform my custom HTML elements into
a bunch of other HTML using a bank of templates.
This allows me to write markdown with occasional HTML for the things
markdown can't express. Then I can rely on my blog engine
to translate those short codes to what people see on the site.
In my blog, I use a templating engine called ruct that
takes a weird meta syntax on top of HTML and then
spits out rust code. This means that when you load
a page like my homepage, you're hitting a function that renders that homepage to a
string buffer. That string buffer is what my website throws
back into the void, and hopefully it all comes back to you on your end.
As a side effect of doing all this, it happens fast.
Really fast. So fast that it's faster
than a static website. Turns out serving things out
of ram is very fast. And when
I say fast, I mean that I have tried so hard
to find some static file server that could beat what my site does.
I tried really hard. I compared my
site to nginx, Openrest,
Tengen, Apache, Go standard library,
warp in rust, axum in rust, and finally a go
standard library HTTP server that had the site data compiled
into ram. None of them were faster saved the
precompiled Go binary, which was like 200
megabytes and not viable for my needs. It was
a hilarious benchmarking session. I have accidentally
created something that is so efficient that it is hard to
express how fast it is. Things is efficient
and fast, but the syntax of ruct is awful.
I have to specify the types of my code in the template itself.
I have to be sure that the automatically generated template code is importing any
of the non default traits I need. It works,
but it kind of sucks.
So I've been playing with mod instead.
Mod is a procedural macro library that lets you transform
its own domain specific language into HTML
at compile time. You can make your components normal rust
functions I use mod for all of my short codes,
and I've been slowly converting my site over to use it.
It's pretty great, you should check it out.
One of the biggest things you see me use these for is the little conversation
snippets that I have in blog posts. This was originally created
to absolutely dunk on homophobes that were angry that someone
put furry art in an information security blog post,
but this also lets me experiment with a more socratic dialogue style for
helping to explain things in more detail. I now
write everything with this style and have to go back and edit
it out for the work blog. My coworkers can confirm this.
This flexibility also lets me add things like hero images
generated with AI. I use these to help make my post more
visually interesting. I'm still refining my style and
trying to make things better, but I'm just absolutely
terrible at CSS.
One of my favorite parts of how this site works is something that will probably
make the theoretical computer scientists in the crowd start crying.
When my blog loads everything from the disk into ram, it stores all the posts
in the moral equivalent of a linked list. When you, as a
reader, look at one of my posts, it's doing a big o of n
lookup on potentially every one of my posts to figure out which post
to display. Normally, this would be terrifying,
especially with the amount of traffic my blog gets, as represented by this
handy graph here. You'd think that something that
does a lookup on potentially every post, in the worst case for
the most common thing on the biggest data set, would make performance
terrifyingly slow. You'd also think that
with the amount of traffic that I get, it would be an active
detriment and I'd be trying to remove it.
However, this is when I play my trap card.
When you look at the analytics, you can see that the most frequently read article
is the most recently posted one. This means that it's
not actually a big o of n lookup. Most of the time it's constant
time complexity. In theory, things design is the
terrifying type of thing that you'd normally find out after you accepted a
job offer and had your first day of work, but in practice it's
fine. It is a bit weird though, and I may need to rethink this in
the future, but this is scaled to almost 300 posts for now,
so I think it's okay. When my site
starts up, it reads every post from the disk into RAm.
Rust makes that really easy. With Tokyo, I can
schedule a bunch of jobs and then wait for them all to finish.
This lets me spread the loadout to every cpu core so that the
posts can load up to twelve times as fast as they would if everything
was done iteratively. Once it's done loading them,
it sorts them and then puts them into the list for the blog's data structures.
I can do things in one line of rust and it would be something like
50 lines of go. Rust allows me to have
a lower cognitive complexity because I can just
rely on things being taken care of for me instead of having to reinvent
the wheel all the time. I things in high level logic
and let the compiler take care of the lower level details of making
it work. It's great.
Can amusing part of all of this loading things into
ram stuff means that my website is actually
stateless. This allows me to move it around to any server
I want very easily in case something very bad
happens.
I can also take all this data in RAM and then transform it into
whatever kind of feed I want. I currently support RSS,
Atom and JSON feed so that you can subscribe to my blog with
whatever reader you normally use.
JSON feed allows for custom extensions, and I have played
with one that gives you some of the extra metadata in my front matter
that isn't exposed in JSON feed itself.
Normally, this doesn't show much of anything useful. It's where
I put things like the Twitch and YouTube links associated with a post,
the link to the slides and talk pages, or the name of the blog
post series, if one exists. I don't know if anyone uses
these, but I've been starting to use them for some of my internal pipeline.
Things I mentioned my
website was stateless, right? Turns out
that's not totally the case.
It's mostly stateless, sure, but it also
has a stateful component that organically grew to meet my needs.
This stateful component sort of started out as a personal API for
other things. I named it me after the Toky Pona word
for me. I use this daily to track some
personal things, but it became really useful once I found
the indie web concept of posse publish on
your site syndicate elsewhere.
This concept allows me to post things on my blog and then have something else
take over to announce those posts on Twitter and Mastodon.
With messages like this, everything is
automated. I don't have to lift a finger except
for Patreon. Patreon's API doesn't
allow you to generate posts, and sometimes I can
forget to link the post to my patrons. I'm trying to get better
about this, but I would really love to just hand this over to a machine
and stop having to care about it.
The other major thing I use things for is web mentions.
Web mentions are kind of like app mentions or Twitter,
but it's generalized for any website on the Internet.
It's another indie web protocol that a surprising number of websites
support. Along with bridges for things like Twitter and Mastodon,
me receives and stores all of the web mentions I get.
When my site starts up, it reaches out to me and gets a list
of web mentions for every post it loads into memory. This means that there's
potentially some delay from you sending the web mention to it showing up on my
blog, but in practice that's okay.
I would like it to be faster, but that would mean having to move
the web mentions database into my main blog app, and I don't
know if I'm ready to do that or not because it would make moving my
website around a lot more complicated.
So I mentioned on my blog before that I host everything on one big Nixos
server. Now, this means that I would be able to store things
on that server fairly durably. But I also have
mentioned that my site is stateless and it farms out its state to
a stateful microservice. You may be wondering something like why
would you do that to yourself? I have a good reason for it, but in
order to explain why, I want to take a moment to trace over the history
of my website's hosting. Heroku's free
tier was one of the things I used to break into tech when I
started my job in Mountain View and got my former domain name.
I was likely using Heroku to host that website.
I don't have notes from back then, I'm going off of my gut feeling and
some projects that I have on GitHub.
At that point, my website was a showcase of my ability to write
things using a web framework called lapis.
You can think of it as rails for Lua built into the side
of Nginx. This variant of my website was in
use for a few years until I rewrote it in late 2016.
A huge part of
how that website worked was that it parsed the markdown for each post every
time the page loaded. This let me edit and test
things very quickly, which made writing posts and previewing
them in real time possible. I didn't fix this
before my first article got to the front page of Hacker News,
which meant that my website was a bit slow, but it did
survive the load, barely.
After that, I set up a cache server named Olegdb.
Olegdb is a key value store written in c by some
friends, and it is a joke about mayonnaise that has been taken way too far.
I used Olegdb in my website to cache the
rendered HTML for each markdown post.
When you loaded a page, it made another request to the OlegDb
server to grab the contents from the cache. This was faster than
parsing the markdown on every pages load, and it ended up
being the thing that made my site survive the wrath of Hacker
News. Some time
after my site was deployed on Heroku, I moved it over to a server running
docku. Docku is a self hostable heroku
clone that lets you run a heroku like environment with Docker on a server
you own and operate. I've used docku
for years since, and for a very long time. It was the first thing I
reached to when trying to deploy anything to the cloud.
It's got templates for spinning up basically any database you
could think of at the time, and it was trivial to just spin up
infra when I went to experiment and kill it off when I was done,
no additional cost required.
I was very price sensitive back then. Being able to host many apps
on the same $5 per month server was a
huge advantage compared to hosting one app on one $5
per month Heroku app.
I've also been a member of the Go community slack since it was founded.
Time and time again answering helping
people with Go, I had seen people
wanting an example of a web application that used the Go standard
library as its framework, and there was no really good example for
it. I had also reached a performance optimization
point where I didn't know how to make my site on lapis run faster,
so I kind of got nerd sniped and decided to rewrite my site in
Go. The first iteration
used a Go backend with purescript and react on the front end.
This worked for some time, but after I realized that my target audience
uses weird browsers that don't support Javascript sometimes I
removed the client side rendering entirely and I had the server spit
out HTML to the client like a traditional website.
This allowed me to survive hacker news hugs of death gracefully
and is why I started putting everything into ram in the first place.
The Go port of my website handled the load like a champ.
This is also when I started putting everything into one giant linked
list. It was so much faster than using a cache server, but the
main downside was that it made the site slower to start up.
At the time it wasn't a practical issue.
I admit my blog is an exceptional use case.
My website gets a lot more traffic than you could possibly imagine.
It usually gets more than 100gb per month.
This is really impressive because my site mostly contains
text and small images.
When my articles get popular, they get very popular
very fast, and then that starts people looking at other pages. On my website
I have really unique performance requirements. The number on
the slide is the number of times I've been on the front page of news
aggregators or have made other posts that have gone viral.
At nearly 300 posts written. This means that my posts have
a less than one out of ten chance of getting a lot of page views
in a very short amount of time. So I need to be sure that the
website code runs as fast as it can for the most common use of
the most common routes. At one point, my blog was
starting to get loaded enough that it started to make my docker server fall over
from plain text, HTML responses and RSS
replies. Something had to give.
So in a moment of weakness, I made a pact
with the devil. I put my blog
on Kubernetes as a part of me learning how to use Kubernetes for
work. I'm a very hands on person. I need
a local copy of things in order to really feel like I understand how to
use them. So I decided to commission
a freight train to mail a letter and I set up a Kubernetes cluster
with digitalocean. This worked
pretty great once I got past the initial teething issues,
and it worked for a long time. I was disappointed by
how many alpha components I needed to serve web apps reliably.
I was able to do continuous deployment using GitHub actions and it
made my blog minimal effort. At most, I was focused
on writing. Publishing was relegated to the machines,
however, sometimes it blew up and when
it did, it was worse than when the single server blew up.
I didn't have access to root on the servers.
I had just enough apps on the Kubernetes cluster that I
couldn't scale the cluster up and down to unbreak
issues. Sometimes a file system mount would get
stuck and I didn't have a reboot that sucker button to
unstuck it. When that happened, my git server would stop working.
This is a very annoying thing to debug while you should be focusing
on your day job. After a while I
gave up. Then I got nerd sniped again
with Nixos. With Nixos I could just directly specify what
should run and where I had power beyond what
mere mortals could attain. With Docker and
Kubernetes alone, I could shape the
universe of the applications in question and then proceed with
that. Instead of trying to kit Bosch things into shape based on overly
generic tools, I could just use Nginx
to route to the Unix socket. And then I did not
have to care about the overly generic Turing complete Yaml hell
that is Kubernetes. I think it's
pretty great, but I'm a Vtuber, so take my
opinions with an appropriately sized grain of salt.
The biggest thing that you can take away from this is that dynamic
web apps can be very fast, especially if they are
built to purpose. If you keep your goals in mind as
you develop things out. It'll do everything you need
very quickly. My blog
stands on the shoulders of giants. Every one of these people gets a special shout
out for helping either make my blog or this talk shine.
Thanks. You all really help more than you can imagine and
thank you for watching. I'm going to stick around in the chat to
answer any questions I haven't answered already. If I miss your question,
or if you really want an answer to my question outside of the
chat, please email it to how I made blog at
zserve us. I'll have a written version of this talk,
including my slides, a recording of the talk, and everything
I said today on my blog. Soon. If you have questions,
please speak up. I love answering them and I am more than
happy to take the time to give a detailed answer. Be well,
all.