Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm going to talk today about building the Stonehenge using Gauss
law. My name is Fabricio Buzeto and I've been coding since 2002.
I worked at my fair share there in big companies and small as well.
Also, I did my time in academia and had my phd.
That's why you're going to see a lot of references in my slides.
I've been working with startups since 2011 and this has been
my love since then. And what I'm going to share today is a bit
of my past experience and what I've been currently learning
about building this type of products in this kind of environment of
startups. So let's start talking about what is galls
law. Galls law states that a complex system that works is invariably
found to have evolved from a simple system that also worked.
And a key word here is the word evolve. How do we perceive
a complex system evolving? Usually when we think about
this, we think about adding features, starting with something
that works, and just keep adding features and features
that will remain with this working. But usually
a complex system evolves and starts from a very initial state that
usually don't work or we don't know that it works.
So that's where comes the second and important
part of the Galls law. It's what is this simple system?
When I think about galls law, the first and better
example that I can find is game of life from John Colby.
So if you are not familiar with this classic example in
computer science, you have just three simple rules. And with
three simple rules, you can create very complex behaviors.
When you look the results and play around with John
Coy's game of life, you learn that most of the initial
states that you have are not stable. They die out and
you have nothing left. But if you work well and you try it out,
you can find very complex and beautiful results like
this. Galls law also states that this simple
system may or may not work. And that's something
that we should be familiar with, because knowing
what works or not, it's key when you're selecting that simple
system. So knowing what works means stacks with
thinking about it. And when we talk about software, a software
that works is a software that fulfills its purpose. And the purpose of
a software is many. A software can exist
to improve sharing, or to prove something, or to help
explore some context, some problem. A software
can exist to help politics within an organization,
somebody that's trying CTO, a promotion, or trying
to prove that something that they believe is true, it's happening
but usually most software, they are focused on business,
helping the business advance or go forward.
So I'm going to stick with that. Also, every software has a
client. This client can be the team that is using that software to help
them with some task. Can be the company itself can
be a sponsor that's paying for that software no matter what, but usually
is a user, somebody that's really interested of having
that software solving their problem. Galls law also
states that a complex system designed from scratch never works
and cannot be patched up to make it work if you do.
So if we start a complex system from scratch, as a
complex design, you have to start over with a working
simple system. That's why we should start simple.
And I'll start with some examples of myself. I'll start with coconut.
Coconut was a startup that I worked back in 2012 and
we did social tv analytics. Basically what we did was
CTO, watch for a Twitter stream about tv shows and
generate intelligence based on that data. And when I got into
the company, our infrastructure was very simple. We plugged
it into Twitter API, processed the data using django,
and fed to a MySQL database. It worked well.
We just had a few shows, but we stumbled
on our first barrier, which was the limits of the
Twitter API. So we had to switch to a Twitter API
roles, which was data sift. So it allowed us to address
the whole Twitter comments about a tv show.
And since we had just a few shows, it also started small.
So our average twizz per day was just ten k
and our peak was five k. When we had
these peaks, our application would just freeze. And that was
not nice. I knew that our infrastructure was
not the best, it was the simplest we can do with our purpose,
but was not also available for us to just start
rewriting everything from scratch. So what we did was,
let's just try to scale this. And we did. We scaled first
our Django application, and this helped us CTO increase
our volume. We reached five k tweets per day.
Then our database started, got into
the way, but we managed to also increase, doubling our
capacity until we knew that we reached our limit.
It was impossible for us to increase any longer without just
having a very complex and difficult to maintain
database and code base. So what we did
was let's go for a final infrastructure. So we studied,
we choose a part storm back then, and we did some experiments
and we started small. We migrated one simple
metric from our Django application to apart
storm. And over six months we migrated one
metric at a time to this new application while the other
application was maintained and also works until
the end. We managed to migrate everything and we reached more
than 1 million tweets per day. And I remember our peak was
50k tweets per minute, even though we never
had any issues with load anymore. So this
got me thinking that this kind of approach where
I tried to stretch the application and the architecture
as much as I could until it hurt, it was the best approach for
this kind of situation. And this is how I started when I
got into Bxblue. So I was since the beginning in
the company, our first mvp was just an unbalanced
page pointing to a Google Docs. Of course it won't
last for much. When we started having too much load for
us to handle by ourselves, I started building a rails
application that helped us handle our requests from the clients.
This rail application eventually replaced our Google
spreadsheets and replaced it as an RP
CTO handle our client's paper line. This infrastructure
grew and eventually had to have our own database to handle
our clients requests and we chose MongodB.
So on each step, architecture changed,
but not very much, just a little bit at a time.
And the purpose of each step was to answer a
question. So first was to answer if the main purpose
of the company could be fulfilled, then if we could sell anything,
if we could do it faster, and finally if could
do it properly so we could have a bigger team handling these requests.
What this have in common these two applications is that both of
them are monoliths. Why monoliths? The main
reason, because they are simple. Monoliths are simple to develop,
simple to test, simple to deploy and simple to use.
They can be simple to scale as well. But usually when you think about
monoliths, you think about the drawbacks. And the main drawbacks of a
monolith is that they are hard to scale, hard to scale their tests,
how to scale the team to avoid that many people are
working on the same thing. How CTO scale the deploy you can have a faster
deploy. It's hard when you have a very large code base,
how to scale a stack so you can have new technologies living
with legacy ones, and how CTO scale changes when you have
lots of changes happening at the same time. So let's talk about how Bxblue
handled this type of scale. So our galls application was
very simple, but it also relied on a lot of external
services. We had more than 15 of them plugged on
our application, helping us do our job over time. What we
did was just go async so asynchronous communication was
handled using sidekick to handle our jobs and our job
queues. So this managed to have us scale
at a very fast pace. And what helped us was not this architecture
per se, but also how we did our development. And the first thing
that we did was we always automated our tools,
because you don't have to trust yourself. You have to trust
your code, and you have to trust that on every change that
you make, things are still working. So you have to automate your tests,
automate your code quality, automate your deploy,
and automate your monitoring tools, your tests. So every
time you deploy something, you know that your code is still
working. Automate your code quality checks. So every time
somebody is code reviewing, they are not checking the same and
same things again, so they can miss something out. Automating your
deploys so people don't have to think about all their checklists
over and over again. And automate your monitoring
so you know that if something goes wrong, you'll be
notified when you're done automating. What you should worry about on
your test, you should worry about your unit tests, you should worry
about things that are in common with your tests so you don't have to rewrite
them every time. And you can better maintain integration
tests, not only internal, but external as well. So you know that when
some external tools change, you know if it broke
something. And about speed, if your tests take too much time to run,
you avoid running them. Your code quality, so you can control coverage.
And you know that you have a blind spot, your linter, your code
quality, so you know that your team is not checking something that a
machine can help them do. And your security. So you know that
if you introduce something bad in your code, in your deployment,
you have a very good CI CD pipeline, stable. You have a
source control, and you have a cloud pipeline that controls galls, your servers,
and finally, monitoring, monitoring your errors, your servers,
your logs, and your user journey. In our case, we have a
rails application, so we use most of the rails
ecosystem for that. But we're a monolith. And unlike
Kokanao, our problem was not scaling the application to
handle users, but sharing our application to handle its
context. As our monolith grew, we started
adding more and more context to our monolith, and this
started to slow us down in our development. So that's where
we complicated hints. So before I started talking
about what we did, let me talk about the options that we have if
we wanted to avoid the monolith. So basically,
what we have here is a distributed
system, and a distributed system mainly is something that
run on multiple servers. And these applications,
they manage some kind of data. We have many architectures available besides
the monolith. We have the microservices that
they are simple, self contained, they are loosely coupled,
they are single focused, and they are services, they are
connected by themselves. We have the proposal of the citadel by GhH.
That's a large self contained monolith that's supported by small,
single focused, problem specific services that handle what
the monolith cannot handle by itself. We have also the microservices
that are kind of more hungry microservices that
people from Uber are starting to experiments.
They are simple, they are self contained, they are context focused
as well. But they are multipurpose services that try
to engulf more context in their services than just a
microservices does. And what did the explode?
Okay, so we had our architectures as I presented before,
just out of simplicity, I'll consider my application as this
small box with sidekick and MongoDB, this square.
What we did when we had to start and add that new context, we decided
we're going to create a new application. So instead of building
this new context in the same monolith that we had before,
we are going to extract it and build its own application to handle it.
And it did well. The development time was great. Integration with the
legacy monolith was easy, and we started adding
more services and more external applications to it. Then we
did it again. A new context appeared. We built a new application to
that context, more services, aggregate, CTO iT. And is it so
well that now we have more than six applications in
our part? And that's what we call the Stonehenge. So the Stonehenge
is a distributed systems strategy just like the others.
We call it as one step further into the microservices.
So it's a simple, self sufficient, context focused,
service enabled application. The difference here is that we don't
think about services only, we are thinking about
applications. So they are self sufficient because
they work by themselves. They don't need the other applications to
do their job. They are context focused, which means they can handle that
folks very, very well. And they are self enabled, which means
that other applications can integrate with them. So they can scale,
but they don't need them, they can work them by themselves and by the
law of conservation of complexity. Complexity has to
go somewhere. So it doesn't matter which of these architecture
I choose. Every application has an inherited amount of complexity
that cannot be removed or hidden. So we chose the Stonehenge
because was the better way for us. But complexity is still there,
mingled in the applications, the way the whole park
is connected. Just like microservices or microservices.
What I'm trying to avoid here is not complexity as per se,
but that code, when you see the statistics,
something around five and 30% of what we code,
they are not used. And when you think about startups,
70% of startups will fail. So the code that they
build will never be used again. And what I'm trying to do is to make
sure that the code that I'm building, they are the best ones for that
purpose. And they are trying to help the companies where
they live to move forward and they are used. So Samiya,
if you have CTO take something from this conversation, is that
God's law works. A simple system may or may not work,
but a complex system designed from scratch never works
and cannot be patched up to make it work. So a complex
system that works is invariable, found to have a vote
from a simple system that works. And if you are into a complex
problem, start small, start simple. That's the way to go.
And don't trust yourself, trust the machine. Automate your tools,
your tests, your code quality, your deploy, your monitoring.
This is the best way to ensure that every time you
change, things are stable and things are going
to break. And if you know when they are broken, you can fix it.
So build a very good automated tool set to
help you do that. And lastly, why not try the Stonehenge?
The simplest solution is usually the best. So in the end
everything is just distributed systems. Decision making is
hard and things will change. So this is another option in
your tool set. So thanks and I'm happy to
hear your opinions and what you thought about what I'm showing
here. To have any comments in the video on Twitter, just reach
me out.