Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, and thank you for joining my session.
I'm super excited to walk you through building resilient
systems with serverless web development.
I am Olumide Akinremi and I work
as a technical team lead at Sabi.
In this discussion, we are going to talk about chaos
engineering. Chaos on the front end,
building resilient systems and auto and failures.
First, what is chaos engineering?
I know a lot of people might have asked about chaos
engineering and some of its advantage,
but it's important to have a quick breakdown of
chaos engineering and why it is so important
to software engineering. So Wikipedia
defined chaos engineering as the discipline of experimenting
on a system in order to build confidence
in the system's capability to withstand turbulence
condition in production. So basically
you have systems and it's important to
build for scenarios whereby your system will fail.
All systems are bound to fail at any point. Your AWS
server, your azure deployment, or your
services. Everything will fail at some point.
But what is important is building confidence
in your system when a failure happened.
And this can only be done by ensuring
you are prepared for this scenario and know what
to do in this scenario. More like seeing
the issue before it happened or knowing
that this is going to happen and you are prepared for this situation
rather than not being prepared at all. And then when it happened,
you have no idea of why your system fail to
see chaos engineering as having a car
and you decide to go on a road trip with your friend.
And on the road trip there are a lot of failures that
might happen, just like running out of gas,
having a spare tire. Having a spare
tire is important because you can have a flat tire on
the road. So all these things
you should consider on the road trip, the kind of failure
that might happen on your road trip and trying
to prepare ahead
for this kind of failure. So things
are bound to happen in the tech world,
maybe as we speak, someone's system is currently
down and trying to resolve it. This happens
like every single time.
So it's smart to find a
fix to the problem before they arrive because
they can cause trouble and give you
a hard time to deal with when they pop up. So it's
more like checking your car to ensure that this car
can take me on this road trip. And if different
situation occur, I have
the writing in place to ensure that I keep moving,
I don't get stranded at this point.
So let's talk about chaos on the front end.
The front end is a very crazy environment
because a lot of all these failures are not dependent
on you as a front end engineer
or as a full stock engineer, they are dependent on different
situation which are
not in your control. So imagine
that you have an application whereby the
front end is supposed to, sorry, the back end is supposed to return you some
data that you are going to use to
render for a particular page,
but at some point the server goes down, the back end can't
return those data that you need and then
everything is building. The user can't see anything in your application,
the user complain and as a result you
might lose some users.
As a result you might get some call from your
CTO or your CEO that the applications is
not working, or in fact like the customer support team or
users generally leaving feedback on the application,
this is not working, this is crap. So it's
important to deliberately introducing
issues into your front end application to observe potential problems
and assess how your application respond to this.
And have at the back of your mind that it's important
to have a lot of things in place for your
front end application because this is where the user sees
and this is where they interact on
how they interact with your system. The user are not going
to see the back end application, they see
your front end application and they interact with it and
different situation can make you lose users.
If your app fails to render on the initial
load, user complains and leave. If they click on
a particular CTA and is not responsive,
they give you some feedback that it's not working. I click on this button,
nothing happened. In fact, if the user have some network
connectivity issue and it
times out when making a particular request,
they complain that oh, this doesn't work just
because you didn't undo those failures
and faults. So it's important to
building with chaos engineering in mind and
trying to catch and fix issue before they arise.
Be prepared for a situation like
that. So introducing
additional features to your front end
or your code generally doesn't make it resilience.
In fact, it might add potential risks
an issue to failure in the application,
because adding new features means that there
are more features or more
user interacting with your system and
in that case they can try to interact
with the new feature you built and
the one they are trying to use before is broken.
So it's important to be prepared for
situations where this happened and be
ahead of the users. And another important
point is the front end poses greater
challenges compared to every other environment because
of different thing we need to deal with.
Javascript engines,
plugins, accessibility,
styling, latency,
viewport, all of these are
not 100% in your control,
but they are things you should be prepared for imagine
an application that works end to
end on chrome, mobile, responsive born,
Internet Explorer or Mozilla. A particular feature
doesn't work the way it should just because the
JavaScript engine or the browser doesn't support
a particular style that you've used or a particular
function that you've used. So it's important to have
tested or be ahead of the
users in situations like that to ensure that it works
on all browsers.
So let's talk about handling failures and building resilient
systems in a serverless web development.
So take a look at this diagram.
This is a music streaming platform
that have an authentication service, a movie
service, a recommendation struct service,
then a service that keep track of your
watch history. And this is connected to a catch so
you can see it really fast compared to
others that need to connect to the database.
So the database on the other end feeds like
all other services, because the
authentication needs to go to database to retrieve
user information. So I believe this is like a
basic microservice that most
people use. Then we have a front end service that
talks to whatever front end services that you
built. We can be react application
angular application or a mobile app.
So at this point think of a situation
where your database goes down,
meaning that none of this service will
be able to talk to the front end service.
So your react application angular application and
your mobile application suffers from this threat,
or a situation where your authentication service
is down, meaning that users won't be
able to log in. So we
can talk about different scenarios of other services
going down and what will happen. But what is important
here is knowing what
we fail and handling the failure.
So if your authentication service goes down, for example,
you need to think about how your system is
going to work. Does my application depend on
the authentication service to fully function?
So based on your answer then you should decide
how you are going to build and react to this failure. If your
authentication service goes down then the user should still
be able to access the application and still stream
movies and see the recommendation part
of the application because your applications is not solely dependent
on this service, because it's a microservice
which every service are dependent.
So with this you can kind of do some
testing scenario in terms of kiosk engineering
to simulate
each services and see how
your system depend on them as a whole and try to
react to those failures that might happen.
We will talk about some tools that can be used to
implement this chaos engineering we've been
talking about to kind of simulate failures and
know how your system react to it.
But overall, it's important to understand how your system
works and knowing what fail and how they will
feel and finally how you react to
them. Next, let's take a
deep dive into a LinkedIn use case.
So we have this LinkedIn profile,
and in this profile we have different views.
We have the user profile section, we have the feed
section, we have the recent activity session,
and we have the post section. In each of this
session, in each of this section we have
the views and they do different things.
So all these little views are what
form this is a profile page and
a lot of personalization happening here in terms of
recommended or suggested posts to follow.
Also the recent
activities that this profile has performed,
then the user can follow.
So in this page. So let's take
for example,
we can retrieve, just like I blowed out this,
this is supposed to be the user profile picture and
then the username of who you want to follow. Then let's
think about a situation whereby we can retrieve that information.
Is it really necessary to show the
follow button or to tell the user to follow this profile
because they can't see the information
about what to follow?
So it's important to know this
little detail to know how you react to this.
So that's why I'm thinking
also because it is irrelevant
to show this follow button if
the user can see the profile picture and
the name of who to follow, because it's confusing. And at this point
the user will be kind
of concerned that I don't even know who I'm following.
So that is one way to
undo failures for
this account. Another way to do that is knowing
what depends or how your system
depends on each other, or each section depends
on each other. So the most important bit of
this page, based on
your application or based on what you are building for
your own use case. But for this use case, the most important part
of this application is this user profile, which is
here because this user profile, it's what
makes us know or what makes the back end know,
the recommendation of who you want to follow,
the suggested posts you want to read, and the recent
activity. So in situations like
we can't see or retrieve the user profile,
then it's irrelevant to show any of
this information because we don't have any user
profile here, meaning that it's irrelevant
to display any of this information. So our page can fail
gracefully to not confuse the user any further.
But then if we can view the user profile and
then we can retrieve other information about
the suggested post. To follow who
to retro recent activity. We can still
display information on this page because what
is important to the user or what is most important to the
user, it's currently being displayed.
So with this example,
we can better streamline what
we want the user to see at any point based
on the failure that might
happen from our back end.
So I said focus on what to fail
and how it should fail. So just focus
on your application,
think about what is going to fail and
ensure you know and you are prepared for how it is going
to fail if such failure happen.
So these are tools that we can
use to create chaos. So these are the ones
that I've created that can help you
create chaos in your application and like failure
for your application. Thank you
for listening and let me know if you have
any questions. You can reach me on LinkedIn or
Twitter if you have any questions for me.
Bye everyone.