Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good morning, good afternoon, good evening folks.
Thank you for listening to me. My name is Gaurav.
I am the director of engineering.
I worked in a lot of New York City based companies and
thank you for joining me for my talk on scalability on
AWS. More specifically scalability and cloud
native architecture on AWS. Now, I know I'm trying to hit
as many buzzwords as possible, but I'm going to
try to keep this talk at more or less a beginner to intermediate
level. So I'm going to try to get as much into the basics
as possible and hopefully if you like it, if you are interested,
you can then move on and search for some
of the online literature to get into more advanced topics.
So without further delay, I'm going to start off by
talking about the structure of how this presentation is going to
unfold. To begin with,
of course I'm going to give you an introduction. I'll talk about who I am,
what I do for a living, and as I've said, I am
an architect. I'll talk about why you might want to have an architect
in your company. This is again, you can think about it
as advertising for my profession.
I will then, because this talk is more specific about scalability,
I will talk about the problem that currently exists in all the businesses
that I have worked with as an architect. What is it that
causes you to start looking for scalability AWS? A solution?
I will then talk about what it is that scalability exactly tries
to solve. What does a focus on scalability give you?
And then I will talk about how focusing on scalability can
solve the problem that currently exists in the industry.
Now this is very abstract, so I would like to also go into some
of the tools that AWS gives you in order to make your application more
scalable. As with any architecture,
there's always a pro and a con. There's always a trade off.
So I'll talk about some of the trade offs that you'll have to make in
order to make your application scalable on AWS.
And then I'll talk about some of the metrics that you
can use to decide how much you want to give up in order to
have a more scalable application. Then I will
go into some of the best practices that you can follow.
What are the rules of thumb that you can follow in order to design your
system so that you can make the right trade offs
and make the right choices for your application from a
more scalability perspective? And finally,
I am aware of the fact that people have short attention spans
in these days of TikTok and all. So I'll
give you a cheat sheet, a TLDR, if I may, where I will
talk about just two things that if you were to miss the
entire presentation, what are the two things that I would like you to walk away
with that I would think you would benefit from?
And these might be things that you already know or may not,
but I would like to reiterate those things. And throughout
the presentation, I would like to weave business outcomes with security
and technology. So it won't just be a very technical presentation.
I will try to make it as business value focused as possible.
One last structural point. Throughout the presentation,
I will have this question at the bottom left corner
of the screen. This will be the question that I'm trying
to answer while I'm going through each and every slide. So if you think
that I'm rambling too much on a particular slide, or if youll think,
what exactly is he trying to say, hopefully looking at this
question will make you reorient and try to kind of
figure out what it is that I want you to take away from the slide,
and I'll try to have at least just one takeaway from each slide.
So with that said, again, as I said, I'm the author of the book
Security and Microservice Architecture on AWS. It's an O'Reilly
media publication from 2021.
So if you want, please do grab a copy. It's available
on Amazon by the day. I work as a director of engineering,
sometimes as an architect, as a consultant in various companies,
and I've worked in a lot of companies throughout my career in
New York City. So please do find me
on LinkedIn at night. I work as a research scholar
and a doctoral student at University of
Rutgers. So my thesis
is on international business, on corruption and international business,
something that I'm very passionate about. So do feel
free to follow some of my work. Just apart
from that, I have an MBA from NYU Stern
School of Business in finance. So I try to merge finance
into any tech related talk, any tech related discussion
I have. That's just something that I like to do. And I
also have a master of Science from Rochester Institute of Technology in AI.
This was before the whole AI hype, so somehow I managed to be
ahead of the curve there another structural
part of the presentation. Throughout the presentation, I like to keep my
examples consistent. That way I don't have to go
back and explain what the setting is. So in this particular
case, the setting that I would like to use is that of an ecommerce
website. So you can assume that throughout
this presentation I run a company that has an ecommerce
website and this website,
it gives you the ability to search for products. So you can search it
by using simple keywords like I want sunglasses. Or you can
have an advanced search there where you can say, okay, I want to find sunglasses
which cost between $5 and $15 and have
a rating of four stars and above. So that's
the kind of website that you can assume I'm running. And any
example that I want to give related to scalability would be
pertaining to this website. So you can think about
that. That way we are all on the same page when it comes to
examples. Of course
you can run wild on how this application is running, but you
can just assume that the problem that I'm trying to solve is I want to
make this application more scalability for all the different use cases
that my end users are going to use this application for.
With that said, and with that background, I like to go into,
first of all, architecture. What exactly is architecture?
So anytime I look at a software, I look at
six different aspects of a software, I always think that
these are the six points
that I want to focus on. I feel an application,
in our case the ecommerce website, has to be efficient. I don't want to
pay money to a cloud provider or whatever for an inefficient
system.
I want to extract as much juice as I can out of the code that
runs on this application. I want it to be scalable.
Again, this is the whole point of the talk. So anytime I add a server
to the mix, anytime the number of requests jump up,
I need to be able to increase the amount of servers that
I have, the resources that I have, and handle the
new load that my application gets. So if my application is
an overnight success, I don't want to send people back saying like
hey, I know youll visited my website, I know you wanted to
buy it, but we just don't know how to handle so many requests at
the same time. I might want my application to be available all the
time. I don't want any downtime. I don't want it to be going down
at 11:00 at night because of some network latency or
something like that. I want my application to be secure. Security is
important in the day of cyber attacks and everything. I don't want
one malware to infect the entire application and bring everything down.
Of course I might be running some kind of a venture backed application.
So I want it to be as cost efficient
as possible so I don't have to pay money and convince
my investors that I can handle the cost aspect of it.
And finally, I want my application to be simple. A simple
application is easier to maintain. I can hire more
efficiently after hiring, people can get onboarded more easily,
I can document it better, and I can expand it
in a better way. So I want all of these things.
And this is where I realized that I
can't have all of them in my life. I've realized you
have to pick at the most four of
these six points, if not three.
Sometimes you might just get one or two even.
And that's where you need an architect.
An architect is someone that can come in and say,
okay, these are the things that you need to focus on.
These are some of the points that you can give up, some of your
needs, and this is how your application will
be most efficiently running for
the scale and the level of growth that your business expects.
And that's the whole point of this talk. The whole point of
the talk is to look at every application from an architect's
perspective, figure out how to make trade
offs, where to make trade offs, and what are the
factors that will decide where these trade offs should be made.
What are the tools that are available to you
in order to make these trade offs. So one of the tools that I
always like to begin talking about is the AWS shared responsibility
model. Back in the day, when you used to
run the application on your own servers, Amazon started in
a garage. You were responsible for every single
aspect of the application your server should be running. It should
be able to scale up, scale down. You should make sure that in that garage
that you are running the server. No one should just jump in and steal your
servers. At the same time, youll are responsible for availability and
all of the six aspects that I talked about.
That's not the world we live in today. That's why you have cloud services.
Cloud says you can hand over some of
the responsibilities that traditionally were yours to us.
We will handle those responsibility in return. Of course, you pay
us extra and you might lose on some of the flexibility
that you have running those applications on your backend.
A way of distinguishing them. I talked about, I always use this example by Albert
Barron, which is about pizza as a service. I like to have pizza.
I'm thinking of having pizza for dinner today. There are two ways I can do
that. I could go out to a restaurant and then
I don't have to worry about anything. I don't have to worry about where the
cheese comes from. I don't have to worry about doing the dishes after eating.
I don't have to worry about the temperature of the oven and all
sorts of things. But at the same time, if I want a customization, if I
want gluten free base or something like that, I don't
have that option anymore. On the other hand,
if I want everything customized, I want to have gluten free
base. I want to have pizza dough, which is
of a specific type. I want the cheese to be french mozzarella
made from buffalo milk or whatever it is. I can make
everything at home. But then after doing everything, I'm now responsible
for the dishes. I'm responsible for setting the dining table,
the oven, the heat, et cetera. So I
have to decide what is it that I care about most.
The third option is of course I can go for something in the middle
where I can order takeout, or I can order on seamless or uber eats or
something like that where I still have to do the dishes, but I don't have
to worry about the pizza term. And in
the same way on the cloud you can have an ala carte of responsibilities.
I could have everything on premise, or I could
have something where everything, including scalability,
availability, security is taken care of by AWS,
but then I lose the flexibility that is associated
with running my own application. Or I
could find something somewhere in the middle and decide
how much to give up control. Again, that's my
responsibility. And as an architect to evaluate what my business cares about the
most, I would like to focus on the scalability aspect
because that's what the presentation is about. More specifically,
I would want to go into figuring out what is
it that I will give up while trying to attain more scalability?
And what is it that I will probably have to give
up as far as scalability is concerned, if I want to be not flexible
on the other aspects of my application. In order to
do that, I want to take a step back and first go into
what exactly scalability is part of. The reason is
I've noticed scalability is often confused with another
kind of cousin of it called efficiency, because they are
both trying to solve a very similar problem.
So let's assume our application, our ecommerce application, is an overnight
success. All of a sudden you start getting millions of requests
per minute. You can solve it and suddenly
you realize that you are hitting the seams of your server. Your server CPU
utilization is high, your ram is off the roof. You can
solve it in two ways. You can either increase the
size of the ram that you have. If you are hitting 80%,
you add another 20% to it. Or you can add more servers
to the equation. You could make your code
run in such a way that it can suddenly handle more requests that come
in. You can add caching to the equation.
That way you don't have to make a round trip to the database or I
o or whatever it is. In both ways the
solution is the same. You can now handle more requests. In the
first one you add more resources. In the second one, you keep the number of
resources the same. You just get more out of whatever you have running.
The first one where you add more resources is the scalability that we
want to talk about in this talk. The second one where you keep the
number of resources the same, just get more output
out of the same number of resources is efficiency. In most
cases you want the system to be as efficient as possible,
but soon you start hitting limits. You have diminishing returns
as far as efficiency goals go. And that's why you need to
focus on scalability, because your business would be a success and the
last thing you would want to do is not service customers because you
don't have the resources. Or rather you have the resources that you could
have added. You just don't know how to add them. An example
here. I've tried to create this caricature
in chat GPT, where we are sitting in this group
of developers. We suddenly hit success and then how
do we scale? Well, we scale by adding these servers.
And what happens once you add those servers? For starters,
you lose the simplicity. You created this monster
Frankenstein application with all these chords going everywhere.
And look at these people, they've all done it. It's all a scaled system,
but at what cost? You have a Frankenstein that youll suddenly have to
tame. So that's where you need to focus.
So having discussed scalability, let's talk
about the ways you can achieve scalability. Well, you can do it in two
ways. The first one is vertical scalability.
Vertical scalability, going back to the textbook definition, is just adding
an extra size to it. If you're hitting building the
ram ceiling, like say if you have a four GB computer and your
ram is close to getting done, you increase your ram to eight GB.
Or if you have a CPU, you can increase from
go from I five to I seven to I nine or whatever it is,
vertical scalability, where you just improve the size of the
server that is holding your application. The advantage of
doing it is you don't have to make any changes to your code. It's the
same code, it just runs on a better system. The second
way of doing it is horizontal scalability where you add an extra
server to the mix. So if you're running a cluster,
and this is a big F, if you are running your application AWS part of
a cluster, there are different multiple resources
within that cluster that run your application. Horizontal scalability
means you can just add extra resources to that cluster and then you can
start achieving more. So what are the advantages
and disadvantages? Well, for starters, if you don't have your application
running as a cluster, it's just easy to achieve vertical
scalability, right? Especially in the world of cloud, you can just increase
the size within 20 minutes and you're done.
The disadvantage though is you have limits
to how big your instance can be.
On EC two you can't go beyond T 24 x.
Well you shouldn't ever reach that point, but there are limits to that.
Secondly, vertical scalability is very easy to
achieve. Initially you don't have to hire new engineers, you don't have to change the
code, but it starts hitting diminishing returns very quickly.
And horizontal scalability is the way to go beyond
a certain point. So these are the two places
factors that you need to figure out at what stage is your application
at your company at and that will decide whether you want horizontal
or vertical scalability. Now getting a little more
into vertical scalability, as I mentioned, you can have
an EC two instance that is running. And if you want to scale vertically,
all you can do is increase the RAM or the CPU or
the throughput of that application of the instance. And then suddenly
you have a more scaled system.
The other thing you can do as far as achieving vertical scalability goes
is if you have a general purpose instance running. That is if you have
an instance with two gigs of RAM and one virtual CPU.
If you suddenly want to increase the CPU size, AWS gives you a
memory optimized or a CPU optimized instance that
you can go for where you give up some of the RAM for an
extra CPU or you give up an extra CPU for some more RAM.
So if you hit one of the two limits, if you hit the Ram
limit, you can give up some of the RAM and you can give up some
of the CPU for extra RAM at the same cost. So that's the
other way of achieving vertical scalability. And finally, slightly less
intuitive way is if your application runs on a shared
server. Like if you have a database running on a shared database cluster,
you can move it to its own instance and that way you can achieve
better data. Youll can have a dedicated server,
that way you get more processing power with
that instead. I wanted to go a little bit into what are the cost
implications of that. For this I've collected
all the on demand prices for the T four
instance on EC two AWS. You can see
if you have a micro instance that has a 1GB RAM,
it costs you zero point.
It has two virtual cPus. If you go from 1GB to
two gigabyte, it takes 0.8 extra cents.
And now for zero point $0.16 you get
the same number of VCPU, but you get 2GB of RAM.
0336 increase, but you
get 4GB of RAM. So you can see that for each gigabyte
increase you're spending approximately zero.
Look at the CPU side of things. You see a CPU
jump from two to four between T four large and T four extra
large. Ironically, AWS doesn't seem to be charging extra for
that CPU jump, even though they
say that CPU is expensive.
And all I've noticed the memory is how they're
pricing their tiers. So as an architect,
if you have a demand that you are projecting, you should keep that
in mind and figure out how this insight will
help you at some point. Ram does become expensive on
the cloud and you need to figure out how you can optimize youll
application based on this kind of an equation.
I've written a few blocks around cloud economics,
but that's something that I always keep in mind when it comes to
projections. With that being said, I would
like to go into the more complicated, according to me,
way of achieving scalability. That is horizontal scalability
for starters. Why is it that most people don't
have horizontal scalability by default? Well,
because for starters, your application needs to be ready to have
horizontal scalability. Just because you add an extra server to the equation doesn't
mean suddenly your application can start running faster,
better, or process more requests. A fun joke one
of my old boss used to tell me was nine couples can't
have one baby in one month. If the baby takes
nine months to be born. That's how it takes.
That's how some applications might be. So you need to redesign
the application in order to have horizontal scalability
where that extra server should make a difference. With that
said, let's assume you have an application that can actually
do that. One way of doing it is of course you can over provision
your entire system. Let's say you are suddenly going to expect a million requests
per minute. Youll can assume that's what you're going to do.
And right since day one you can provision your servers in that way where
you have enough servers to handle 1 million requests. So if you hit
that kind of a level that is not going to be a problem.
What's the disadvantage of doing it that way? For starters,
on day one where you're not getting a million requests, you have all these servers
running and you're paying money for no reason.
They are just running there without handling any requests.
Then your load starts picking up. Day 50, day 100,
whatever it is. Then each server starts becoming useful
even though then this server here is still being wasted.
But some of the others are now useful until you hit this limit
where you have provisioned something. Once you cross this limit
again now you are under provisioned because you don't have
any server above this that you could provision. Now suddenly you
might end up running to the market trying to buy new servers.
I mean you live in the world of cloud so you're provisioning a new server
but that's still a problem. You have to do something. You need to have
someone to monitor the utilization before a new server is added.
And that's the problem that horizontal scalability
might have. I have at the bottom. This is something that I
found online. Tried to give attribution wherever possible.
If this is how your demand is going to increase. The blue
line is how you want your servers to be provisioned.
But in reality that's not how the world works, right? There's always
these step increases. Anytime you go and get
a new server you have to jump up a step. You might have to make
a capital expenditure if it is on premise or in the cloud
you can just provision a new server. Finally you can also
start reducing your traffic
and you might have to get rid of servers again. That is something
that youll have to worry about. If you are under provisioned
you might lose customers because you don't have the
processing power to accept the request. If you're over provisioned your
servers are running for no reason and you are losing money.
A final point I would like to make is that of elasticity.
Elasticity is very similar to scalability but in a very temporary setting.
So what if normally speaking you get say 5000 requests
a day? That number is going to be the same. But on
the day of the Super bowl suddenly you get
like a million requests. So that temporary spike,
you might want to also have the ability
to handle the traffic that comes in because you don't want to waste the
turn away customers. Even if it is on a temporary basis.
So sometimes while designing for scalability, you might also want to
consider elasticity where you want the ability to add
extra servers for 1 hour where you have that extra
spike. So that's what
you want to achieve. So what you want to achieve is to have servers being
added in the way this blue line is added. When the spike
increases, you want more servers. When it goes down, you want less servers.
There are two ways. The first one is by using what is called
auto scaling. What AWS allows you to do as part
of the shared responsibility model is it allows
you to teach it how to add new servers. By that
it means you can specify events which when
triggered will add an extra server to your cluster.
These events are known AWS auto scaling triggers and
they can be something simple like each time my CPU utilization
hits 90% I want an extra server to be added.
Or each time my Ram hits 85% I want
an extra server to be added to the cluster.
By ram I mean the average ram of the cluster.
This way you can have logic that can teach Amazon
to add extra servers and thus your application now
starts scaling up or scaling down automatically. And that way you don't
have to over provision or under provision. There is still the
possibility because it's still a stepwise increase. You can see that
there are still these pockets where you still have under
or over provisioning. If your auto scaling logic is not aggressive
enough, it will under provision and then you will have lost customers.
Or you can aggressively auto provision, but then you
might have more over provisioning or overcapacity.
The second disadvantage of auto scaling is of course the fact
that you are adding extra logic to the equation. That's going to increase the complexity
of your application. You can't add, there is no free lunch.
You still have to figure out how to add this logic. Sometimes this
logic itself can be complicated. If it's as simple as Ram or CPU,
that's one thing. But you might want to say okay, each time I see an
upward sloping curve of the number of requests I
want you to add and that's where your logic starts getting more and more complicated.
So that's a disadvantage of auto scaling.
The other way of doing it is a serverless application. What serverless
does is it's the full service application
that we talked about in the shared responsibility model where
Amazon says we will handle everything for you as
far as scalability goes, as long as you
make your application run in the very specific way that
the serverless application is designed to run.
So within limits. Of course, you can't have like a trillion requests
a second, but within limits. What Amazon says
is we will increase the capacity and match your load
as long as your application runs the way it is designed to run.
One example is Lambda Lambda is something that runs a function
for you. It's supposed to be very small functions,
asynchronous functions generally, even though they can now run for 15 minutes,
you're supposed to run small asynchronous functions. But if you
can extract this function out and run it on an AWS
lambda within region, within limits,
Amazon will automatically provision resources for youll provision
servers and your function will run on those servers so you don't
have to think about scalability, adding auto scaling logic,
et cetera, et cetera. Again, I keep repeating like a broken
tape, as long as your application conforms
to the way lambdas are supposed to run. So that then
becomes a very effective way of doing, especially if you have big spikes,
jumps and downs. Adding out the scaling logic might be hard,
but lambdas are the rescuers. Postgres is another
example. Aurora on AWS gives you an option
of running your application in a serverless mode where Amazon handles
all the scaling up and scaling down for you. I'm actually going to talk
about postgres Aurora in the next slide where I'll talk about
how Aurora can run either in postgres mode, it can also run
in the auto scaling mode, or it can run in the fixed capacity mode,
all three modes. So it's a wonderful case study to differentiate between
all the different aspects of scaling.
DynamoDB is another example where if you have a key value lookup,
DynamodB gives you a serverless lookup. So as I
said, Aurora is an example of an application on AWS that
can run in the one at the bottom.
That's the base mode where you can just say, okay, I just want one cluster,
or you can have it run in an auto scaling mode,
or it can also run serverless. So I want to talk about the three
different scenarios where each one of these modes is the
best option to go for and how you AWS architects can
make that decision. The first mode is if
you have demand that is sustained and fairly stable, it's a
flat line you probably don't want need auto
scaling. That's unnecessary complexity. Youll are going to get five requests
per minute. That's it. No more, no less. So in
such a situation, you want the thing at the bottom because it's cost efficient,
it's extremely simple, and you don't need to worry about anything.
Aurora serverless also gives you most
of that. But Aurora serverless, you have to pay a premium.
AWS doesn't give you serverless for free. You pay a premium
for the auto scaling that it does on your behalf. And that is unnecessary
in this situation because your demand is constant. The next part
is where you have fairly constant demand,
except certain times of the day when you need to add more resources.
Let's say at lunchtime. Suddenly people want to sit on their desks and order,
I don't know, cell phones or sunglasses from your website.
That's where you have a very sustained demand and you have spikes
when you can predict them. Such a situation,
the best option youll have, first of all, your regular single
instance, Aurora is not going to work because at
lunchtime there is a spike and youll can't just get rid of customers,
I'm assuming at that point. So you can either auto scale or
you can use serverless. Serverless will still work,
but AWS. I said you are paying an unnecessary premium to AWS
when you can just have a time based auto scaling.
You can just teach AWS to scale up at 11:00 a.m.
In the morning and then scale down at 130 and that's
the best option you have. Third part, what if your
demand is bursty? You just can't predict when suddenly you're going to get a
million requests and then you're not going to get something. This lunchtime spike
was fun, but it's lunch
at any given time of the day throughout the
world. In such a situation, well,
youll need to go for serverless because Amazon will handle the ups
and downs and you don't need to have complicated logic that will scale
up and down for you. Even though you could invest time
and money in doing that, you don't want to reinvent the wheel. So in such
a situation, serverless is the best option for you. The point
of me doing this is as an architect, I think about these kind
of trade offs every day and the answer
to that problem is different depending
on what the problem is that you're trying to solve.
And that's just one way of achieving scalability.
So as I said, there are many, many ways you can handle
scalability. As an architect, your job
is to make sure that you pick the right tool for
the job. As I mentioned, scalability is
not where you extract more out of the same number of resources.
You do have to spend money on resources. You make
a decision that it is better to have a resource that you
are allocating for your application because it's important
for the business and this way your application can handle more traffic,
can handle more load, as your company
becomes more and more popular and your product starts getting sold more and
more. Secondly, as I mentioned,
the scalability does not come for free. First of all, you have to spend money
in the resources, plus it might lead to complexity.
It can lead to do more loss of security or
any of the other aspects. So you need to make the right trade off
while trying to achieve scalability. With that said,
I hope that I was able to show some newer
aspects of scalability and shine some light
on some of the more confusing aspects of scalability.
Thank you again for your time and I hope you learned something.