Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to scaling Kubernetes clusters without
losing your mind or your money. In the next 30 minutes we
are going to talk about the challenge of making efficient
use of the ever changing options of compute type software
by cloud providers. You can find today on
AWS different options of powerful,
sustainable, cost effective infrastructure.
And there is a question. How do you tailor that to meet your Kubernetes
workload needs while implementing automations to scale
based on your business demand? The answer is carpenter.
In this talk, we are going to review how carpenter simplifies and
accelerates provisioning of cost effective infrastructure.
My name is Yale and I'm a options architect at AWS
focusing on compute services. In the past
few years I've been working with clusters to help them make
optimized and efficient selection of
compute infrastructure for different kinds of workloads with offerings
like EC two and Graviton, as well as specialized
hardware based on their requirements while keeping their
operational efforts to the minimum. In this talk
we will dive into Karpenter, an autoscaling
solution that helps scale efficiently the Kubernetes
infrastructure. We will touch on the technical aspects of
the implementation and the integration with other cost
optimization techniques like easy to spot and graviton.
So let's start with what we want to achieve
with auto scaling. In other words,
efficiency. We start from talking about scale,
which is the obvious part. Pay only for what you use,
provision just the right amount of resources that you actually need
based on your business requirements.
The next part is density,
and by saying density, I mean being able to
select the right compute option and bin pack the
containers intelligently into shared resources to
maximize efficiency. And this is the main part of
the advantages that you can get by using kubernetes,
but it's still not an easy task to do.
The next part is flexibility.
Flexibility is a requirement that is related to
being able to take advantage of cost effective compute options you
can find today. Different types of instances on AWS
and usually more than one instance type
can power your application needs, and there might be more costeffective
solutions than others. The most effective way to
get high amount of resources very economically is easy
to spot the spare compute capacity of AWS.
Another way to get higher performance
with lower cost is the latest graviton processors
that are based on the arm architecture.
Being flexible between different instance types and options will
allow taking advantage of spot and graviton
in a way that will let you get more and pay less.
Now you can notice that these three requirements are obviously
related, being able to scale automatically
when scaling choose the right instances that maximize
efficiency, and that's what we want to accomplish.
And in addition to all of these, we also want to minimize
the operational overhead that you invest in
order to be able to get to this goal. And if you're in this session,
you're probably a DevOps or a platform engineer. You're in charge
of operating production, test dev environments, and you
have ton of tasks to do.
When choosing a solution for auto scaling, one of the main requirements
would be that it will be easy to implement
and will require minimal effort from your side.
So I talked about how efficiency translates into scale
density and flexibility, which are, in other words,
scaling the right opensource, using the most cost optimal resources
as possible based on your container requirements.
So let's talk about container requirements.
They start from defining cpu and memory
that should be defined as resource requests in
your pod or deployment manifest.
They also might need storage network, sometimes gpus.
On the other hand, we'll have EC two instances that
provide a set of opensource that will support the needs
of your applications.
The EC two naming convention represents the amount of
resources that you get from each instance. The instance
size, what we see here is extra large, represents the
amount of cpu that the instance is providing, and the instance
family represents the cpu to memory ratio and therefore defining
how much memory we get. You can also find attributes that
talk about how much additional resources
you're getting, for example, disks or
increased networking throughput.
In this case, the G represents the graviton processor.
You can also identify what processor you're working within each
instance. Type now, in a Kubernetes
cluster, the Kubernetes scheduler is in charge of
matching pods to nodes and initiating
preemption when required. But it's
not in charge of node creation, pod creation,
rescheduling, scheduling, or rebalancing pods
between nodes. And here comes a need for an external solution
that will perform the task of node management that
complies to the same pattern that the Kubernetes scheduler has,
as well as be aware of the cloud elasticity,
the pricing model, and the infrastructure options in
order to maximize the value getting from it.
The question is, in fact, how do we scale
EC two instances to support our application
needs? Now, when working with
Kubernetes, a common practice to scale nodes is using
the clusters autoscaler. Cluster autoscaler
is a very popular open source solution that is in
charge of ensuring that your cluster has enough nodes to schedule your
pods without wasting opensource.
It runs AWS a deployment inside your eks cluster and it's aware
of pod scheduling decisions. So essentially one
of the goals is to bridge between the Kubernetes abstraction
into the cloud abstraction, the auto scaling
groups that are the entity that supports
provisioning nodes for the application needs.
Now let's take a look on a process of the scale up activity
presented in this slide. It starts from a pod
that is in a pending state due to insufficient resources.
This is a good thing because we always want to run just
the amount of resources that we need, and when
we'll have more applications that need to run more containers
that are being created, they will go into a pending state.
Then the scaling solution, the Kubernetes cluster autoscaler,
identifies that event and it's in charge of
scaling up the underlying resources to support the requirements
of the pod. How it works is that
it's in charge of selecting the right auto
scaling group that can support the needs of the spending pod.
It increases the amount of requested
instances in this auto scaling group and waits to
get them back. When those instances are provisioned,
they will run the bootstrap scripts,
join the eks clusters, and then the
pod can be scheduled. Now let's dive deeper
into phase number two and three. When the
Kubernetes cluster autoscaler reaches out to the auto scaling
groups API, it needs to know how much
resources it will get back. It works by simulating
the amount of resources it expects to get from each one of
the auto scaling groups it works with.
So that enforces a requirement that
each auto scaling group should run instances that have similar
resources between the different instance types.
So you need to run each auto scaling group with homogeneous
instance types. That means that in your auto scaling
group you can combine instances like c five
to Excel, c six to Excel. You can
put there the older generation of four.
If you don't mind how much memory you're getting, you only
care about the amount of cpu you're getting. You can combine C
instances together with M instances or R
instances, but you can't, for example,
combine two Excel instances with four Excel because the
cluster autoscaler would not be able to know in advance
how much cpu opensource it's getting from each
instance. The solution of cluster
autoscaler to this problem is to
replicate and run multiple auto scaling groups
in your environment. So if you know that
you have applications that require two excel instances
and others require twelve Excel instances,
simply run a lot of auto scaling groups. And every
time that the cluster auto scaler will have a pending pod
that needs a big instance, it will provision resources
from the big auto scaling group. When it needs a small
instance it will provision resources from the small auto
scaling group. But this does bring
a lot of challenges. So for one,
managing a lot of auto scaling groups is tough because
you need to update the AMIs and roll the instances
and make sure you are maintaining every configuration
there. There are also other challenges
related to running application in multiav fashion for
high availability applications that do
have flexibility between different instance types and they
just want to choose the most optimal one for them
and being able to use spot capacity spot
is the spare capacity of AWS and one of the
main best practices in order for customers to be able
to take advantage of spot is be able to
diversify their instance selection AWS much as
possible. One of the best practices there when
working with spot capacity is be able to diversify
between different sizes of instances.
So for example be able to use four Excel and
eight Excel. If you can pack your application to
one eight excel instance, you can also pack it to
two for Excel instances and so on.
So this is something that we would like our autoscaling
solution to be aware of and implement
in an easy way. This brings me to talk
about carpenter because Carpenter was designed to overcome these
challenges. So similar to cluster
autoscaler, Karpenter is an open source scaling solution
that automatically provisions new nodes in response
to unschedulable pod events. It provisions EC
two capacity directly based on the application requirements
you put in your pod manifest file,
so you can take advantage of all the AC two instance
options available and reduce much of the overhead that
cluster autoscaler had. Carpenter has lots
of cool features, but I'm going to dive specifically
into the features that are related to managing the underlying compute.
So carpenter is implemented as a groupless
auto scaling, meaning it directly scales resources
based on the requirements without the middleware of node groups.
This provides simplification of the configuration
and it allows you to improve the efficiency because
different kinds of applications can run in shared infrastructure.
It also improves performance because scaling decisions
are made in seconds when demand changes.
Even in the largest Kubernetes clusters,
carpenter will perform an EC two
instant fleet request based on the resource requirements.
So if we recap for a second how
we saw that the clusters autoscaler works, we first
have some entity that creates
more pods. They enter into pending state due
to insufficient capacity. Cluster autoscaler will
identify this event and will perform an API
call to the autoscaling group that was already created by
the administrator and the administrator already had to
define what instance stack requirements you should include
inside your auto scaling group and manage multiple
groups in order to support multiple pod requirements.
With carpenter this changes.
You have carpenter right here consolidating
the two phases that we had with cluster oil scalar and
carpenter simply identifies depending pods and
creates an API call to EC two fleet. This API
call is custom made based on the requirements
we have right now from our pending pods. So there
is no need to prepare in advanced list of instance types
that support this pod requirements and
it simplifies the process a lot. So carpenter
is implemented in Kubernetes AWS, a custom resource
definition which is really cool if you think about
it because it's Kubernetes native and you don't need to manage
any resources that are external to your Kubernetes microcosmos.
So the provisioner CRD holds all the configurations
related to the compute that you want to work within the cluster.
By default you can just leave it
as is and allow the provisioner to take
advantage of all the instance types available by EC
two, which are more than 600 today. But if you want
to customize that and you want to include
or exclude something from your instance specification,
you can also do that.
The provision also allows defining other configurations
like limiting the amount of resources provisioned
by a workload in case you want to control a
budget for a team for example, or define when all
nodes will be replaced by putting a time to leave
setting inside the provisioner.
Now let's see how it actually works for different common use
cases. So inside your
Kubernetes microcosmos you might have containers coming
with different requirements. These requirements will usually be
managed by resource requests, node selectors,
affinity topology spread. Carpenter will
eventually select the instances to provision for the
pods based on the combination of all these requirements,
so it reads directly all these constraints that
you can put inside your pod.
Yamls you have different
types of topologies that you can build with
karpenter. So let's start from a single
provisioner. A single provisioner can run multiple types
of workloads where each workload or container
can ask for what it needs, but it has the option to
share resources with other applications as much as possible to
maximize efficiency. On the other hand,
if I want to separate workloads and I want to enforce
them to run on separated nodes, I can do that with multiple
provisioners and each provisioner can define
different compute requirements. For example, I can have my default
provisioning to use spot and on demand and
use all the instance types available. And I
can have another provisioner supporting only GPU
instances for containers that require GPU
and I don't want to share these instances because they
are expensive. Another option
is building prioritized or weighted
provisioners if I want to use different
configurations, but don't really separate it
between the two configurations, but allow for example
running 30% of my deployment
on graviton instances and run all the rest
on x 86 instances. I can do that
with prioritized provisioners and implement
kind of waiting.
So inside a single provisioner, the point is to be as flexible
as possible between the resources that can be consumed by
the containers, so that carpenter will be the one that will make
the intelligent choice of the right instance type to support the
application needs. So what we see here is that we can have
inside a single provisioning, use multiple instance
types and multiple Aws, and I can have my deployment
opensource topology spread between availability zones
so that each replica is required to
run in a different availability zone.
Carpenter will be aware of this
requirement and carpenter will be able to provision
a node an instance for each replica in a
different availability zone. Or carpenter will be
able to run instances to
run containers that require different instance
types. For example, one container can
request a memory optimized instance
while all the rest can just run on whatever is available for it.
One of the major ways that you can save on compute
infrastructure is by using spot instances, and I already
touched about it a little bit. So AWS offers
different pricing models to allow you to choose the best option
for your specific needs. Spot instances
are the AWS pair unused capacity
and it's offered in the same infrastructure
as the other models at a lower price without
any commitment. The only caveat is
that whenever EC two needs that
instance back, it will be able to interrupt it with
a two minute notification warning.
Now spot is a very effective way to get a large amount of capacity
very economically. As long AWS, your application
is aware of these interruption events
and is capable of moving from one instance
to a different one. So let's
talk about containers. Containers are usually very
flexible. If you modernize them, you went through
the process of building them in a fault tolerant
way. They are usually stateless. We have kubernetes
and carpenter that can binpack our containers
into shared resources. So we can use different sizes
of instance types and different families of instance types
or availability zones. And so containers
fit really well to use spot instances.
What's unique about carpenter is that it implements all the spot best practices
which are listed in this slide. It's simplifies
flexibility because by default it allows us to use all the EC
two instance types that are available on the EC two platform.
It uses allocation strategy of price capacity optimized
that helps improve the workload stability and reduce interruption
rates by always choosing the EC two instance that is from
the deepest capacity pool. And carpenter
also manages the spot lifecycle which includes identifying
the interruption events, moving your
containers from the interrupted instance to a different one,
and making sure that we always choose the cheapest
instance to work with.
So this helps us get to the understanding
that spot could be a very good fit for containers
when working with Karpenter and you can tap into
spot capacity and gain up to 90% discount.
The next way to save on your compute is by
using the Amazon develop chips Graviton
I won't dive too much into graviton,
but in two sentences graviton
can provide you up to 40% better price
performance. The list price is usually around
15 or 20% less than the equivalent x
86 intel instances and you can gain
up to a lot more of the performance benefit depending
on the use case. Graviton processors
also provide improved sustainability,
up to 60% more than more
energy efficiency than the comparable x 86
instance processors.
So why carpenter is a great
system, a great orchestration system
to work with graviton processors if
you went through the process of building multi
architecture container images, which means that
you want to allow your applications to use both
graviton as well as x 86 processors,
Karpenter is able to combine graviton, intel and
AMD together in a single cluster just by adding
the support of the different processors in your provisioner.
And then when carpenter will scale up an instance, it will be
able to choose whatever is available in the lowest price.
Let's say that you got a graviton instance,
then your multi architecture container
manifests will be able to pull the graviton container
image and run on Graviton. On the other hand,
if you got an intel based instance,
the multi architecture container manifest will pull
the container image that is suitable for x 86
processors. So carpenter really simplifies the
combination and the usage of different processors inside
worker nodes in kubernetes.
So I'm going to summarize now what I've been
talking about. Remember we defined in the beginning what
is efficiency and what we want to achieve with auto scaling.
We want to be able to provision just the amount
of resources that our applications need. We want to densify
them and be able to choose the right instance sizes that will
allow for the highest bin packing, and we want
to be able to be flexible with different purchase
options. Instance types and instance families so that
we can use the best price performance instances
for our applications. Carpenter essentially provides
the ability to accomplish all of those. It's compatible
with native Kubernetes scheduling. It offers flexibility
and cost optimization using spot and gravitron instances.
And because all the configurations are built in with carpenter,
you know you are scaling with the best practices
so that you can gain the most out of your carpenter
deployment. Last but not least,
carpenter is a project in huge development right
now and new features are going out
all the time. Carpenter is an open source and
you can follow code and roadmap on GitHub,
and you can open issues directly to the development team to
get quick feedback. So I really recommend taking a look on the
Carpenter project on GitHub.
Thanks so much for are listening to me and enjoy
the rest of the conference.