Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello all, welcome to the session. Why should you bother
about carpenter a cluster auto scaling solution for Kubernetes
I am Raja Ganesan, working as a cloud architect at AWS.
I have more than a decade's worth of experience in building software and
high performance streams in the last several years. My main focus and
interest are in building scalable systems, containers and
observability. Before we dive
into carpenter, let's see, why do we need an autoscaling solution
for your Kubernetes cluster? Some sites prefer to
over provision their Kubernetes cluster, while the others prefer
to have an autoscaling solution in place to meet their unexpected
compute needs. Auto scaling in Kubernetes
is nothing but automatically adjusting the capacity of
the Kubernetes cluster and provide a predictable study performance
for your workloads. Some of the factors which may
influence to implement an autoscaling solution for your
Kubernetes cluster are resiliency, which means recovery
from an unexpected failure or load, or even an scheduled or unscheduled
interruption. The next could be cost optimization.
You might want to run your Kubernetes cluster at an optimal
state by making sure that you choose the right size
of the resources. Last but not least,
you might want to design for high availability,
which means your workloads are available consistently
in a predictable manner to serve your uses requests.
Having said that, we can broadly classify the Kubernetes autoscaling
solutions into two categories. One is
scaling the underlying machines or nodes that powers the
Kubernetes clusters. The other one is to scale the number of instances
of an individual workload, in other words,
pods, carpenter and cluster autoscaler
falls into the earlier one, which helps you to scale
the number of nodes of your Kubernetes cluster.
Before we talk about Karpenter, let's quickly see about
cluster autoscaler and how it works.
Cluster autoscaler is nothing but an
autoscaling solution for kubernetes from the Kubernetes Special Interest
group with the implementation for most of the major cloud providers.
The way the cluster autoscale works is by keeping a watch
on the Kubernetes API server and works along with the Kube scheduler
to find a new place for pods when it becomes unschedulable.
When you're using cluster autoscaler, it always assumes
your Kubernetes cluster have some sort of grouping,
in other words, node groups. When a pod is unschedulable,
cluster autoscaler will try to increase the node group size by adding
new nodes of the same size which are already
present in the node group. It is a straightforward
process if you have only one node group in your cluster.
When you have more than one node group and more than
one node group matches the scheduling criteria of your pods,
then cluster autoscale uses expanders to choose
the right node group to expand.
Let's take a closer look at cluster autoscaler.
If a node type or an instance type is not available for
any reason, then cluster autoscaler cannot acquire the required
compute for your Kubernetes cluster. In this case,
the cluster autoscaler will attempt a retry with a pre configured
timeout value. Another important factor
is when you're running in multiple availability zone,
you might want your pods to be evenly distributed.
In this case cluster cluster autoscaling Karpenter the
underlying cloud provider's zonal rebalancing process.
For example, in AWS, it uses
the autoscaling group's easy rebalance process to periodically
check whether the workloads are evenly distributed
across the availability zones. If it is not, it will terminate
the nodes so that your workloads can be scheduled elsewhere.
Let's try to understand how the cluster auto scaler
works by looking at an example and a
quick disclaimer. In reality, the cluster autoscaling Karpenter
multiple steps before provisioning a node,
but for the sake of the explanation, I have oversimplified it.
Let's assume that we are running our workload in on AWS
and we have this example cluster with a single node group which has
a minimum size of one node and maximum size of ten nodes.
It is primarily consists of t three medium
instance type and currently one node
is running with several pods in it.
And we have two pods which are waiting to be
scheduled of four and five replicas respectively.
And these pods require 250 millicore cpu
and one gig memory in minimum.
When these new pods are waiting to be scheduled,
let's see what the cluster autoscaler will do during
the process of request for new nodes. Cluster autoscale will attempt to
determine the cpu and memory resources in an ASG based on
its launch configuration and launch template.
To increase the number of nodes, cluster autoscaler will set the desired
replicas that it needs in the ASG configuration.
In this example, it will set the desired replicas to be
four so that it can schedule all the pending pods
at once. And remember,
cluster autoscale always assumes that your nodes running in
a node group are always exactly equivalent.
In AWS, it is always recommended to use a flexible set
of instance types so that same amount of virtual cpu
and memory resources so that you can get a consistent performance from
your cluster autoscaling solution.
And one important thing to remember here is cluster autoscale
works well for scheduling across multiple availability zone when
your workloads have no zone specific storage requirements and
other pod affinity or node affinity based on the zones.
A quick recap. You had nine pods, sorry,
two pods and nine replicas that needs to be scheduled and your initial
cluster size did not meet the requirements. And cluster autoscale
calculated the total resources required for
your pods to be scheduled and provisioned three nodes
of the same size so that you can schedule all the pods
having seen the cluster autoscaling Karpenter let's
talk about carpenter, which is the main topic of the discussion today.
Karpenter is a fully open source, cloud agnostic,
high performance cluster autoscaling solution for kubernetes clusters.
It provisions node for your Kubernetes cluster in a groupless way in
bother words, if you use carpenter, you don't have to use node groups
and avoid meddling with a configuration in another layer like
node groups or auto scaling groups.
Karpenter provisions the right resources, in other words,
nodes directly for your Kubernetes cluster based on the scheduling constraints
given in the pod specifications such as resource requests,
node affinity, etc.
Carpenter avoids unnecessary API calls between your kubernetes
cluster and the underlying cloud provider's APIs.
And finally, Carpenter uses the best suitable instance
type to provision in order to accommodate the pending pods.
If you remember, cluster autoscaler will attempt to provision
the instance type of the same size of the other nodes
in the node group.
Before we see how the karpenter works, we need to understand what
is Kube scheduler. Kube scheduler is an
implementation of a control loop which regularly checks the
Kubernetes control plane to make sure the cluster's current state
matches the desired state. In other words, if there
are parts that needs to be scheduled or evicted, then Kube scheduler
is the one that does it. Carpenter works
along with the Kube scheduler, very similar to cluster rotor scaler to periodically
check for the pods in pending state with the reason unschedulable equal
to true, carpenter waits
for these events and provision new nodes for the pods to run.
When a node becomes empty and there are
no running workloads, then carpenter will attempt to deprovision
or delete these nodes.
In short, if there are more compute needed for your kubernetes
cluster, then carpenter will provision additional nodes
and if your cluster is underutilized,
then karpenter will check the utilization and
see if there are any nodes that can be deleted.
Let's take the same example that we saw earlier and again,
assuming that we are running our workloads on AWS and we have
two workload that needs to be scheduled, blue and green ones
of four and five replicas and each of which requires 250
milli core cpu and one gig memory.
And remember, we are not using any sort of grouping
mechanism or node groups because we are using carpenter. In this
example, the initial capacity of the cluster was
provisioned by Carpenter, which has one t three medium instance
with a limit of two virtual cpu and four gig memory.
When the new pods are waiting to be scheduled, carpenter will look
at the pods that needs to be scheduled and what might be the
most suitable instance type to schedule these pods quickly.
And Carpenter has its own internal algorithm to
select the most optimal instance type from the pool of available instances.
Secondly, carpenter interacts directly with the compute
provider's API. In this example, since we are using AWS,
it interacts directly with the AWS easy to fleet API
to provision additional resources.
And since our example have
two pods with total of nine replicas,
karpenter will attempt to choose the right instance to choose
to schedule the pods as soon as possible and
by calculating the total resources required. It may choose
an instance type of m five x large which has
enough resources to schedule all the pods. If you
look closely, it is not the same size as the earlier one which is
a t three medium and which has a two virtual cpu
and four gig memory and m five xlarge
have four virtual cpu and 16 gig memory which have
adequate resources to schedule all the pending pods at one
go.
Having seen how carpenter works by an
example, let's see how to use carpenter.
In order to use carpenter, you have to install something called
provision. Provisioners are nothing but
kubernetes. Carpenter's custom resources that runs
inside a Kubernetes cluster. It uses
a subset of kubernetes wellknown labels such as zone,
instance type and operating system when creating a
node for instance.
For instance, if a pod has no scheduling constraints defined,
then carpenter can choose from wide range of options available
from your cloud providers to provision the new nodes.
A node provision by carpenter can expire by
a number of factors. The first one could be
by using the property called TTL seconds until expire,
which means when this TTL is reached, karpenter will drain all
the pods running in this node and schedule it elsewhere,
and this node gets deleted. Once the TTL is reached,
the second factor is when the node becomes empty,
meaning when there are no running workloads in the
node, then carpenter will attempt to delete this node.
Karpenter places a node empty TTL on this node which
is controlled again by the property called TTL seconds after empty
and it will check whether the node has any running workload or
not and when the TTL is reached the
node gets deleted. And the third factor
when a node gets deleted is you can either
delete it manually or by a process very similar to
KubectL delete node command one
of the interesting feature about carpenter is using the mix
of spot and on demand instance types. When specifying
both, carpenter always gives priority to the spot instances by default.
Thus it ensures that cluster
have a cost optimization.
My bad, carpenter always gives priority to
the spot instances by default for any reason. If it is unable to
acquire the spot capacity, carpenter will request for on
demand resources. Thus it will ensure
that you have a reduced cost of your kubernetes cluster and
also you can choose spot instances
if your workloads can withstand interruptions.
Finally, if you do not have any constraints on which
zones your pods can be scheduled, carpenter can choose from
wide range of instance time that we've seen earlier. If you want
more control, you can enforce which zone your parts can be scheduled
by using a topology spread constraints
let's see some of the operational considerations when using
carpenter. If you are on AWS, carpenter creates
launch templates automatically for you with the latest eks
optimized AMI and encrypted EBS volumes,
and some users might not prefer this. If these
are not sufficient, then a user can feel free to create your own
custom launch template with the AMI of your choice and
other security attributes. Next,
let's talk about node upgrades, which is one of the interesting feature
or factor. Asked by many sites.
The most straightforward mechanism to perform node upgrades is
by setting the TTL seconds until expired property that we saw earlier.
When karpenter provisions a new node, it will automatically
picks up the latest AMI configured in the launch templates.
The next important consideration that we're going to discuss is
whether to use single provision or multiple provisioners.
For most sites, using a single provisioner is more than sufficient
to meet your needs, but there are certain situations
where you might have to use multiple provisioners,
for example having a separate provisioner for
cpu based resources and GPU based resources.
The second important factor could be when you want to have a dedicated
provisioner for each teams so that
they can manage their own constraints, and also it can
provide you better handle for cost attribution.
One important thing to remember is when you're using multiple
provisioners, it is always recommended to make sure that these
provisioners are mutually exclusive.
Otherwise, when a pod is pinning to be scheduled,
karpenter loops through each provisioner by evaluating the one which
matches the scheduling constraints. If there are multiple provisioners
which matches the constraints, then karpenter chooses the one randomly.
Another important feature which is supported by compender
is use of bottle rocket. Bottle rocket is an open
source Linux operating system which is purpose built for running
containers with improved security and performance.
Carpenter supports use of bottle rocket by
specifying that in the launch template in AWS.
The last consideration that we're going to see today is topology
spread constraint when running a kubernetes cluster
in production, many sites wants to optimize for availability.
Karpenter recommends the use of topology spread constraint
instead of the pod affinity to spread the pod
placement across the availability zones and capacity
types.
As we have briefly discussed what is Karpenter?
How carpenter works let's recap.
Karpenter takes a fresh look at the cluster rotate scaling solution for
kubernetes clusters. It aims to provide more direct control
for site operators and developers to acquire new capacity for your
workloads as quickly as possible.
Carpenter provides several improvements over the existing kubernetes
autoscaling solutions, such as taking advantage
of the wide range of instance types available from your cloud provider
so you are not restricted to instance type of the similar sizes.
Next, Karpenter works in a groupless fashion,
thus avoiding interacting with additional orchestration layer
such as autoscaling group, et cetera. Thus when there
are failures, your retry time is considerably
reduced. The last improvement
is when a node is launched by a carpenter. It binds the
pods that are scheduled immediately to that node,
so when the node provisioning is in progress,
kubelet can start preparing the container runtime, pre pull container
images, etc. So that the pod becomes available to serve
quickly. Karpenter is quite easier
to set up and you can follow the getting started guide in the link
provided here.
We have come to the end of the session. I would like to thank my
colleague Aldrid who helped me immensely to prepare for this session and
his support. And finally, I would like to thank you
for listening to this talk and if you have any questions please feel
free to reach out to me.