Conf42 Cloud Native 2022 - Online

Why should you bother about cluster autoscaling using Karpenter

Video size:

Abstract

Karpenter is a cluster autoscaling solution for Kubernetes cluster. Karpenter uses effective way to autoscale the capacity by interacting directly with the cloud providers computing services to provision capacity for the kubernetes cluster.

In this talk, I will discuss how to use Karpenter to scale up and down Kubernetes clusters.

Summary

  • Raja Ganesan is a cloud architect at AWS. He explains why you need an autoscaling solution for your Kubernetes cluster. Some sites prefer to over provision while others prefer to have a solution in place. Reasons include resiliency, cost optimization and high availability.
  • cluster autoscaler is an autoscaling solution for kubernetes from the Kubernetes Special Interest group. It works along with the Kube scheduler to find a new place for pods when it becomes unschedulable. Works well for scheduling across multiple availability zone.
  • Karpenter is a fully open source, cloud agnostic, high performance cluster autoscaling solution for kubernetes clusters. It provisions nodes for your Kubernetes cluster based on the scheduling constraints given in the pod specifications. Carpenter avoids unnecessary API calls between your cluster and the underlying cloud provider's APIs.
  • Next, let's talk about node upgrades, which is one of the interesting feature or factor. The next important consideration that we're going to discuss is whether to use single provision or multiple provisioners. Another important feature supported by compender is use of bottle rocket.
  • Karpenter takes a fresh look at the cluster rotate scaling solution for kubernetes clusters. It aims to provide more direct control for site operators and developers to acquire new capacity for your workloads as quickly as possible. Karpenter recommends the use of topology spread constraint instead of the pod affinity.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello all, welcome to the session. Why should you bother about carpenter a cluster auto scaling solution for Kubernetes I am Raja Ganesan, working as a cloud architect at AWS. I have more than a decade's worth of experience in building software and high performance streams in the last several years. My main focus and interest are in building scalable systems, containers and observability. Before we dive into carpenter, let's see, why do we need an autoscaling solution for your Kubernetes cluster? Some sites prefer to over provision their Kubernetes cluster, while the others prefer to have an autoscaling solution in place to meet their unexpected compute needs. Auto scaling in Kubernetes is nothing but automatically adjusting the capacity of the Kubernetes cluster and provide a predictable study performance for your workloads. Some of the factors which may influence to implement an autoscaling solution for your Kubernetes cluster are resiliency, which means recovery from an unexpected failure or load, or even an scheduled or unscheduled interruption. The next could be cost optimization. You might want to run your Kubernetes cluster at an optimal state by making sure that you choose the right size of the resources. Last but not least, you might want to design for high availability, which means your workloads are available consistently in a predictable manner to serve your uses requests. Having said that, we can broadly classify the Kubernetes autoscaling solutions into two categories. One is scaling the underlying machines or nodes that powers the Kubernetes clusters. The other one is to scale the number of instances of an individual workload, in other words, pods, carpenter and cluster autoscaler falls into the earlier one, which helps you to scale the number of nodes of your Kubernetes cluster. Before we talk about Karpenter, let's quickly see about cluster autoscaler and how it works. Cluster autoscaler is nothing but an autoscaling solution for kubernetes from the Kubernetes Special Interest group with the implementation for most of the major cloud providers. The way the cluster autoscale works is by keeping a watch on the Kubernetes API server and works along with the Kube scheduler to find a new place for pods when it becomes unschedulable. When you're using cluster autoscaler, it always assumes your Kubernetes cluster have some sort of grouping, in other words, node groups. When a pod is unschedulable, cluster autoscaler will try to increase the node group size by adding new nodes of the same size which are already present in the node group. It is a straightforward process if you have only one node group in your cluster. When you have more than one node group and more than one node group matches the scheduling criteria of your pods, then cluster autoscale uses expanders to choose the right node group to expand. Let's take a closer look at cluster autoscaler. If a node type or an instance type is not available for any reason, then cluster autoscaler cannot acquire the required compute for your Kubernetes cluster. In this case, the cluster autoscaler will attempt a retry with a pre configured timeout value. Another important factor is when you're running in multiple availability zone, you might want your pods to be evenly distributed. In this case cluster cluster autoscaling Karpenter the underlying cloud provider's zonal rebalancing process. For example, in AWS, it uses the autoscaling group's easy rebalance process to periodically check whether the workloads are evenly distributed across the availability zones. If it is not, it will terminate the nodes so that your workloads can be scheduled elsewhere. Let's try to understand how the cluster auto scaler works by looking at an example and a quick disclaimer. In reality, the cluster autoscaling Karpenter multiple steps before provisioning a node, but for the sake of the explanation, I have oversimplified it. Let's assume that we are running our workload in on AWS and we have this example cluster with a single node group which has a minimum size of one node and maximum size of ten nodes. It is primarily consists of t three medium instance type and currently one node is running with several pods in it. And we have two pods which are waiting to be scheduled of four and five replicas respectively. And these pods require 250 millicore cpu and one gig memory in minimum. When these new pods are waiting to be scheduled, let's see what the cluster autoscaler will do during the process of request for new nodes. Cluster autoscale will attempt to determine the cpu and memory resources in an ASG based on its launch configuration and launch template. To increase the number of nodes, cluster autoscaler will set the desired replicas that it needs in the ASG configuration. In this example, it will set the desired replicas to be four so that it can schedule all the pending pods at once. And remember, cluster autoscale always assumes that your nodes running in a node group are always exactly equivalent. In AWS, it is always recommended to use a flexible set of instance types so that same amount of virtual cpu and memory resources so that you can get a consistent performance from your cluster autoscaling solution. And one important thing to remember here is cluster autoscale works well for scheduling across multiple availability zone when your workloads have no zone specific storage requirements and other pod affinity or node affinity based on the zones. A quick recap. You had nine pods, sorry, two pods and nine replicas that needs to be scheduled and your initial cluster size did not meet the requirements. And cluster autoscale calculated the total resources required for your pods to be scheduled and provisioned three nodes of the same size so that you can schedule all the pods having seen the cluster autoscaling Karpenter let's talk about carpenter, which is the main topic of the discussion today. Karpenter is a fully open source, cloud agnostic, high performance cluster autoscaling solution for kubernetes clusters. It provisions node for your Kubernetes cluster in a groupless way in bother words, if you use carpenter, you don't have to use node groups and avoid meddling with a configuration in another layer like node groups or auto scaling groups. Karpenter provisions the right resources, in other words, nodes directly for your Kubernetes cluster based on the scheduling constraints given in the pod specifications such as resource requests, node affinity, etc. Carpenter avoids unnecessary API calls between your kubernetes cluster and the underlying cloud provider's APIs. And finally, Carpenter uses the best suitable instance type to provision in order to accommodate the pending pods. If you remember, cluster autoscaler will attempt to provision the instance type of the same size of the other nodes in the node group. Before we see how the karpenter works, we need to understand what is Kube scheduler. Kube scheduler is an implementation of a control loop which regularly checks the Kubernetes control plane to make sure the cluster's current state matches the desired state. In other words, if there are parts that needs to be scheduled or evicted, then Kube scheduler is the one that does it. Carpenter works along with the Kube scheduler, very similar to cluster rotor scaler to periodically check for the pods in pending state with the reason unschedulable equal to true, carpenter waits for these events and provision new nodes for the pods to run. When a node becomes empty and there are no running workloads, then carpenter will attempt to deprovision or delete these nodes. In short, if there are more compute needed for your kubernetes cluster, then carpenter will provision additional nodes and if your cluster is underutilized, then karpenter will check the utilization and see if there are any nodes that can be deleted. Let's take the same example that we saw earlier and again, assuming that we are running our workloads on AWS and we have two workload that needs to be scheduled, blue and green ones of four and five replicas and each of which requires 250 milli core cpu and one gig memory. And remember, we are not using any sort of grouping mechanism or node groups because we are using carpenter. In this example, the initial capacity of the cluster was provisioned by Carpenter, which has one t three medium instance with a limit of two virtual cpu and four gig memory. When the new pods are waiting to be scheduled, carpenter will look at the pods that needs to be scheduled and what might be the most suitable instance type to schedule these pods quickly. And Carpenter has its own internal algorithm to select the most optimal instance type from the pool of available instances. Secondly, carpenter interacts directly with the compute provider's API. In this example, since we are using AWS, it interacts directly with the AWS easy to fleet API to provision additional resources. And since our example have two pods with total of nine replicas, karpenter will attempt to choose the right instance to choose to schedule the pods as soon as possible and by calculating the total resources required. It may choose an instance type of m five x large which has enough resources to schedule all the pods. If you look closely, it is not the same size as the earlier one which is a t three medium and which has a two virtual cpu and four gig memory and m five xlarge have four virtual cpu and 16 gig memory which have adequate resources to schedule all the pending pods at one go. Having seen how carpenter works by an example, let's see how to use carpenter. In order to use carpenter, you have to install something called provision. Provisioners are nothing but kubernetes. Carpenter's custom resources that runs inside a Kubernetes cluster. It uses a subset of kubernetes wellknown labels such as zone, instance type and operating system when creating a node for instance. For instance, if a pod has no scheduling constraints defined, then carpenter can choose from wide range of options available from your cloud providers to provision the new nodes. A node provision by carpenter can expire by a number of factors. The first one could be by using the property called TTL seconds until expire, which means when this TTL is reached, karpenter will drain all the pods running in this node and schedule it elsewhere, and this node gets deleted. Once the TTL is reached, the second factor is when the node becomes empty, meaning when there are no running workloads in the node, then carpenter will attempt to delete this node. Karpenter places a node empty TTL on this node which is controlled again by the property called TTL seconds after empty and it will check whether the node has any running workload or not and when the TTL is reached the node gets deleted. And the third factor when a node gets deleted is you can either delete it manually or by a process very similar to KubectL delete node command one of the interesting feature about carpenter is using the mix of spot and on demand instance types. When specifying both, carpenter always gives priority to the spot instances by default. Thus it ensures that cluster have a cost optimization. My bad, carpenter always gives priority to the spot instances by default for any reason. If it is unable to acquire the spot capacity, carpenter will request for on demand resources. Thus it will ensure that you have a reduced cost of your kubernetes cluster and also you can choose spot instances if your workloads can withstand interruptions. Finally, if you do not have any constraints on which zones your pods can be scheduled, carpenter can choose from wide range of instance time that we've seen earlier. If you want more control, you can enforce which zone your parts can be scheduled by using a topology spread constraints let's see some of the operational considerations when using carpenter. If you are on AWS, carpenter creates launch templates automatically for you with the latest eks optimized AMI and encrypted EBS volumes, and some users might not prefer this. If these are not sufficient, then a user can feel free to create your own custom launch template with the AMI of your choice and other security attributes. Next, let's talk about node upgrades, which is one of the interesting feature or factor. Asked by many sites. The most straightforward mechanism to perform node upgrades is by setting the TTL seconds until expired property that we saw earlier. When karpenter provisions a new node, it will automatically picks up the latest AMI configured in the launch templates. The next important consideration that we're going to discuss is whether to use single provision or multiple provisioners. For most sites, using a single provisioner is more than sufficient to meet your needs, but there are certain situations where you might have to use multiple provisioners, for example having a separate provisioner for cpu based resources and GPU based resources. The second important factor could be when you want to have a dedicated provisioner for each teams so that they can manage their own constraints, and also it can provide you better handle for cost attribution. One important thing to remember is when you're using multiple provisioners, it is always recommended to make sure that these provisioners are mutually exclusive. Otherwise, when a pod is pinning to be scheduled, karpenter loops through each provisioner by evaluating the one which matches the scheduling constraints. If there are multiple provisioners which matches the constraints, then karpenter chooses the one randomly. Another important feature which is supported by compender is use of bottle rocket. Bottle rocket is an open source Linux operating system which is purpose built for running containers with improved security and performance. Carpenter supports use of bottle rocket by specifying that in the launch template in AWS. The last consideration that we're going to see today is topology spread constraint when running a kubernetes cluster in production, many sites wants to optimize for availability. Karpenter recommends the use of topology spread constraint instead of the pod affinity to spread the pod placement across the availability zones and capacity types. As we have briefly discussed what is Karpenter? How carpenter works let's recap. Karpenter takes a fresh look at the cluster rotate scaling solution for kubernetes clusters. It aims to provide more direct control for site operators and developers to acquire new capacity for your workloads as quickly as possible. Carpenter provides several improvements over the existing kubernetes autoscaling solutions, such as taking advantage of the wide range of instance types available from your cloud provider so you are not restricted to instance type of the similar sizes. Next, Karpenter works in a groupless fashion, thus avoiding interacting with additional orchestration layer such as autoscaling group, et cetera. Thus when there are failures, your retry time is considerably reduced. The last improvement is when a node is launched by a carpenter. It binds the pods that are scheduled immediately to that node, so when the node provisioning is in progress, kubelet can start preparing the container runtime, pre pull container images, etc. So that the pod becomes available to serve quickly. Karpenter is quite easier to set up and you can follow the getting started guide in the link provided here. We have come to the end of the session. I would like to thank my colleague Aldrid who helped me immensely to prepare for this session and his support. And finally, I would like to thank you for listening to this talk and if you have any questions please feel free to reach out to me.
...

Raja Ganesan

Cloud Architect @ AWS

Raja Ganesan's LinkedIn account Raja Ganesan's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways