Conf42 Kube Native 2024 - Online

- premiere 5PM GMT

Harness the power of Karpenter to scale, optimize & upgrade Kubernetes

Abstract

Unlock the full potential of Kubernetes with Karpenter! Scale effortlessly, optimize efficiently, and upgrade seamlessly. Join my talk to revolutionize your cloud infrastructure journey in just minutes! Don’t miss out on this game-changing solution!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Welcome everyone. I'm excited to be here. Thank you for joining me today. So imagine that you are hosting a party and you're not sure how many people or how many guests will show up. You could reserve a fixed number of table, but what if more people come than expected? Or what if just half people show up? So you either be headache fighting some for more space for more people or stuck with empty table And waiting resources that you reserve for So that is a lot like traditional kubernetes capacity scaling is working today You allocate capacity upfront, but the reality is things always change. Hi, my name is Jacques Ri, a solution architect from AWS. So I'm here to walk you through some capability of Carpenter and demo as well. So let's dive in. So now what if the table that we mentioned at your party can automatically be expanded or changed based on how many guests arrive? No S, no scrambling. And yeah, so always the right side. That is what exactly how Carpenter does for our cluster. So let's see some of the challenges that we have today in the current test. And also Amazon EKS with cluster also autoscaler. So here you have Like deployment and also you have the existing node already that run two parts, right? So we have two vcpu here for m5 large and then the Scaling up mechanism kicks in maybe come from data or HPA depending on the cluster and we have 1 2 3 4 5 6 7 7 parts in pending state so Some of the parts will be scheduled on the existing node and some of the parts will need additional capacity. So that's why we need cluster autoscaler. And cluster autoscaler works with autoscaling group. So here. We can have a new instance here, which are m5 blush for two instances. Some of the pending part will go to the second one and just one more part will go to the third instance here. We can spot two things here for the challenge. The first thing is that it takes longer time to spin up a new instance for scheduling the pods. Because cluster autoscaler needs to work with autoscaling group and then autoscaling group needs to work with EC2 API. So there are multiple steps to spin up a new instant here. This is a program number one and the program number two, if you can notify or observe that on the third instance, we have only one part running and we still have 1. 5 CPU left. This one is underutilized. So we would like to. try to find the right size of the node that fit the parts that we need to run. So that is the thing that we will will need to be solved. And coming into the third challenge that we have for exiting cluster with cluster autoscaler. So if you can see here, this is the deployment that we already deployed. At this based on the CPU requests and CPU consumption. But how about when we have a new deployment? That need a GPU workload or another type of instance. So now so this part or this deployment cannot be scheduled because this capacity that we have doesn't support the GPU workload. So the things here is that for platform team, they need to create a new compute, new node group. So the first one is for the compute node group. And another one is for GPU node group. So here, after that the autoscaling can spin a new type of instance to support a deployment which GPU. Now it's done. So we have Three main challenge challenges here and we would like to see what kind of things that carpenter can make it Like more efficient or better let's see. So this is the exiting one when we have pending part and we don't have enough capacity cluster auto, autoscaler kicks in and then cluster autoscaler, we work with autoscaling group. After that autoscaling group, we'll spin up a EC2 instance by calling the API to EC2 fleet. So multiple steps, just like I mentioned, but for Carpenter. So we don't need these two components anymore and we don't need to go through the auto scaling group. Also the node group concept will be gone. And we can bypass that by replacing those two modules by Carpenter. And Carpenter can spin up their EC2 on our behalf. The instant that Carpenter spin up, it will be always the cheapest one depending on the configuration. And also it will be always the right size for the pending parts. So let's see how it works. So here we have two instants that running nine parts of our deployment. The type of the C5 to Eclash. And now we don't. We don't run cluster autoscaler and autoscaler in our cluster anymore. We have EKS and also Carpenter installed. How about now we have the GPU workload. So now if we configure the Carpenter configuration correctly, so Carpenter can spin up a new workload which support the GPU workload automatically. And you don't need to do anything. You just configure in the node pool configuration, which we will go through in the couple of slides. So here the new workload can just be scheduled in the new instance that coming up from Carpenter. The, I think this one is the third problem, right? But we can see the first and. The first one is already solved because we bypassed auto scaling group. But this one is the third one, but we will revisit the second one later. So here is the configuration that you can do. That I mentioned. That you can configure the node pool for Carpentry. How it will behave in your cluster. You can have multiple not to as well, but depending on the situation. So here we have the configuration that we can choose from the different types of instance. So you can. Use y card here like you can use only C or C5 or C6 or something like that You can use M. You can use R and you can also specify the size like nano, micro, smaller, large, X large Something like that. So So depending on the use case, you can specify wildcard or a specific one. And how about the availability zone? You can also specify here that you allow Carpenter to spin up a node inside that particular availability zone. Or you can choose either architecture for the CPU. By using xic or 64 here, or you can also use the arm 64 A MB 64. Sorry. And you can also use ARM 64 which is Graviton on AWS. So Graviton provide better price for performance compared to X. So if your application is already tested that it supports arm, so you can use ARM here as well. And the final one is the capacity type. You can either go with on demand or you can utilize spot as well. If you specify both of them, it will try to prioritize the spot. And then you have the limit here. The limit specify that that this node pool can spin up maximum 100 CPU only. When it exceeds 100 CPU, it cannot spin up a new node anymore. Okay, for the node pool, you can have a single one for one cluster. So this one is quite simple, maybe in the test environment or dev environment, or you have a unified node pool configuration that supports all the workloads in this cluster. So it's quite simple for the single one, single node pool. But for a multiple one, you can have maybe multiple team that need the different requirements for the node configuration, or you have the accuracy hardware. Some of the team need the TPU workload or like you have different image or a MI in your workload, like one workload use battle rocket and one workload. Just use EKS optimize. Or something like that. So you can have multiple one to support different kinds of workload. And you can also put the weight on the notepad as well. If you try to prioritize the higher weight notepad first, and then when it run out, when it reached the limit, maybe the first one can spin up only like 10 CPU, depending on the, maybe saving plan or reserving standard you bought from AWS, and then other than that, maybe utilize spot or on demand. So you can use where the node pool as well Okay so here is the Challenge number two that we have some of the node that are in underutilized status so you can notice here on the Two instance on the right side. We have only one pod running and it's the m5 They are m5 egg large, which is quite big compared to the workload is running then inside these two workloads So in carpenter configuration, you can also specify that When will you do the consolidation? Consolidation mean that it will try to move the workload together Try to bin packet to save the cost for you. And you can also specify the time that it will do the disruption or move the workload for you. So here I specified as zero second. So if you try to do consideration every time that it sees that the node is underutilized it's quite aggressive here, but for. Demo purpose I just would like to spin it up quite fast or do the consolidation quite fast You can also specify like one hour or 24 hours or one day two days depending on the use case. So now it can group together for those two pots to inside the exiting capacity. And this is the first que first case, and here is the second one. So the same you have here M five egg large and two two of the pots are running separately. And when you enable the consolidation, you try to find. A node that fit these two parts and spin that up and try to drain these two parts to a new instant to save you some cost. Let's see here. If we group together those two parts and then you try if we move to the new instant, which is M5 blush. So if we reduce the overall cost of cluster. Yeah you can see that we solve. All of the three challenges that I presented back then. So how about some other features as well? Recently we launched a carpenter version 1. 0. So this feature also support that as well, that you can do automatically upgraded by detecting drift. Of the cabinet configuration There is a custom resource another custom resource. It's called ec2 node class for here you can specify the image that you are that you would like to use the security group and also the Availability zone as well. Here this, under the spec, we specify that we would like to use the latest image that it comes out. So it means that if AWS update the EKS optimized a MI. For the new version, Carpenter will try to upgrade every workload under Carpenter and upgrade to the new one and also try to drain the pods for us. So this is, this can be done automatically and it will force every time that we have a new, um, AMI for The ami that you are using so the new version comes and then it will be Automatically upgrade but sometimes you can also do something like this. You can pin the version that you would like to use this one when the updates come out, it will not be automatically updated and then you can change the value inside the EC2 node class later when you would like to do the upgrade. And this one can be specified with the custom AMI that you have as well. So quite flexible for this one. So now coming into the demo that I prepared for you. So let's see here. What do we have? Let's see. So for the nodes, we have two nodes here. The type is a three, three, three large, two nodes. And I think, so let's see the workload. What kind of workload that we have. Okay, we have coordinates in kube system. We have carpenter. This is the controller. So I spin up for two pods for under the deployment because I would like to provide the resiliency and Then I have the my workload here. The name is inflate and the replica is zero now let's see What kind of configuration that I have in that inflate deployment So it's quite basic. The name is inflate, and the name space is inflate. And the CPU request here is one CPU. I didn't specify the memory requirement. And it will just pull down from the public ECR repository. And it did nothing it just reserved the cpu that's all I think for now and How about the node pool configuration that we went through? So what kind of thing that I configure? Let's see Okay, I think Yeah, we have only one node pool here. The name is the default. You can have multiple just like I mentioned. And the version is version one, of course, because we launched as a stable version. Consolidation policy now is when it's empty and we change later. And the consolidation time is five seconds. The limit here, this node tool can spin up only 10 CPU only, after that it cannot spin up a new instance. Architecture is x86. The capacity type is on demand only and Instant type here this node tool can spin up a C type, M type, and R type. So let's see What type of things that it will happen when we scaled up our workload? Yep. So we spin up five replica here. And so we have five pending pots now or so. You can see it's detect quite fast and it's spin up a new one for us. It's C six a two X large. It's not ready yet because it take some time for the node to be ready, but yeah, it's super fast to detect and this one is faster than auto scaling group and cluster auto scaling, so it takes some time for that. So now, it's also spin up two parts. It is the cool proxy and also couplet. Yeah, so all of them already scheduled five parts and it's the flat right side for This kind of pending parts. So let's see the logs to understand it further this is the log from carpenter. So here you request for five parts that we see in the pending parts, right? And then Carpenter, we try to come up with the right size instant. So here we request for five CPU, the number of part is seven because it include kubelet and kubeproxy. So we try to search through the instant type that we specify in the node pool, like 45, 46 yeah. It's in that time and it came up with c6a2xlash in this availability zone with on demand type and this one support 58 pods. This one is depending on the number of elastic network interface. Now it's already registered and ready to to host the pods for you. Yeah, that is a lot. How about now we scaled up to 20. So some of them can be scheduled in the existing node, and some of them will be scheduled in the new instance that's coming up here. So it comes up to C6a large, but we still have some pending pods there, 13, right? I think C6a large maybe can support like one or two. Let's see now it's not ready yet. So we need to wait until it's ready to see what kind of pending part that we have Yeah, just speed just schedule just one pot inside the new instance and we still have 12 parts that is in pending state here. Yeah, you can see here as well and So can you guess? Why character don't spin up anyone because it reached the limit that we specified that this node pool can spin up only just like 10 CPU only and now it reached the limit so How about we increase that? Let's see Yeah, we will increase that to 1000 from 10 then hopefully it can spin up a new node to support 12 pending parts. So now it's quite fast. You can see so now it's been up a C6A4XLarge here. The limit already changed is 1000. C6A4XLarge and it will fit over the 12 pending parts here. Soon, so I need to wait for a cool plan and cool proxy here Okay. Yeah, it's there. And it's all the pods is running. I would like to add arm as well. The ground, we don't want to set some costs. And I also would like to add the spot as well. For a spot, you need to make sure that your workload can tolerate for the risk disruption that we call back the spot. Okay. Let's check the configuration that we add the spot and also the arm. I just architecture as well. Okay. It's up there somewhere here so you specify X86 which is AMD 64 and arm 64. Yeah crowd it on one and Spots as well. I think something will comes up here when it fight a spot for us because it's cheaper It's deleting this note. So it's right try to drain some pots out Already drain to the extinct one. Maybe the lower one And coming up with the spot soon. Okay. Let's wait a little bit. Let's see. Okay here. So we have C 60 G means Graviton, which is arm and from on demand, we change to spot. Here is much cheaper, like up to 70%. Cheaper than the only one. Okay. Just wait to it to be ready and then it will try to drain the pods for us It will call it on and also drain for us. Yeah Okay, now it's working for a spot and grab it on which help us to save some cost of our cluster Can we it will delete the old one here? How about we change the consolidation policies policy from when empty to when empty or underutilized Let's see and here Consolidation policy from when empty to when empty or underutilized under five second As I say you can specify like one hour or 24 four hour do the consolidation if you don't want to have any disruption Okay, I will try to Reduce the number of the workload here so that we have some underutilized node here we have one five pots and 20 percent of consumption for CPU and It will try to come up with a new node and drain for us No to make sure that our cluster is like a more utilized Yep Now we have it. This one is also underutilized. So we'll try to spin up a new one and then move all the parts for us. Yeah. Deleting the other one that don't need anymore. So this is how it works for consolidation and also some other things like architecture, spot and also the type that we can select. How about now we would like to do 10 and toleration. So we specify that Carpenter will spin up a part, a node with a 10. The key is my name is Chakri. Here, so the node that spin up by Carpenter will have this 10. So the deployment need to have the toleration for this one. I will try to increase the number here of the workload. Oh, unfortunately that it will fall into the node that we have already, then it can be scheduled. So now we add some more. So these some of the pods here will be in pending. Because The Carpentry cannot spin up a node for it. Deployment doesn't have a toleration for the 10 key chakras. So cannot spin up here. So now we would like to add the toleration into the deployment so that Carpentry can spin up a node for us. So let's see. Okay. Just add the toleration here in our deployment. Let's check. Oh, it's just spin up a new one for us. You can see ccgs for Eclodge. So let's see in the deployment that we have the iteration. That I just added. Okay. Here is the iteration. That's why Carpenter can spin up a new node for us. And yeah, schedule our parts. Boop. Okay. Okay, that's all I think I think the final one Is about the ac spread so carpentry will respect anything that you configure in your deployment So here I would like to spread my ac. From each other And you can see that a lot of things that happening. So you try to spin up a new node for supporting different ac So this one will ensure that your workload will have higher resiliency. So this one is the configuration that I specify in the deployment that I would like to use topology spread. That if it cannot satisfy, will not schedule. You can see that do not schedule that. You can schedule anyway as well, but this one I would like to show that Carpenter, we respect anything that we put in a deployment. So we would like to check when it's done that we have multiple AC for our workload here. So you can split, you can, okay, it's deleting one of them, maybe just wait for it. So here we have AC1A, 1B, 1C. So let's see again because it's deleting. Yep, I think we have all of the AC here. A, B, C for our workload. Yeah, you try to balance the AC for us. That's all I would like to show. A lot of things that you can explore by yourself. You can also read to the best practice that we have online. So for onboarding to Carpenter, so you can install Carpenter by using Hamshark and One best practice here that do not run the Carpenter on the node that It's managed by Carpenter. Otherwise, it will just delete the node that runs Carpenter and stop working. You can run Carpenter on Fargate or different node groups. And you can also migrate from Cluster, Autoscaler as well. The guide is there. And you can explore. So let's sum up. Also, CloudHunter is fast, it's quite simple, but it's quite powerful. It's cost effective because it will try to reduce the cost for you by consolidation. And also it's Kubernetes because it's open source and now we have, we support multiple cloud already. And yeah, I think it's awesome. Thank you so much.
...

Chakkree Tipsupa

Solutions Architect @ AWS

Chakkree Tipsupa's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways