The Frugal k8s Engineer: Strategies for Amazon EKS

Video size:

Abstract

Inspired by Dr. Werner Vogels re:Invent 2023 Keynote topics The Frugal Architect on building cost aware workloads, this session will delve into practical strategies and best practices for optimizing costs associated with running Kubernetes clusters on AWS.

Viewers will gain insights into:

Identifying and Eliminating Underutilized Resources
Implementing Effective Autoscaling Strategies
Using Spot Instances and Reserved Instances to Reduce Compute Costs
Best Practices for Storage and Networking Cost Optimization
Monitoring and Controlling Costs with AWS Cost Explorer and Third-Party Tools

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

All right. Welcome everyone to our session on Amazon. E. K. S. Are also known as Amazon Elastic Compute. Kubernetes services cost optimization. I'm Dariush and with me is my colleague Piyush. Hello everyone. We're very excited. To share some insights on how you all can optimize costs while running Kubernetes workloads on Amazon EKS. My name is p and I'm joining Dru. We both are dialing in here today from SoCal. I'm, yeah, as well here in the office today. Zach, so before we dive into cost optimization strategies, let's start with a brief overview of. Amazon EKS. Piyush, could you give us a quick overview of Amazon EKS? Absolutely. Do we, if we have a slide, yeah, let's point to that. So Amazon Elastic Kubernetes service and you see the high level architecture diagram here Amazon EKS, it's our managed Kubernetes offering. It makes it easy to run Kubernetes on AWS. without needing to install, operate and maintain your own Kubernetes control plane or nodes. Absolutely. That's right. So with Amazon EKS, AWS manages the control plane components and you can focus on deploying and managing your applications. It integrates seamlessly with other Amazon AWS with AWS services, giving you a secure and scalable environment for your workloads. Yes, and as organizations adopt Kubernetes for container orchestration, Amazon EKS offers a reliable and efficient way to run Kubernetes at scale. Absolutely. But with great power comes great responsibility. So in this case, potential for high cost. If not managed properly, I would say Kubernetes abstracts away much of the underlying infrastructure, which can sometimes make Cost management, a challenge, would you agree? Yes, of course, and that's where a cost optimization strategy comes into play Understanding where costs are incurred and how to manage them. It's crucial for running efficient operations Exactly. Now, in today's session, we'll explore tools like CubeCost for cost visibility, and we'll discuss scaling solutions like Cluster Autoscaler and Carpenter to optimize resource usage. There was a keynote back in reInvent 2023 by Dr. Werner Vogel where where he emphasized the importance of building cost aware workloads, particularly relevant for Kubernetes cluster on AWS cloud. Yes, I remember watching that ocean. That's what inspired our session as well. When it comes to kubernetes cost, maybe if you can transition to the next slide, that is where we showcase the overall cost and The control plane cost it's fixed at 10 cents an hour So even when you scale up your amazon eks cluster to hundreds and thousands of node, there's no additional control plane cost Which is great the main cost component is on the data plane which comprises of Workload cost which includes your pods your containers your auto scaling parameters and we'll cover some of that In this session next within the data plane for some workloads, you incurred network cost, which includes for data transfer, your elastic container registry cost, your load balancers and costs doing your NAT gateway deployments. And then finally, you incur logging and monitoring cost on the data plane. Good points. Maybe I should, maybe we should go back to the slide that we had Just the back here. So when you mentioned the control plane, I see two boxes here. Would you say the control plane is the box on the left hand side and the data plane is on the one on the right hand side. So which part is really managed by AWS? I would say the other one that the control plane runs on the AWS cloud, whereas data plane runs in your VPC. So over here, what we showcase is a couple of options within data plane. You have self managed options, there is a manage options, and then there is Fargate as well. So yeah, so it's, so AWS is managing the the brain of Kubernetes, right? So to speak the control plane, the orchestration, and it's on the right hand side. And that's where we have the label. If you can see customer VPC that's what we mean by the control plane. Other way around AWS cloud. Oh, there you go. Yes. AWS Cloud. Absolutely. I was looking at the left hand side. So managed self-managed note groups and managed note groups on my left by customer VPC and on the right is AWS Cloud. Alright, so as you can see, this video this recording is not really scripted. Alright. So if you look at the recent CNCF studies study on the Kubernetes cost back in December of 2023, that's something that we looked at. Yes. Yeah, if you can move to that study on the deck, yeah, perfect. Yeah, the study confirmed that Kubernetes cost are indeed increasingly for increasing for many of the organizations that we work with. Further, this survey revealed that a number of human and technology factors were blamed for the increase in both spent and unwanted and unexpected cost in cloud environments. Over provisioning is by far the most common issue, likely due to difficulties in accurately estimating resource needs in dynamic Kubernetes environments. Failure to deactivate unused resources was cited as another issue and the presence of the technical debt, not to forget. So the workloads which have not been optimized for cloud native environments really leads to inefficiencies. Piyush, would you agree? Yes no, I think this is a very useful survey from a. Reputed organization cncf I have some more surveys if you can go to the next slide I think we covered this reasons for overspend. We covered the over provisioning and the presence of technical debt. So if you go to the next one where we look at the tooling survey I'd say if you look here, cloud native default tooling came in first, like AWS cost Explorer in case of AWS cloud in terms of purpose built tooling for Kubernetes cost cube cost came in first. What we see many customers using KubeCost along with AWS Cost Explorer or traditional cloud cost management tooling. So while AWS Cost Explorer will perform overall cloud cost and optimization, KubeCost would be hyper focused on Kubernetes or Amazon Elastic Container Service, EKS, our managed offering for Kubernetes. Absolutely. Great points here. All right we talked about the cube cost, and that's something that probably we can do a quick overview. Is that something that you want to start with? Or do you want to go into a demo on a terminal? Yes. No, let's bring up the dashboard. And now that we've set the stage. Yeah. If you can discuss more about cost visibility and why do we need it? And what's the sense of it? Yep. Awesome. Can you see my screen? Yes, I do. Okay, perfect. So what you see here it's a view of KubeCast that's been deployed or rather installed on our EKS cluster. Now that's a demo version that we're using, but you can also deploy this on your own EKS platform. The reason I'm using the demo, it's got a lot more features than I would have with the free version. Comes into two tiers. One is the free version was the enterprise version. Here, I'm looking at the at this cost. Yes, you're going to mention something. Yeah. No. So maybe so Darush, what's the challenge here that KubeCost is addressing? Is it the busy getting the cost visibility, understanding where cost is coming from within the Amazon EKS cluster? Is that what it is or something else? Absolutely. So it's really within EKS or Kubernetes. It breaks down the cost by namespace, pod, teams. If you have set up labels, for example, for your deployments. It really, that, and that breaks, that breakdown is what what some of our customers are looking for. Yes, and it's particularly important because kubernetes resources as some of our viewers will know they're very dynamic They can scale rapidly. So it makes it difficult to track costs associated with let's say specific applications specific teams And that's where that's the space kube cost fills in or that's why kube cost exists, right? Yeah, go ahead And moreover, with the integration of AWS split cost allocation data, Q cost can provide even more insights into cost attribution in EKS. Good. And for those who may be unfamiliar with this new term that Darush used, AWS split cost allocation data, what it allows for is the distribution of shared costs like data transfer and EBS volumes across various AWS resources giving a more precise cost allocation. Exactly. So maybe we should talk a little bit about AWS split cost. Allocation data. Good. Yes. Yeah. Happy to. So I would say AWS has introduced some amazing features to help enhance cost visibility for your Amazon. Eks cluster including the split cost allocation data feature. So this feature, it allows you to break down the cost of shared resource, like EC2 instances, networking and storage into individual Kubernetes workloads. What does that really mean for you? Traditionally, if multiple pods are running on a single EC2 instance, You just see the overall cost of that instance in a W. S. Cost Explorer. But with a W. S. Cost allocation, you can now break down the cost of that instance and attribute portion of it, a portion of it to each workload, if you will. Based on their actual resource usage, this makes it easier, a lot easier, I would say, to identify which workloads are contributing most to your overall costs. Yes, I think that's a good point. And if you were to combine tube costs, real time visibility. Where the AWS is split cost allocation. You get a powerful way to track and optimize your cost, right? You'll be able to precisely see where your spend is going, whether it's what CPU memory storage network and it allows you to take the right actions to optimize your cloud spend. So together, these tools give you the level of detail. You need to make informed decision about your infrastructure and resource management. Great. Yeah, absolutely. So next we're going to look at going back to cube costs and see how it really helps identify underutilized resources. Look at some of the recommendations that you can see on the dashboard. We'll walk through some of how to spot some of these opportunities for optimization and review the actionable insight cube cost provides. So let's jump into a quick demo and see how it works. All right. So we can see that we have the overview for the cube cost dashboard. Under monitor, you have a number of filters, and those filters could be by namespace, nodes, pods, controllers and containers, or in some cases, environments and deployments. We have here some examples of those. And the more environments you have or the more departments you have with the labels deployed, you'll get to see it here. One area that I really like is the reports where you can have a different number of reports created based on what matters to you most. And, or, last but not least, the most important one to me right now is the savings, where I can see, as you can see, number of options. For example, here, right sizing the cluster nodes. And if you click on it, it comes back and there's a lot of details where we can get into. But at a high level you can see that, yes, we are running a series of CPU families instance families, I should say, and the recommendations which are most important. Going back to instance. Yes. And I'm just curious, what is cube cost basing all these recommendations on? So what data does it have access to? Is it some specific data within EKS cluster that it is tapping into and coming up with this recommendation? Good questions. I don't have the insight of how it really delves into the details of putting the data but over time, this is an example of a demo. If you had installed it on our own, E. K. S. Could be probably looking at a lot more insight. I would say it looks at the load and the load that is running on the E. K. S. And make some intelligent decision. In one case, we are running, for example, carpenter carpenter auto scaler, where it decides the What is the best family type of easy to instances or compute? That's really suited for your workload. But that's a good question. That's something that you should. Yes. Yes. No, what I see in the, I was just speaking at the documentation. What I see is that there are a couple of different options. One it queries metrics from Prometheus, as some of you may know, it's our open source monitoring and time series database. It's installed by default when KubeCost is installed on an Amazon EKS cluster. So KubeCost will query metrics from Prometheus. In addition. It uses API calls to retrieve some public pricing data for AWS services. And then there are some other integrations like AWS cost and usage reports to further enhance the accuracy of pricing information specific to a given account. So it relies on all of these information and then I'm sure there's some intelligence built in and that's how it's coming up with this recommended savings for you. And You mentioned Prometheus. I think we also have a managed, I believe we have a managed service of Prometheus and Grafano on AWS, where you can also take advantage of. All right. This was a great demo. Yeah, this was a great demo. You should have more time to get into a lot more details, but I just want to introduce some of the capabilities that cube cost gives you. And it's a free there is a free version that users can deploy on their EKS platform and really give it a spin and see how it works for them. All right. Next topic we want to talk to is about auto scaling strategies. Let me move that away and go back to our auto scaling strategies. And we are introducing carpenter as one option, but we talked about the cost optimization and, but I think while visibility is crucial. Optimizing resource usage is equally important for cost management. Let's discuss scaling solutions in Amazon EKS. You should. Yeah, I can take that. A lot of folks are a lot of customers we talk to. They're familiar with cluster autoscaler. That's how we started. And so cluster autoscaler automatically adjusts the number of nodes in your cluster based on the scheduling needs of your pods. Thanks. Now, that's helpful, but I think it has its own use cases. But in some instances, I see some feedback where it's somewhat slow in scaling times and less flexible with instance type selection, if you will. Yeah, and we don't make it easy. We have hundreds of instances, and they're always launching new instances. So just amplifies that limitation quite a bit. And that's where Carpenter comes into picture. It's one of our newer, open source, efficient node provisioning tool. It was built by AWS. And we donated Carpenter to the team. as part of Kubernetes auto scaling special interest group SIG. So Carpenter aims to improve the efficiency and speed of scaling operations. It's Kubernetes native. And and what I mean by that is sometimes your Kubernetes workloads, they may be required to run in certain AZ. Certain instance types part on demand so how do you do that? You need mechanisms like node selector node affinity gains and toleration topology spreads Right, all of these are kubernetes concepts So carpenter works with your part scheduling constraints and that's what makes it kubernetes native Exactly. Carpenter can provision new nodes in response to unschedulable pods within seconds, optimizing for cost and performance really by selecting the right instance types. On the fly. Yes. And it also supports features like consolidation and bin packing which can significantly reduce costs by making better use of resources. So let's jump into a quick Devos Carpenter in action and see how it's really operates when we try to scale up and down some of the resources. Is that a good time to go into the terminal perhaps? Yeah, absolutely. Yeah. Maybe if you could start with what's your starting state, I see two windows. You can provide some more background, what it is. This is essentially the, I have an EKS cluster deployed on AWS and I've logged into the, I've set the context for the cluster that we're using the top window. It's an EKS node viewer. It's an open source tool that you can download and install on your Kubernetes On your E. K. S. Cluster, and it really gives you a live view near real time. I would say view into your E. K. S. Cluster on that first line. It shows three nodes in the space. I'm running three nodes and 51 parts on the right hand side. It gives you an estimated cost off per nodes per hour and or per month, which is to some extent helpful. It's not an exact answer. Yeah. Price, but it gives you an estimate below that. It also shows you the instance families that are used both by E. K. S. For the main cluster and other resources that you need for your deployment. The screen below just a terminal screen where and take a look at You get deploy, and you can see here I have three deployments. One is CPU Memory Hungry app, named after Angry app and next is Inflate and Nginx. Each one of them, starting from the top, we have seven replicas, 13 replicas in one. So in this case you can scale up the replicas on the Hungry app, from seven to something else, So that we can scale up. So let's say scale up in late or rather CPU memory from seven replicas to 125 replicas on while, or I click. Yeah, before you hit enter, I'm just curious. So up in the top window, we see. Like 48 parts running and then at the bottom we have three applications So looks like there are more parts I think that's an interesting one and part of it is because carpenter has visibility into the control plane and carpenter recognizes kubernetes resources Which are on the Kubernetes control plane, and that's where those additional resources comes into play. Exactly, because within the within EKS or Kubernetes in general, they have a number of different namespaces based on what applications you deploy and the namespaces that you as a user you create. Now within the cube system namespace on Kubernetes or on EKS. There are a number of pods that are spun up to support the Kubernetes infrastructure, be it the control plane, the scheduler, the proxies, the core DNS and so on. That was a great point you brought up because looking at the number of pods and looking at the screen below, it doesn't add up. But that's why those are pos that are not showing. 'cause I'm only looking ads. A deployment, the number of deployments that I have got it. Yep. So let's scale up your hungry app. All right. And as you can see on the top dynamically, it's creating the parts and wait. So pending running within a few seconds or pretty fast. Yeah, it is pretty fast. Yeah. It's pretty fast in terms of response, and I really took a big number here, Piyush. Yes. The demo gods are with us 25 pending. It's really a simple app, but I can, we can take a look at some of the some of the one interesting thing. The other one is, so of the three nodes, I see two are on demand and one is spot. That's interesting to me. And perhaps if we can Take a look at the configuration, node pool configuration because there was something in the node pool configuration that made pot pointer decide to spin on demand and spot. And yeah, for the benefit of our viewers, if you can bring it up. I have a couple of deployments. One is the high priority yaml file, which essentially shows the deployment. The other one is the hungry app. Let's see hungry app. And so when you say couple of, you mean you have multiple node pools and then there is a way to run them in parallel, assign some. Wait to it and then, okay, let's take a look at one, if you want to hi, just be mindful of time. I'm not sure how much time we have left, but in this case we have a note pools. We have selected specified different families of instances that are available to Carpenter and one on line 40, the operator, the values are spot and on demand. So Carpenter can pick and select depending on the workload that's on Carpenter. Got it. And what happens is there is this trade off for spot where if you just choose the lowest price, sometimes, You get a very small capacity pool, a pool that's almost unavailable, and it's at a very high risk of interruption. So what Carpenter does is it uses EC2 fleet API feature, which relies on price capacity optimized strategy. So it gives us. It gives you the cheapest instance type, but then it doesn't give you the one which is about to be interrupted. So there's a bit of balance in there and Carpenter relies on fleet API to make that trade off. Great point. Yeah that's a great point to bring up. And we have also a parameter called weight within Carpenter. Here I have set the weight for 20 for high priority workloads versus a lower number for lower priority workloads. All right, so let me scale down in this case and see, okay, scale down, going back to example five replicas. And in this case, if the demand on the application that we have deployed, the hungry app, it doesn't really consume a lot of resources, but if you had an application that would consume a lot more resources, it would spin up additional instance family type like spot instance and if required on demand instances in the interest of time. Probably it's a good time to talk about a few takeaways from our discussion as we have seen effective cost or cost optimization in KS. Involves both visibility and action use. Yes, and there are tools like cube cost that you should give a presentation on that. And it gets you visibility to understand where your spend is going. Very granular level parts, namespaces and whatnot. Yep. And scaling solutions like Carpenter allows you to act on that information by optimizing resource allocation in somewhat near real time, if you will. Yes. And what we recommend at Amazon is don't treat this cost optimization as a one off thing, something that you do only at the time of migration or when you cut over. Continuously monitor your cost, regularly review your resource optimization or your resource utilization and then adopted dynamic scaling solutions we saw carpenter. Yeah, and also don't forget the importance of right sizing your workloads and cleaning up unused resources. Small adjustments can lead to significant. savings over time. Yes. And like we discussed cost optimization, it's an ongoing process. So stay proactive and leverage the tools at your disposal. We showcased cube cost carpenter, but then, yeah, there are more out there. All right. So thank you all for joining us today. We hope you found this session valuable and that you can apply these strategies in your own environments. Yes thank you everyone and feel free to reach out to us with any questions or any further discussions. And perhaps we can, before we wrap up and let's do a quick recap if we go to this slide, do we have a slide on that? Yes. Yep. Do you want me to take it or okay. Yeah, I can do that. So as a recap, what we did today was we we started out looking at the breakdown of your Amazon EKS cost. We positioned cube cost as a way to get granular cost visibility within your Amazon EKS clusters. And then we highlighted different auto scaling strategies and Did a sort of deep dive demo on carpenter, which is an efficient cluster node autoscaler for Kubernetes. In between, we highlighted how important it is to ensure that these cost optimization strategies are sustainable over a long period, which requires you to establish the cost governance, so to speak. Yeah, that's all we had. Once again, thank you very much for joining and see you next time. Thank you all.

See all 32 talks at this event!

Conf42 Kube Native 2024 - Online

September 26 2024 - premiere 5PM GMT

The Frugal k8s Engineer: Strategies for Amazon EKS

Video size:

Abstract

Summary

Transcript

Piyush Mattoo

Senior Solutions Architect @ AWS

Dariush Azimi

Senior Solutions Architect @ AWS

Join the community!

Featured event

2025

2024

Info

Conf42 Kube Native 2024 - Online

September 26 2024 - premiere 5PM GMT

The Frugal k8s Engineer: Strategies for Amazon EKS

Video size:

Abstract

Summary

Transcript

Piyush Mattoo

Senior Solutions Architect @ AWS

Dariush Azimi

Senior Solutions Architect @ AWS

Join the community!