Optimizing Cloud Efficiency with AI-Powered Prompt Engineering: Cost-Effective Strategies for Cloud-Native Platforms

Video size:

Abstract

Unlock the full potential of your cloud-native platform with AI-powered cost optimization! Discover cutting-edge strategies to slash cloud waste, enhance resource efficiency, and achieve up to 35% cost savings—all without sacrificing performance or scalability.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

My name is Hari Ramshetty and I'm a software engineer on the infrastructure team. today I'll be talking about cloud cost optimizations and how you can use FinOps and FinOps practices to reduce your AWS cloud costs. Today I'll also touch briefly on how you can use AI powered tools like ChargerBT, cloud. Gemini and Llama to help you reduce the cloud cost. I'll also touch briefly on the AI agentic approach using the Lang chain and crew AI where you can build your own agents that will help you with the cloud cost optimization process. I'm excited for this talk. So let's just get started. some introductions. I'll be going through, introducing what is a cloud native cost optimization. What are some of the cost drivers in cloud native platforms? What are some of the resource management techniques? some container and serverless optimization, some storage and data management, monitoring and cost observability. what is finops? A cultural approach to what cost optimization. And then I'll deep dive into leveraging prompt engineering and finops to drive cloud efficiency. So let's get started. So the first thing is. Like what is a cloud native platform? The cloud native platform is a platform that lets you build and deploy applications seamlessly on the public cloud. It can be on any container orchestration platform like Kubernetes, ECS or EKS. You have AKS for Azure Community Service or GCP. So whatever the public cloud may be. So those are called as cloud related platforms. So this is where you deploy your applications. And these environments tend to be dynamic and complex in nature because setting up requires like you need to set up a VPC, you need to set up a security groups, you need to set up a whole bunch of virtual networking, and also you need to deploy the deploy and maintain the virtual machines, which we call EC2 instances. so each cloud providers have their own terminology. At the end of the day, it's just like virtual machines. so some of the statistics show that, 30 percent of the cloud spending is basically due to resource mismanagement. And this tends to be true, because you often have engineering teams, who spin up these instances, spin up the new databases for testing purposes. they tend to build new features, which are sometimes abandoned because the business doesn't require them and they're never turned off or decommissioned properly. this is the source of the problem. Like increasing cloud cost and, the objective of this presentation covers strategies for optimizing costs in cloud entity setups and also enabling financial responsibility without compromising on agility or scalability. So let's get started. So what are some of the cost drivers in cloud native platforms? The first thing that comes to our mind is virtual machines, compute. So compute comes in various different forms. It can be like easy to virtual machines that are managed by us or it can be like a lambda, like function as a service that can be, managed by AWS. AWS. So for the virtual machines that are managed by us, like the cloud providers give us a variety of virtual machines. They're like the smaller virtual machines and some are like a very resource intensive, CPU intensive or memory intensive virtual machines. Now each have their own pricing model and, Data like they tend to be more expensive, as they scale vertically. Now, the serverless executions are a tricky bit because it lets you execute a function as a service. But if you call that function like 100, 000 times, it tends to get expensive. So there's no benefit of using function as a service. when you're calling it frequently. So the next type of, the cost driver is storage. Storage comes in three forms. You have block storage, you have object storage, and you have RDS, databases. Now the object storage is S3. So S3 is pretty scalable. So we tend to fill up our S3 buckets with S3 buckets. We have a bucket sprawl, where we store tons, like terabytes of data. Sometimes this data is used, sometimes it's not used. we tend to ship most of our logs into these buckets, and these are put in sanity arguments. And if we forget to put a lifecycle policy on this, it tends to get expensive. That is one of the cost drivers and it applies the same for AWS block store like EBS volumes. So every virtual machine has a block store, like EBS volumes. You delete the virtual machine, but you don't delete the volumes. The volume is still available, but it's detached from the eStroke machine, but you are still being charged for the storage. The next one is RDS databases. the databases are like Aurora PostgreSQL or like similar such databases. sometimes these databases are over provisioned with much higher IOPS. like you don't need that scale of RDS databases because you don't have that much volume. We tend to over provision things because we expect the volume to grow someday. But that is a problem for the another day. The next cost driver is data transfer. So Data transfer charges are often overlooked because they Creep up in your AWS cloud. So think of this as in an organization It's not just like one single account you have 10 different accounts. Some are broad. Some are non prod. You have one account per team You And they have their own VPC and they want to connect to another VPC in another account. You have the NAT gateways, you have the transfer gateways, you do VPC peering, you do cross AZ transfers. there are multiple different layers of networking abstractions that is, that really blends this. Data transfer, charges you don't expect these charges to be like, simple There's okay cut here so the next one is data transfer charges So the data transfer charges are the cost from network egress charges, which is if you have multiple, AWS accounts in an organization and each account has its own VPC and you're trying to connect these VPCs using net gateways or VPC pairings and you have applications in multiple different VPCs trying to continues to talk to each other. There's a charge for that, and these, if not control, these charges tend to be. increasing if there's, for example, there's a spike in a traffic, from one VPC to another, you see a data transfer charges. The next one, the most probably is a third party services. So all the cloud providers, provide, third party APIs. Through their portal and basically they have their own subscription models. For example, you have red hat licenses for the ec2 machines That are provisioned through aws account now these Sometimes these licenses are managed through your cloud provider but if not managed properly these tend to Add additional burden to your AWS cloud build. So what are the common challenges? So these challenges are pretty much common over entire cloud architectures. One is over provisioning. So over provisioning means you're over provisioning, compute for the application that it doesn't require. So you are giving excessive resources To the applications and this causes so for example, you have an application that uses like only 10 percent of the CPU and you are And the remaining 90 percent of the cpu time is under that is over provisioning. The second one is lack of visibility so Resource usage is not often clear because It's easy for, in the cloud providers make it easy for you to provision resources and you don't see the impact of provisioning these resources immediately. Until you get a bill at the end of the month, so There is not an instant alert that lets you know that okay this action that I have taken Is increasing the cloud cost so the lack of visibility is often one of the common challenges There's multi cloud complexity. So Most of the organizations are on single, cloud provider, but there are other, organizations that do multi cloud, where some of those resources are on AWS and some of the resources are on GCP. we had an instance where, some of the engineers were working on a hackathon project, and, This was like three years ago and they forgot to turn off the instances in like the google cloud account because most of our Work goes in one single cloud provider. It's hard to keep track of all the resources in a multi cloud environment so resource managing techniques Right sizing so right sizing ensures that the allocation matches the workload. So You There are multiple tools that help us do this. There's And this works on various different levels. So for example, you have AWS tools like AWS trusted advisor Which lets you know, okay, these are the databases that haven't had connections in the past few days or These are the idle instances that are not getting any traffic. These are the idle EBS volumes Which are not attached to any EC2 instances. So The AWS Trusted Advisor is such a tool that's really helpful for you to know which resources are not used anymore so that you can deprovision them appropriately. And also the right sizing, I would, and when it comes to containers, so the right sizing, like for example, you have Kubernetes. And kubernetes has limits and requests so requests make sure that you have certain amount of cpu and memory And there's a limit so, understanding how much your application needs Because you have the observability tools in the mix. So once you have these observability tools you can Say, Hey, how much, this is the amount of CPU that I need. I don't need one gig, memory and five virtual CPUs. I can just get by one virtual CPU. And like 4. 5 gigs of RAM. Now this, so understanding what your application requires and appropriately setting those configurations is rightsizing. Autoscaling. So autoscaling is dynamically adjusting your resources based on real time demand. And this is much more impractical scenario applied on various levels of abstraction. So AWS provides what's called as an auto scaling group where like you can provision more nodes in the cluster Like automatically based on a certain threshold. So for example, let's just say you have 10 ec2 instances In an auto scaling loop and you have a spike in traffic because you are running some thanksgiving Promotion and you're expecting these traffic to go up and you need more instances. Now setting an autoscaling group and letting it expand, let the autoscaling group add more nodes to deploy applications like, horizontal scalability is called autoscaling and basically you need to be aggressive with autoscaling. You have to set up proper thresholds because once It's not just scale up. You should also scale down once the peak is over. The next one is instance types. So one of the instance types that all the cloud provider I think all the cloud providers provide is like spot instances. So spot instances are like much cheaper than the regular instances, but they don't Guarantee the availability they come at much lesser cost but They can be terminated at any point of time by your cloud provider Now these, spot instances are pretty much useful for various different tasks like CICD, you have like batch processing, where you have the one off job where you want to process a bunch of records and you spin up these spot instances. The next one is reserved instances. So reserved instances are ones you know that you need a minimum number of records. instances in your autoscaling group or your cluster and instead of like dynamically provisioning it you said, Hey, I need this number of cbus at all time And that's when you go to your cloud provider and say hey, I need this compute 365 days 24 hours a day And that's when you go to the reserved instances to make sure like you get a long term discount and of course Like that's the best for the applications that run 365 days 24 hours And the second one is on demand instances. So the on demand instances are used for short term variable needs. As you all know, this is the most common type of the easy to use instances that we use, in our cloud architectures. So one more thing I want to talk about here is the density optimization. so in containers and serverless, Models, you have a container that runs your application. You have a bunch of various small tiny containers that run on a single node. So thanks to docker and kubernetes Building and deploying containers has become pretty much easy, for everyone and also like Integrating kubernetes has proven to be a substantial feature cost savings, for the cloud. one such thing is the density optimization. density optimization means you include more number of containers per node. you provision, one big node and basically, deploy multiple small containers or, configure your Kubernetes, to deploy these applications on like a single node. So that is density optimization like More number of, pods on a single node. So this helps you like use less nodes, but also like you have like more number of containers. There's no idle CPU time or idle memory that gets wasted here. So the next one is node autoscaling. So the node autoscaling, as we talked before, it's similar to, auto scaling group you as you get more and more demand, you add different nodes. So in the Kubernetes, environments, we use, carpenter, one of the cluster, and also there is a cluster auto scaler, which adds the nodes. To kubernetes and also There are some of the optimizations that you can do is execution times. So One other thing is we want to see like for example, if there is a job kubernetes job that's running in a pod and Basically that part is stuck in a flashback loop. Now. We need to keep track of these Restarts And make sure that this don't occur because Oftentimes, this is a wastage of resource. So a single kubernetes job is supposed to perform a job and then Just complete it the part should die. and the other one is reducing idle times so implement efficient cold start strategies to minimize idle costs, so Whenever there's a new part spins up you have the application inside it running You The application needs to spin up whatever the JVM or other, software that it needs to run. so we, we can do some strategies at the application layer to reduce the idle time. Next one is code optimization. So minimum minimize dependencies and optimize code for better memory and cpu usage you can take a look at the flame graphs Like see if there's any like potential memory leaks in your application and fix those there's These are the smaller optimizations, but overall they compound to, significant cost optimizations. The storage and data management. So one of the things that we can do for cloud cost optimization is implement policies, to move data across storage tiers based on expertise. For example, you have, Like a doc that you want to store for the compliance reasons, but it's not Accessed frequently. So what do you do? You store it in a like less accessed year, which is lesser cost and basically you implement a lifecycle policy where An object you can it might be a document or any other it's a log file like it's goes through a different phases of tiers And it's stored in archived, and this helps you reduce like the storage cost for S3 or other object storage. The other thing that we can do is data compression and deduplication. So techniques like data deep compression, And the application reduced storage needs. basically because you have a hundred DB file, you can just compress it aggressively to make sure like you have 10 GB or 20 GB file. the regular audits. So routine audits also help identify and remove unnecessary data. sometimes you only want to store seven years worth of data. Data and the data that's more than seven years. You don't need it. So Performing regular audits on the data helps you Like access like reduce your cloud costs monitoring and cost visibility, so Continuous monitoring. So continuous monitoring is essential for identifying unexpected usage spikes and tracking resources like utilization trends. So this means that we need to always expect that if there is a spike, There's actually a bug somewhere that's causing the spike and we need to set up appropriate monitoring and alerting to let know to let us know that, hey, there's a spike that means something or there's a bug and production or other environments that's causing the spike. There's AWS Cost Visibility Tools. AWS Cost Explorer is one such tool that helps us deep dive into what resources are we using, like what are the cost patterns, like how much we are going to be spending, in particular month based on the current usage. And this gives us a visibility into how much the change in configuration can lead to the increase in cloud costs So and also we can set up budgets and alerts so we can do spending thresholds On a particular aws account. Let's just say we have a thousand dollars Maximum budget on an AWS account running as example application now We can set up alerts when basically there is a spike or when the forecast is more than The current budget and we can set up alerts And we can fix what's causing what resources were provisioned That's causing this alerts to get fired up There's cost allocation tags, the so the cost allocation tags is one of the crucial parts of Cloud cost management because it lets you know which team which project or which department you this resource belongs to and how much it costs because It's all about unit economics. So how much so for example, if you're doing a transaction of hundred dollars How much you're actually spending for the cloud is what matters? so if you're making a hundred dollars in a transaction and you're spending eighty dollars on a cloud cost That's not a viable business model finops. So finops is a cultural approach to cost Optimization So finops provides a framework for cost management that integrates financial accountability into cloud spending it just means that It's a culture shift where finance teams are more involved, with the engineering teams like we Like it's a shared responsibility between all the teams to make sure that we are operating the cloud effectively and basically, like at a low at a lower cost and this also One of the principles of the FinOps is that the, the engineering teams should be able to know, what are the repercussions of their actions of provisioning resources. And it should be near instant, or at least it should be like, the predicted cost should be like, visible to the team so that they can take active, steps to like, to not cross their budgets. Thanks. Etc. And so this helps, developers and the engineering teams know that, hey, I want to make a design decision, but cost is also one of them. So these are all the principles of FinOps and implementing FinOps, will help like organizations make like cloud use much easier. leveraging AI and FinOps to drive cloud efficiency. AI powered tools can continuously monitor cloud usage and expenses in real time. for example, you have all this data about billing and cloud usage, like how much cloud we are using, how much idle time, how much busy time. Now, we can train some AI models, Based on the data that we collect and based on that, we can design the tools that can say, Hey, at this point in time, you, like you are going to have a spike in load and that way. So you can predict that you might need resources at certain point of time. And this helps team prepare for if they need any resources at certain point anomaly detection. So AI algorithms can also analyze historical spending data to detect cost anomalies and irregularities allowing businesses to address these issues from So there's always anomalies in the data so It might be like as I said before it might be some bug that's causing a spike at one point of time Another for example There is if you are, the NAT gateway cost. So the data transfer cost, there's a spike There's a huge bunch of requests that's going out. There's data that's coming in There's ingress data and basically that's there. That's an anomaly because some there's a misconfiguration on the Network side that's caused this spike. Now AI algorithms can Detect this cost anomalies and attributed to hey this change has cost this much to the organization eliminating inefficient resource use so ai can help eliminate idle cloud resources and unused resources by automatically shutting down or resizing them Reducing wasteful spending. I'll talk more about this in the ai agentic approach yeah, let's move forward. Demand forecasting and autoscaling. As I already said, AI can predict the future demand for the cloud resources, enabling better planning and deployment, and dynamically adjusting resources to avoid over provisioning. using prompt engineering for achieving cloud efficiency. This is what we have been waiting for. As you guys already know, the prompt engineering is how you talk to AI models to get the job. And one of the key things about the prompt engineering is role prompting. So what is role prompting? So role prompting is a technique in prompt engineering. to control the output generated by the model by assigning a specific role to the AI model. This can be any model. I just used shared GPT example here, but it can be cloud, it can be llama, it can be Gemini or any AI model that's GPT based. So we can make use of rules like FinOps Expert and craft prompts by providing more context to the prompt. Let's see a couple of examples. So some of the prompts using the FinOps Expert rule. So here what I did try was, as a FinOps Expert, can you please help deep dive into our AWS Cloud Build? Please explain what is different unplanted costs And I'm an amortized cost. So as someone who is coming to like cloud billing Like specifically it was cloud billing. There are multiple different types of costs like you have like unplanned costs You have amortized costs and there are multiple different costs. So to get you Understand, AMRLs can definitely help you to do that and you can use roles like FinOps Expert to help you guide how to implement the FinOps best practices in your organization. This is one of the experiments that I did. So you can also use as a FinOps Expert. Can you analyze our AWS invoice and let us know why our AWS bill was higher than usual? Can you please provide details, detailed report on how could we reduce our AWS spend? Now, this is a very specific prompt and I'm also giving some context to the AI model so that it knows. I've also gave the invoice and let the AI Like why was the spend like, you know increased in the previous month so Since we are talking on the same page. so one of the things that is popular is Like most people are trying to get now is ai agentic approach. So ai agentic approach is Like we build ai agents to get the job done. So for example You can build an ai agent That would monitor The usage of a resource. Let's just say there is a resource that says hey this Resource is been idle for a while. It's not even taking connections. Let me you know, shut down that server I will let the infrastructure institute know that this is happening and I will let them and i'll shut down the like the database temporarily Now this is what an ai is Agentic. AI approaches so we can build like simple ai agents So there's two amazing resources. One is lang chain. One is crew ai They use prompts under the hood the prompt engineering so they can act as a part of automation And they can talk to any of the ai models that are currently available like chat gpt cloud or any other open source models, and basically, like you build an AI agent that will help you, implement these actions, for cost optimization on your behalf. let's get to the conclusion. so cost optimization in cloud platforms requires a comprehensive approach. that balances technical strategies with organization shifts. it's not always easy. the cloud native platforms are, like, increasing. there's a need for the cloud platforms that operate efficiently and at a lower cost. By understanding the primary cost drivers such as compute resources, storage, data transfer, third party service organizations, we can implement like tailored techniques to reduce expenses without compromising performance. more resources, is not equal to like more performance. That's true that you can run like, a big instance and throw an application on it, but that's not how it works. resource management practices like right sizing auto scaling and diversified instance types from the backbone of effective cost optimization at the end of the day at the end of the day These are the core principles like use how much you need Auto scale whenever you need and use the right resources. That's it. And additionally, when it comes to container and serverless, make sure you use like limits and requests appropriately, coupled with when it comes to databases and block storage, use data lifecycle management, lifecycle policies, and to make sure that the data is stored, securely. And also in a cost efficient way and, and also like I want to put a strong emphasis on monitoring and cost visibility framework is really important because, without adequate monitoring, you don't even know what resources you have, what are the spikes. So having a strong monitoring foundation, having a budget alerts, having a proper utilization alerts will help you track resources even before the AWS cloud lands at you. So FinOps principles bring a cultural dimension to this strategy, encouraging cross functional collaboration and fostering cost awareness within development teams. I would reiterate that FinOps will help business teams Finance teams understand the cloud expenditure, with the help of the engineering teams and also The engineering teams would be responsible For the cloud costs and the actions that they take for provisioning new cloud resources So this financial accountability in every stage of cloud management Organizations can align spending with business goals creating a culture of continuous improvement. Thank you That was my talk for today hope you liked it. There's a great future for cloud cost optimizations Using agentic AI approach, which i'm truly excited about. thank you everyone have a good rest of your day

Slides

Download slides (PDF)

See all 40 talks at this event!

Conf42 Prompt Engineering 2024 - Online

November 14 2024 - premiere 5PM GMT

Optimizing Cloud Efficiency with AI-Powered Prompt Engineering: Cost-Effective Strategies for Cloud-Native Platforms

Video size:

Abstract

Summary

Transcript

Slides

Hari Yerramsetty

Software Engineer, Infrastructure @ Flexport

Join the community!

Featured event

2025

2024

Info

Conf42 Prompt Engineering 2024 - Online

November 14 2024 - premiere 5PM GMT

Optimizing Cloud Efficiency with AI-Powered Prompt Engineering: Cost-Effective Strategies for Cloud-Native Platforms

Video size:

Abstract

Summary

Transcript

Slides

Hari Yerramsetty

Software Engineer, Infrastructure @ Flexport

Join the community!