Transcript
This transcript was autogenerated. To make changes, submit a PR.
My name is Hari Ramshetty and I'm a software engineer
on the infrastructure team.
today I'll be talking about cloud cost optimizations and how you can
use FinOps and FinOps practices to reduce your AWS cloud costs.
Today I'll also touch briefly on how you can use AI powered
tools like ChargerBT, cloud.
Gemini and Llama to help you reduce the cloud cost.
I'll also touch briefly on the AI agentic approach using the Lang chain
and crew AI where you can build your own agents that will help you with
the cloud cost optimization process.
I'm excited for this talk.
So let's just get started.
some introductions.
I'll be going through, introducing what is a cloud native cost optimization.
What are some of the cost drivers in cloud native platforms?
What are some of the resource management techniques?
some container and serverless optimization, some storage and
data management, monitoring and cost observability.
what is finops?
A cultural approach to what cost optimization.
And then I'll deep dive into leveraging prompt engineering and
finops to drive cloud efficiency.
So let's get started.
So the first thing is.
Like what is a cloud native platform?
The cloud native platform is a platform that lets you
build and deploy applications seamlessly on the public cloud.
It can be on any container orchestration platform like Kubernetes, ECS or EKS.
You have AKS for Azure Community Service or GCP.
So whatever the public cloud may be.
So those are called as cloud related platforms.
So this is where you deploy your applications.
And these environments tend to be dynamic and complex in nature because setting up
requires like you need to set up a VPC, you need to set up a security groups, you
need to set up a whole bunch of virtual networking, and also you need to deploy
the deploy and maintain the virtual machines, which we call EC2 instances.
so each cloud providers have their own terminology.
At the end of the day, it's just like virtual machines.
so some of the statistics show that, 30 percent of the cloud spending is
basically due to resource mismanagement.
And this tends to be true, because you often have engineering teams, who
spin up these instances, spin up the new databases for testing purposes.
they tend to build new features, which are sometimes abandoned because the business
doesn't require them and they're never turned off or decommissioned properly.
this is the source of the problem.
Like increasing cloud cost and, the objective of this presentation covers
strategies for optimizing costs in cloud entity setups and also enabling
financial responsibility without compromising on agility or scalability.
So let's get started.
So what are some of the cost drivers in cloud native platforms?
The first thing that comes to our mind is virtual machines, compute.
So compute comes in various different forms.
It can be like easy to virtual machines that are managed by us or it can be
like a lambda, like function as a service that can be, managed by AWS.
AWS.
So for the virtual machines that are managed by us, like the cloud providers
give us a variety of virtual machines.
They're like the smaller virtual machines and some are like a very
resource intensive, CPU intensive or memory intensive virtual machines.
Now each have their own pricing model and, Data like they tend to be more
expensive, as they scale vertically.
Now, the serverless executions are a tricky bit because it lets you
execute a function as a service.
But if you call that function like 100, 000 times, it tends to get expensive.
So there's no benefit of using function as a service.
when you're calling it frequently.
So the next type of, the cost driver is storage.
Storage comes in three forms.
You have block storage, you have object storage, and you have RDS, databases.
Now the object storage is S3.
So S3 is pretty scalable.
So we tend to fill up our S3 buckets with S3 buckets.
We have a bucket sprawl, where we store tons, like terabytes of data.
Sometimes this data is used, sometimes it's not used.
we tend to ship most of our logs into these buckets, and these
are put in sanity arguments.
And if we forget to put a lifecycle policy on this, it tends to get expensive.
That is one of the cost drivers and it applies the same for AWS
block store like EBS volumes.
So every virtual machine has a block store, like EBS volumes.
You delete the virtual machine, but you don't delete the volumes.
The volume is still available, but it's detached from the eStroke machine, but you
are still being charged for the storage.
The next one is RDS databases.
the databases are like Aurora PostgreSQL or like similar such databases.
sometimes these databases are over provisioned with much higher IOPS.
like you don't need that scale of RDS databases because you
don't have that much volume.
We tend to over provision things because we expect the volume to grow someday.
But that is a problem for the another day.
The next cost driver is data transfer.
So Data transfer charges are often overlooked because they
Creep up in your AWS cloud.
So think of this as in an organization It's not just like one single account
you have 10 different accounts.
Some are broad.
Some are non prod.
You have one account per team You And they have their own VPC and they want to
connect to another VPC in another account.
You have the NAT gateways, you have the transfer gateways, you do VPC
peering, you do cross AZ transfers.
there are multiple different layers of networking abstractions
that is, that really blends this.
Data transfer, charges you don't expect these charges to be like, simple
There's okay cut here so the next one is data transfer charges So the data transfer
charges are the cost from network egress charges, which is if you have multiple,
AWS accounts in an organization and each account has its own VPC and you're
trying to connect these VPCs using net gateways or VPC pairings and you have
applications in multiple different VPCs trying to continues to talk to each other.
There's a charge for that, and these, if not control, these charges tend to be.
increasing if there's, for example, there's a spike in a
traffic, from one VPC to another, you see a data transfer charges.
The next one, the most probably is a third party services.
So all the cloud providers, provide, third party APIs.
Through their portal and basically they have their own subscription models.
For example, you have red hat licenses for the ec2 machines That are provisioned
through aws account now these Sometimes these licenses are managed through
your cloud provider but if not managed properly these tend to Add additional
burden to your AWS cloud build.
So what are the common challenges?
So these challenges are pretty much common over entire cloud architectures.
One is over provisioning.
So over provisioning means you're over provisioning, compute for the
application that it doesn't require.
So you are giving excessive resources To the applications and this
causes so for example, you have an application that uses like only 10
percent of the CPU and you are And the remaining 90 percent of the cpu time
is under that is over provisioning.
The second one is lack of visibility so Resource usage is not often clear because
It's easy for, in the cloud providers make it easy for you to provision
resources and you don't see the impact of provisioning these resources immediately.
Until you get a bill at the end of the month, so There is not an instant alert
that lets you know that okay this action that I have taken Is increasing the
cloud cost so the lack of visibility is often one of the common challenges
There's multi cloud complexity.
So Most of the organizations are on single, cloud provider, but there are
other, organizations that do multi cloud, where some of those resources are on AWS
and some of the resources are on GCP.
we had an instance where, some of the engineers were working on a hackathon
project, and, This was like three years ago and they forgot to turn
off the instances in like the google cloud account because most of our Work
goes in one single cloud provider.
It's hard to keep track of all the resources in a multi cloud environment
so resource managing techniques Right sizing so right sizing ensures that
the allocation matches the workload.
So You There are multiple tools that help us do this.
There's And this works on various different levels.
So for example, you have AWS tools like AWS trusted advisor Which lets you
know, okay, these are the databases that haven't had connections in the past few
days or These are the idle instances that are not getting any traffic.
These are the idle EBS volumes Which are not attached to any EC2 instances.
So The AWS Trusted Advisor is such a tool that's really helpful for
you to know which resources are not used anymore so that you can
deprovision them appropriately.
And also the right sizing, I would, and when it comes to containers,
so the right sizing, like for example, you have Kubernetes.
And kubernetes has limits and requests so requests make sure that you have
certain amount of cpu and memory And there's a limit so, understanding how
much your application needs Because you have the observability tools in the mix.
So once you have these observability tools you can Say, Hey, how much,
this is the amount of CPU that I need.
I don't need one gig, memory and five virtual CPUs.
I can just get by one virtual CPU.
And like 4.
5 gigs of RAM.
Now this, so understanding what your application requires and
appropriately setting those configurations is rightsizing.
Autoscaling.
So autoscaling is dynamically adjusting your resources based on real time demand.
And this is much more impractical scenario applied on various levels of abstraction.
So AWS provides what's called as an auto scaling group where like
you can provision more nodes in the cluster Like automatically
based on a certain threshold.
So for example, let's just say you have 10 ec2 instances In an auto scaling loop and
you have a spike in traffic because you are running some thanksgiving Promotion
and you're expecting these traffic to go up and you need more instances.
Now setting an autoscaling group and letting it expand, let the autoscaling
group add more nodes to deploy applications like, horizontal scalability
is called autoscaling and basically you need to be aggressive with autoscaling.
You have to set up proper thresholds because once It's not just scale up.
You should also scale down once the peak is over.
The next one is instance types.
So one of the instance types that all the cloud provider I think all the cloud
providers provide is like spot instances.
So spot instances are like much cheaper than the regular instances, but they
don't Guarantee the availability they come at much lesser cost but They can
be terminated at any point of time by your cloud provider Now these, spot
instances are pretty much useful for various different tasks like CICD,
you have like batch processing, where you have the one off job where you
want to process a bunch of records and you spin up these spot instances.
The next one is reserved instances.
So reserved instances are ones you know that you need a minimum number of records.
instances in your autoscaling group or your cluster and instead of like
dynamically provisioning it you said, Hey, I need this number of cbus at all
time And that's when you go to your cloud provider and say hey, I need this compute
365 days 24 hours a day And that's when you go to the reserved instances to make
sure like you get a long term discount and of course Like that's the best for the
applications that run 365 days 24 hours And the second one is on demand instances.
So the on demand instances are used for short term variable needs.
As you all know, this is the most common type of the easy to use instances that
we use, in our cloud architectures.
So one more thing I want to talk about here is the density optimization.
so in containers and serverless, Models, you have a container
that runs your application.
You have a bunch of various small tiny containers that run on a single node.
So thanks to docker and kubernetes Building and deploying containers has
become pretty much easy, for everyone and also like Integrating kubernetes
has proven to be a substantial feature cost savings, for the cloud.
one such thing is the density optimization.
density optimization means you include more number of containers per node.
you provision, one big node and basically, deploy multiple small containers or,
configure your Kubernetes, to deploy these applications on like a single node.
So that is density optimization like More number of, pods on a single node.
So this helps you like use less nodes, but also like you have
like more number of containers.
There's no idle CPU time or idle memory that gets wasted here.
So the next one is node autoscaling.
So the node autoscaling, as we talked before, it's similar to, auto scaling
group you as you get more and more demand, you add different nodes.
So in the Kubernetes, environments, we use, carpenter, one of the
cluster, and also there is a cluster auto scaler, which adds the nodes.
To kubernetes
and also There are some of the optimizations that you
can do is execution times.
So One other thing is we want to see like for example, if there is a job kubernetes
job that's running in a pod and Basically that part is stuck in a flashback loop.
Now.
We need to keep track of these Restarts And make sure that this
don't occur because Oftentimes, this is a wastage of resource.
So a single kubernetes job is supposed to perform a job and then
Just complete it the part should die.
and the other one is reducing idle times so implement efficient cold
start strategies to minimize idle costs, so Whenever there's a new part
spins up you have the application inside it running You The application
needs to spin up whatever the JVM or other, software that it needs to run.
so we, we can do some strategies at the application layer to reduce the idle time.
Next one is code optimization.
So minimum minimize dependencies and optimize code for better memory and
cpu usage you can take a look at the flame graphs Like see if there's any
like potential memory leaks in your application and fix those there's
These are the smaller optimizations, but overall they compound to,
significant cost optimizations.
The storage and data management.
So one of the things that we can do for cloud cost optimization is
implement policies, to move data across storage tiers based on expertise.
For example, you have, Like a doc that you want to store for the compliance
reasons, but it's not Accessed frequently.
So what do you do?
You store it in a like less accessed year, which is lesser cost and basically
you implement a lifecycle policy where An object you can it might be a document
or any other it's a log file like it's goes through a different phases of
tiers And it's stored in archived, and this helps you reduce like the storage
cost for S3 or other object storage.
The other thing that we can do is data compression and deduplication.
So techniques like data deep compression, And the application reduced storage needs.
basically because you have a hundred DB file, you can just compress
it aggressively to make sure like you have 10 GB or 20 GB file.
the regular audits.
So routine audits also help identify and remove unnecessary data.
sometimes you only want to store seven years worth of data.
Data and the data that's more than seven years.
You don't need it.
So Performing regular audits on the data helps you Like access like reduce
your cloud costs monitoring and cost visibility, so Continuous monitoring.
So continuous monitoring is essential for identifying
unexpected usage spikes and tracking resources like utilization trends.
So this means that we need to always expect that if there is a spike,
There's actually a bug somewhere that's causing the spike and we need to set
up appropriate monitoring and alerting to let know to let us know that, hey,
there's a spike that means something or there's a bug and production or other
environments that's causing the spike.
There's AWS Cost Visibility Tools.
AWS Cost Explorer is one such tool that helps us deep dive into what
resources are we using, like what are the cost patterns, like how much we
are going to be spending, in particular month based on the current usage.
And this gives us a visibility into how much the change in configuration
can lead to the increase in cloud costs So and also we can set up budgets
and alerts so we can do spending thresholds On a particular aws account.
Let's just say we have a thousand dollars Maximum budget on an AWS account running
as example application now We can set up alerts when basically there is a spike or
when the forecast is more than The current budget and we can set up alerts And we
can fix what's causing what resources were provisioned That's causing this alerts to
get fired up There's cost allocation tags, the so the cost allocation tags is one of
the crucial parts of Cloud cost management because it lets you know which team which
project or which department you this resource belongs to and how much it costs
because It's all about unit economics.
So how much so for example, if you're doing a transaction of hundred dollars
How much you're actually spending for the cloud is what matters?
so if you're making a hundred dollars in a transaction and you're spending
eighty dollars on a cloud cost That's not a viable business model
finops.
So finops is a cultural approach to cost Optimization So finops provides
a framework for cost management that integrates financial accountability into
cloud spending it just means that It's a culture shift where finance teams are
more involved, with the engineering teams like we Like it's a shared responsibility
between all the teams to make sure that we are operating the cloud effectively
and basically, like at a low at a lower cost and this also One of the
principles of the FinOps is that the, the engineering teams should be able
to know, what are the repercussions of their actions of provisioning resources.
And it should be near instant, or at least it should be like, the predicted
cost should be like, visible to the team so that they can take active, steps
to like, to not cross their budgets.
Thanks.
Etc.
And so this helps, developers and the engineering teams know that, hey,
I want to make a design decision, but cost is also one of them.
So these are all the principles of FinOps and implementing FinOps,
will help like organizations make like cloud use much easier.
leveraging AI and FinOps to drive cloud efficiency.
AI powered tools can continuously monitor cloud usage and expenses in real time.
for example, you have all this data about billing and cloud usage, like
how much cloud we are using, how much idle time, how much busy time.
Now, we can train some AI models, Based on the data that we collect
and based on that, we can design the tools that can say, Hey, at this point
in time, you, like you are going to have a spike in load and that way.
So you can predict that you might need resources at certain point of time.
And this helps team prepare for if they need any resources at
certain point anomaly detection.
So AI algorithms can also analyze historical spending data to detect cost
anomalies and irregularities allowing businesses to address these issues from
So there's always anomalies in the data so It might be like as I said before it
might be some bug that's causing a spike at one point of time Another for example
There is if you are, the NAT gateway cost.
So the data transfer cost, there's a spike There's a huge bunch
of requests that's going out.
There's data that's coming in There's ingress data and basically that's there.
That's an anomaly because some there's a misconfiguration on the
Network side that's caused this spike.
Now AI algorithms can Detect this cost anomalies and attributed to hey
this change has cost this much to the organization eliminating inefficient
resource use so ai can help eliminate idle cloud resources and unused resources
by automatically shutting down or resizing them Reducing wasteful spending.
I'll talk more about this in the ai agentic approach yeah, let's move forward.
Demand forecasting and autoscaling.
As I already said, AI can predict the future demand for the cloud
resources, enabling better planning and deployment, and dynamically adjusting
resources to avoid over provisioning.
using prompt engineering for achieving cloud efficiency.
This is what we have been waiting for.
As you guys already know, the prompt engineering is how you
talk to AI models to get the job.
And one of the key things about the prompt engineering is role prompting.
So what is role prompting?
So role prompting is a technique in prompt engineering.
to control the output generated by the model by assigning a
specific role to the AI model.
This can be any model.
I just used shared GPT example here, but it can be cloud, it
can be llama, it can be Gemini or any AI model that's GPT based.
So we can make use of rules like FinOps Expert and craft prompts by
providing more context to the prompt.
Let's see a couple of examples.
So some of the prompts using the FinOps Expert rule.
So here what I did try was, as a FinOps Expert, can you please help
deep dive into our AWS Cloud Build?
Please explain what is different unplanted costs And I'm an amortized cost.
So as someone who is coming to like cloud billing Like
specifically it was cloud billing.
There are multiple different types of costs like you have like unplanned
costs You have amortized costs and there are multiple different costs.
So to get you Understand, AMRLs can definitely help you to do that and you
can use roles like FinOps Expert to help you guide how to implement the FinOps
best practices in your organization.
This is one of the experiments that I did.
So you can also use as a FinOps Expert.
Can you analyze our AWS invoice and let us know why our AWS
bill was higher than usual?
Can you please provide details, detailed report on how could
we reduce our AWS spend?
Now, this is a very specific prompt and I'm also giving some context
to the AI model so that it knows.
I've also gave the invoice and let the AI Like why was the spend like, you
know increased in the previous month so Since we are talking on the same page.
so one of the things that is popular is Like most people are trying to
get now is ai agentic approach.
So ai agentic approach is Like we build ai agents to get the job done.
So for example You can build an ai agent That would monitor
The usage of a resource.
Let's just say there is a resource that says hey this
Resource is been idle for a while.
It's not even taking connections.
Let me you know, shut down that server I will let the infrastructure
institute know that this is happening and I will let them and i'll shut
down the like the database temporarily Now this is what an ai is Agentic.
AI approaches so we can build like simple ai agents So
there's two amazing resources.
One is lang chain.
One is crew ai They use prompts under the hood the prompt engineering so they
can act as a part of automation And they can talk to any of the ai models that are
currently available like chat gpt cloud or any other open source models, and
basically, like you build an AI agent that will help you, implement these actions,
for cost optimization on your behalf.
let's get to the conclusion.
so cost optimization in cloud platforms requires a comprehensive approach.
that balances technical strategies with organization shifts.
it's not always easy.
the cloud native platforms are, like, increasing.
there's a need for the cloud platforms that operate
efficiently and at a lower cost.
By understanding the primary cost drivers such as compute resources, storage,
data transfer, third party service organizations, we can implement like
tailored techniques to reduce expenses without compromising performance.
more resources, is not equal to like more performance.
That's true that you can run like, a big instance and throw an application
on it, but that's not how it works.
resource management practices like right sizing auto scaling and diversified
instance types from the backbone of effective cost optimization at the
end of the day at the end of the day These are the core principles like use
how much you need Auto scale whenever you need and use the right resources.
That's it.
And additionally, when it comes to container and serverless, make sure
you use like limits and requests appropriately, coupled with when it
comes to databases and block storage, use data lifecycle management,
lifecycle policies, and to make sure that the data is stored, securely.
And also in a cost efficient way and, and also like I want to put a
strong emphasis on monitoring and cost visibility framework is really
important because, without adequate monitoring, you don't even know what
resources you have, what are the spikes.
So having a strong monitoring foundation, having a budget alerts,
having a proper utilization alerts will help you track resources even
before the AWS cloud lands at you.
So FinOps principles bring a cultural dimension to this strategy,
encouraging cross functional collaboration and fostering cost
awareness within development teams.
I would reiterate that FinOps will help business teams Finance teams understand
the cloud expenditure, with the help of the engineering teams and also The
engineering teams would be responsible For the cloud costs and the actions
that they take for provisioning new cloud resources So this financial
accountability in every stage of cloud management Organizations can align
spending with business goals creating a culture of continuous improvement.
Thank you That was my talk for today hope you liked it.
There's a great future for cloud cost optimizations Using agentic AI
approach, which i'm truly excited about.
thank you everyone have a good rest of your day