Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome everyone.
I'm excited to be here.
Thank you for joining me today.
So imagine that you are hosting a party and you're not sure how many
people or how many guests will show up.
You could reserve a fixed number of table, but what if
more people come than expected?
Or what if just half people show up?
So you either be headache fighting some for more space for more people
or stuck with empty table And waiting resources that you reserve for So that
is a lot like traditional kubernetes capacity scaling is working today
You allocate capacity upfront, but the reality is things always change.
Hi, my name is Jacques Ri, a solution architect from AWS.
So I'm here to walk you through some capability of Carpenter and demo as well.
So let's dive in.
So now what if the table that we mentioned at your party can
automatically be expanded or changed based on how many guests arrive?
No S, no scrambling.
And yeah, so always the right side.
That is what exactly how Carpenter does for our cluster.
So let's see some of the challenges that we have today in the current test.
And also Amazon EKS with cluster also autoscaler.
So here you have Like deployment and also you have the existing node
already that run two parts, right?
So we have two vcpu here for m5 large and then the Scaling up mechanism
kicks in maybe come from data or HPA depending on the cluster and we have
1 2 3 4 5 6 7 7 parts in pending state so Some of the parts will be scheduled
on the existing node and some of the parts will need additional capacity.
So that's why we need cluster autoscaler.
And cluster autoscaler works with autoscaling group.
So here.
We can have a new instance here, which are m5 blush for two instances.
Some of the pending part will go to the second one and just one more part
will go to the third instance here.
We can spot two things here for the challenge.
The first thing is that it takes longer time to spin up a new
instance for scheduling the pods.
Because cluster autoscaler needs to work with autoscaling
group and then autoscaling group needs to work with EC2 API.
So there are multiple steps to spin up a new instant here.
This is a program number one and the program number two, if you
can notify or observe that on the third instance, we have only one
part running and we still have 1.
5 CPU left.
This one is underutilized.
So we would like to.
try to find the right size of the node that fit the parts that we need to run.
So that is the thing that we will will need to be solved.
And coming into the third challenge that we have for exiting
cluster with cluster autoscaler.
So if you can see here, this is the deployment that we already deployed.
At this based on the CPU requests and CPU consumption.
But how about when we have a new deployment?
That need a GPU workload or another type of instance.
So now so this part or this deployment cannot be scheduled
because this capacity that we have doesn't support the GPU workload.
So the things here is that for platform team, they need to create
a new compute, new node group.
So the first one is for the compute node group.
And another one is for GPU node group.
So here, after that the autoscaling can spin a new type of instance
to support a deployment which GPU.
Now it's done.
So we have Three main challenge challenges here and we would like to see what kind
of things that carpenter can make it Like more efficient or better let's see.
So this is the exiting one when we have pending part and we don't have enough
capacity cluster auto, autoscaler kicks in and then cluster autoscaler,
we work with autoscaling group.
After that autoscaling group, we'll spin up a EC2 instance by
calling the API to EC2 fleet.
So multiple steps, just like I mentioned, but for Carpenter.
So we don't need these two components anymore and we don't need to go
through the auto scaling group.
Also the node group concept will be gone.
And we can bypass that by replacing those two modules by Carpenter.
And Carpenter can spin up their EC2 on our behalf.
The instant that Carpenter spin up, it will be always the cheapest
one depending on the configuration.
And also it will be always the right size for the pending parts.
So let's see how it works.
So here we have two instants that running nine parts of our deployment.
The type of the C5 to Eclash.
And now we don't.
We don't run cluster autoscaler and autoscaler in our cluster anymore.
We have EKS and also Carpenter installed.
How about now we have the GPU workload.
So now if we configure the Carpenter configuration correctly, so Carpenter
can spin up a new workload which support the GPU workload automatically.
And you don't need to do anything.
You just configure in the node pool configuration, which we will go
through in the couple of slides.
So here the new workload can just be scheduled in the new instance
that coming up from Carpenter.
The, I think this one is the third problem, right?
But we can see the first and.
The first one is already solved because we bypassed auto scaling group.
But this one is the third one, but we will revisit the second one later.
So here is the configuration that you can do.
That I mentioned.
That you can configure the node pool for Carpentry.
How it will behave in your cluster.
You can have multiple not to as well, but depending on the situation.
So here we have the configuration that we can choose from the
different types of instance.
So you can.
Use y card here like you can use only C or C5 or C6 or something
like that You can use M.
You can use R and you can also specify the size like nano, micro, smaller,
large, X large Something like that.
So So depending on the use case, you can specify wildcard or a specific one.
And how about the availability zone?
You can also specify here that you allow Carpenter to spin up a node inside
that particular availability zone.
Or you can choose either architecture for the CPU.
By using xic or 64 here, or you can also use the arm 64 A MB 64.
Sorry.
And you can also use ARM 64 which is Graviton on AWS.
So Graviton provide better price for performance compared to X.
So if your application is already tested that it supports arm, so
you can use ARM here as well.
And the final one is the capacity type.
You can either go with on demand or you can utilize spot as well.
If you specify both of them, it will try to prioritize the spot.
And then you have the limit here.
The limit specify that that this node pool can spin up maximum 100 CPU only.
When it exceeds 100 CPU, it cannot spin up a new node anymore.
Okay, for the node pool, you can have a single one for one cluster.
So this one is quite simple, maybe in the test environment or dev
environment, or you have a unified node pool configuration that supports
all the workloads in this cluster.
So it's quite simple for the single one, single node pool.
But for a multiple one, you can have maybe multiple team that need the different
requirements for the node configuration, or you have the accuracy hardware.
Some of the team need the TPU workload or like you have different image or a
MI in your workload, like one workload use battle rocket and one workload.
Just use EKS optimize.
Or something like that.
So you can have multiple one to support different kinds of workload.
And you can also put the weight on the notepad as well.
If you try to prioritize the higher weight notepad first, and then when
it run out, when it reached the limit, maybe the first one can spin up only
like 10 CPU, depending on the, maybe saving plan or reserving standard you
bought from AWS, and then other than that, maybe utilize spot or on demand.
So you can use where the node pool as well Okay so here is the Challenge
number two that we have some of the node that are in underutilized
status so you can notice here on the Two instance on the right side.
We have only one pod running and it's the m5 They are m5 egg large,
which is quite big compared to the workload is running then inside
these two workloads So in carpenter configuration, you can also specify
that When will you do the consolidation?
Consolidation mean that it will try to move the workload together Try to
bin packet to save the cost for you.
And you can also specify the time that it will do the disruption
or move the workload for you.
So here I specified as zero second.
So if you try to do consideration every time that it sees that
the node is underutilized it's quite aggressive here, but for.
Demo purpose I just would like to spin it up quite fast or do the consolidation
quite fast You can also specify like one hour or 24 hours or one day
two days depending on the use case.
So now it can group together for those two pots to inside the exiting capacity.
And this is the first que first case, and here is the second one.
So the same you have here M five egg large and two two of the
pots are running separately.
And when you enable the consolidation, you try to find.
A node that fit these two parts and spin that up and try to drain these two parts
to a new instant to save you some cost.
Let's see here.
If we group together those two parts and then you try if we move to the
new instant, which is M5 blush.
So if we reduce the overall cost of cluster.
Yeah you can see that we solve.
All of the three challenges that I presented back then.
So how about some other features as well?
Recently we launched a carpenter version 1.
0.
So this feature also support that as well, that you can do automatically
upgraded by detecting drift.
Of the cabinet configuration There is a custom resource another custom resource.
It's called ec2 node class for here you can specify the image that you are that
you would like to use the security group and also the Availability zone as well.
Here this, under the spec, we specify that we would like to use
the latest image that it comes out.
So it means that if AWS update the EKS optimized a MI.
For the new version, Carpenter will try to upgrade every workload under
Carpenter and upgrade to the new one and also try to drain the pods for us.
So this is, this can be done automatically and it will force every
time that we have a new, um, AMI for The ami that you are using so the
new version comes and then it will be Automatically upgrade but sometimes
you can also do something like this.
You can pin the version that you would like to use this one when the updates
come out, it will not be automatically updated and then you can change the
value inside the EC2 node class later when you would like to do the upgrade.
And this one can be specified with the custom AMI that you have as well.
So quite flexible for this one.
So now coming into the demo that I prepared for you.
So let's see here.
What do we have?
Let's see.
So for the nodes, we have two nodes here.
The type is a three, three, three large, two nodes.
And I think, so let's see the workload.
What kind of workload that we have.
Okay, we have coordinates in kube system.
We have carpenter.
This is the controller.
So I spin up for two pods for under the deployment because I would
like to provide the resiliency and Then I have the my workload here.
The name is inflate and the replica is zero now let's see What kind of
configuration that I have in that inflate deployment So it's quite basic.
The name is inflate, and the name space is inflate.
And the CPU request here is one CPU.
I didn't specify the memory requirement.
And it will just pull down from the public ECR repository.
And it did nothing it just reserved the cpu that's all I think for
now and How about the node pool configuration that we went through?
So what kind of thing that I configure?
Let's see Okay, I think
Yeah, we have only one node pool here.
The name is the default.
You can have multiple just like I mentioned.
And the version is version one, of course, because we launched as a stable version.
Consolidation policy now is when it's empty and we change later.
And the consolidation time is five seconds.
The limit here, this node tool can spin up only 10 CPU only, after that
it cannot spin up a new instance.
Architecture is x86.
The capacity type is on demand only and Instant type here this node tool can
spin up a C type, M type, and R type.
So let's see What type of things that it will happen
when we scaled up our workload?
Yep.
So we spin up five replica here.
And so we have five pending pots now or so.
You can see it's detect quite fast and it's spin up a new one for us.
It's C six a two X large.
It's not ready yet because it take some time for the node to be ready,
but yeah, it's super fast to detect and this one is faster than auto
scaling group and cluster auto scaling, so it takes some time for that.
So now, it's also spin up two parts.
It is the cool proxy and also couplet.
Yeah, so all of them already scheduled five parts and it's the flat right
side for This kind of pending parts.
So let's see the logs to understand it further this is the log from carpenter.
So here you request for five parts that we see in the pending parts, right?
And then Carpenter, we try to come up with the right size instant.
So here we request for five CPU, the number of part is seven because
it include kubelet and kubeproxy.
So we try to search through the instant type that we specify in
the node pool, like 45, 46 yeah.
It's in that time and it came up with c6a2xlash in this
availability zone with on demand type and this one support 58 pods.
This one is depending on the number of elastic network interface.
Now it's already registered and ready to to host the pods for you.
Yeah, that is a lot.
How about now we scaled up to 20.
So some of them can be scheduled in the existing node, and some of
them will be scheduled in the new instance that's coming up here.
So it comes up to C6a
large, but we still have some pending pods there, 13, right?
I think C6a large maybe can support like one or two.
Let's see now it's not ready yet.
So we need to wait until it's ready to see what kind of pending part
that we have Yeah, just speed just schedule just one pot inside the
new instance and we still have 12 parts that is in pending state here.
Yeah, you can see here as well and So can you guess?
Why character don't spin up anyone because it reached the limit that we specified
that this node pool can spin up only just like 10 CPU only and now it reached
the limit so How about we increase that?
Let's see Yeah, we will increase that to 1000 from 10 then hopefully it can spin
up a new node to support 12 pending parts.
So now it's quite fast.
You can see so now it's been up a C6A4XLarge
here.
The limit already changed is 1000.
C6A4XLarge and it will fit over the 12 pending parts here.
Soon, so I need to wait for a cool plan and cool proxy here
Okay.
Yeah, it's there.
And it's all the pods is running.
I would like to add arm as well.
The ground, we don't want to set some costs.
And I also would like to add the spot as well.
For a spot, you need to make sure that your workload can tolerate for the risk
disruption that we call back the spot.
Okay.
Let's check the configuration that we add the spot and also the arm.
I just architecture as well.
Okay.
It's up there somewhere here so you specify X86 which is AMD 64 and arm 64.
Yeah crowd it on one and Spots as well.
I think something will comes up here when it fight a spot for us because
it's cheaper It's deleting this note.
So it's right try to drain some pots out Already drain to the extinct one.
Maybe the lower one And coming up with the spot soon.
Okay.
Let's wait a little bit.
Let's see.
Okay here.
So we have C 60 G means Graviton, which is arm and from on demand, we change to spot.
Here is much cheaper, like up to 70%.
Cheaper than the only one.
Okay.
Just wait to it to be ready and then it will try to drain the pods for us It
will call it on and also drain for us.
Yeah
Okay, now it's working for a spot and grab it on which help us to
save some cost of our cluster Can we it will delete the old one here?
How about we change the consolidation policies policy from when empty to
when empty or underutilized Let's see and here Consolidation policy
from when empty to when empty or underutilized under five second
As I say you can specify like one hour or 24 four hour do the consolidation if
you don't want to have any disruption
Okay, I will try to Reduce the number of the workload here so that we have
some underutilized node here we have one five pots and 20 percent of consumption
for CPU and It will try to come up with a new node and drain for us No
to make sure that our cluster is like a more utilized Yep Now we have it.
This one is also underutilized.
So we'll try to spin up a new one and then move all the parts for us.
Yeah.
Deleting the other one that don't need anymore.
So this is how it works for consolidation and also some other
things like architecture, spot and also the type that we can select.
How about now we would like to do 10 and toleration.
So we specify that Carpenter will spin up a part, a node with a 10.
The key is my name is Chakri.
Here, so the node that spin up by Carpenter will have this 10.
So the deployment need to have the toleration for this one.
I will try to increase the number here of the workload.
Oh, unfortunately that it will fall into the node that we have
already, then it can be scheduled.
So now we add some more.
So these some of the pods here will be in pending.
Because The Carpentry cannot spin up a node for it.
Deployment doesn't have a toleration for the 10 key chakras.
So cannot spin up here.
So now we would like to add the toleration into the deployment so that
Carpentry can spin up a node for us.
So let's see.
Okay.
Just add the toleration here in our deployment.
Let's check.
Oh, it's just spin up a new one for us.
You can see ccgs for Eclodge.
So let's see in the deployment that we have the iteration.
That I just added.
Okay.
Here is the iteration.
That's why Carpenter can spin up a new node for us.
And yeah, schedule our parts.
Boop.
Okay.
Okay, that's all I think I think the final one Is about the ac spread so
carpentry will respect anything that you configure in your deployment So
here I would like to spread my ac.
From each other And you can see that a lot of things that happening.
So you try to spin up a new node for supporting different ac So this
one will ensure that your workload will have higher resiliency.
So this one is the configuration that I specify in the deployment that I
would like to use topology spread.
That if it cannot satisfy, will not schedule.
You can see that do not schedule that.
You can schedule anyway as well, but this one I would like to show
that Carpenter, we respect anything that we put in a deployment.
So we would like to check when it's done that we have multiple
AC for our workload here.
So you can split, you can, okay, it's deleting one of
them, maybe just wait for it.
So here we have AC1A, 1B, 1C.
So let's see again because it's deleting.
Yep, I think we have all of the AC here.
A, B, C for our workload.
Yeah, you try to balance the AC for us.
That's all I would like to show.
A lot of things that you can explore by yourself.
You can also read to the best practice that we have online.
So for onboarding to Carpenter, so you can install Carpenter by using
Hamshark and One best practice here that do not run the Carpenter on the
node that It's managed by Carpenter.
Otherwise, it will just delete the node that runs Carpenter and stop working.
You can run Carpenter on Fargate or different node groups.
And you can also migrate from Cluster, Autoscaler as well.
The guide is there.
And you can explore.
So let's sum up.
Also, CloudHunter is fast, it's quite simple, but it's quite powerful.
It's cost effective because it will try to reduce the cost for you by consolidation.
And also it's Kubernetes because it's open source and now we have,
we support multiple cloud already.
And yeah, I think it's awesome.
Thank you so much.