Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, thanks for joining my session.
My name is Samuel Baruffi. I am a solutions architect with
AWS and today we are going to be discussing
about Carpenter. The name of my presentation is just in time.
Nodes for any EKS cluster auto scaling
with Carpenter. So let's just
get started with a quick agenda. We're going
to, at a very high level, discuss about EKS.
What is EKS elastic Kubernetes services on AWS
and we just want to set the tune and set the page, set the
understanding of what is EKS because Carpenter actually works
on top of eks. If you're not familiar with EKS, you might need a little
bit of understanding of kubernetes, but hopefully the
quick overview will be able to provide that guidance.
After that we're going to talk about kubernetes auto scaling.
What are the mechanisms that both on cloud
native Kubernetes native are available for us, but also
what are the currently implementations for
cloud. After that we're going to talk about some customer
challenges based on those implementations, current implementations,
and then we're going to talk about Carpenter and how Carpenter solves some of
the challenges that we've heard from customers trying to do autoscaling
on kubernetes. And in the end we spend probably
1015 minutes doing a demo, installing Karpenter and actually
showcasing how carpenter can help you with a lot of flexibility
and speed to scale up and scale down your clusters,
your specific nodes within your clusters.
So moving forward, let's do an overview of EKS.
So EKS is short for elastic Kubernetes
service. It's a managed service on AWS.
EKS actually runs on vanilla upstream Kubernetes.
It's also certified Kubernetes conformant for specific Kubernetes
versions at any given time. EKS currently supports four
versions of your Kubernetes, which gives you as a customer
time to test and roll out upgrades.
Having a lifecycle management of upgrades on
your Kubernetes clusters is really important and AWS
helps you with that because it's a managed service. EKS provides
an experience for reliability,
security, availability and performance on top
of eks on eks. On the next slide you see how you
have data plane and control plane that can be managed for you on
both sides. And the whole idea is by
using EKS you don't need to do a lot of the operations
and what we call undifferentiated heavlifting for
managing your Kubernetes clusters, you can just rely on a managed
service like EKS to take care of those
tasks like upgrades, lifecycle management,
security and so forth. Of course, it's always a shared responsibility
that some of the things will be taken care by AWS. And some
of the things it's your responsibility to proper configure,
giving you the proper flexibility.
So when we look at a high level overview of
what EKS is, you have two boxes here.
The first box that we're going to talk is the control plane.
So when you look at the box on the right which says AWS
cloud, it means that it's running behind the scenes by AWS.
And here on the top you can see that the control plane,
which is a fully managed single tenant,
kubernetes control plane per cluster. So once you create your EKS
cluster behind the scenes, AWS is going to create a single tenant
only for you control plane. And you're only going to get the
specific endpoint. You can create private or public endpoints,
we know we call them cluster endpoints. And behind the
scenes, if you're familiar with Kubernetes architecture,
you have the ETCD database, you have the API,
you have the schedulers and you have the controller.
AWS is going to manage the control plane for you and not only
manage, but scale as needed. So you don't need to worry about
that. It's all taken care on the control plane side
by AWS. Then when you look at the left box,
you see that the customer VPC. So that's the virtual private cloud
that you have on your AWS account and that's where you
can deploy your data plane. So the data plane means that
those are the nodes where your containers, your pods are going to
be running on. You have kind of two types of node
groups that you can create it. You can have a self managed nodes group and
a managed node group. With self managed node group you're actually
responsible for all the configurations for your altiscaling
group, for managing AMI and everything else.
With manage no groups you have a managed experience for
your data plane as well. So for doing know
lifecycle management, scaling, those are actually going to
be responsible to take
those actions for you behind the scenes. You can
actually also use forgate. Forgate, it's a
serverless container offering that does not require any
specific nodes group in the sense that you don't need any EC
two, neither a self managed node group or a managed node group.
But with forgate you pay per pod
and that specific
running pod behind the scenes is running on the AWS account.
So you can see here that the way it works, it creates an Eni
within your VPC that links back to the Fargate
micro VM that is running on the AWS cloud. Fargate also
works on ecs but has integration with eks like you
can see here for this talk. We're not going
to focus on too much on EKS data
plane or control plane. We are going to talk about EKS auto scaling and
Kubernetes auto scaling. So with that said, let's move to
the next section when we look at different
so what you as a customer or a user of kubernetes,
what are the available resources and configuration that you can fine tune
for autoscaling? So you're going to start at the application level
so you can separate autoscaling and kubernetes
at two different categories. One is the application itself and the
other one is the nodes and the infrastructure.
So the first two items are more focused on the applications
that are running. The first one is called horizontal pod outscaling.
HPA is the short version of that.
The whole idea of HPA is you do a
deployment on your cluster and you decide
how many replicas of that specific deployment you want to have.
Let's say I want to have an Nginx server and I want to have
three replicas of that specific Nginx pod
to be deployed across my specific environment. You can configure
HPA on top of that deployment, and you can specify specific
metrics, for example cpu, memory, or even have your own
custom metrics. And once you do know,
HPA is going to look for that metric. And if a specific
threshold that you have configured goes above. So let's say you configure
that if at any given time the cpu
aggregation of your deployment goes above 80%, it wants
to increase to another pod, another replica within
your deployment. So HPA is going to take that job
for you and it's going to just add horizontally more nodes
for as much as you have configured and for
whatever metric you have configured. So that's what is called HPA.
But you also have another option which is called the vertical pod outscaling,
which is VPA is short for.
VPA is less common in a sense,
because Kubernetes is really good at distributed systems horizontally.
But you also have an ability to actually change
a pod that is running, for example, with 2gb
of memory. But if a specific threshold has
been achieved, you want to create a new pod with 4gb of
memory. So the same things that works on HPA.
Now, instead of adding new replicas on your deployment, it's just
going to recreate a new pod with more memory available
for that specific pod. But those are always
looking at your application.
Those two configurations, both HPA and VPA, don't actually
look at the cluster itself or add more infrastructure nodes. You only
do at the application level. So you actually need
to. If you want to run a flexible and
elastic Kubernetes cluster, you also need something
that is responsible for creating more nodes for you.
With that said, that's where cluster altoscaler
comes in. So with cluster outscaling, if let's say
you have two nodes on your data plane
and you try to schedule in this example,
four more pods, but there are no resources
available within those existing nodes on your node
group. A cluster out scaler once you install and you
configure and integrate it with your provider, let's say in
this case AWS cluster outscaler
will look for penning pods and we say, wow, I don't really have
resources currently available for me to deploy those four
penning nodes. So the cluster outscaler will go and we talk
to the altiscaling group as part of your node group, either a
self managed node group or a managed node group.
So the cluster outscale itself will go and you talk to the API of your
altiscaling group and you say please spin up a new
nodes or a new EC two for me within that
specific outscaling group. So then I can go
and actually schedule and run all my four nodes that were penning.
So behind the scenes, each outscaling group will
increment the size based on the recommendation of the penning pods.
This works fine for most applications and workload.
However, as kubernetes and eks have grained,
broader adoption customers are moving for a variety
of different workloads. And as you can see in this example,
it's actually just creating a new instance of the same instance type
within the same auto scaling group. And that's
where some challenges come into the picture.
So what we've heard, we've heard some customers bringing
some feedback and saying why potentially
cluster autoscaler doesn't work every single time,
or there is potentially improvement that should be made.
So nearly half of AWS Kubernetes customers
have told AWS that configure cluster autoscaler
is challenging, and we are going to now just go
through some of those challenges to set a scene on
why it was important for us to create carpenter.
So first of all, no groups and autoscaling group sprawl
different workloads will need different compute resources.
So AI ML workload, we have
a very different requirement than for example your web application or
your batch applications, right? Unfortunately, with cluster
auto scaler, the only thing that the cluster outscaler
is able to do is to add new instances of
the same type on your existing manage nodes
group. You can create multiple manage node groups with different instance
types, but that adds a lot of complexity
in managing those, right? So what customers have
told you that not all workloads needs to be isolated on specific
nodes groups, and balancing the needs of specific
workloads adds a lot of complexity because now you need to manage
multiple outscaling groups and multiple managed node
groups and it becomes just cumbersome and it's really
hard to achieve proper performance for cost
and also for availability. As an
example, if you need spot, you can't mix
and match in a specific managed node group is spot and on
demand. You need to have multiple managed nodes groups
that behind the scenes each one of them have an auto scaling group and
it becomes really challenge how you actually provide availability
for spot interruption or best practice for for example,
spreading workloads across the z while always thinking
about cost and trying to improve cost and
performance for those workloads. Another challenge
that we've heard from customers is cluster outscaler actually sometimes
can be very slow to respond for capacity needs
and spike workloads. So if you think
about ETL jobs or GPU training jobs or ML
workloads, the speed that it's required for
those workloads, like big data and AML workloads
to be spun up is critical. Delay in providing
those capacities for these workloads can
slow down innovation and potentially decrease
the satisfaction of your data science and engineers.
This job typically spin up several nodes of expensive accelerated
EC, two instances, for example, GPU,
very expensive gpus. So you want those to be very quickly spun
up, but also very quickly spun down.
And a slow scale down means waste resources,
which you don't want to be in the business of.
Another challenge. It is very hard to balance utilization,
availability and cost. So typically with cluster
altoscaler is hard to get high cluster utilization and
efficiency operation while not over provisioning
resources to ensure a consistent user experience. So what
this can result is in a low utilization and lead to
waste resources that impact, which can be significant,
which the impact can be significant. So as
an example, let's say you want to make sure your application is running
across multiple availability zones,
but have a different resource requirement.
Then you potentially need to have multiple auto scaling groups.
And that adds just a lot of challenge managing those
auto scaling groups that are across AZ and you want to make
sure that they are fully utilized, that becomes very
challenging, sometimes potentially impossible to
not have wasted resources. So with
all those three challenges we so far have discussed,
we have come up with Carpenter. But what is actually carpenter.
So carpenter, it's a open resources, a flexible and
high performance Kubernetes cluster altiscaler.
So instead of actually deploying cluster altiscaler,
you can actually deploy Karpenter on your eks.
It is open source and Kubernetes native.
It doesn't have any concept of group. So it's what we call
a groupless approach. And we are going to talk about
in a moment why it's called groupless, but it's pretty much automatic node
sizing. So instead of having a specific requirement
or a specific, I guess, blocker of just being able
to launch instance of the same type that you have on your altiscaling
group with carpenter, it can look at the specific requirements for
the painting jobs and choose the best performance
and costs for that specific need at any
given time. And it's also much more performant at scale because it
has some changes on the way it behaviors compared to
cluster autoscaler. The way APIs and the way it's
actually looking for pending pods on your cluster is a
little bit different. The goal is to launch 1000
pods within 30 seconds. That's the goal that carpenter has set
in mind. And it can, depending on your environment,
actually achieve that. So let's look at
how Karpenter works. Very similar to cluster outscaler
when you have penning nodes. Karpenter, we're always going to look
at the schedule on Kubernetes because he works integrated into Kubernetes
native ecosystem. Look for penning pods and
those panning pods looks at existing capacity
in this case and see, well, I can't actually deploy more
pods because it's full here. So penning pods becomes
unschedulable nodes and then that's where capital comes
in. So this would actually replace your cluster out scalar.
You're not going to have in this case cluster outscaler. Here you
have carpenter deploy and Karpenter. We actually go
and instead of talking to an is because there
is no groups, we go and we talk to the EC two fleet API.
And the EC two fleet API provides a bunch of benefits.
And behind the scenes, what carpenter does, it looks at
the specific requirements for those unscheduleable
nodes and will find just in time capacity
that is perfect for what you need. So you
can configure and it's very flexible what type of configuration allows you
to do. But if you don't provide any specific limitations of the
instances that it can choose, you'll find a specific
instance that can fit your requirement while
also improving for cost and performance, and you make that
decision for you. So it's also deeply
integrated with Kubernetes. So you look for watch APIs,
you have a lot of labels and finalizers, and like
I said, it does a lot of the automated instance
selection. So you match a specific workload to a specific instance
type.
Carpenter also works really well with spot.
So if you configure a specific provision, and we're going to talk
in a moment, what is the provision? But if your provisioner has support
for spot and you can mix and match spot and on demand, you can
have both being supported. He can actually
look for what cheapest and most performance
spot is available on this currently specific availability zone
that he wants to deploy and actually pick and choose that and deploy
those for you, and it can also hand interruption for you.
You actually have integration with the two minute interruption
system that spot has in place to allow your
applications to potentially be rescheduled before the node interruption
is in place. Another really good
thing that carpenter has actually done is
the ability to consolidate. And consolidation
is a feature that actually looks for opportunity to improve
your cluster utilization over time. So carpenter does not
also works on scaling up and down, but also look
at your cluster high level and look at which
current nodes you have in place. And if there is potentially an
opportunity to maybe remove some of those
nodes and bring up other nodes that are more performance and
a price optimized for you. So you can reschedule running
pods onto existing clusters that are underutilized
at the cluster capacity, but you can also launch new
more cost efficiency nodes within the cluster
and replace potentially nodes that were much more expensive. So let's
say in this case here
you have three nodes
that are potentially m five large.
Let's just give an example. Carpenter will look at that saying, well, I can
potentially spin up an m five two x large and
maybe then an m five
medium and actually bring
all those capacity to those two instances that are
going to be more performance and better for your price.
So that consolidation is a feature that you can enable. If you don't want
the feature, it's okay because it's actually going to reschedule the
pods and you want to make sure your application is highly
distributed within your cluster. So consolidation doesn't ever
bring your application down, but that actually optimize capacity
quite a lot. So when
we look at how Karpenter works, how carpenter provisions
a node on AWS. So let's first look at how cluster
autoscaler does so. Cluster autoscaler, we look at,
let's say you have an application scheduler or HPA
is triggering a specific pod to be created.
It gets into a pending pod scenario, then the cluster
altiscaler will actually look at those pending pods, will talk
to altiscaling group and then outscaling group will talk to EC two
API to increase or decrease whatever in this case increase because
you have pending nodes, increase the number of nodes that you have on your
node group. Now the way
carpenter works is instead of having to
talk to closer outscaler and
the specific outscaling group, penning pods will actually talk
directly. A carpenter will watch for those penning pods.
Those penning pods will then actually set an action on Karpenter.
And instead of Karpenter talking to EC two API,
carpenter talks to EC two fleet instance, which is much more performance
when you're trying to grab what is the capability and
possibilities that carpenter can deploy on a specific availability
zone in a region. EC two fleet is the one responsible on the AWS
side to make those decisions and
consolidation instance orchestration responsibility within a single
system. It's what Carpenter does,
and we've talked about groupless provisioning. So what
actually carpenter does, it's an application first
approach because it's always looking for the pending pods and grouping
those pending pods, which is called beam packing. So every time there are pending pods,
you're actually going to beam pack all those panning pods and look at
what is the simplest node
provisioning that will make sense for your cluster. And the way you
configure, compute, you'll see on the demo. It's really simple. You just have
two objects. One is the provisioning
and the other one is the node template for your specific provider,
in this case AWS. And it reduced a lot of the cloud provider
API load because now it's going directly to the EC two fleet
API. It doesn't have the limitations that how
many times you can call out scaling groups
which the cluster altiscaler had and it reduced
the latency significantly. So you choose the instance type from the pod
resource request. So when you're
doing deployments, you always want to make sure you set the requests of memory
and CPU. And that's what carpenter we actually use to
look at how much memory and cpu requires. Then you choose the
specific node as per pod scheduling constraints. So if you have
constraints of specific availability zones that you want that pod to be
deployed, or potentially some specific instance
types or gpus, it's going to look at the
pod deployment like labels. And then
carpenter read that and make a decision based on those constraints.
And that capacity is directly done on EC two instance fleet.
And then you track the nodes using native Kubernetes labels and
you also bind, this is a specific one.
It binds the pod early to the nodes
because it doesn't need to wait for the
cluster altoscaler to make any decision like it was before.
While it's actually creating the node behind the scenes,
the cube scheduler is already kind of downloading everything that
it needs to do and it becomes ready.
The schedule for the node becomes ready. It can start preparing the
node immediately. It doesn't need to wait much how the cluster autoscaler
needs, including the pre pulling of the image. And this can actually shave
seconds off nodes per node startup latency.
So it just is a very nice feature that
helps carpenter be more reliable
and fast when actually doing those scaling activities.
So let's just quickly look at how carpenter can scale up.
Let's see, we have specific panning
pods here on the top. Karpenter will look at
those panning pods and you create a new node, right?
And assuming you have targets here because you
have requests set on your application. So he knows how much at
both at a node level but also at a cluster level,
what is the utilization and the target that he wants to set for
a specific one. In this case you can set up provisioners
by default. It has all instances, types able
on that that are included that carpenter can pick and choose.
But you can specify, and we have some examples that I'm going
to show of specific instance types that you want to make available.
And then when we have scaling, we have different options.
We talked about consolidation and we're going to show an example before,
but you have two options. You can either use the TTL
seconds after empty or you can use the consolidation.
They are mucco exclusive.
Consolidation is more like a newer feature, but before consolidation existed
you have these settings set PTL seconds
after empty. In this case, in this example I'm showing you
is set as 10 seconds. So what this feature will do,
it will look for nodes that are empty. In this case I
just removed some pods from my nodes and 10
seconds after, if the node is still empty,
you're actually just going to remove the nodes completely.
And one thing that I just want to mention
is it doesn't actually care about the demon sets because demon
sets are looking at every single node. You just look for
nodes. So it is smart enough to realize if there are three demon sets
running at that cluster, sorry, at that node, it doesn't
really cares about that because it knows the demon sets actually running across
all nodes. So remove that.
I talked about bing pack. The cool thing is it
combines all those specific penning pods requirements
and has well known labels that you can define on your specific deployment
that are actually configured at the node as well. So let's say you
want to run this specific application on arm
graviton tube processor. You can actually define those specific labels.
And when carpeting, doing the beam packing is going to make a
consolidation decision on how they can organize all
the panning pods you have on the queue.
And then consolidation, which I recommend
rather than using. There are potentially reasons why you want to use
CTL seconds after expire,
but potentially consolidation is a much more broader and feature rich
solution that allows you, if you enable here on
your provision AWS, you see consolidation enable.
True, let's say in this example you had
five nodes within this node here on the right.
What you can actually do once it goes back to Chi, you can see that
you have a lot of underutilized resource.
Carpenter will look at that. If you have the consolidation enabled
and you say, you know what, I can actually run those chew pods in a
much cheaper node. So it's going to
spin up the node for you,
it's going to spin up the node for you, then it's going to move those
pods into the new node and finally it's going to remove the old
node. So in that environment it actually allows you
to delete a nodes when pods can run free on capacity
that other exists in the cluster, but it can also delete
a node when you
don't have a lot of requirement for that big node that you have. And it
can just create smaller ones like the one you saw here.
That is just a replacing of a nodes, in this case a
specific. So continue the information.
The example here, you had four nodes on this one,
the third node from the top. Now you only have one pod.
What it's actually going to do is going to remove from here, it's going to
move from the one in the bottom and it's just going to remove. So it
keeps an idea of cost optimization, which is really important to
scale your Kubernetes solutions.
So here is we are just going to spend. A few examples
here how you can configure your provisioning. So your provisioner
is the Kubernetes object that once you deploy karpenter
and you see on the demo you can provide specific configuration,
how your provisioner can behave and you can create multiple
provisioning with different weights or you
want to match your specific pod to a specific provisioner.
There are actually labels that you can actually mix and match.
So one example for the flexibility
of provisioning is the ability to select purchase options so you
can select capacity type. In this case you have requirements
capacity type. You actually are choosing spot
and on demand. When you have spot and on demand configured
at the same time on a specific provisioner,
carpenter will always favor spot and it's
only going to pick on demand if there are
spot constraints. So if potentially the
good options for spot to launch an UEC two instance for
you are not available at that time then it's going to default back
to on demand. But you can also select different
architecture types. You can have provisioners that can deploy
both arm 64 graviton two processors and
AMD 64 processors type with x 86.
So that means that that provisioner will look for
the specific architecture type that your
panning pod needs. And if it needs an arm and there
is no capacity available on your cluster, it's actually just going to go and
deploy a new arm gravitant tube ec
two node. But it can also do that for AMD 64.
Another capability is you can
restrict instance selection
by diversification across different configurations.
So you can define the size,
the family, the generation and the cpus. So in this example
you don't want carpenter to spin up
instances that are nano tiny small and
large. You only want medium x large, two x large
for x large. So you can create this specific requirement on your provisioner
and then carpet will always look at those. And if you can have multiple provisions,
but if whatever specific configuration have a pending pod
that has gone to that specific provisioner then you
just use the configuration you have in place. But you can also have
availability zone. You can say well this provisioner should only deploy
new nodes into us West QA and
us two b availability nodes. So you can restrict
for availability zones. If you have a requirement that you want to make sure your
applications are only run or a set of your applications can run and
run on this environment. Another thing
you can actually do this is just
a new specific provisioner.
You can create different provisioners. In this case it's not a default
provisioner, it's called west zones. And you can say well west zones can only
deploy within these three availability zones on us west two
region and it can do either spot and on demand.
So between this is a very simple provisioner,
you just pick whatever instance type is the more performant and available
at a time. It's very like it's going to be a spot instance if
it's available for you.
And you can also isolate expensive hardware.
So if you have needs, for example for applications that
need a gpu, you can specify which instances
you want this specific provision to deploy. So in this
case GPU, you just want p three x eight x large
or p 316 x large. But then what
you do is you create a tent on those nodes.
And if you're familiar with tent and toleration it means that only
pods that have a toleration to support this specific
tent will actually go on a go and be able to
request and provision those resources within these
nodes. So if you don't specify on your pods or deployments
a toleration to support this tent, this is not going to be selected.
But that gives you an ability to have different provisioners to fit
your specific use case. And this is all declarative
using kubernetes, custom resources definitions using crds.
So hopefully I provided a little bit of information
on carpenter before we do the demo, but there are some takeaways
that if you're looking to implement Karpenter, you should be familiar
with and or at least evaluate. The first one is if
your application can support disruptions in the sense if
you have distributed your applications across multiple nodes and
availability zones, please use ECG spot instances to
optimize for cost because Karpenter actually looks for
those node interruptions for spot and reschedule
automatically your pods into a new instance that will configure.
But of course you don't want to be in that business. If you only
have one pod, for whatever reason you have a stateful application that can only run
in one pod, you probably want to avoid spot. So it needs to be a
case by case. But most of the times if you're using stateless applications
on kubernetes you should be using pod and then you're
going to use provisioners to ensure that your scaling nodes and spots
are actually implemented with best practice by default.
Like I said, you can have multiple provisioners, but you
should have a default provisioner with a very diverse instance type and
availability zone. So if you don't have specific needs like
GPU in that example, you can just let carpenter choose what
is the best for you given a wide variety that you have configured on
default provisioning. But then you configure additional provisioning for different compute
constraints like you have a GPU or you have jobs that you
want to make sure it runs on specific instance types because of performance
or architecture. Then you create those additional provisioners
and you link your deployments to those additional provisioners.
And of course you want to control your scheduling using Kubernetes
native solutions like node selector, topology,
spread constraints, taints, tolerations and provisioners.
You actually integrate that within your scheduling of your
application. And of course you should use
horizontal pod outscaler in conjunction with Carpenter.
So you have HPA focusing on the application scaling
and you have carpenter focusing on the cluster scaling,
spinning up instances as needed for specific requirements.
And then before we go into the demo, please look
at these resources. I'm just going to go quick through
them. The first one you have Karpenter webpage,
Karpenter Sh, you have all the documentation and everything available
there. You have a lot of examples and you go into a lot of
details on how carpenter works. Because carpenter is open source,
you can just look at the carpenter specific GitHub. If you
have an issue, feel free to just create an issue on GitHub. Or if you
need some help, the community is always there for helping.
You have a workshop if you want to play around on
your own with carpenter, you have two workshops here.
The first one the carpenter workshops with the
ecgspotworkshop.com. It goes in
depth on carpenter. So it's a really good workshop.
And if you just want a more high level, you can do the eks workshop
and go to the carpenter selection and play around with those.
And there is a really good 50 minutes video. If you
just want to hear from other SMEs on AWS
talking about Carpenter, you can just click on that button.
And before I go on the demo, the only thing I want to mention is
carpenter currently only supports AWS as a provider,
but because carpenter is open source, we do expect in the future
that potentially other providers can adopt carpenter and also
make available for their users to utilize
this flexible way of auto scaling
on kubernetes. So we'll see you in a moment on the
demo. Okay, so let's jump in
into the demo. So I have created before
a eks cluster. So if you just want to go
and take a look at the nodes that I have on my cluster. So I
have actually two nodes already created. Those are actually nodes that
are managed by a managed node group on eks, those are not managed
by carpenter. I just want to start from a
clean slate. So you need a place where carpenter will work.
So you can actually have Karpenter being deployed on a managed node group.
But that managed node group doesn't need shisk or anything like that.
So if I go on the console and I just show you I have one
managed node group, which two desired instance which are
the ones I showed and they are up and running. And if
you look at the nodes that I currently have available on this nodes
here, nothing fancy, I just have this cube apps
view which is a application that I can look at the stats
and a nice visualization of my nodes, AWS nodes,
each of the specific nodes to talk to.
AWS core DNS cube proxy and you
know, if I want to use HPA I need metric server.
So it's deployed behind here. It's a nice tool.
It's called eks node reviewer. It's open resources.
You can just Google eks node viewer. This actually shows in
real time what is the currently state of my eks cluster.
So the one on the top here is the cluster aggregation. You can see
the price per hour and the price per month. And below here it's per
node which instance type how many nodes are
running each of them, instance type, the price if they are
on demand and they're ready. You see as I go through and install carpenter,
and once carpenter will actually go and deploy things for me,
you see that this will keep changing.
So that's why I'm sharing with you. So I have
everything already set up. I just want to install Karpenter. And so carpenter
is available AWS, a helm chart. I have this command
here that I'm just going to deploy. What is this actually doing?
It's creating the carpenter installation for me.
I have already some environmental variables and some pre configuration that
I've done. Actually if you want to deploy carpenter
with integration with EC two spot, there are a few pre configuration
you got to do like creating, you know,
making sure you're creating rules for that sqs queue for an event bridge
and so forth. Those were already created for me. So I
have deployed carpenter and now if I go and I look, so let's look
at the pods that carpenter have. So carpenter is deployed
within its specific namespace called Carpenter. So if I
look at the nodes that carpenter has actually deployed, I can
see that I have two. So karpenter doesn't work as
a demon set. Carpenter is just a deployment. And if you look here,
Kubectl get deployment on
the carpenter namespace you're actually going to see that it's just a
deployment and Karpenter works in an active standby
approach. So there are two pods and they are going to be running
across different nodes but only one pod at a given
time is responsible for making those
decisions and making the scaling actions for me. So the other one
will actually take place as the lead if something happens with the first
one. So it's important to understand that it's a high availability scenario.
It's not using demo set, it's just a deployment. They have both enabled.
So what we're going to do now we
have created Karpenter but I haven't created my
provisioner. And provisioner is what actually is responsible
for making those decisions. You saw on this slide before is how
you tell which decisions you want to make when there is a specific scaling
activity. And then there is also the AWS node template
object that is responsible to telling Carpenter
how he actually goes and talks to the cloud to scale up
and scale down applications on easy to instances on your
AWS account. So I have pre configured a very
simple example here. Just gonna paste it and I'm going to
show it to you. So what I've done so far, I've created, I installed Carpenter
and I have deployed a provisioner and I've deployed an AWS
template. So let's just quickly look at the provisioner and see what this provisioner
tells us. So if I go kubectl get provisioner
default OEM that is the name of my
provisioner. So what this provisioner is telling me
is telling me that every single,
this provisioner has a label of intent apps and we'll
see in a moment why that is important. You can also create some
limits on your provisioner. So the provisioner
will keep an account of how much cpu
and memory it has controlling and you can define how much
memory you want to give memory and cpu aggregated
on all the instances that that provisioner will create.
What is the limit? So this provisioner will never go above
1000 cpu and eight terabyte of
memory. Then I provide a specific name for my
provisioner. This is the default provisioner and here I provide some requirements.
So I'm saying that for my capacity type I just
want to do spot. So this is only going to do spot.
And then I'm saying for my instance types I don't want to be
nanomic small, medium large. I only want instance to actually
be two x large and above. And then for
operating systems I only want carpenter to actually deploy Linux
and for my app architecture I only want Karpenter
to actually deploy AMD 64 instances
and the instance category are only CM and
R. I know there is a lot here, you don't need
to do that. If you just leave all empty on the requirements carpenter will figure
out by itself. But I'm just showcasing how
flexible and customizable a provisioner can be.
And I'm saying that instant generation can only be greater than
two. So you won't be able to deploy like an m two.
I don't even know if those are still available, but it won't be able to
deploy an M two instance if that was available. And then I'm not
using consolidation this specific example, I'm just using
the TTL seconds after empty. So 30 seconds after
the nodes is empty is going to remove and then this is an
interesting one that it says Ttl seconds until expire.
You can create expiration dates on your nodes which helps you
with lifecycle managements. And if you want to do new AMI
updates, this is a good selection. You can specify
a specific number here and after that number has expired
Karpenter automatically move, create a new instance with
the new AMI if you have a new AMI in place
or use the old AMi if you don't. And then
we'll start deploying those nodes from this
old instance to the new instance. So we always keep those instance fresh.
The other object has deployed is called
the AWS node template.
And if we go just show Yaml here,
you can see that what this does, it is
responsible for. So this is the important piece here.
So what this does, this is responsible for telling
a carpenter how do you actually deploy new instances
like which security group do you use when you are deploying those instances,
which is the subnet that you use when you deploy these new
instances. And potentially you
want to select some tags on those
instances as well, right? So you can provide this
here, there are many more other configurations. You can specify an
AMI, a specific AMI here, you can do much more
configurations and you can check that out on the carpenter documentation.
But let's go and try to
do a deployment where carpenter can go and spin up new instances
for me. So let's just going to go here and
let's just going to create a specific deployment using inflate.
So I have deployed, so here I'm deploying
this object called deployment zero replicas right now.
And I'm telling here on
this deployment that I want to select nodes with
intent apps. And if you remember carpenter once the nodes
are created will have intent apps. So I'm just telling this is just
to say please don't deploy this application on the existing two nodes.
Deploy with nodes that have this label intent apps and then
carpenter will actually spin up those nodes with this label and
just doing a pause container and I'm giving one cpu
for each pod and 1.5gb
per each pod when I deploy. But because of course
if we do kubectl get deployment I have zero replicas so it haven't created
anything for me. Right so let's just
see 1 second deployment. So you
see here that I have inflate it
hasn't done anything. So what I want to do here let's just go and
actually scale this up for us. So let's just create one
replica to start with. So let's just go create one replica.
So you see now in a moment down below, keep an eye down below
you see there you go. Karpenter looked that there was spanning
pods because there wasn't any node that
was able to sustain all the requirements that those pods
had. So now it's creating an L four x large
that is its pod because remember my provisioning only said spot
it was not supporting on demand. It tells me the price and
it's an L four x large so you can see now that
it's actually spinning up the node. You take maybe a minute or so
to spin up the nodes. Once the node is up and running you see
and we can keep an eye actually let's look at the logs so we can
look at the logs for let's just do one thing here.
Let's look at the logs for carpenter so
let's look at the logs for Karpenter and see what logs
are telling us. So if you do kubectl logs and carpenter
if you see here, let's just wait a moment.
You can see yeah so you can see
that it started. Let's see where it actually says oh
sorry this is actually not so remember I
said that carpenter has two pods. I select the pod that is
not the lead is the standby one. So let's just select this 1
second let's just select this one.
There you go. Kubectl logs M.
Carpenter and let's look at the logs so you
can see that. Found provisionable pods. Oh sorry yeah,
found provisionable pods. Compute new nodes to fit pods. So he
found a pending pod, then it launched a new node
here and then it discovered the security group for my node.
It discovered the specific Kubernetes version discovered the AMI create a
launch template and launch the instance which is this instance that you see here.
And if you now look at the Kubectl get nodes, you see that
I have a new node which is the one down below here that ends with
145, that has one pod. So if you see here kubectl get
pods o wide and
we see that this pod for inflate
is running on that 145 instance. But what
happened? If I want to scale this specific, let's say
I want to scale this deployment a little bit more, let's say I want
you scale this deployment, you have ten replicas, what will actually
happen? So if I go and I do a deployment, okay, I need
ten replicas, right? So you
see that it has a schedule five nodes here and
now he said well ten replicas won't fit in this
r four x large, it just won't, right?
So what in this case will happen, Karpenter will say
okay, I need a new instance and it looks for whatever capacity
he had available that would fit the requirements. So always remember it
looks for performance and cost and it has a spin up c
xlarge so it is spinning up and we can actually see
Kubectl get pods, we can see the status of those
pods. So I have one, two,
three nodes that were running that actually fitted here.
The other two pods that are running on this
specific node are demon sets, the AWS node
and cube proxy that you need to deploy. Kubernetes will
deploy automatically but that needs to be deployed on those instance,
on every single instance. So now it's ready and we
can see if we do get nodes again off
then are running and I have nine nodes running right again
it has all the seven pods for inflate and
the two demon sets that is required for my pod to run.
But if now if I go and I finally let's scale this to zero.
So now I want to see how capitol behaves to remove.
So remember I don't have consolidation here enabled
but after 30 seconds that my nodes
are empty and you see there are two pods but those two pods
are demon sets. And you can see here that if we do Kubectl get
pods a and we look for all the pods for example that
are on 145. So let's
just do apologies, let's just bring this up again
and if we do show white, if you look for all the pods that are
running, for example the one 9145,
I mean they were gone, you see now they are gone right? Like carpenter
already removed. But if you look before 9145
you see here that this was qproxy and 9145
this was AWs node which is just demon sets that
are running. But you saw how carpenter can works.
So just because I'm running out of time here, one thing I actually want
to do is I want to enable consolidation.
So I'm going to replace my default provisioner.
So I'm going to replace my default provisioner. So what you
see on my default provisioning here now I'm telling consolidation is true.
So you can see here consolidation true.
And then 1 second just my computer is
a little bit slow. I'm saying that I only want on demand
now I don't want spot and I don't want these specific
instance sizes and that's pretty much it. And remember once
you have consolidation you need to remove the TTL seconds
until seconds after empty.
So I don't need that. So what I want to do now let's
deploy three replicas of my inflate application here.
So remember now I have consolidation enabled. Let's see
what is the difference. Right. Okay, I have said okay, spin up three more replicas.
Karpenter said okay, in order to spin up three more replicas,
now it's on demand because my provisioner said it was on demand and
not using spot anymore. He found that this specific instance
is the most performance and cost optimized for. What I'm trying to
do is now spinning up those. So we're just going to wait for those
to spin up.
Okay.
Remember the idea of actually having beam packing
and fast rescaling cluster scaling is going to be much slower than this.
You could see that it's actually pre pulling the image and making
those cached available just when the node is up and running.
So it's doing already some work before even the node is ready on
kubernetes. And you can see those details here. And off
he goes. They are actually available. So if we quickly do kubectl
get pods, I can see that I have three pods. And if you do,
you can see that I have three pods. Let's just
wait a second. You can see that I have three nodes that are schedule
132 which is the new one. So now finally let's deploy,
let's actually go and do the same. Let's do, okay, I want ten replicas.
Okay, now I want ten replicas. Let's see what carpenter does. Right, okay.
Carpenter said I cannot fit all the remaining
seven nodes that you want me to deploy on the existing
infrastructure. So now I am deploying another c, two a.
They are two x large, also on demand.
So let's wait and see what happens there.
It's taking a few minutes to actually a minute or so to scale
to create a new instance and then deploy those containers on
that new instance. But the goal I'm trying to do, you can see that
before it's even ready on the Kubernetes, Carpenter is already making
all the work behind the scenes. And you can see
all the logs here. If you look at the logs. Let's see if we get
the logs here.
It's just going to 1 second here.
So you can see the logs that it has deployed. A new instance and they
are all available. So if you do Kubectl get pause a
and we see all the pods and you see all the inflates. Now, I have
pods on 106, but I also have pods on 132,
which is 106 and 32, the one I have deployed. So what
should happen if I actually now scale my application down to
six? Remember what? Consolidation does not
only remove empty nodes, but also try to make good decisions
if we can do it. So I'm going to spin up to six
replicas now instead of
Santa one, six replicas. So you see that, okay, it has
now removed to 40% utilization.
So let's see if Carpenter will make any decision here.
It might make a decision here or it might not. So let's just wait
a few minutes. So he made a decision. He said,
okay, the remaining pods that you had,
the six replicas that you had, could actually all be
filled within the chew xlarge. So we remove my
x large instance before removed, of course,
moved all the pods into the existing node
here. Because he saw like, well, you shouldn't have two. You don't need to
have two instance. You can only have one instance, keep the bigger instance,
move the nodes to the bigger instance, and then remove the smaller instance.
So we see that. But what happens now if we go a
step above, we have this two x large.
What happens if you only need now three replicas?
Right? So you only need three replicas. And you can
see now that only 40% of the two
x large is being utilized.
Carpenter should actually go and look for a cheaper instance
that can fit those spots. So let's just wait.
Oh, actually what this is going to happen here is
because. Yeah, there you go. Because you
can see that it's now cordon. It's waiting for a new instance.
That's xlarge to actually to be available because
much cheaper than two xlarge. Right? So it's waiting for a new
instance to be up and running. Once that instance is rampant running, it's going
to move the pods from the bigger instance to the smaller instance. And once
those pods are moved and ready and running, it's going
to remove the bigger instance. So long story
short, carpenter will always be looking for
you for the best performance and cost optimized way
and you can create many different things, multiple provisioning that will
fit specific needs for your application. But hopefully I
was able to demonstrate once this finished. So you see this is ready
and it actually hasn't removed the two xlarge. And if I look at
the pods running here and we look
once, just a few seconds, when my page refreshes
here, you see that all the inflate pods are
actually now running on my specific 123
instance. So you can see 123 instances they are up
and running. So I just want to say thank you so
much. Hopefully the demo was useful. Please reach me
out on Twitter and LinkedIn if you have any questions. Go ahead and
test carpenter. I would highly recommend running carpenter on eks.
It can make your life much easier, more flexible and
more cost optimized. So hope you had fun. Thanks for
tuning in and have a great rest of your conference. Bye bye everyone.