Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to this deep
dive. Talk about the Kubernetes operators.
Well, let's begin. Well before
we go forward, I think it's
always a good idea to give ourselves a quick primer about
Kubernetes architecture, right? How kubernetes is
constructed, how it is being laid out. So we all know
that kubernetes can be
seen as a composition of two different categories
of workloads, right? These is
the control plane which you can relate
to like a brain behind your Kubernetes
cluster. That's where all the intelligence is built.
And then you have the worker nodes, also called
as data plane. This is where your
custom applications, custom workloads are
designated to run. Now control plane
is actually an overarching term for a collection
of different services which are native to kubernetes,
including your API server. That is kind
of the gateway through which you interact with a cluster.
Then there is controller manager. This specific component will
be our area of focus as we move forward because this is exactly
where the control loops reside and control loops are
indeed the foundation for building operators.
Then there is the Kubernetes scheduler which is
responsible for scheduling your pods, your workloads,
finding the appropriate node or nodes
for your workloads to execute. Then the
persistent store or the key value store which is
used by Kubernetes where the state of your resources
is actually saved, called as EtCD or etzed
depending upon how you pronounce it. Then there is a cloud
controller manager which is sort of an entry controller
manager consisting of control loops, but it is
dealing with all things external to the cluster and can be
very specific to certain cloud providers. So we are
not going to go too deep into what a cloud controller manager
is and what its roles and responsibilities are.
However, just for awareness so that you know that there
are in fact two controller managers in the control plane,
the controller manager responsible for managing
and operating upon the native Kubernetes
resources within the cluster. And then there is the CCM
or cloud control manager. And then of course you have
the data plane or these worker nodes which is
where your workloads get scheduled and execute data
plane in Kubernetes. Every node especially will
also be running a couple of pretty specialized
workloads. One is your kubelet which is sort of a node
agent in Kubernetes, and then the kuberoxy which
basically uses IP tables for various
different purposes like packet filtering,
source netting, dnating and to an extent load
balancing the traffic. When you create your
services with a set of pods as these back end. So it
serves a bunch of those different purposes,
networking related mostly.
As we move on,
let's try to build some foundation first.
Right? So we touched upon the
controller or the controller manager, right?
Controller manager is responsible for
executing a bunch of control loops, but what these control
loops are, what specific
action are they going to perform or
are performing and on what.
So control loops are implemented by?
Well, pretty obvious controller and
their responsibility is really to watch over
the state of the
specific Kubernetes API object that they are
assigned to. And then based
upon the state changes that they are watching,
perform a specific action, either make
a change or request a change and delegate to someone
else, most likely another controller to act upon it. But the
idea is very simple, right? It's a continuous
loop which is noticing
these state changes as they are coming in and
taking a certain action. But what's these purpose of it?
Right, the purpose is that
as and when the stateful change happens,
a control loop would actually try to reconcile it
with the state that is actually being observed
in real time in that very cluster.
So that is the pretty foundational
or fundamental aspect of the mechanics
on which a control loop operates. That is, to watch for the state
changes, compare it with the actual
state, and then take the appropriate action
to reconcile or to match the
desired state. That is something that you apply as a cluster administrator
or as a developer deploying your workloads on that very
cluster with the state that
these Kubernetes control loop happens to observe in
the cluster. It could be something as simple as scaling
out the number of replicas for your web server,
or it could be something pretty complicated like
provisioning persistent storage for your
database workloads and setting up replication
backups, snapshots, point in time recovery, et cetera, et cetera.
All right, moving on. Now,
this is actually a
pretty oversimplified view of what
a control loop might actually look like in Kubernetes,
right? But I wanted to keep things simple and
not to kind of overwhelm everyone with
all the complexities. Otherwise this
diagram could get really messy, trust me.
But yeah, the idea is like, hey, we are in the very
initial phase of understanding
what controllers and control loops are. So let's start with something
which is clean and simple. So as we
can see here, right, a pretty simple flow.
You as the cluster administrator or
cluster user apply the desired state.
The control loop is executing continuously,
watching for those resources as are they're being created,
updated, deleted, modified,
and take specific set of actions.
So let's say you deployed a web server with three
replicas control loop will say, hey, I see
a new deployment coming in. Let me go and
start spinning up three identical pods which
basically meet the criteria that you just defined
through your manifestation or through your yaml. And that
is the whole value proposition of control loop
if you ask me. Because look, as an end user,
you are not really dealing with pretty
low level or the intricacies of kubernetes.
End of the day, when you supply
a deployment or when you supply a stateful set, everything is
going to translate into a set of pods and pods, into containers
and whatnot. But this is the beauty of control loops,
this is the beauty of controllers, that these are letting you work with
kind of more higher level abstractions,
what is also called as controller objects.
So if you look at deployment, stateful, set job,
cron jobs, all these are basically controller
objects which offer you kind of a higher
level abstraction so that you don't have to deal with low level mechanics
of kubernetes. Like what are the resources that should be created
when you create a deployment, right? And the additional benefit
is that not only you are being abstracted away
and being offered something more simple in order to provision the workloads,
but there is this continuous reconciliation which is happening
behind the scenes to make sure that
the state of your application remains as
you defined it initially, so that there are no surprises.
Like you go to bed in the night and one of
the pod or one of the replicas of your
deployment crashes out. Control loop is observing that,
hey, I see there's a delta. Because the user wanted three
replicas of this web server. I see only two. So let me go and
request another replica. Right? That's the whole value add that
control loops actually bring in. I have
linked one excellent resource here.
There's a book programming kubernetes from O'Reilly.
I think it's an excellent book. If you have some
go programming background for you, go check it
out. There's a lot of good information about the
client APIs and the internal mechanics
of Kubernetes, how control loops really operate.
It's a very good resource if you are into Kubernetes native development.
All right, so we test upon the control loop.
But what are these building blocks of the control
loop? Right? I mean, if you were to design, if you were to
create a controller for your workloads,
how would you basically go about creating
that? Right? So there
are several different frameworks,
utilities, sdks which
are available today in order for you to
start writing your own operators or custom
controllers. But fundamentally,
regardless of what SDK or what
framework you choose, there are
going to be three fundamental building
blocks, three fundamental components of any controller,
which are the informer, the work queue and the
events. Okay? And they all
have a pretty specific assigned
set of responsibility within your controller.
So informer, as an example, watches these state
changes, right as it propagates
in the cluster. It implements resync and reconciliation
mechanism that we just touched upon.
These, there are work queues which are basically responsible
for queuing up these changes. Implement retries
if something fails upon the first attempt
to reconcile, and then there are events
which are basically the native representation
for these state changes. Think about the crud operations that
you perform on your resources, creating a new stateful
set, creating a new deployment, updating a number of replicas,
or a replica crashing out for whatever
reasons, and your controller
is now responsible for creating
another one. So all of these state changes,
all of these modifications
which take place are represented as events
to your controller.
Well, I apologize for the not
so clear sequence diagram because indeed there are a lot
of pillars and boxes here. That's why I have linked
the resource where I kind of drew
a lot of inspiration from. There's an excellent blog
from Andrew Chen and Dominic Torno
called the mechanics of kubernetes, which is very thorough, very detailed,
and it was written four or five years ago, so it's been
some time since it's out there. Please go and check
it out and how they have kind of laid out
how Kubernetes control loops really work behind the scene.
I think it's a fantastic resource if you want to dive deep into controllers
and control loops, but on a high level.
What is represented here through
an example is basically the extent of coordination
that happens within a Kubernetes cluster when
a state change is detected. So one thing to note
is that kubernetes by design is distributed.
I mean, right now we talk about monolithics
and microservices, which is better versus
which is worse. So we talk
about all this. Kubernetes was fundamentally
designed and architected,
keeping a distributed system in mind.
So we saw the architecture,
right? And how the control plane is comprised
of different components. Why all the business logic
is not built into a single binary for the sake of scale.
Of course, distributed architectures are capable of scaling
better. There could be challenges,
especially if these components require a
lot of coordination between the different components. It can
get complicated, it can get complex. But Kubernetes
is built with scalability,
reliability and efficiency in mind,
right? And with microservices distributed
architectures in general, you get that,
right? So again, going back to
this little diagram here on the left,
it is just a representation of these different
control loops getting involved when you actually request
a deployment to be created, right? So let's say I
want to deploy Nginx as a deployment in a
Kubernetes cluster and I want three
replicas of it, right?
As a user I can happily apply my manifestation
and well go and fetch some coffee.
But let's look at the work that Kubernetes is
performing through its own control loops.
When you perform an action such as creating a
deployment in the cluster. So of course API
server will intake your request,
right? It creates the deployment resource
persisted. What happens afterwards is really interesting
because after the API server you see a bunch of different controllers,
right? So there's a deployment controller which is watching
over the Kubernetes API resources of type
deployment. Deployment is a kind in Kubernetes.
Now it sees that hey,
a new deployment has just been created.
What am I going to do now? I'm going to
request the API server to create a replica set.
So it sends out a request to the API server say hey,
corresponding to this deployment, can you create a
replica set? API server says yes, why not?
It creates another resource. You can think of
that as a child resource of the deployment.
But the resource created now is a replica set.
Then there's a replica set controller which is watching over
the creation of new replica sets. It observes that
these, there's a new replica set that has just been created
and it says it needs three replicas,
three identical replicas of a certain type of container.
The container here being your NgInx web server.
But how many do I have right now? None.
So let me ask the Kubernetes API server
to go ahead and create three
pods which will represent three replicas
of this deployment or this replica set.
Again, Kubernetes API server creates
the pods, pod is created.
After that what happens is the scheduler which
is another component of the control plane that we saw in the architecture
overview looks at these pods and
figures out that hey, these are new pods and they need to
find a node in order to execute. So let me go and find
the appropriate nodes for these pods.
It happens to find it and your pods get scheduled
to run on a certain node. Then these kubelet,
which is also a kind of control loop
is looking for the pods which
are designated to run on the very node
where this kubernetes is. You can have like hundreds and thousands
of nodes in a Kubernetes cluster,
right? And each node is going to run Kubelet. So think of
these individual kubernetes on these nodes as a control loop
of their own, which are waking up periodically to
check if there are any pods which are
scheduled to run on the node on which they are executing.
And if they find, then their next job
is to go and well launch the
containers. And that's how your workload comes to
life. So as you can see, right, there's a
lot of coordination happening between different components.
But the idea here is that how beautifully through
the control loops Kubernetes is basically providing
you this automation and a pretty good degree
of abstraction that you as an end user is
only requesting one single API resource,
which is a deployment. And behind the scenes all
the complexities that comes in,
whether it is creating replica sets, then the pods, and then
finding out the right node to
run these pods, because it depends upon the resources
that are available on that node versus what you are requesting in terms
of cpu and memory. Kubernetes does
not want you to worry about all of this. And it
basically provides that automation, that intelligence to
deal with the lower level mechanics, which is a huge plus.
And that's one of the biggest value proposition of Kubernetes, right. Why it is
so popular? Because yes,
workload orchestration, workload management, all that is
good, right? But look at the degree of automation it provides.
Load balancing, scaling out, auto healing,
auto repairing, restarting the crash containers.
This is all fantastic right?
Now if you move on here now again, it's a pretty
difficult to see type of a screenshot here,
but the idea is not to kind of go
deep into what this code is trying
to do and in which language or runtime
it is being written. So it's go code, it's a snippet of go code.
Kubernetes is written in go. And if
you do not know go or have not programmed enough in go,
that's perfectly fine. But I would still encourage all of you to
at least if you really want to take your Kubernetes expertise
to the next level and maybe in future become a
contributor to one of these projects
hosted under CNCF. Majority of CNCF hosted projects are
actually written in go. So as Kubernetes and
here what this code is trying to do. By the way,
this is the source code for these replica set controller which we just discussed
on the previous slide, right? This is the real code
hosted on GitHub. You can go and check it out if
you want to. After these talk, this is the real code.
And what this code is trying to do is just
do something pretty intuitive, pretty simple,
right? It's just looking at the desired number of
replicas. And if the replicas
requested are less than the
replicas which were observed in these cluster,
it's creating new one. And if the
replicas requested are less than what
is observed in the cluster, it is going ahead and terminating some pods,
right? But end of the day, the idea
is very simple.
Whatever is the persisted
state in the persistent store
of your kubernetes cluster, which is etcd
or et cetera, that is being taken as a
source of truth because that is what the declared
state is. And all it is trying to do is
to basically match that declared state,
which is a desired state, to what is
being observed real time in the cluster,
right? So that's pretty much how all
the control loops are going to work depending upon
what type of resources they are dealing with. That's why I thought it would
be useful just to kind of put this little code snippet without worrying
about whether, you know, go or not. Or your
preferred programming language of choice is not go. That's perfectly fine,
because the idea is, again, not to overwhelm anyone with the mechanics
of go and how go language work, but just
to kind of show you that how
a controller is going to execute
its assigned set of responsibilities. When it comes to
resource state reconciliation,
there is another very important
concept in Kubernetes to understand. So we
discussed about Kubernetes being distributed.
It's a collection of different components which you can
broadly think as different microservices
kind of work in a cohesive fashion together.
The concept here is the optimistic concurrency.
Now, why optimistic concurrency is important?
I mean, it's very obvious that when you work with
distributed systems, right, and then you have
state and the state is shared,
integrity of the state becomes very important.
Now, Kubernetes is meant for
running workloads at scale.
That is why it consciously does not employ anything
like resource locking or synchronization,
because that could be a hindrance to performance
and scalability both, and might actually
result in higher resource usage
than it should otherwise. But at the same time,
maintaining the integrity of the
state is also equally important.
That is why Kubernetes employs something called as
optimistic concurrency.
Optimistic concurrency, to put it simply,
is Kubernetes way of dealing with
concurrent rights that might be affecting the same resource.
So what kubernetes does, basically behind the scenes,
it maintains a resource version with
every resource, right? And as and when this resource undergoes
changes, the resource version keeps changing.
So as a client,
when you happen to fetch the resource,
make an update and go
for persisting that resource through
the API server. The Kubernetes API
server is going to check if the resource version has changed
since then, right? And if it has,
then that request is going to be rejected
because you might be operating upon stale
data, right? And at this point in
time, what Kubernetes, typically this is
what even the control loops will end up
doing, and we will see that. But in this case,
your client, let's say it's a control loop in this case,
becomes responsible for handling
these errors
related to concurrent writes and
simply re queue these resource to be retried
later so that you can be
sure that when you retry you
fetch it again. And hopefully this time you get the latest and
there are no concurrent rights going around. And then you make your mods
and these you apply it. So that is what
the principle of optimistic concurrency is,
that don't do any explicit locking,
don't do any synchronization, rather rely
on resource version. And if a specific
client happened to run into a problem where its request
to write or change the state of the resource was
rejected, re queue,
refresh, update, and then try.
All right, so let's move on a little bit.
Let's talk about operators now. So we discussed about
control loops, we discussed about reconciliation.
Operators are basically control loops,
or you can think of them as custom control loops and more.
But what is that more, that more is the
operational intelligence that an
operator has about your workloads.
Operators were first introduced in 2016. So they are not a
very novel, very new concept. They have been
in use. And one of the founding principles
for creating operators, or the whole framework to
help you create your own operators
was that, hey, can we codify,
can we translate all the operational knowledge
that the support engineers
DevOps, engineers, cytoliaty engineers
have developed over time by operating a
specific type of workload in production.
So think about a database, right?
When you create a database, it's not just about start consuming
them as and when you create it.
There's a lot of operational exercises
that you would have to do to manage a production
scale, a production level database. Think about
backups, think about snapshots,
think about point in time recovery,
think. But the transaction logs, how you archive them, how you
back them up, think about high availability,
think about replicas, right? So there
is a lot of operators complexity. Now,
back in the days, maybe you had individuals who did that.
But today in cloud native ecosystem,
running your workloads at scale,
you want to automate as much as you can and reduce
this toil. It was okay probably if you had like two
or three, but you can't do this for thousands
and thousands of databases that you are running in production, right? So how
do you automate, especially if you were to run these
databases in Kubernetes as stateful workloads,
how do you basically bring about this operational excellence
through automation, so that you just
worry about creating your databases and
leave the rest to Kubernetes?
In this case, that is where operators add
a lot of value. And this is exactly the purpose
for which the operators were actually created,
right? It is an end user.
You work with these abstractions,
maybe like a simple manifestation which defines what
your database is, provides some fundamental information.
And when it comes to the more
like day two, day, these type of operational
exercises or operational activities,
you leave it to Kubernetes. There is an excellent blog,
it's a pretty old blog written in 2016 by
a few folks from Coreos, the company which actually created
the operator framework, which is pretty widely used
today to build custom operators
and controllers. You can go and
check out this blog. But this blog kind of
lays out the ideology behind
creating operators, why we need them, and what
is the whole value proposition of having them.
So any operator in Kubernetes
has two fundamental
building blocks and we will discuss both of them.
However, we already touched upon what controllers are.
In this case, it's just going to be custom controllers,
which you will write using the available utilities,
sdks and libraries. But there is one more
concept here, which is the Kubernetes API
extensions or the custom resources.
So what are custom resources? Right, let's dive into
that. Custom resources. Well,
it's not a new concept. It's been out
there since pretty primitive version of Kubernetes,
which is one seven, right?
But custom resources offer you
a mechanism to extend
the Kubernetes API,
which is to help you define your
own custom kinds, right? I mean, if you look at deployment,
stateful, set, pod, these all are
a kind of object,
right? These all are predefined kinds in
Kubernetes. Kubernetes understands these. When you deploy
a deployment through a YAML file and
you mention the kind is deployment and the API version is
this, which is the API group, and the version of the API,
Kubernetes has an understanding of it, it knows what it
is going to do.
Similarly, if you have very
specific type of workloads, and again, I will
take database as an example,
right? Yes. You can deploy, let's say a
postgres database in production as a stateful set and be done with
it, right? But is that
the only thing that your database as a
system consists of? Probably not,
right? Because there is of
course the storage part, which is the persistent volume and
persistent volume claims.
Additionally, you might want to create a couple of
service fronting these pods so that your client can
connect to these databases. You may want to define
some access patterns, some database users
their roles. You might want to define
some secrets and passwords
to be stored either natively or outside of the cluster,
depending upon your architecture. So you can see
or you can probably imagine by now that hey, it's kind of getting complicated,
right? Because a database in a Kubernetes
cluster may not just be a single resource, it's basically a
collection of different resources,
right? So how do I basically make it more abstract,
something more generic which represents
database to the end user who's
deploying it, but at the same time does
not overwhelm them, right? And also
let Kubernetes figure out that what are different
lower level mechanics it has to apply in
order to honor this specific
abstraction or the resource abstraction and get
it functional and running, right? That's where custom
resources come in, right? And they're
pretty common. I mean almost every Kubernetes
cluster that you are dealing with today in production would probably have some
or the other custom resource either deployed explicitly
by you or by the provider that you use, whether it's AWS
or Azure or GCP, because that's how
these providers are deploying a lot of managed components for
added value is through custom resources and operators.
But if you look at all these different CNCF hosted
projects, istio Flux, argo link id,
they heavily use custom resources and the idea
is the same. Take istio as an example. So service mesh,
right? And you just want to define,
hey, these are the different policies that should
exist. These are the different authorization
rules that should exist,
right? End of the day this has to translate into lower level constructs,
right? And that lower level construct could be something
as simple as an iptable rule,
dropping the packets when one ip attempts
to talk to another ip. But as an
end user you don't want to deal with iptables directly,
right? And probably there is no way for you to deal with them directly either.
So how do you go about configuring them? You use custom resources
which will tell Kubernetes in a way that hey, this means
some change in the networking stack so let me go and perform
it as a user, you don't worry about it, right?
So that's the idea of custom resources is to help you define those higher
level abstractions that end of the day you want to offer to
your own end users, but at the same time are being understood
by Kubernetes, which is responsible for taking the
actions at a much lower level.
But how do you go about building custom
resources? You just can't create custom resources out of thin air,
right? Every kind,
every type that you deal with in Kubernetes has
a schema. It has to follow a set of rules,
right? I mean, a certain property can be of type,
integer versus string versus boolean versus
a map or a list, a dictionary,
right?
So when you create custom resource in Kubernetes,
you actually start with something called as custom resource
definition, right? Custom resource definition
is essentially what provides you a
well defined schema for creating your custom
resources.
Sometimes we call these CRD, sometimes customer source definition.
I think you will hear the word CRD crds a lot
throughout your Kubernetes journey. But crds essentially
are what which provide the schema definition
for your custom resources. So the sequence
would be that you will write a CRD first and
then you will apply it to your cluster. Then you will write a custom resource
based on the CRD and you will apply the resource to the cluster and
when you will do so, just like how API server is
capable of checking the native objects for correctness
and for adherence to the schema. Similarly,
your custom resource would also be checked against its definition,
whether it is meeting the criteria or not. And if not,
your request would be rejected and the resource will not be created.
Again, there's a quick sneak peek. This is from the official documentation,
Kubernetes documentation. You can go and take a
look yourself, but these
idea here is that you have a CRD on the left.
You define your API schema and then you eventually
start translating or creating resources out
of this CRD. As long as they adhere
and they comply with what you have defined, you should
be able to create these custom resources in your cluster.
All right, but here's the problem, right?
In one of the earlier slides, we saw the sequence of
events which happen when you
create a deployment, right? I mean, a bunch of controllers
getting invoked and acting upon it within their own capacity
and doing something, right,
because they're aware of a certain type of resource,
whether it's deployment or replica set or pod, whatever,
right? But what about this custom resource?
Who's aware of it? Yeah, it got created,
it got persisted, you can query it. But what's really happening?
Technically, nothing, right? Because the
control loops that Kubernetes provide, they are specific to
a certain kind. So in that case, that kind was either a
deployment, a replica set, or a pod. Now,
I have defined my own custom type here. There's nothing
in that cluster. There's absolutely nothing in that cluster which
is actually aware of any action that
can be taken when this type of custom
resource is created. That's where we write custom
controllers. That's the missing piece of puzzle. And when
we glue them together, these is exactly what
we get through operators, right?
The custom resource that we define,
we create, and the custom controllers which
are basically the control loops which are now going to be aware
of this custom resource and will implement the
business logic, which will take a
list of action and will deal with the lower level mechanics
of kubernetes. Whether your custom resource demanded creation
of a stateful set and persistent storage,
a bunch of secrets and config maps, or it requested
something else, doesn't matter. But now you have
a controller who's looking for it and
the very last bullet point here, right, these operator SDK.
So that is a utility that we will actually use or we'll actually
take a look at to
build out a custom resource and a custom
controller. There are a couple of
other frameworks also, but we will walk
through operator SDK in this talk.
Of course, it's one of the more widely used and pretty
easy to use framework.
Yeah, we will quickly take a look at it. Like all
the boilerplate stuff that operator SDK
provides and how it basically simplifies the operator
development in kubernetes. So there
is Kube builder framework.
We briefly talked about the operator SDK, which is part of the
operator framework. It's not super important to
know Kube builder framework in
and out, but what is definitely good
to know, since you will probably be dealing more with,
if you happen to work with operator framework and operator SDK,
you would be dealing with these specific toolkit that operator SDK
provide. But Cube builder awareness
is important because one thing to note
here is that operator framework is
actually built upon cube builder framework. So Cube
builder framework existed before and then
the operator framework kind of came in, made it
a little bit more simpler, intuitive to use,
but the fundamental building blocks, right? Things like
informers, workers, clients,
reconcilers, they were all defined by these cube builder
framework. So just for your awareness that there is Kube
builder framework, of course you can build
an operator using Kube builder framework directly. Many people do that.
And if you want to explore more about the Cube builder
framework, there's a link here at the bottom.
There's an excellent online book about Kubebuilder,
which also has a lot of examples and
tutorials and DevOps for you to take a look at. So please do refer
to it. And then there's operator SDK,
of course, right. So operator SDK is part of the operator framework.
It is based on,
you know, developed by Coreos and Red Hat together.
And like I was mentioning the boilerplate
stuff on the previous slide. So operator SDK actually
uses a bunch of libraries
like controller runtime, API machinery
and many others actually to make the
operator development easier,
right? And take care of some
rudimentary stuff which
you otherwise as a developer would probably prefer not to do,
right? So scaffolding,
creating automated unit test cases,
a lot of code generation, bootstrapping, a way to
run your operator locally while connecting
to a remote Kubernetes cluster for testing purposes.
And what is more interesting about the operator framework
that we saw examples in the previous slides
from the replica set controller and how it was written in Go.
But that's what I was mentioning, that you don't have to kind of bog yourself
down if you do not know Go, because operators
SDK not only supports writing operator with
Go, but you could actually write operators
with ansible and help.
And the link is there at the bottom. You can refer
to the operator SDK documentation or operator framework documentation.
There are a lot of details out there, but the idea is
that if you have
a choice of runtime and it's not
go, but rather you are comfortable writing your operator in something else like ansible
or helm, you could actually do that. And by the way,
if you have, let's say,
python programming background,
there is an operator framework which
is called as Kopf or cough, which is Kubernetes operator
pythonic framework. You can check that, but as well.
So like I was saying, knowledge of Go is
nice to have, especially if you want to kind of dive deep
into some of the engineering decisions that Kubernetes
team has made or will make in future as the Kubernetes platform
continue to evolve. But if you are trying to extend Kubernetes,
if you're trying to create your own custom resources and operators and write your own
control loops and you do not know go, that's perfectly
fine because there are other options out there and people are using frameworks like
Kopf. This talk is revolving around
Kube builder and operator SDK or the operator framework essentially.
But like I said, it's not a hard limitation.
Now, I have given some pretty basic commands
here for you to refer. And of
course, given the time foundation and the
length of this talk, while I won't be able to do
a live development of an
operator, these are still very handy commands,
and there's a lot of documentation and information about what these
commands are really trying to do. If you refer to the official documentation
of the operator framework. But executing
these commands and what you should expect as you
execute the command, that is something we will definitely go
through in a moment. I will share my
screen with my ide so we
can actually take a look at all the scaffolding,
all the boilerplate code generation and what's really happening
behind the scene. But these are some of the key commands
that you would actually need to know or need to be aware
of, starting from the initialization where I give
a appdomain option. So that domain option would basically become a
part of my API group, right? If you
know that API groups are basically qualified subdomain names in
Kubernetes, the repo here is nothing
but kind of a name for my Go module,
because end of the day my operator would be packaged
and served as a Go module. So this
GitHub.com slash acme redis operator,
no, it does not have to exist somewhere on GitHub.
It is just the go module naming convention.
The second command creates the APIs and the types,
the Go types. So very important to understand as well,
the group here is cache. So when you will actually create your manifestation,
like how you see apps, one when you create a deployment
or networking k IO if you create a network
policy, when you will create a
resource of type redis which is
basically provided in the kind option. In the second command,
the API group would be cache appdomain,
which is Acme IO, and then the API
version. So if you're comfortable, if you know how
the version semantics really work within Kubernetes. So we
are starting with v one, alpha one, which is of course not a production
ready API. And as we mature it, we go to beta one,
we go to beta, and then we go to g. Eight becomes v one,
right? So that's how the Kubernetes version semantics really work.
Again, not something we need to go into details.
And then of course there are some make commands here. Operator SDK
uses a make file with some specific target,
and each command has its own significance, starting from generating
the types, the kinds to generating manifestation,
which involves creating crds and some
bases and some samples and some cluster
roles and role bindings for your operator
so that it has the appropriate permissions to operate on a specific type of
resource. And then the make install run command is
basically a utility command which
is included in the framework to help you run the operator locally.
However, you can assume that this is only for local testing
and development, and as you would
develop an operator for your production systems,
end of the day, your operator would be deployed to your cluster as
a Kubernetes deployment eventually. Right?
So let's do
a quick walk through. So I'm just going to
unshare this for a moment and
share my screen. Just give
me a moment here.
All right, so let's see a working example
of an operator. On the previous slide we
saw bunch of different commands with
respect to the operator SDK and what each and every command would
actually do. However, the idea is kind of not to
go into the details of what I am really getting by
each and every individual command that I execute, because you
can very well find that level of details in the official
documentation of the operator framework. The idea here is
to kind of really make you understand that how
these operators are going to behave at runtime.
Right now, before we begin,
I just want to kind of make you aware of the directory
structure that I'm using here. So conf 42 Redis operator
is my project directory. Now here you see, these are a lot of subdirectories,
a bunch of files in here. There's a lot of go code.
Let me make you understand a
couple of things, right, the operator
framework, or even for that sake, if you're
using cube builder, their job is
to simplify the operator development
task, and they do so by
taking care of lot of boilerplate stuff that you
will otherwise have to write by yourself, right?
So there's a lot of scaffolding that happens behind the scene as
you run those commands. That includes generation of lot
of these files, go code markers,
annotations. Even your custom resource definition
is created based upon what
values do you provide against some of those options when you do an operator
in it and create APIs? There is also,
like I said, a lot of go code that gets generated.
For example, redistypes. Go basically
defines the go structure representation
for my custom kind, which in this case is Redis. Again,
if you do not know go programming or have not worked
exclusively in Go, that is perfectly fine.
The idea here is kind of not to make you
a Go expert or assume that you are a Go expert, but just
to kind of show you the whole value proposition of
using a framework like operator framework, right? All these go
files, trust me, I didn't write anything from
the scratch. A lot of skeleton code was created for me
and then I happened to just kind of decorate this
code as per my needs and my requirements and run some
of those, make commands to create my
CID, generate a skeleton code for my
controller, things like that.
So if you look
at right here under config CID,
since we just mentioned CID,
look at this, I have my custom resource definition created,
right? This is where I have the whole
API schema for my custom resource, which is redis
in this case. And what's going to look like it has taken care of specifying
all different things here, right? Whether the group,
the kind,
the API version, whether my type is
namespace is scoped or cluster is scoped, all these details are actually
captured under CMD.
I have my main go. So if you're familiar with,
you know, package main function main is always going to be your
entry point for the program to execute.
And here I see that with the help of controller runtime package,
which I'm importing here, I'm able to instantiate a manager.
And if you recollect on the architecture slide
that we reviewed like what Kubernetes architecture looks like,
there was this controller manager which was responsible for
running and managing a bunch of different control loops. This is exactly what
it is. This is my controller manager, and this
is where I'll be bootstrapping my custom control
loops, right? And then if I go here,
internal controller, I see the redis underscore
controller go. This is where all the business
logic when it comes to handling the resource or resources
of type redis, all the business logic
goes in here, right? So what I'm going to do when a resource of type
redis is created, of course I'm going to go and
try to find one, right? Because the event says the
resource has been created. If I'm not able to find, maybe this
is resource deletion. And if it is resource deletion,
then let's see if it has got finalizers. If you're familiar
with what finalizers are meant for in Kubernetes, it's basically
to do some cleanup work before the resource
is actually deleted from the cluster. In this case, I don't have any such
complicated scenarios, but just for the sake of it, the methods exist.
So if you have finalizer go and honor the finalizer
before proceeding with the resource deletion.
Now this is where it actually gets interesting. Line 159
when I'm actually requesting a deployment for my redis resource.
Because like I was saying, redis as such,
as a kind, as a type, means nothing to kubernetes,
right? It's not a native Kubernetes object, right? I am these one
who is providing it some meaning by the means of
this controller, right? But end of the day it has to
roll out into lower level Kubernetes constructs. That is,
there has to be a deployment or a stateful set, there has to be pods,
it has to be containers, right? And this is exactly what I'm trying to do
here, is to basically try and find a
deployment if it exists. If not, go ahead and
create one, right? There is a bit of a
reconciliation logic going on here as well, where I'm comparing
these size as in like the size that
I specify when I create my redis resource versus
what is the observed state, right? How many number of replicas that I
have and if there is a difference, if there's a drift,
go ahead and correct that drift. So in a
nutshell, my control loop as we can
see here is kind of managing the whole
lifecycle aspect of a resource type redis
without even really revealing all these details to
the end user. And that is what I was trying to stress upon,
that when you deal with custom resources, when you deal with operators,
when you deal with custom control loops,
that's the whole value proposition of it, right? That you
can really simplify how your workloads
are actually represented to your consumers. Because kubernetes
is hard, it's complicated and it's probably
not a fair assumption to make that everybody
but there is super familiar with all
the lower level details and know the functionings of
kubernetes. So how can you basically abstract it out, right?
How can you basically make life simple for them without
compromising on the Kubernetes best
practices automation standards
operators gives you that control, it gives you that way.
And you can also get very opinionated, right? Because you
can only expect abstractions to be created by your user
and then you can basically build your controls in
terms of what really happens when user actually end up requesting
your custom resources, right? However you want to control it, you want to specify
some specific security measures. For example,
the container should not run as root, the container
file system should be read only or
this specific image tag may not be used or
is not approved. So think about it, right? I mean,
how far can you go with these operators and controllers? There is absolutely
no limit. And the example here is a pretty simple one just
stateless redis cache. But when you think about
more complicated workloads like say postgres
database in production, and think about all the operational
aspect of a database,
like starting from backup recovery snapshots,
point in time recovery log archivals,
everything becomes important, right? But you necessarily don't want to
assume that your end user is a database administrator,
right? So you can give them a bit of abstraction through these custom
resources and then you basically transform
them, translate them through your own operational
expertise in that specific field, which is postgres or kubernetes
in general. How do you want these resource to be handled?
What should happen when a backup is requested? What should happen when a point in
time recovery is requested? Things like that.
So let's see it in action. Like I said,
I have done a lot of stuff already
before the talk. Well, for the Kubernetes
cluster I'm using local kind cluster,
though it's not a hard requirement, you are free to experiment with
any other distribution of kubernetes, even if you have
a cluster at your disposal from EKs,
AKs or GKE. Please feel free to use that as
long as you're authenticated to the cluster and you have cluster admin rights.
For all the local development and experimentation purposes,
I prefer using kind or minikube.
They're pretty good. And for this demo I'm using a
local kind cluster. So just want to make sure that my
cluster is up and running and listening to the API request,
which it does. So great.
Now let's start from beginning and
let us start the control loop here.
So I'll be using some specific make commands to
bootstrap my controller locally. So it's not going to run within the cluster,
but it's going to run outside as a standalone process and would
actually use the Kubeconfig
file and its current context, which is an authenticated cluster admin
user towards my kind cluster to make the
API request, right? So I'm going to run this command,
make, install, run. This is going to
bootstrap my controller manager
and that's my controller locally.
All right, so it looks like my controller is up.
Now I go to this terminal here in this directory.
I have already created some sample
manifest files. This is the file of our interest.
So if you copy it and just look at it like what I'm really trying
to do,
just a very simple definition for
my redis type of resource. I'm just
requesting the cluster that hey, create a
redis cache with three replicas, right? So you could
see that I'm requesting redis,
a resource of type redis, right? But when
my controller will see it, it's actually going to translate it to
a deployment with three replicas. And that is exactly what we want to verify.
So let's do kubectl apply f
sample cache yaml.
Okay, so it says it created it. Now this resource
was a namespace scoped resource. So let's
verify if there is a resource of type redis
namespacecon 42 in it.
And there indeed is.
Now let's see if it also created
a deployment in these
namespace which corresponds to the redis resource.
And there is, you could see that there are three replicas,
all three are healthy up and running.
Right now, if I go ahead and
delete this redis
cache,
this redis cache resource, what do you think should happen?
It should not only delete the custom resource,
but if you remember, we saw it
here in the controller code towards
the bottom. When I am setting up the control
group with my controller manager that I'm specifying,
it also owns the deployment
that it creates. So that is very important.
And it actually is reflected through this
line of code right here when I'm
creating the deployment before associating it with my redis resource
set controller reference. If I don't set it,
then this deployment will actually fall under the
control of the control loop which is
built into the controller manager inside of these Kubernetes control
plane. And we don't want it to happen. We want the lifecycle
of this deployment to be managed by this custom
controller. Okay, so if we delete this
and then see if I'm still able to find a deployment.
No, these was found. So this way we were able to
kind of tie up the lifecycle of the custom
resource and everything that custom resource kind of rolled
out into, in this case a Kubernetes deployment together.
So that's on a high level how operators really work.
Of course you can play around more than that, like try
modifying the size, try deleting one of
the replicas from the deployment and see what happens behind the scenes.
But the idea is that, well,
operators are there to provide you
a mechanism to apply the operators
knowledge that you have about a specific type of workload in
a Kubernetes native way. Right? So you could
do it in maybe several other different ways. We took
the database example and of course those
who have been into DBA roles, they understand pretty well
what it means to take periodic backups,
full backups, partial backups, snapshots and whatnot. Right?
But end of the day, when you look at these systems
running at large scale and especially in kubernetes,
if kubernetes is offering you a way to eliminate this operational
toil and codify this operational excellence,
this operational knowledge that you have garnered over the years by operating
on this specific type of workloads, you're going to be benefited.
So that's all I had for these talk.
Thank you so much for joining and I hope you
enjoyed this talk. Again, thank you
so much, have a good day.