Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to this 42 cloud native
conference talk. My name is Arapulid, I'm a technical evangelist at Datadog,
and today I'll be introducing using a project called Gatekeeper
and how you can use it to embrace policy as code in
kubernetes. So that's my twitter handler.
In case you want to reach out after the conference,
feel free to do so. Let's get started by introducing
kubernetes. Obviously this is a cloud native conference, so many of you
will know, just in case. Kubernetes is a container orchestration platform
that helps you run containerized applications in production. And it comes with
a lot of nice features like it's completely API driven,
auto healing, auto scaling, et cetera, et cetera.
And datadog. Obviously this is not a talk about data itself, but just
so you know, it's a monitoring and analytics platform that helps companies improve
observability of their infrastructure and applications.
But today we are going to be talking about sometimes a little bit of
a youll topic, which is policy. But what is policy when we are
talking about services services. So basically policies
are the rules that governs the behavior for services service.
So basically what you can and cannot do with
a software service. So in the case of kubernetes, what you can and cannot
do in your cluster. So that sounds a lot when
we are talking about kubernetes and what you can and cannot do, it sounds a
lot like RBAC. So RBAC stands for role based access
control, and that's basically what RBAC
is already doing. So it helps you define
roles of what a user or a service account can and cannot do
in Kubernetes cluster. So usually you
have rules in the form of a subject, a Kubernetes API
resource, and a verb. So you can do things like for example my
user ara, for the type of resources pods
she can create, get and watch those resources in a
particular namespace, we can say, so if we already have this,
why we need something else, like gatekeeper. And the reason why
we need something else is because auth, which is what RBAC
tries to solve, is just a subset of the type of policy rules that
you want to create for your environment. To put a couple
of examples of things that you may want to enforce in
your cluster, things like only run images coming from
a particular registry or having a set of labels that are
mandatory for all your deployments and your pods. These type of things are
things that you may want to define somehow, and our
back doesn't allow you to do so. So when
we have our environments, our cloud native environments.
In this case we are using Kubernetes as an orchestrator probably.
We also have so many other things in our stack.
We may have cloud resources, we may have API gateways, we may
have service meshes, stateful applications,
et cetera. So not only you want to create policies for
your kubernetes cluster, but also for all the rest of your cloud native
resources. So is there a way that we can do that in
a very common way? And that is exactly what
OPA, or open policies agent tries to solve.
So basically OPA is a cloud native project,
it's completely open source. And the idea of
OPA is that you are going to decouple the
policy decision making from policy enforcement.
How does that work? So the idea is that OPA is
only going to get a policy query in JSON format. So very
standard, very domain agnostic, and based on some
rules that you have coded in a specific domain language
called Rigo and some data that you may or may have stored
as well, it's going to compute basically
a policy decision in JSON format. And that's the only thing
that OPA is going to do. And that's how it tries to completely
decouple that from any particular enforcement for a
service. So now you have a decision, but how do youll enforce that decision
into your service? So the way you do that with OPA is by using
any of the integration it has. So if you go to the OPA website,
these is a full list of all the integration. This is just
a screenshot. So there are many more. And because it's JSON,
it's very easy to create new integrations. There are more
and more coming all the time. And this is exactly what gatekeeper
is. Gatekeeper is one of these operations that
enforce OPA decisions for Kubernetes.
So the way gatekeeper is built is basically embeds
OPA inside gatekeeper itself as a binary.
And the way it enforce that is by using
something called admission controller. So when you create an
API request against the Kubernetes API server, it goes through
several steps, it goes through authentication. So is this
authenticated or not? Then it goes through authorization,
usually in the form of RBAC rules. And then it goes to
a third step called admission controllers. And admission controllers are a
set of binaries that are already embedded in the Kubernetes API server
that basically compute the request and
decide whether two things, whether that's valid request
and also whether you want to mutate that request. And there are two
specific ones, particular ones that are very helpful,
which are validating admission webhook and mutating admission webhook.
On those two, basically you can hook any code through webhooks,
any come to act as an admission controller.
And this is exactly what Gatekeeper is doing. So it
hooks through the validating admission webhook for now.
They are also working on doing so as well with teaching.
One, two, once you have the policy decision
from OPA, it basically enforces it at that point.
So if these request that you're doing is against one
of the rules that you have encoded using Rego,
then it's going to block that API request at that .1
of the great things also about Gatekeeper is that it was
created having kubernetes in mind. So it's
fully Kubernetes native. And by kubernetes native I mean
that everything is created through
new CRD objects. So custom resolve definitions and
custom resolve definition is a design pattern that is used
all over Kubernetes to extend the Kubernetes API.
So you can create new objects in the Kubernetes API and
then have a controller that does the classic reconciliation loop
in Kubernetes that we all love. So you're able to create your policy
by creating new Kubernetes objects and the gatekeeper controller
is going to do the actions required for
that to happen. There are many crds, well, I think there are three,
four crds that are created, but the main two ones are these two
which are constraint template and constraint. And constraint template
is where youll define your policy. But the good thing about constraint
template is that you can create parameters for those. So you
can create reusable policy by just creating
a constraint template and then instantiate that policies
into as many constraints as you want. And we are going
to see how that's done in the demo. But just to put an
example. So let's imagine that we want to have a rule to ask
for a required set of labels in our objects.
So you have a name, required labels,
you have the parameters here, like the labels that you're going to
require for a particular object and then a set of rigor. And we
are not going to go into much detail on the rigor
six syntax in this talk, because the goal of this talk
is for you to see the opportunity for gatekeeper
to be used straight away, even if you
don't know Rigo. To start with, once we have that template,
we can instantiate that template into as many rules as we
want. In this case, we have all namespaces require
the gatekeeper label. But we also have another rule that says
all ports in the default namespace require the
do not delete label. So as you can see, just by creating
once the rigor code, you can reuse that policy many
many times. So that makes gatekeeper reuse
of policies super simple. And the good thing is that usually
when youll want to start creating policy in your kubernetes cluster,
you may probably want to create the same set of rules
that many other people are going to create as well. Things like
images can only come from approved registries. That's a classic one.
Deployments require a set of labels. Container images require
a digest. All the containers that you have defined
have to have a cpu and memory limits set. These are very common things
that you want to do in kubernetes. Obviously the values is the thing that
are going to be different from company to another, but the generic rules
are very similar. So for that reason there is an open source
project, part of the OPA organization called the Gatekeeper Library.
So the community is creating all these reusable policies
that you can use out of the box, even if you
don't understand Rigo. So you can start
getting value out of gatekeeper very very easily.
So there are many constraint templates and more that
are coming every day. To put an example of what you would encounter
in that repo. This is an example of one of the templates.
It's about having only HTTPs increases
and not HTTP. So you have that template and
you have these rigo code. You can try to understand rigo code.
You can use this to start learning Rigo.
But even if you don't, it's super simple to use because it
has a description, it has a name, it has parameters or not.
And then on that same repo you're going to get not only the
template, you're going to get an example or several
examples, and the examples come in the form of an instantiation
of that template. So a constraint, and also based on that constraint,
an object in this case can ingress object that is going to
fail that validation. So very easily you can see all
these examples on that Ripper and understand first what the
template is going to use and what type of objects may fail that
one. So this is the GitHub repo for the gatekeeper
library. It's part of these, as I said, part of the opA.org CnCF
project, completely open source, so easy to use. And another
good thing about gatekeeper that I like a lot is that it
comes with out of the box observability. So it comes with a
lot of metrics like constraint templates and constraints,
number, the number of requests and the latency of those requests,
the number of violations, et cetera. We also at Datadog has
an out of these box integration. So if you're using Datadoc as well,
just without having to do much, you will get this out
of the box dashboard to start with. So you will get all
your metrics back into Datadog and we will see that as well
during the demo. Good, so let's start by,
with the demo. So this is,
by the way, I have an alias to k, it's kubectl.
So every time I type k, that is what
it is. This is a one single node. It's very simple,
it's mini cube. It's good for these demo.
And I'm running Datadog already here.
So I'm already running Datadog, I'm sending this data to Datadog and
I'm not running anything else. Obviously I have some pod in Kube
system as well. So this is my pod. We can have a
look to that on Datadog
as well. As I said, I'm already running it. So I have here my deployments,
my replicas sets, et cetera, et cetera. So let's
deploy gatekeeper. That's the first thing that we need to do. And we are
going to deploy gatekeeper just using the
default gatekeeper yaml that comes in the getting started.
So I'm just doing everything here by default. And you see
a lot of stuff has been created.
We can see that some crds were created,
four in this case. We also had a
new namespace gitkeeper system that has
basically two things, the controller in this case with three replicas,
obviously, because this is going to be
used to validate your policy. It's always to have more than one replica, but you
can define how many. And you also have an audit
pod that we are going to explain later what it's
for. So once it's running.
So if we now exec
onto the data doc, let's see
if this works. You can see that
the agent, Datadog agent has found straight
away the gatekeeper pods. So it's going to start
sending all these metrics directly without having to do
anything else.
So let's see if everything is running here.
Cool. So everything is now running and we are going
to be sending those metrics as well. Okay. Now that we have
all running, I'm going to use some of
these templates for the gatekeeper library. The reason
why I'm doing this is because I wanted to show you again how
easy is to reuse these things that are already coming out of the box.
So I'm going to use this. This is part, as you can see, this is
part of the Gatekeeper library. I just cloud
this from these GitHub repo so
nothing changed there. So I have a lot to get here
from. So I'm going to be using this, the required labels, and you
can see that there is a name for a new
object, there is some properties.
So to parameterize that and some rigor
code that is already tested and validated for me.
So the first thing I'm going to do is I'm going to create
that, so apply that template so I can have the new CRD
and I have that template already available for me. So let's find
this library general required
labels and then template.
So I just have to apply that and it's going
to create this new constraint template object. But it's not only
going to do that. So if I now do get me the
crds, it has created a new object type,
a new kind of object, which is the required labels that now I
can use to instantiate as many times as I need just by using the
same Kubernetes native format. The good thing also about this being
crds is that you can store this with the rest of
these configuration that you have for your cluster using githubs,
et cetera. So let's do that. So as I
said, every of these templates come with examples.
So let's check the example that we have these. It's a constraint basically
that says all namespaces require
an owner key, an owner label with a
set of accepted values. But instead of using this just I'm
going to copy this one and change it a
little bit so you can see how is this to reuse these thing. So going
back to the terminal, I'm going to copy
that one.
Library general required labels,
samples almost have owner I'm going
to copy this, I'm going to change
to conf 42, for example.
So now that I have that I can edit it
and I'm going to change this
to conf fourty
two and
instead of name of spaces, I'm going to do
okay, this is for pods. I'm going to change
the error that I'm going to get if
I don't find this. And instead of asking
for a value, I'm just going to say, okay, I just need the key.
So I'm going to change the name of the key to conf fourty two
and I'm going to
remove this. Okay. So next thing that I have to
do is to apply that,
okay, that has been created already. And basically
this rule is saying to a cluster, all of our pods require these.com
42 key, but we already have some pods running, we already have
the data pods, the kube system pods, the gatekeeper
pods, and all of those didn't have that label. So what happens in this
case? So this is where the audit pod, remember that one of
the pods that were created for gatekeeper was the audit
one. So basically that pod is going to check for violations that
are happening when you create new rules. But instead of removing
those spots, it's just going to get you a description so you can fix
that afterwards. That makes it very easy to have
those coming to create new rules without having
to alter your cluster right away. So how do we know
those constraints are happening? So if I say describe
the type of hobbies and the o mass
half conf 42,
something is wrong here. So let me see, probably I made a
mistake. Yes. So this has to be singular,
which is important. If not, it doesn't work. So it's pod and not pots.
So let's do that again. Let's apply the constraint
it has been configured. And now if we do this,
hopefully now we will get the violations after
the audit has synced a little bit.
Let's just wait.
There's,
okay, here we are. So we have now all the violations,
all the pods that are violating that rule.
So all the pods that we're already running. So if we go now to
data doc, as we said, we are sending all that
data back to data doc, so we can see all the latency of
the webhook request, the number of requests, and also we can see the
number of limitations here. So we had those 14 pods
that are violating. So this is also a very nice way to see if
you're the owner in your cluster of enforcing policy when
you are creating new rules. Straight away check how many violations
you have and start reducing those by modifying the objects
that you already have running in your cluster. Good.
But that's for existing objects. So what happens with
new ones? Let's imagine that now we
have this rule in place that says all pods must
have the 42 label.
So if we see this new object, we are going to
create a pod for NgINX. Very simple. And we are not
going to add the label that is required by my organization.
But I don't know. So I'm going to just try to
create that object and it's going to fail. So it's going to give me
a failure coming from gatekeeper explaining to me
why my pod wasn't being able to be created.
So if I now modify this and
I see conf 42
and I can put any value because we are not requiring a value.
So let's just put April and now
I try again. Now it allows me
to create. So you can see that I already was able to create
policy for my pods by reusing some of the examples
that I got on the gatekeeper library. Make things very simple.
Okay, let's go back quickly to the slides and then
we will do a second demo. So we've seen in
this case was simple because we said all pods
must have this particular label. So you only needed the information that you
had on the object you were planning to create to be able
to answer the operations. But let's imagine that we have
this other type of rule that we want to enforce in
our cluster. All hosts must be unique among all ingresses.
So if you have many ingresses, you have to make sure
that the host name are unique for each of them.
So for that, only having the information of the object
that you're planning to create is not enough. You also need the
information about all the rest of the ingress's object that you have in your cluster
in order to be able to answer that question. We were
talking about OPA. It uses two things to
answer policy inquiry. It uses these policy in
Rigo, but it also have the ability to have some data
about that. So how am I able to in gatekeeper
to add new data that is going to be used for that decision
making? So it's super simple because again, it's Kubernetes
native. Basically what I need to do is to create a config
object, which is a new one, and then let gatekeeper know
all the type of objects that I want to store as
data to make those decisions. In this case, ingress objects.
So let's do a quick demo about that as well.
So we are going to use again,
an example. So let's go here.
And we have this one ingress host.
We have a template as in any other
one. So as I said, these is example
I was talking about. All ingress is wholesome, Sb unique.
You don't have a parameter because you don't need to just have
the regal code that again, we don't require that to
understand. Okay, so let's create that
this is unique ingress host.
And again, we have a template here. So let's
apply that template first. And then I'm
going to also use this sync object.
And this sync object is going to tell gatekeeper
all the stuff that it needs to store to make those
decisions. So again, this is here as part
of the gatekeeper library. So even the sync objects that
are required are in the example. So let's apply
that one. So as soon as I apply this object,
gatekeeper is going to start storing any ingress object that I create
that can be then used to make a decision. Let's go to the
examples. And as usual, as I said,
the examples come with a constraint. That's the first thing
that we need to create. In this case,
the constraint is pretty simple because it doesn't have any
parameters. So it's just for any ingress that we create.
And we also have examples
about things that are not allowed. So in this case you can
see that we are going to try to create two ingresses object.
The first one, it's going to check these host
that is unique and because it's the first ingress object that we have created is
going to be unique. So that's good. For the second
one we are going to try to use these same name.
So probably it's going to fail at some point. So let's try that.
Okay, you can see that the first one was created successfully
and even though it was super quick, so I created the first
one and then the second one, Gatekeeper already had that information
on that, so it created the first one
successfully. And when it tried to create the second one, it checked
that sync objects that it already had and said
okay, there is already an ingress that has that same name.
So I'm going to stop this. So again, not only you
can create policies with the information about the object that you are creating,
but also in relation to the objects that are already in your cluster.
So that's all I had for this talk.
I hope you learned about Gatekeeper if you didn't know, that project is a fantastic
project, makes super simple to start using policies
as code in kubernetes. So check it out and
thank you very much.