Transcript
This transcript was autogenerated. To make changes, submit a PR.
Me. Hello. Hi. Conf 42.
Hi. Thank you so much for coming. It's so great
to be here. It's like our second Conf 42. Yeah, it's a second
of 42, I guess. I don't know.
Yeah. So thank you very much for coming. I'm Shimon. I'm Noah.
And yeah, let's get straight to it. So what are we going to talk
about today? Today we're going to talk centralized policy
management at scale. Scale.
But first let's talk about us. So my name
is Nael Barki, I'm a fullstack developer.
I'm checkwriter and also one
of the leaders of GitHub Israel community, which is the largest GitHub
community in the whole universe. Amazing. And my name is
Shimon, I'm one of the co founders and the CEO of the Tree. I'm an
AWS community hero and I live and breathe DevOps
and infrastructure and this is why we're here.
So today we're going to talk about how to manage centralized
policy at scale. We're going to show it to you in the eyes
of a Kubernetes administrator that tries to manage
the policies for the organization. But I think it can be applied not
only for Kubernetes if you're a serverless organization or any other
organization. So what we do in the trees,
we see a lot of organizations that deal with policies because we
help prevent misconfigurations from
reaching production and this is what we do and policies is
how we roll and we have an open source CLI that
runs policies and we just passed 5000 GitHub
stars. Yeah. Woohoo.
Yeah. So we're really happy but let's get into it.
So today we want to talk about how to avoid
this misconfigurations.
Yeah. And I really like to say that as a developer at a
datree, what I do, we do
policies for a living. But what I do is not only to
understand about Kubernetes, how it works and what are the policies that
we want, it's also to understand how you can blow up your
own cluster. So I really understand
why organization needs the centralized policies and this
is what we are going to talk today. And the real question,
the main question is how you can prevent the next
misconfiguration, the next production outage. And after
a lot of thinking, a lot of thinking,
100 postmortems, if you saw that talk,
we summed it up into three major steps.
So step number one or step number
zero? Actually the first step, how many fingers
does a developer have 4012.
So step number the first step is meet OPA.
It's your policy engine. OPA is a general proposed
policy engine. It gives services the ability to decouple
decision making logic from policy enforcement. You can
basically think about OPA like a super engine. You can write your policies
into it, you can publish your policies into it. So whenever
you execute it with every input
JSON input, OPA will
check if it violates any one
of your policies that you published. You can talk to it using API
or importing it as a library, and it will evaluate the business logic
of whether it should allow a policy to pass or not.
Yeah, and there are many different use cases, not just to
check core configuration, there are many authorization use
cases that are being used with OPA and the other things.
Yeah, but the real beauty of OPA, the real magic behind
OPA is that OPA enables you to
offload and unify all your decision making logic
into a dedicated server. Yeah. So you can decouple your
business logic from your application logic from your
decision making of whether a user can perform an action of delete,
let's say. So you offload it to a different service that does all
of this calculation, and then you don't need to build this
logic of policies inside every one of your microservices.
Yeah. And this really empowers
admins in the organization to have control over
their system. Yep. So moving forward,
by the way, I'll just say that OPA is part of the CNCF
foundation. It is a graduated projects. We really recommend doing
that and using it. Yeah. So to use OPA,
it's very simple. First you need to integrate with OPA
and you can use OPA as an embedded package inside your
project. If you're using the Golang language or as
a host demoing, you write your policies in regular
language. We'll talk about later. And you query OPA
by sending an HTTP request with the input
and OPA will do the rest. Or you can call it as
a library and just call it directly. Yeah,
and this is can example of the regular language. It's a declarative
language. It's very easy to learn from
experience, I'll tell you that. And this is can example of
a policy written in Orba which violates if deployment
resource have the app label. It's very
straightforward, you can mark
my word. Yeah, it's a nice declarative language
and there aren't many loops and
stuff like that. It's like what you see is what you get. This is,
I guess what they tried to do? Yeah, it's more like SQL than Javascript,
I always like to say. But let's move forward to step number
one which is define your policies.
Cool. So defining the policies is very important. So I
remember when I was an engineering manager for 400 developers
and one developer made a mistake and it propagated to
production and we had an outage and it's okay, it happens. I also make mistakes
all the time. But I thought to myself what can we do,
how can we prevent the next outage? And for me
we tried sending emails and stuff like that, which obviously doesn't work.
So we said okay, we need some sort of framework, some sort of guardrails so
everyone will work by them and we call them policies. Number one,
you need to define what are your policies. For example, make sure
that every kubernetes workload has a memory limit and cpu
limit and has a liveness probe and readiness probe. And every docker container
has a health check. So that's a policy. And now you want
all of your microservices to follow this policy. So number one,
define the policy. Let's say all workloads should have memory limits.
And number two, define a granular policy. So just
having a top level thing that tries to narrow
everything down, I don't think it's good enough because let's say you put a four
gigabyte memory limit, but then you have can AI workload
that needs 50gb. So start from the top as
broad policies and then go deeper and deeper to more granular
policies. Amazing. So now that you have defined
the policies in your organization and you know what you want to enforce,
the real question is how you integrate the policies inside your pipeline. And this
is very crucial. I really want you to think about where in
the pipeline you want to enforce the policies.
This decision will affect the developers and the DevOps engineers in
your organization. So the first option would be to
integrate the policies in the CI pipeline.
If you want to enforce those policies in the CI
pipeline we really recommend you to
use conftest which is also built on top of OPA.
So conftest is can open source library which helps
writing tests again any structures filed XML,
JSON Docker, pretty much anything yaml.
Of course as I said before, it's built on top of
OPA. So all the policies should be written in Rego. And another
amazing thing about, really awesome thing about confidence is that
it allows you to push and pull your policies into docker registries.
It's not only about containers anymore and to use
conftest really straightforward,
you need to download conftest, write the policies in Rego,
and then execute that policy according
onto a specific file using the conftest test
command. And as you can see here, and you will see the violation
output.
Yeah, you can really think about it as a unit testing
library. And as you can see here, we used GitHub action
just to hook conftest into our pipeline.
We used Docker pool to pull conftest and then
we ran conftest test with
this path and pretty much that's it.
This is conftest straightforward? Yeah, very simple. I really
like conftest as a developer. It made me a lot of sense.
But what if we want to integrate our policies
in the cluster? Yeah, so I'm a big believer in
shift left and I believe we as developers,
we want to find problems as soon as possible in the pipeline.
But then still sometimes you want to make sure that your runtime is
also secure. And I don't know, maybe someone cubectl something
into your cluster or I don't know what. So if you want to make sure
that your policies are also enforced and checked
on the runtime environment on your Kubernetes cluster,
you can use Gatekeeper. And Gatekeeper is a great utility,
also part of the open policy agent
GitHub project, and it uses the
admission controller Webhook of Kubernetes.
And it is much like the operating system webhook. So imagine
you're an operating system, there is a process trying to run,
then the operating system calls the antivirus and asks him hey Mr.
Antivirus, can this executable run? And then it
tells them yes or no. So it's the same thing. So you Kubectl
apply a resource to Kubernetes. Then this webhook
admission controller calls Gatekeeper. Gatekeeper runs a policy check
and says you cannot push this, it does not have memory limit.
Go fix your deployment. And then it rejects
the deployment and then the developer has to fix it. And this way
you achieve a runtime protection. Yeah,
Gatekeeper has a lot of other options that you can
configure. You can have it on audit mode, you could have it on test
mode, it's really cool, it's really cool project.
So how does it work? So you define a constraint
template, which then checks for, it says
like okay, for every resource that comes in we have a constraint. For example,
this talks about labels and it's very
simple. I won't go line up by line with you, but you say like basically
check for metadata tag and see if there is a label and then
some sort of label. Let's say you want to have a cost center, so you
want to have a namespace and a label for every resource.
And then what happens is that once you apply that,
then you write a policy and you
send a resource to it. And then you say,
okay, this resource and this policy with the constraint,
can this be applied to the cluster or should it be rejected?
And this is how it works. So it's very simple. Yeah,
but it's not the same policy. I mean, if you
decide that you want to use both Gatekeeper and conftest,
you will have to write the same policies, almost the same policies.
One for Gatekeeper and the cluster will store
those policies and one for conf test. Yeah,
so both of them are written in Rego. They're almost identical.
They're a bit different. And if it's in gatekeeper,
gatekeeper will run inside of your Kubernetes cluster and it will
be there. But you can almost use the same policies.
Yeah, but you have it like twice. You have two
instances of the same policy. Okay,
amazing. So using
conftest and using key gatekeeper, you can practically protect
yourself, protect the entire pipeline.
Totally. So it's great. You integrate it directly
within your source control and you can be projects from dev to
production. Yeah. So the next step is how
do you control, review and monitor what you've done?
For instance, as I said, we're in my previous role,
so we had 400 engineers and like 1000 git repos.
So let's say you define the policy which is by itself,
you need to think like, which policy do we want? Which policies can
there be? And then you need to integrate it into each one of
your builds. But then let's say
I want to make a change and I want to introduce a new policy.
So what do I do? I open 1000 pull requests.
So if I'm using raw
solutions like gatekeeper and conf test, this is exactly
what you need to do. If you use solutions like the tree,
it comes built in. But just so you know, it is
important to be able to dynamically change the policy. It is
important, number two, to have full control and visibility into
which policies ran on, which resource, what was rejected,
what was passed, what is your status. So it's
sort of like having a command and control solution.
And those are the things that don't really come
built in with OPA and Gatekeeper and conftest. And this
is what the tree is complementary about. So we
come with predefined battle tested rules that you can just use
out of the box. You can write custom rules, and you can
also have an enterprise grade control and management so
you can oversee everything that happens and dynamically assign
other policies and change them on the
fly without having to change the code itself. So how does
it work, Noah? This is
really straightforward. All you need to do is to
install the tree and execute the tree test
with the path of the files that you want to test, and the tree will
show a full output with guidelines to how to solve
every violation and where that violation occurred.
As you can see here. And it's free,
it's an open source, and since it's my teammates and
mine code, I encourage you to submit a
pull request and
we'll try to get to it in time because it,
but we promise we read everything. Yeah.
Cool. So thank you very much, Noah. Thank you very much.
Iman. Thank you very much. Thank you. Conf 42 bye.