Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Today we're going to talk about Kubernetes scheme of validation.
We're going to understand what is exactly the schema validation process
and why it's important and what are the tools natively
and open source that you can use to do that.
But first, let me introduce myself. My name is Yazil Berman.
I'm leading the product and co founder at the tree. I'm also
one of the organizers of the biggest GitHub community in the world
right here in Israel. I hate SQL built, I do
love Regex, and I'm also fully aware that I
have pink bunny ears. Yes, let's put it on the
table. It's a gimmick. But I promise you, if you're going to stay
until the end of this presentation, you understand exactly why I'm
doing it. So the tree is an open source tool
for developers or engineers, and it can help you prevent
any Kubernetes misconfigurations the way that it's
doing it. It's a CLI tool, so it's integrated inside your
CI CD workflow and it will run automatically. Every time
that someone is doing a change to your Kubernetes manifest,
it will run to make sure that you don't have any misconfigurations
that are introduced. When someone is making those changes,
it will verify it against predefined policies that the
organization is setting to make sure that
everyone aligned on the same policy. So that's
about the tree. But enough about the tree. Let's talk about Kubernetes schema validation.
So what is it exactly? I like to refer it as a
set of unit tests to verify that your manifest contain the correct properties,
key and value. Think about it. Let's have
this example. It's a subset of a
Kubernetes manifest and have two misconfigurations inside this file.
One, the world deployment need to start with a capital
d. The key namespace should have
a lower s,
so the schema validation should catch those kind
of misconfigurations. And you're probably wondering
who is actually writing all those unit best. So it's the community.
You can actually write a pull request and open it. In the Kubernetes
community, if you have some kind of a unit
test or another validation that you want to add to the schema
and it will be checked and it can also be accepted.
So what is not part of the schema validation?
Everything that is related to making sure that it's a valid yaml
file, or to make sure
that it's actually holding the best practices. For example,
making sure that you have memory and cpu limit. Yeah, those are
the mandatory best practices. I know,
but it's not really necessary to apply your manifest,
so it's not part of the scheme of validation. It will also not check
if you have team organization policy, for example,
make sure that it's a really common one. Make sure that all the images are
pulled from a private registry. It's not part of the schema validation,
it's part of a different flow. So how do
I use the schema validation? Good news, you don't need
to do anything. It's activated by default every time that you're
trying to apply a configuration into a cluster. Bad news,
it's probably going to be too late. You're probably going to see those
misconfigurations when you're trying to apply it. You want
to see it much, much earlier in the process when
they are introduced into your configuration. So how do
I shift it left? Well,
Kubectl have a flag for that. It's called Ryrun
and it's have two mode, client and server, and you can run it
before you do Kubectl apply.
So what's the difference between those two modes? Let's check.
So of course they both perform schema validation.
Makes sense. Server mode
also perform extra validations securing
the presentation. I will go deeper into exactly which kind of validation
it's making, but it's only supported on the server mode. It's not supported
on the client mode because those validations are not part of
the schema. The server mode and
both the client mode are not supporting different Kubernetes schema versions.
So let's think about our use case for that. Let's say that your Kubernetes
version is 118 and you want to upgrade to version 120.
So you want to make sure that all the manifests that you already have
don't contain something that's going to be deprecated in version
120. So the downside with this flag is
that you can do it. You can only check against a predefined
schema for a specific version. In this case you can only check it
against the version 118.
How about requiring a connection to your cluster? So in the
server mode you have to have a connection. Makes sense.
And in the client mode you also need to have a
connection. Wait, what? Why?
So it's a bug. So you probably think to yourself, oh,
that's a nice bug, it's going to be fixed really soon. The answer is no,
it's a bug, and this bug is actually open for a long time.
Last time that I checked they have more than 1000 open bugs
on the Kubernetes project. So don't have a high expectation
on that. So this make the client mode
unnecessary. I don't see any reason to use it.
This was the only reason to actually use the client mode because
it's not supposed to require the connection to your cluster.
So the question is what should I do?
Open source to the rescue we have two great projects
that can help you verify your schema,
your Kubernetes schema in offline mode. One of
them is called Kubevival, the other one is called Kubecon.
They are both working in the same way.
So inside the Kubernetes project you have a swagger
file. It's an open API definition of all the schema
definitions. So there is another process
that is happening by both of those projects
that's converting this swagger file into
different JSON schema files.
I'm not going to get into why they are doing it and why it's happening,
I'm just going to mention it then.
The CLI tools cubival and cube perform, they are both doing
the same way. They are checking those
manifest, sorry, they are checking those schemas when
you're running it offline. So this is why you don't need to have a
connection because they actually have their own local copy of the schema
definition inside those repositories when they converted it from
the open API into a JSON schema.
So which one should I use? Kubectl server
mode or the open source tools?
Let's do Qa. So this is a manifest,
and inside this manifest there's an invalid label value. You see
the app label so it's
not valid. And if you try to apply it, you get this nasty arrow
with basically saying that you can't start your label with
a dash, it needs to start with either alphanumeric letter or
a number. So if
you try to run it with Kubectl server
mode, it will catch it. If you try to run it with one of the
open source, it will not catch it. The reason for that is that, remember that
we talked about extra validation that is performed on the server side.
So this is one of those extra validations,
you expect it to be part of the schema, but it's not, it's another validation
that is happening. I really don't know what's the reason for that.
And if someone have an answer for it I will be more
than happy to know why is that like that?
So let's do another qa. Okay, take two.
This manifest is missing an image. You see that
I have the name and I have the pod, so I don't have the image
itself.
If I try to apply it with dry run
server mode, Kubectl will catch it.
Open source will not catch it again. From the same reason
it's not performing those extra validations that you expect to be
part of the schema. Same thing. I don't know why not.
So who is the winner in this case? If you
have a connection to your cluster, of course Kubectl is
the winner, but in the
majority of cases you don't. And if this
is also the case for you, and you don't have a connection
to your cluster, because you either don't want to have your developers
to have a direct connection to the cluster, or you don't want your CI
machines to have a connection into your production environments,
it's not probably, but it's necessary that Cube
Eval and Kubeconform are the winner if this is your use case.
But which one is better, Kubeval or Kubeconform?
So we talked about the version support use case.
The thing is that Cube Eval, the last commit was in 2020.
So that's saying that the latest version
of Kubernetes version schema that it's supporting is 118
one, which is pretty shame because
Kubeconform is supporting up to version 122
four, which is the latest right now. So you remember the use case
that I described, that if you want to actually check your
manifest before you upgrade your cluster. So Kubeconfol
and Kube eval, you can pass this parameter and tell
it locally to tell you if there are any API that are deprecated
in the new version that you want to upgrade to. It's really really nice
CRD support. Kubeval is not supporting
crds. Actually this is
exactly the reason. One, the maintainer of Kubeconform, it's a guy
named can, he's a really cool guy,
and this is the reason why he started this project, it's because it
was missing this CRD support. And today crds
are more and more popular. So it's pretty shame that Cube
eval is not supporting that. Let's look at
the community. So Cubaval have a really big community,
more than 2000 styles, more than 200 folks.
Again, the last commit was on April 26,
which indicate that this project is not
maintained that much with Kubeconform
it only have 340 styles,
which is a lot actually. But the good news
that is still maintained by the container and by the
community. Last commit was eight days ago. So in
my opinion, yes, maybe Kubeval is more popular and more
people are familiar with Kubeval, but I do think that Kubeconform
still have a strong community to consider.
So who is the winner in this case? Well you probably
guessed it already, it's obviously kubeconfirm.
Okay, so now that we understand the different tools that
we have, let's think about different strategies to validate the Kubernetes schema.
The first thing that you want to consider is shifting left.
Like we said, you want to make sure that you can test it as soon
as possible. So sometimes even the CI
is too late in the flow. Try to think how you can implement those
checks inside the developers
environments. For example, it's a pre commit or GitHub or something like that.
Don't forget to fill the gaps. What do I mean by that?
So remember the extra validation that we talked about that
are happening on the server side. Those validations also
need to be checked when you are using it,
the open source tools, because you don't want to find
that you have misconfigurations when you want to apply it to your
cluster. There's alternative
that you can think to use, which is
running a fake cluster locally
or in your CI machines with kind or minikube or something
like that. And then you can use the Kubectl
Virans server mode because it will talk with the local cluster
and this way you can still use this flag and you
don't need to have a connection to your real environment. The downside
of doing it like that is that you actually need
to make sure that your fake cluster is synced
with your real environment and it's actually replicating the same
conditions. What do I mean? So it need to have the same namespaces,
the same number of clusters and so on. Or what will happen
is that when you try to apply the manifest it will get
rejected by mini cube all kind.
Also something to consider is by visas build. So there are a
lot of open source tools to implement to overcome
those challenges, but there's a lot of really good out
of the shell solutions that you can just buy. Yes, the tree
is also one of them, and in the tree
we also perform the schema validation, but we're also helping you fill the
gap with custom rules. So it's something that you can consider.
There's no dark magic in what we are doing. You can do the same with
different open source tools that you glue together, but think about
if it's worth your time and effort to do that or just buy something out
of the shelf.
That's it. I hope you enjoyed the presentation. And now
for me to reveal why I actually have those pink bunniers is because
this is the name of the repository that inside this repository
you're going to find this presentation. If you want to see this presentation,
you want to see all the other resources that I talked about, you can find
it all inside and probably
if I will give it another name, you're going to forget it.
But with the pink bunniers I hope that you think like I remember
the bunny that told me that I need to do schema validation. What is the
name of the repository? What is name? Oh yeah, it's pink bunniers and
I already checked, this is the only repository all over
GitHub that have this name. So I hope no one else
is going to open it another repository in the same name.
So you can just google it and you should also go to
find this repository. And the last thing is that I also wrote a blog
post about the same topic and everything that I discussed
is also listed there so you can also check it out and we also
put a link inside the repository. Thank you very much and I hope
you enjoy it. Oh also by the way, if you have any questions you can
open an issue inside this repository. I'm a watcher and I
always reply if someone want to have more information about
scheme of validation.