Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello and welcome to the Kubernetes
Security Workshop with Msweeper. I'm your host, Jacob Beasley.
I'll be teaching you a lot about Kubernetes and Kubernetes security.
I've been using kubernetes since 2016, so that's a long time, kind of from the
beginning. I've got experience building and supporting software in just about every
tech stack you can think of, from Java net to rails to python
to PHP. I'm a certified Kubernetes security specialist and
administrator and a lead a team of people lead
a team of site reliability engineers that has deployed many like hundreds
of applications to Kubernetes. I also lead a
team who this same team supports the open source project Msweeper,
which you're going to be learning a lot about. It's essentially a Kubernetes security
platform that takes all of the tools the Cloud native Foundation recommends,
all the things the Linux foundation recommends, and gives you a user interface
to manage all of those tools. All right, so before
we get too deep into this, I want to talk about the four c's of
cloud or container security.
When you think about security in kubernetes, we think of it as
a layered security model, starting with the cloud. The cloud is really the physical infrastructure
and the way in which you manage the physical infrastructure. So how do you go
from your bare metal to your virtual machines, which Kubernetes
ultimately runs inside of? And that's really important because you have to think about things
like network security and physical security. That's largely out of
scope of this presentation because it isn't really anything specific to
Kubernetes. The next layer is the Kubernetes cluster itself. So we'll
be looking a lot about how do you secure that cluster. Then we'll look
at the container. The container in Kubernetes is the unit of
work. How do you make sure that the container isn't running with too many privileges
and that you don't have applications that can escape or do bad things in that
container? And then finally we have the code, and the code is the code
that actually runs your application and
we have different ways of validating that code doesn't have any obvious
vulnerabilities. We're going to be doing a
number of labs in the process of this presentation. So you're going
to see demonstrations of Kubebench, Kubernetes, OPA,
Gatekeeper, Cubesec, trivia and Project Falco.
By the time we finish this presentation, these aren't going to be foreign concepts
for you. They're going to be very real things that you'll be able to use.
Now if you want to follow along with the lab, you can click on the
View lab guide in the PowerPoint. You can also
go to killercoda.com
slash mindsweeper. Killercoda is a great
resource for you to go spin up a Kubernetes cluster rapidly and play it trout
different things in Kubernetes and we have a Msweeper
lab that you can click on here and it will walk you through setting up
a Msweeper cluster inside of killer coda and using every one of these
tools that we'll be talking about today. It's a great way to introduce minesuper
and try it out. All right, let's get started talking about
the cloud. So what is Kubernetes exactly?
Kubernetes is a container orchestration engine. What does
that mean? With Kubernetes you can have many nodes which are where
your applications run typically, and you can describe
to Kubernetes what you want to deploy and then it will make
it. So you typically do that by talking to its API using
various command line tools and saying, here's what I want to deploy.
And then it will plan out using various
controllers like schedulers and things that are part of this API or connected
to the API technically, where those things should go, and then it will
make it. So on each of your node you have Kubelet and Kubeproxy.
And Kubernetes talks to the API and says, what should I
deploy? And then it deploys it. And then the Kubernetes proxy
does all the networking for you. This is important because
when you talk about Kubernetes security, you'll see I've circled the API in red.
The majority of our efforts revolve around making sure that
access to that API is limited and that users can't deploy
bad things with that API. Here are some general
best practices that aren't related to the configuration of
the cluster itself, but they're things that are really important,
right? So one of them is don't expose your API on the Internet. So typically
people will put it behind a VPN or at minimum have some kind
of IP address whitelist that just significantly reduces your footprint.
If Kubernetes ever had a major zero day vulnerability,
if you were behind a VPN, it's much harder for someone to find your
cluster, connect to it and then exploit it.
So far there haven't really been those, but if it ever did happen,
you want to be behind a VPN. Number two, don't make everybody
an administrator. So by default, if you use a
lot of the different cloud managed Kubernetes environments,
it's very easy to say grant someone administrative access to that
environment, but you really want to limit that to only people who
really need to have administrative access. It's very helpful.
Most of the clouds have some kind of managed Kubernetes service with can active
directory or some other kind of identity provider integrated where you
can say this, say active directory group is granted access to this particular
namespace or this particular role in Kubernetes. So keep
that in mind. And the third one, I'll go back a little bit,
I kind of brushed over this, but you'll see the API connects to this
thing called ETCD. ETCD is a form of database
and it's where all the data is stored. And that's
important because you need to control access to
that eTCD data store the same way you would anything else.
So it's a file on disk, and if somebody had
access to connect to that VM and modify the
contents of that decrypt and modify etCd,
then theoretically they could cause a lot of problems. So a good
example of that would be if they were to delete things,
they might be able to break your cluster or they might be able to create
new things in Kubernetes, effectively bypassing the API.
So that's something important that I want to point out. All right,
moving on, cluster. So the first thing I want to talk
about is role based access control. Role based access control
simply means preventing people from having too much
access to the cluster. I think we've all had that case of
access denied errors when using Kubernetes, but you really
should set up roles in Kubernetes. I want to talk about a
few vocabulary terms. First of all, cluster roles and
roles. The cluster roles are
for all namespaces, the whole cluster, whereas roles are for
specific namespaces. Namespaces is how we segregate different
teams within your company or different workloads. Once you have a role,
you then bind it to something. So you might say bind it
to a group, to a user, et cetera. So there's cluster role bindings
and role bindings. Users are not actually a thing that
exists in Kubernetes directly. Rather,
Kubernetes implicitly has integrations
for external user stores, identity providers.
So you typically bind to a group, you can even bind to a user,
but you still have to have an identity provider hooked
up to Kubernetes to allow that person to authenticate.
Finally, service accounts allow applications to communicate with Kubernetes API.
So we have the ability to create service accounts in Kubernetes and then
you can generate certificates for those and somebody can use that certificate
to connect to kubernetes. Here's an example role.
So here we have kind role. We have some information about what namespace
it is. Remember roles are namespace scope,
whereas cluster roles are global. And you can see what APIs
they can connect to. So if you want to know
which APIs are available, look it up in the docs, it's not too bad.
But here in this example you can see they can get watch and list
pods in the pod reader or in the
default namespace. That's an example role, but before you
can use it you have to bind some user to it. So here's an example.
A user or group, right? So role binding
subject, Jane, right? So a user named Jane,
it could also be a group and then the reference is
a reference to the role that it's going to be related to. So the pod
reader role that we created is bound to Jane.
So now Jane is a pod reader. If you're wondering how to
bind to things like active directory groups,
check the documentation for your cloud providers. They generally have great examples and you pretty
much just copy paste from the examples and set up your roles as needed.
Next we're going to look at a couple of tools, Kubebench and Kubernetes.
And I do want to back up and say, well why can't I just give
you a checklist of things that you should do and you should just follow
it? Why do I need tools to help me secure kubernetes?
The answer is kubernetes is pretty sophisticated and
for every other major technology, whether it's Linux or Windows,
there are an established set of benchmarks that you should follow. And most
companies that are successful securing Windows or Linux
or Kubernetes will simply use the right tools to make sure
they're following best practices. So we're going to be looking at Cubebench,
which runs the center for Internet Security's
benchmark suite.
CIs has its own benchmark suite, but it's a paid product,
whereas Qbench is open source and made by Aquasec, a great
company. So Qbench is what we recommend,
it's the one that, they're members of the Linux foundation,
it's a CNCF project, it's very well respected.
So Kubebench will connect to
your cluster and run a battery of tests and give you feedback
on how secure your cluster is. Cubehunter will do a pen test.
It can even try to exploit things, although by default you do
it in a passive mode. So let's demo these. Now you can
run all these tools from the command line, but the way that we like to
do it is we have msweeper installed in our cluster and
when minesweeper is running I can see my list of clusters and
I can click on Cube bench on the left navigation
and I can see every time that I've run it.
If you want to set up Kubebench in your cluster, you can click this run
audit button and it'll help you figure out how
to install it. We've created helm charts which are open
source and you can say things like run it every day, run it every week,
or you can just run it one time and be done so it's all automated.
Here, take this, put it in a pipeline, hit go and you've got
yourself a benchmark running. We typically recommend
running a benchmark every day to make sure that configuration changes
don't drift. Or if you do a cluster upgrade, you don't suddenly
have things configured in an insecure way. Here you
can see we ran a worker node test and
you can see it ran and it checked a number of different things. So is
the worker node configured correctly? Is Kubelet configured correctly?
And then I can click down and see details about the test
and what happened. I'm just going to find one where I got a
warning. Here it is, 3211. I can see
ensure that rotate Kubelet server certificate is set to true.
It's not set to true and it gives me advice on how
to remediate it. Now when I'm using a managed service, I can't always change all
of these. I could dig into it. In this case I'm using Azure Kubernetes
service. So I might not be able to fix everything. But you can see this
is pretty good. So that's Kubebench.
Next I want to look at Kubernetes. This is a very similar tool, but it
does a penetration test, meaning that it actually boots up an application
inside of Kubernetes. And it says, if I'm an app running in Kubernetes,
do I have access to do bad things? So you can see it
ran and it has some advice for us.
It says that by default it's injecting a
service account into the container that it's running in. So that's not
necessarily good. A service account is how your pod can communicate
with the Kubernetes API, and a lot of people recommend turning that off by
default so that it only injects it. If a pod actually needs to
connect to Kubernetes, that is a smart move.
Also, it says here that there's an API. This is a big one, and I'll
explain in a minute why that allows the application to
learn a lot of information about its environment. In this case, it can find out
what version of Kubernetes is being run. And that's
concerning, because if there was again, a vulnerability in Kubernetes,
I could leverage that API to figure out which say,
metasploit exploit that I would run. But if I
didn't have that exposed, it would be much more difficult. So it kind of gives
you some advice. You can click on this cubehunter link.
I could get more details about what I need to do to fix that.
So Kubebench and Kubernetes, great tools from
a very reputable company. Open source projects
very, a great tool. Let me pop back to our
PowerPoint. All right, next,
let's talk about securing the container. Before we dive into that,
let's explain the difference between a virtual machine and a
container. This is really important. So virtualization allows an
operating system to virtualize another operating system inside
of it. Use this thing called a hypervisor to keep track of
to do cpu scheduling between the different operating systems, to keep track of
things like storage and memory, things like
ram. The hypervisor keeps
the vms from stepping on each other's toes and accessing each other's data.
And that's great. But a VM has the overhead of
a whole nother operating system typically takes a while to boot up. It's kind
of heavy. Enter containers so containers
share allow different apps, different container to
share the same operating system kernel. So instead of having to
have a hypervisor detecting things,
you can have a container. The Linux kernel supports
several features that enable containers to work from the perspective
of your application. Your app thinks that it's running as if
it's its own virtual machine for the most part. But we use a few features
of the Linux kernel to trick the app or to isolate the apps so
that they are mostly separate. C groups limit the cpu
and memory of each container. So when you create a container, you can share
cpu and memory, but make sure that one container doesn't use all of the cpu
time. CH Root means that you can change
the root directory. So when a container boots up, it unzips your
container image and all of its software into a
folder in the host operating system. And then it switches
into that folder and runs your entry point as the appropriate
C group. So now it's got its own root folder.
It's running with limited cpu and memory. And then finally for things like
users processes, network volume mounts, we use this
thing called namespacing so that your app in
its container cannot see other applications processes,
can't see other volume mounts, can't see other applications users.
Now I will say some of the things like namespacing for users
is not implemented in many of the container runtimes.
So we'll talk later when we get to cubesec about this. But there
are some key best practices that you need to do to prevent
applications from potentially breaking out of their container.
It's not foolproof, but it doesn't have to be that hard. We've got great
tools and there are many great open source tools that will help you do
this really well. So we have degrees of isolation.
On the one extreme we have like racking a physical server
for each client application. And then on the other hand, we have
running applications with no isolation.
Virtualization is considered less segregated
than hardware separation. There have been vulnerabilities in hypervisors
where people could break out of one vm and see the other. It's very rare.
Containerization is not quite as much isolation as virtualization.
Containers have quite a bit of isolation. But if you're not careful,
or if there's a Linux security vulnerability,
maybe you could break out. And then finally, if you have no isolation where apps
are just running as the same or different users in Linux they have
even less isolation. One application by default in Linux
can use all the cpu and memory. Unless you do C groups like containers
do. So containers are better isolation than if you had none.
Cotta containers I want to mention, right, if we want to make
containerization a bit more secure, if we're afraid of can application breaking
out of its container, some people will use kata containers or gvisor.
Cotta containers actually spin every container up in a very
lightweight virtual machine. So it'll spin up a vm
with enough on it to be able to
boot up a container inside of it.
So that's Kata containers. And then Gvisor will filter
and validate that applications aren't doing things they're not
allowed to do. So in case there's a vulnerability in the Linux
kernel, Gvisor will prevent an application from escalating privileges
or accessing things it shouldn't be allowed to access. Let's talk
about the parts of a container image. So I mentioned that we do things like
C groups, ch root and namespacing. I should say
the container runtime engine, whether that's docker or
container D, takes care of this for you.
And in a container you
create a container image. The image is actually a layered file system.
So technically what that means is it's a series of tar files that get unzipped
one after another. And if you've ever seen a docker file, which we'll
look at in a moment here, actually, I'll pull one
up. A docker file has a series of
steps, and each step, in each step it figures out what files changed
and then it tars them up.
So then when you're downloading a docker container, you'll see it saying
downloading, downloading, downloading a whole bunch of files. Each of those files was
a step in the original docker file. Inside of a
container image, we have a command, which is the command
that is used by default to boot up whatever application you're containerizing.
You have a working directory, just like when you are running a command
shell, you've got a working directory you're sitting in. The present working directory is
the default directory where you're going to be running that command
that it runs by default. When the container boots up,
you could have a list of default environment variables. So you might
default in a rails application, you might default rails environment to production,
or you might default a path to include certain executable files.
And finally, what group and user does it typically run with?
The thing about it here is that almost every part of this can be overridden
at runtime. And you'll see when we get to security contexts and things,
there's ways of doing that. And I did want to show you a docker.
See this docker? We look at an example of a
docker file. You'll see, look at a
good example here. Here's a good example you'll see
here in this docker file. We start out saying we're from
Ubuntu 22, so it's going to download that base image and unzip it effectively.
Then it's going to copy in a file from my local directory
inside into my container. Then it's going to run the make command,
and then when the app boots up, it's actually going to run pythonappapp
py. So here you're configuring your container, and there's
many other commands to configure container. But these are the basics.
Copy them. Some files say what user you're going to run as. That's the
basics. All right, I want to talk about
container breakout. What happens if we don't secure a container?
So the worst case scenario is you don't secure a container
and then the application can break out. The easiest way to do it is if
a container is running as root, you actually have the capability
of mounting volumes into your container. So running as root,
you could potentially mount a volume that's actually the host root
volume. So even though you're switched into your
container folder, you could mount a volume from outside
of your container, effectively letting you see everything else running in the operating
system. So that's container breakout,
preventing container breakout. All right, so now we get to the fun part.
So whenever you are deployments an app to
kubernetes, we need to set a security context. So a pod
is the unit of working kubernetes. So most of our examples are just going to
be with pods, although oftentimes when you deploy, you'll use a deployments,
a stateful set, a daemon set, some kind of container that will deploy multiple pods,
whether that's n instances of pods or one for every
node. But in this example here, we've got an individual pod,
and you'll see the pod is set up to run as a particular user and
group. You can configure some things like run as user
and group, both on the security context layer globally for the pod or
on the individual container, and then file system.
Group says basically if you have any volume mounts, who's going
to be the owner of those volume mounts? Typically if you want to be able
to read and write from those, you'll set the owner to the same as the
group. This example is a bit different, but just bear with me.
And then for the container here you can see allow privilege escalation,
false, run as non root, true privileged false. These three are
kind of best practice to set these three. Some people
also will do a read only file system to prevent one container from
using up the whole disk in Linux. But you
probably don't need that. So you can see run as user group and file system
group security context. Very straightforward.
If you wanted to enforce or at least make sure people are following
best practices, there's a great tool. We have it built in here, open source
cubesec. You can choose any pod that's running and I'll find one here
that I know is going to be probably in trouble.
Go Project Falco, which we'll explain later. And here
it looked at my project Falco pod and it said, is this pod following best
practices? And my answer is no. But that's actually
because Project Falco legitimately needs elevated privileges. But you
can see okay, service account name is green good. Limited cpu
good. Limited memory good. Requested cpu and memory,
good. Okay, we're good, but then come down here to critical.
Uh oh, we're running as privileged, basically running as root.
We have access to the docker socket. We can connect directly to
docker. It's not good. And then it's got a number of other advice here
saying let's use app armor seccomp to limit
which Linux capabilities we have and which Linux
calls we can make. It's saying run as non root should be true. So a
lot of, lot of good advice here. So we got a negative 39 on
critical, seven for passed. It adds it up and says,
well, we get negative 32 points, we stink.
That's actually okay though, because project Falco legitimately needs these privileges.
But I recommend using this for most of your applications and
getting teams to look at it. And even better, we'll talk about later
tools like gatekeeper that allow us to enforce policies.
So you could just prevent application teams from ever deploying anything
that looks like this. But we'll come back to that in a minute.
All right, preventing container breakout. So do
not allow applications to run as root or escalate privileges.
Good. Two, don't mount in any host.
Be very careful with host volumes. So Kubernetes allows you to mount
in volumes. One of them is called a host path. But if you
could connect to Kubernetes, if you could mount a host path, why not just
mount slash as the host path? Effectively, then you've
just become the root computer. You've just overridden everything,
right? You now have access to the root file system you've broken out of
your container. Three, use tools like OPA,
gatekeeper, pod security policies, and pod security standards. We'll cover more later
to prevent someone from deploying pods with host volumes or
with elevated privileges. So basically don't do these
things and then use some kind of policy standards to just prevent someone from
doing these things. And then finally limit service account privileges.
So the service account is potentially an
account that's injected into your pod in order to
allow your pod to talk to Kubernetes. Maybe your app wants to connect directly to
Kubernetes for some reason. Maybe it needs to legitimately spin up other
containers. For example, if you're using Apache airflow, other tools.
Apache Airflow will want to kick off jobs in kubernetes great.
But definitely look at those service account privileges and
only give people access to the things they need. Keeping in mind that if
that service account can create pods,
it can effectively, and you have no policies
that service account can effectively create a pod with elevated
privileges and break out of the container. So unless you're doing some kind of policy
management, be very careful with your use of service accounts.
Limiting Linux kernel calls it's another fun one,
so let's back up a little bit. A lot of us have used
Linux. Not a lot of us understand exactly what a kernel is
and how the Linux kernel works. So whenever your application wants to
do something other than access cpu and memory,
maybe it wants to open a file, maybe it wants to connect to something on
the Internet. It has to make a system call to the Linux kernel.
An example of a system call might be something like getting the current time or
setting the current time. But some system calls are kind of
dangerous. So maybe you want to do a system call to change your
user account. That might be dangerous. You might want to get a system
call to change the time of the computer. That might be dangerous because it
might mess up other people or mess up other applications running on that computer.
So we want to limit our apps from doing certain
things. You can, within the security context, specify a
list of which things are dropped or added.
Also, some people will use seccomp and app armor to create pre made profiles.
App armor can even listen, follow your app for a while, build a profile
based upon what it's using, and then you can apply that profile.
So I'm not going to demonstrate using setcomp and app armor, but I will
demonstrate this. So inside of the security context, you can explicitly add
or remove capabilities. So here I'm adding the sys time, which would
allow me to change the system time. As an example.
I'm not going to give you the big list of all the different options.
There's a lot out there. Usually the defaults
are good enough in many cases, but it is worth considering this.
If an app ever needs to add capabilities, be careful, dig into what those capabilities
are. Net admin is a great example of one that could
be a bit dangerous, right? It could allow the app to look
at other things that are running around or to reconfigure networking to
potentially break things. Sometimes you have legitimate
use cases for this, so just think critically about it.
All right, Cubesec, we demonstrated Cubesec earlier,
so Cubesec will analyze the manifest of a pod and give you advice.
We've built a UI around Cubesec. I know, I demonstrated it briefly
earlier. I can go to Cubesec here.
I can choose a pod or upload and
I can even pick. So we'll pick,
let's do cube system, like pod.
I'm going to click all of them. I can even pick a whole bunch of
them and it'll run it for all of them at once. Take it a minute.
Hopefully it works. You never know with live demos. There it goes. And now
it's given me a breakdown of every single pod in this namespace and how they're
doing. Now it is cube system. So I
can see the azure disk, some of this you'd expect, right? So azure
disk in cube system, the CSI driver is going to of course
need elevated privileges, right? Cube proxy,
same thing. It's got to have network administrator access to set up
the proxies. So it makes sense. But it is interesting to look
at. So great tool, it's a great tool. We've made
it even better with can easy to use Ui so that it's easy for your
team to use, even if they're fairly new to Kubernetes.
All right, next, pod security admissions.
So for those of us who've been around a while,
Kubernetes used to have this thing called pod security policies, which gave you a lot
of granular control over what things could do. But Kubernetes recognized,
or the creators of Kubernetes recognized that that was really too complicated
for most people. It's enough simply to have a few general
presets and if it's not enough, they can go use OPA gatekeeper
or something to create their own policies. So with pod security admissions,
we have three pre made standards for Kubernetes
security, privileged baseline and restricted.
What you can do is enable in a namespace a standard, and then if any
app tries to deploy that doesn't meet one of these pre made standards, it will
block it. Anything running in, say, Kube system is
probably going to need privilege access like we talked about earlier. So that's what privilege
is for. Unrestricted deploy anything you want. Definitely lock down
those namespaces that allow privileged deployments.
Baseline, this is pretty good. It prevents things like basically
the things that would allow container breakout. You need to dig through the
list because I think you could probably still do host paths, which are potentially
dangerous, but it would prevent like running as root or privilege escalation
at a very minimum. Start with baseline and then finally restricted,
which requires you to set quite a bit. You have to
configure a bunch of stuff in security context on every pod.
It's kind of a pain. We've actually created an open source project
which I'll show you in a minute, that makes deploying all of your apps with
restricted pod security standards fairly easy.
When you configure a namespace, you have to add this label,
podsecurity Kubernetes IO enforce restricted.
And this would allow you to enforce a particular level of pod security admission
standards. So you could do privileged baseline or restricted. If you
do nothing, it's effectively privileged. Right. But if you configure baseline or restricted,
it's going to start locking down that namespace, which is really nice.
I want to show you. We've created a project called
K eight EZ. It's a helm chart for deploying apps to Kubernetes.
It allows you to deploy an app just by specifying the name. Or you can
create a full values file to configure everything you could think of. The big thing
is that by default the security context is fairly locked
down. So by default it's running as non root, it's dropping all
Linux capabilities, privileged, false. Run as non
root, true. Allow, privilege, escalation, false. It's doing all those things by default
that you want to do. So definitely try it out. We use it for a
lot of our client implementations and it's been very successful.
We've also had clients where they have a whole bunch of apps in kubernetes and
they're able to use a single helm chart to deploy all of them. So then
if they're enforcing tens or hundreds of apps to Kubernetes, or if we're
onboarding them, they can just create a values file for each app.
They don't have to create a custom helm chart for every app. And they can
trust that by default things are pretty locked down.
You couple this with the pod security
standards and you do a good job locking down kubernetes.
All right, let's keep moving. Network policies.
So Kubernetes network policies allow you to limit
what pods can connect to what pods. This is really nice.
In a lot of architectures you'll have a front end and a back
end, and the front end and back end might both run on
your servers, right? So maybe you have something that receives requests
from the Internet and then it connects to APIs. And I see a lot of
times where some of those back end APIs that have no
external ingress also have no authentication. I wish they did.
There's a lot of applications where people are not doing any authentication or authorization
on internal backend APIs. So with
network policies in Kubernetes we can lock down what applications are even
allowed to connect to them. So that makes it much harder for a
hacker to exploit one of these APIs, because let's just say hypothetically,
you have hundreds of APIs, but maybe half of them are
internal only and maybe don't have a lot of authentication or authorization on
them. You can use network policies to just prevent any app from connecting
to those APIs. So you can literally limit it to things in
the same namespace or just named
connections. A network policy looks a bit like this,
you would say what pods it applies to. So you'll use a pod selector
to match pods with certain labels or certain namespaces,
and then you can say whether it's in ingress or egress policy.
So this one here for example, would deny all ingress. So it says
it's can ingress, but it also doesn't specify what can connect to
it. So the default is to deny everything. You can
do the opposite, where you say ingress and you
have an array meaning match everything, which allows all ingress.
Same thing for Egress. I'll let you dig through the docs on all
the different options, but the big thing is that you can say this namespace
or these pods can connect to these pods. Very helpful.
Not all container network interfaces support it.
So istio does, but some of them don't.
So you do have to think critically when you set up your cluster, do I
want this or not? And if I do, which network interface
do I want to use? And that's something when you set up your cluster.
A lot of people are using cloud managed clusters, and by default they come with
something like this. Definitely something to check into though. There are
shortcomings though. So one of the big ones is that, and I
don't even know if I've got it on the PowerPoint here. One of the problems
is that it doesn't allow you to connect to control
access to external hostnames. So a lot of people we see using
Kubernetes service entries to control access to particular hostnames.
Most of the rules are namespace wide, but you can do pod
selectors like label selectors. Not all conceivable rules can be
set up. So again you can say external host names is a
big limit. So if you wanted to say I'm going to allow them to connect
to an external database. But that database doesn't have an IP,
it's got a hostname. Kubernetes by default doesn't support it.
But again, people use istio with Egress gateways
to do that, or some people are using cilium for that now. So there are
other methods. And yeah, pod security standards are far less granular
and may not be appropriate for all workloads. Still, it's very powerful
and it can be extended with other third party applications as
well. All right, OpA and gatekeeper. So OPA
is a language really, it's a
language for describing policies. And Gatekeeper is a plugin
for Kubernetes. It creates a CNI,
basically it extends Kubernetes to allow you to describe policies
inside of Kubernetes and then it will assess and enforce those policies.
So you write these scripts, and usually you don't write the scripts, you use
one of the open source standards and you just configure it. And then these scripts
validate whenever something tries to deploy that it meets those standards
and it does more than deployments. You can do it for even namespaces,
anything. I'll give you a couple examples.
So a good example would be maybe you want to make sure that for
cost accounting reasons, you want to make sure that every application
has a cost center attached to it. So if I open up
know gatekeeper in the UI,
the first thing I do is create a constraint template. So by default there
are numerous different constraint templates that are included
in gatekeeper. It's an open source project and they have a number of premade
constraint templates. Constraint templates are different.
Think of it as different kinds of operations you want to do. So you might
want to say, I want to require
containers to have a limit on cpu and memory. So I can say okay,
container limit. And you'll see here we have
different constraint templates available and they have this rego
code in it. But advising rego code, it's really complicated.
I can let you go dig into that on your own time,
but we don't see a lot of people actually doing that. Usually what they do
is they use one of the official ones like constraint
limits or pod security policies,
like running as non root.
Let me find one. I like the label one. Okay, required labels.
All right, so we're going to do required labels. We're going to actually require
that people tag every pod with a cost center
so that we can do our fin offs that we're all being told we have
to do. Right? So I've created the required labels constraint
template. Then I have to come in here and click add more and
I have to pick, I have to configure each
constraint. So the constraint template is the rego code,
the constraint is parameters for the regal code.
So I'm going to say I want to enforce it on all pods. I'm going
to call it cost center constraint. I'll just call it
cost center cost center required
okay, kubernetes required labeled labels description
require cost center to deploy pods excluded,
namespace included. We're going to exclude
gatekeeper system and cube system and
Msweeper system. Okay, cert manager. So I'm going to
exclude my built in stuff. I'm going
to say allowed Reg X and key.
So we're going to require cost center
and it's going to be star meaning anything. It's got to
be there, but it can be anything. I'm not going to be picky about the
format of it, save changes,
and now I've just created it. Let me just take a look real quick.
Good. There's different modes, audit and
enforce. Audit means we just simply report on it,
whereas enforce actually looks at it. It can take it a minute to
actually run and give me back my violations. I've actually got another one that's
already been set up called container limits. And you can
see here, once it's had time to compile and run the rego code,
I can click on this violations here and it'll list off for me every pod
that's currently violating. So very useful.
Here we go. It's beginning to compile all the pods that
are breaking the rules. So my ingress controller doesn't have a cost center,
so I don't know who to charge for it.
So pretty useful. And you can see here, we did it all through a UI,
so it was super easy. We also enforced creating exceptions
so you could give a team an exception, but for only a specific period of
time. Pretty powerful stuff. All right, so that's
gatekeeper. Next I want to talk about
code scanning. So whenever you create a container
image, a container image contains both your operating
system and all of the operating system utilities you require. Maybe you need
go script to create pdfs, for example. Right? And you need Java to run
your Java code, and then it's got your code. So maybe it's your Java
jar files, your ruby code, your php code, your node code,
whatever. So this packaged up container image is
actually something we can scan, kind of like a VM image and trivia
or sneak are the most common two that I see. We integrate with trivia
sneak is coming. Let me show you a little bit
about that. So you can run it yourself from the command
line or in a CI 3D pipeline locally, and you'll get an output like this,
which is very useful. You can even block someone from deploying
something that has certain levels of vulnerabilities or even things that are
not fixable. But a lot of times we find people have to create
exceptions. So we created an interface here where it
will scan everything that's currently running in your cluster.
You can browse around say by namespace,
you can see what's running there and what container image it's running,
and then you can expand and see a scan of that container image
and whether or not it meets your standards. You can even
block things from booting up that don't meet your standards. And you can create exceptions
for teams that need those. So here you can see I
did can open policy gatekeeper. Apparently I'm not running the oldest,
and I can see here it's running an older version that
has a vulnerability out of bounds memory access. That's pretty
bad. And it's fixable in version 00:40
so I should probably upgrade. I can click
request exception here. So if I was getting blocked for that reason, I could click
a request exception and request the security team give me an exception.
We have an entire exception flow here where you can request an exception and
emails the admins. They can review and give you a thumbs up or thumbs down.
They can give you an exception for a specific period of time,
that sort of thing. And then if I scroll down I can
see they click details, details about that
particular CVE known vulnerability
and exposure. So here I can see details. If I was to
open this up new tab, I can see more
details about that CVE. So CVE 2002 228946
high. You'll notice there's different scoring methodologies, but it's bad,
should probably upgrade. And there's a lot more in here
too. That's a little bit about CVE scanning.
I want to show you the exception management interface. It's pretty neat,
so you can see all the exceptions and then for any exception I
can configure it. So kind of like what you would expect. Super useful.
All right, next I want to talk about Project Falco.
I want to back up a little bit. So if you remember we talked earlier
about Linux kernel calls. So every time your app wants to do something other
than cpu and memory, it has to do a kernel call to do that
action or to perform it. And so what if something was
doing something that it's allowed to do, but that thing seems suspicious?
Or what if it tries to do something that it's not allowed to do,
but trying to do it is itself suspicious?
So if an application tries to change its user
account, that's suspicious. If an application tries to
mount a volume when it's not supposed to,
that's suspicious, right. Project Falco can
monitor and alert you whenever that's happening.
Well, should say alert. It doesn't do alerts. It'll just do an API call or
a log. It can also monitor the Kubernetes API logs. It can
really do anything. It's kind of like OPA open policy agent. It's a generic rules
engine, but we've integrated with it and a lot of people use it for monitoring
Kubernetes API logs or just Linux
kernel calls with an EBPF filter so it can monitor
for suspicious behavior.
We make it very easy for people to set up and use.
Falco, let me go back here.
There we go, Falco. So in my cluster I can
set up all my filters and I can see.
Okay, so here I can see all the recent events. Now this is a test
cluster where we've intentionally configured it so that we get lots of events.
So here I can click on this one and see what happened. Okay. Cube Prometheus
deck. It got a priority error level. Okay, that's high ish.
And the message, full message. If I expand it,
file. Okay, it attempt to open a file for writing.
And that's a so, all right, so I shouldn't do that.
And then I can see here all of the different other times that it occurred.
And if I click more, it actually will expand and it takes a
minute. But it's going to give me a graph of the historical incident rate.
So it's happening regularly. So it's probably part of some kind of regularly scheduled
process. We calculate
a signature here by combining several pieces of metadata and
then Shaw hashing it so that I can find all of the other
cases where this same kind of thing occurred with Project Falco.
So you can search and see all the other incidences. And then if I go
down to raw data here, I can actually see the
full details in JSON or Yaml or in a table
that project Falco logged out. So very nifty.
So we've built this. You can go into your Falco
settings globally and you can create rules.
We've found that by default, Project Falco is kind of chatty. So we've got
a rules engine here where you can go in and say ignore certain things in
certain environments. From a realistic standpoint, you're probably going to have to do some
tuning. We also do anomaly detection where it
will automatically alert you whenever it finds something new. So if
I go to settings in the corner here, I can say notify about anomalies,
notify no more than once every say once a week.
And I only want to be alerted to alert, emergency,
critical and maybe nothing below that and I want
to send to myself. Right. So pretty
powerful. This allows you to configure
alerts so that you're notified whenever something
suspicious happens. And because we're doing that signature where
we combine different metadata, we're able to alert whenever something new
has occurred. So it's really powerful.
All right, summary so we talked
about the four c's of cloud security, cloud cluster,
container and code. We talked about different tools you
can use such as vpns and firewalls to limit access to the Kubernetes API.
Kubenshube Hunter role based access control open gatekeeper
Cubesec trivium project Falco we wrapped it all up in a really neat
Msweeper demo. If you have questions,
if you go to we have.
You can use the contact form to reach out to me and you can click
on docs at the top and we have great documentation on how to get started,
so I definitely recommend starting there. We have an easy install
guide on the left here. This getting started guide will actually get you up and
running fairly quickly. It can be as easy as one line
or you can create as easy as a one liner to try it out.
As I mentioned earlier, we also have killer
Coda, so that's also another great way to try it out.
You can spin up a cluster and install all the tools in 2030
minutes and it'll go away when it's done. So super easy.
Also, if you have any issues, if you go to GitHub.com
msweeper msweeper our GitHub repository
is where you'll see all the activity happening as well as
who has contributed. And you can always file an issue
to give us feedback about feature requests.
Or if there are gaps that you're finding or bugs that you're finding,
definitely post them there. We'd love to hear from you. All right,
thanks so much for the time. I hope you enjoyed it. I hope you learned
a lot about Kubernetes security and I hope to talk to you
on GitHub.