Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to this presentation on the best audit
logging practices when using Kubernetes. My name is Kenneth Dumez,
a developer relations engineer here at Teleport. So for a little
background on my history, I came to teleport around a year ago
after working at Pivotal Cloud Foundry and then was at VMware Tanzu for
a few years working on their Kubernetes build service solution. Thank you
so much for coming and I hope you can learn a little bit about Kubernetes
best practices as it's such a rabbit hole and gets confusing very quickly.
There's a bunch of other awesome talks today as well. Conf 42 is a great
place for developers and leaders in various fields to come together
and share some of their knowledge. So just want to shout out Mark
Paulikowski for putting this together. It's always a pleasure to
be here. So today in this presentation we'll be discussing
the importance of good audit logging practices in Kubernetes and the best
practices to follow to ensure a secure and compliant environment
at scale. We'll talk about the native built in logging functionality
in Kate's and its limitations. We'll also look at some third party
opensource tools that can help make following all of these practices
a little easier, while making your administrators and security engineers
a little bit better. So the first thing we're going to talk about is the
audit logging capabilities you get out of the box when you deploy your Kubernetes cluster.
Kubernetes has a built in logging system that is used to record information
about events that occur in the cluster. This information can include things
like API requests, opensource changes, system events,
basically everything and anything that happens inside of your cluster.
Kubernetes stores this information as log files on the cluster
nodes, which can be accessed using various tools such as Kubectl,
or if you're using a hosted Kubernetes environment, there's usually a dashboard
or UI or something that you can use to access these logs.
The important thing to note here is that these logs are super granular
and highly configurable. Kubernetes clusters, especially larger
ones, can generate a lot of events and thus a lot
of log data. This can make it difficult to separate the
wheat from the chaff, so to speak, and maintaining a good signal to
noise ratio can be really tough. One of the most important things when setting
up your logging and to manage the spamminess of your cluster's log data
is this object called the Kubernetes audit policy. The Kubernetes Audit
policy configuration object is a native Kubernetes resource
which you provide to your API server that defines the rules and settings for
auditing events that occur within a Kubernetes cluster.
This audit policy configuration object is defined like
all the other Kate's resources in a YAML file that defines
the audit rules and settings. This is the first object you want
to configure when determining your logging strategy. The file contains
several fields that can be configured to customize the audit policy. We're going to
look at some of those fields in depth in a second. For one example,
you can say only log anything that's done to secrets or
just events concerning pods, or say everything that's
done to any of the core APIs, but none of the custom resource
definitions or extensions. As a good starting point, you can check
out the audit profile for the Google Container optimized os.
This is publicly available and you can then configure it from there
to whatever best suits your logging needs. Within the Kubernetes
audit policy object, the rules field is the most important.
This field defines the audit rules that dictate which events should be audited
and how they should be handled. Just as an example and so you can kind
of see what an audit policy configuration would look like, here's a little walkthrough
of the various audit rules fields. I would highly recommend not just copying
this one and plugging it into your own clusters because like I said,
this is just an example and you probably want to tailor it a little bit
better to your specific needs. So first we have this omit stages
field. This defines the audit stages to be skipped for your various
events, such as request received or response started. This is
crucial for cutting down on the parts of the events that you don't care about.
You don't need every stage and you shouldn't track it
all in your audit log. Then you have level which defines the level of
the event to be audited, such as request, response or metadata.
Next is your resources field which defines which kubernetes
API resource that are to be audited, such as pods,
deployments or services. Then you have verbs which
defines the kubernetes API verbs to be audited, such as create,
update, or delete. Then of course you have users which you use to tell
the audit service which kubernetes users or groups are to
be audited. And finally namespaces, which as it implies just
defines the Kubernetes namespaces you want to include in the audit collection.
As I said before, the audit policy object is very
flexible and configurable depending on your various needs.
When creating your Kubernetes audit policy configuration,
there's a lot to consider and it can be pretty intimidating at first.
In general though, here are some good best practices to follow.
First, clearly define the audit policy scope it's
important to define the scope of the audit policy configuration object
and identify the kubernetes, resources, verbs,
users and namespaces that need to be audited. This will help ensure
that the audit policy is focused on the areas that require
auditing and is not overly broad, which can result in a
bunch of spam that isn't really useful to anyone. It's hard
to parse, expensive to store and obfuscates actual
useful important log events. If you have millions and
millions and millions of log lines, it's going to be really hard to
actually access the good data, the useful data that you're wanting to keep track
of. Another good practice is to use meaningful audit rule names.
It's important to use meaningful names for audit rules to ensure that they are
easily understood and maintainable. Names should clearly describe the event
being audited, the resource, verb and other relevant attributes just
as we all know, maintaining legacy code can be challenging.
The same thing applies to audit configurations.
You want to do yourself a favor for the future and make sure
that you'll be able to parse what you wrote. Another important step is
regularly reviewing audit logs regularly reviewing
audit logs is an important step in maintaining the security and compliance of
the Kubernetes cluster. It's important to establish a process for reviewing
audit logs and to regularly review them to identify
any anomalies or security risks. SiEM tools, security information
and event management tools can help with this task. Another important step
is to use a dedicated storage solution. Storing audit
logs in a separate and dedicated storage solution can help opensource that they
are protected and available for analysis and review. It also
helps save space for the actual functioning of the cluster. It's important
to use a secure and reliable storage solution that can handle the volume of audit
logs generated by the Kubernetes cluster.
S three, for example, is a very popular place to store audit logs,
and from there you can pipe them to different solutions and have monitoring
and alerting tools in place. Similar to the above, it's really important
to aggregate your logs. This is especially important if you have multiple
clusters or if you have many nodes in a single cluster.
But aggregating all of your log data into a single location makes it much easier
to filter, ingest and manage that log data.
It helps with observability and compliance as well. It's easier to show an auditor
one central secure location rather than having to prove compliance for
dozens of different infrastructure resources. You're leveraging to help with logging
while the native Kubernetes API logging is powerful by itself,
all of the logs in the world are useless if you aren't actively monitoring
them. Audit logging is more than just a postmortem reactive
solution to help you figure out what happened after your cluster is already
compromised. If properly configured and monitored,
it can be used to prevent attacks as they happen, rather than just used
to look for something or someone to blame after the fact.
The simple truth is that, especially at scale, it's completely
impractical for a security team to constantly be looking at these logs themselves manually.
Luckily, there are a few great Opensource tools that can help. One of these
tools that I really like is Falco. Falco is
an Opensource cloud native runtime security project that can
be used to detect an alert on anomalous behavior in Kubernetes clusters.
It can be used to monitor an alert on Kubernetes audit logs,
and it supports a wide range of rules for detecting security threats and
policy violations. Falco can also be integrated with external systems
for alerting and incident response. Another great tool
is Openraven. Openraven can collect audit logs from
kubernetes clusters, including API server logs,
and logs from other Kubernetes components. A great
feature of Openraven is that it can centralize these logs from multiple
Kubernetes clusters, making it easier to manage and analyze them.
Openraven can also analyze these logs to identify
potential security threats and compliance issues. It includes
pre built compliance rules for various regulations such as PCI,
HIPAA, and GDPR, and it can be customized to
meet specific compliance requirements. Another important feature is
Openraven's real time alerting. This tool can send
alerts for potential security threats or compliance violations based on
the analysis of your audit logs. It can also integrate with external incident
response systems for automated incident response. One drawback,
however, is that it can be pretty difficult to configure, especially if
you have a multicluster setup, and managing that complexity can be costly.
Another good tool out there, though, is elastic. The elastic
stack is a suite of open source tools that can be used for log management
and analysis. It includes tools for collecting,
processing, and analyzing logs, including Kubernetes audit logs.
The elasticstack can be used to centralize these logs from Kubernetes
clusters and it includes features for searching and analyzing this log data.
Elasticstack can centralize these logs and allow for easier
management and analysis. It can provide real time analysis
of Kubernetes audit logs, allowing for faster detection and response to
security threats, threats and compliance issues. It also
comes with kibana, a powerful visualization tool that
can help in understanding the logs and identifying trends, patterns and
anomalies. It's pretty similar to Grafana, another honorable mention
in our Opensource tooling. While all of those other solutions are great
and a huge step up from just sifting through logs manually, none of them address
the big picture of Kubernetes audit logging and security.
This is in large part due to them missing the key component of
access. Configuring access to your Kubernetes cluster
managing who has access to what resources, when, how privilege escalation
is handled, and providing can of custody over all of your different resources
can be a huge hassle. Access is not divorced from audit
logging practices, however, as a key part of audit logging is
knowing exactly who or what, in the case
of machines and automated workers, is executing commands
on your Kubernetes cluster. Open source Teleport, which is
a secure access control platform for managing access across
your infrastructure, solves all of these problems while
also centralizing your audit logging not only just for your
Kubernetes resources, but for your SSH database,
Windows RDP and application access. Centralizing your audit logging
at scale for organizations requires you to go beyond just
your various Kubernetes clusters. For a truly secure infrastructure
setup, you need to implement all of the previous principles and best
practices across your organization that
tens all of your various infrastructure resources as soon as
you have siloing, whether it be at the cluster level or at
the cloud resource level. This creates much more overhead,
meaningless duplication and headache for both your security engineers
and cloud administrators. Teleport coupled with fluentd,
which handles all of the plumbing so to speak,
the formatting, exporting and consolidation of your logs is
the ultimate solution for Kubernetes audit logging.
With teleport, you can tie every event in Kubernetes to can
identity, meaning that you'll know exactly who did what on
any given resource. Even at the Kubernetes pod level,
each event is audited based on your configured audit strategy tied
to the entity's identity, mapped to a teleport user with
a teleport RBAC role. This makes it easier than ever
to configure secure access and thus ensuring secure use
and best practice enforcement across your entire cloud ecosystem
and this is not just for human engineers. Teleport machine id
ensures that every microservice, process or automated worker
node also has an identity in the form of short lived
X 509 certificates, eliminating long lived credentials,
access silos, and allowing for a full rich audit
log in real time. Teleport fully eliminates secrets,
replacing them with short lived certificates tied to a user's
identity. And again, this is for every piece of infrastructure,
not just your Kubernetes resources, centralizing everything,
allowing for easy monitoring and log management for not only your Kubernetes
cluster, but for every resource in your stack. Another powerful
feature of teleport is that it actually allows for session playback
of kubernetes sessions conducted over SSH,
meaning that if someone is accessing a node in your cluster, you'll be able
to prevent obfuscation of commands, allowing you to see exactly what is
happening on your cluster. Teleport acts as a gateway for all of
your resources, ensuring security and compliance across
your entire infrastructure. So let's take a look at exactly what I mean
when I say that it consolidates all of this access and audit logging
into one place. So here we are in the teleport
web UI. The first use case I'm going to show you is the
session recording when you're accessing your kubernetes clusters over Ssh.
So here we have our kubernetes cluster. It's called Cookie, which of
course, and here we have all of our servers.
Down here we can see this server called Kate's host, and this is
actually the server that's hosting our kubernetes cluster.
So if we log in, we can actually open an SSH session directly
from the web terminal. And all of this session data is tied directly to my
user and identity. So we can go
ahead and execute a couple of commands here. We can say Kubectl
git pods a we can see all these pods,
we can go ahead and describe this one here,
Kubectl describe,
describe pod colormatic.
We can see all of the pod's information. We can see the container
id, which container image it's using, and some health
about what the pod is doing. We can
go ahead and exit this session now that we know our pod is
functioning the way we should and that we know the container
image now it's correct. We're going to go
ahead and exit this session. Then we
can come back in the web UI and go into our management
session here. We can see when the session
started and that the session has ended.
We can go into the session recordings and actually view exactly what
we did. And as we
can see, these are the commands that we just ran within the session.
And this is actually not a video, it's a rich Json
log describing exactly all of the commands that we ran, which means
that this whole session can be forwarded to other Siem
tools or other logging management
tools that we can actually monitor these so you can play back these sessions
based on every command that we ran.
So the next thing I wanted to show you is how we log into a
Kubernetes cluster without using a teleport managed
ssh node. So in this case I'm going to be using my personal workstation.
So in here in our web UI, we can go to our Kubernetes
resources and we find our cookie.
So this is the same cluster that we were using before, but now we're going
to access it from my workstation. So first we're
going to go ahead and log into our teleport cluster.
We're going to execute this Tsh login. Here's the
address of our proxy, which is the publicly accessible
address of our teleport cluster.
And we're going to go ahead and log in.
Great, we're logged in. This used the same authentication
method as before. It logged in through my GitHub. So we're using GitHub
as an SSO here. Next we're going to select what
Kubernetes cluster we want. So right now we can do Tsh Cube
ls and we can
see all of the Kubernetes clusters that we have available to us. Right now my
user role only has access to the cookie cluster.
Next we're going to go ahead and log in to our Kubernetes cluster
and this will actually give us the Kubeconfig from teleport.
Going to take a second. Great. So we're logged in.
Now let's try to get all of our pods.
Awesome. So now we're in and we can run our commands on
the cluster. So let's go ahead and do what we did before and
let's describe this colormatic pod here in
the namespace.
Colormatic,
great. So we can see all that same information that we saw before when we
were connected to the direct host. Now even from
my personal workstation we're securely logged in through teleport and can actually
run Kubectl commands on our cluster.
Now if we go back in our web UI we can
actually see the results of our session here.
So we can see that the certificate was issued for my user.
We can see all of the details about that, the AWS
role arns that I have access to, all of the various metadata for
this session. All of this is tied
directly to my identity and
we can see all of the various Kubectl commands that we ran.
We can see the request to the cluster and we can see all of the
various metadata for that. We can see the Kubernetes users,
the teleport login, the namespace, all of the different protocol
information. And like I said before, all of this
information is very easy to export and ingest in a seam tool
for easy monitoring and easy alerting on anomalies
and various other things. Because all of this is just raw Json that we can
choose to use however we want and this
is how we use teleport to from my workstation
access it securely. All of the traffic passing through the teleport
proxy all being centrally logged in one location.
So that was teleport in action.
Thank you so much for watching and check out some of the other talks.
We got some great ones here at Comp 42 cloud native 2023.
You can also check us out on Slack at teleport slack.com.
I'm always hanging out there and am totally free to answer any questions you
may have or any clarifications. Or if you need help getting started with
Teleport, you can also check us out@teleport.com
you can sign up for a cloud trial for our enterprise solution or download
our open source version and try it out for yourself. However you start
your journey with teleport, it's the easiest and most secure way to access
all of your infrastructure. Thank you SoC much. Have a great day.