Conf42 Incident Management 2024 - Online

- premiere 5PM GMT

Securing Kubernetes: Best Practices and Strategies for the Modern Era

Abstract

Unlock the secrets to securing Kubernetes clusters in a rapidly evolving threat landscape! Discover cutting-edge strategies for protecting control planes, nodes, and workloads with real-world insights into RBAC, encryption, and runtime protections. Elevate your Kubernetes security game

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, folks. A very warm welcome from my side to CON42 Incident Management 2024. I am Manpreet Singh Sachdeva and I work for Walmart Global Tech in United States. Today, I will be talking about Kubernetes cluster security and will explain how Kubernetes has become really important In managing the containerized applications and it's why it is so important for us to secure our Kubernetes workloads, the applications and to protect us against the cyber attacks. So first, let me introduce myself. I'm Manpreet Singh Sachdeva and I carry a very diverse experience in DevSecOps, MLOps. Incident management as well as automated automation testing. So I myself certified, as a Kubernetes administrator, application developer and security specialist as well. So I'm certified in, AWS as well. I possess both solution architect as well as DevOps, professional certification. I'm very passionate about, security, cyber security. I consider myself as a cyber warrior fighting against the cyber criminals and, protecting all the applications throughout the internet world. So we wonder, cyber security is one of the most, like the hottest topic nowadays. Because, we saw an example a couple of years back about Log four j vulnerability and the damage it caused throughout the industry, are all our customers were at stake. We had a big reputational damage to so many companies as well as financial damage and, cybersecurity has many ways of attacking us. the most popular way, is like the phishing and, optimizing, the cyber, optimizing the cloud, resources to attack the cloud infrastructure, and way to manage out, things where our worker nodes, our applications are thereby impacted. we'll talk about everything today. And, we'll first, we'll talk about how we can secure our infrastructure, how we can secure our workloads, how we can secure our cloud as well. So I'm really excited to share my talk and I, just hope we all of us learn together and get something out of it. So guys, let's get started. So Kubernetes cluster security. what is it and how Kubernetes became important and, what are the advantages Kubernetes offers us? So few years back, all of us were hosted on virtual machines, bare metal servers. And then we had a revolution where everything starting started getting containerized. So we had the docker containers and. We started using the applications, making them, platform agnostic. Like we can use Java, we can use Node. js, we can use Python and just containerize our application and run anywhere. Anywhere in the system. And, then docker had its own way of orchestration using docker swarm, but it had limitations where we couldn't, operate on scale and, there were, there was limitations in the resources as well. And then came Kubernetes and offered us, out of the world orchestration We started operating at scale. All the applications started adopting the kubernetes way of, functioning. So kubernetes, was made for the applications to work on scale and the, all the, different applications team, they started containerizing their workloads. And today we learned that how can we leverage the real world best practices To safeguard our cloud native applications So i'm just presenting in front of you the table of contents for what we will go through today So we will talk about Kubernetes security in general and our multi layered approach for the kubernetes security How we can secure the control plane, secure the nodes we work on, secure our applications and how can we secure, the network part of, the Kubernetes. And then we will touch base on some of the real world threats and incidents which happened in the past and how can we, leverage some best practices for our workloads going forward. I'll also share some resources. At the end of this session and I think, I'm just excited. So let's get started So yeah intro introducing you to kubernetes security So as I told you know kubernetes revolutionized the way organizations deploy manage and scale applications So the best example of kubernetes is like you can scale in and scale out Based on the auto scaling, we can configure with the help of the cloud services. So Kubernetes itself has its auto scaling called horizontal pod auto scaling where you can scale in and scale out And you can operate the way you get the customer traffic so That's you know We are given such a great power to operate on scale But with great power, there is a great responsibility You of securing our clusters, which is really essential, as it becomes a prime target for the cyber attacks. a single security flaw, can lead to major data breaches, service outages, and as I told you, so much reputational damage. In the past, we have seen big outages just based on, the security vulnerabilities in our infrastructure. And that was all like maybe due to the negligence or unawareness of the teams who actually operate on scale. So here we will talk about how we can secure not only our workload applications, but also the control plane, the network policies. How can we make sure that we have certain network policies that don't even allow, the, access to our cluster through some IP addresses. And we'll also talk about how to manage our secrets in such a way that we don't expose our secrets. and we never keep them in the plain text. We always encode them and, use, our secrets management systems like Vault, AWS Secrets Manager, and Azure Secrets. The way we can, work in conjunction with the cloud providers and secure infrastructure. So for modern cloud infrastructure, security we know is totally not non negotiable. Robust defenses at all layers ensure, that applications remain safe from intrusions. And, when we talk about security, we talk about four C's. So one is the cluster level security that we will secure the platform cluster itself. the infrastructure, I, when I talk about cluster, the infrastructure. Then the container, the workload which are running. In containers all we know, it's containerized. So securing our container How anyone cannot execute inside a container? How cannot somebody just come and use the IP address and Do anything on a container? So we have to secure our containers We also have to secure the cloud like, whatever cloud provider we use like gcp Azure or aws all have really good functionalities which we can leverage To secure our cloud and, I will give some examples, at the end, like how can we also leverage the cloud, how can we secure our cloud infrastructure as well? And then we come at last to the code. even the code has some vulnerabilities in case like we don't follow the standard coding directions. Which we should be following and we don't have the, right kind of levers and right you know, the coding, directions we have been, we must be using, when we are working in an agile environment. So we'll talk about that as well. here we have a multi layered approach to security. by securing each layer, we ensure that there is a comprehensive protection, across the entire Kubernetes environment. When we say comprehensive, we mean end to end protection for a Kubernetes environment. the Kubernetes cluster security involves addressing risk at multiple layers. The four key layers are Basically include the control plane. It's nothing but the brain of the cluster. All of the, the master nodes are in the control plane and we have to make sure that the way we manage our cluster, has to be secure. And we have to apply all our security policies at the control plane, making sure, we give, because all, always the admin has the control plane access and we have to make sure that, we follow the security guidelines and, adhere to whatever the security policies are being given by our company. like we all have our company policies, so we have to, adhere with that. And then we, talk about the nodes, like when, then we have the worker nodes, in case, for the virtual machines. Like we have the running workloads or we also may have a physical virtual machine. Then we talk about the actual workloads, the containerized apps which are deployed in the cluster. We'll talk about how we can secure our workloads. And then we talk about the network part, which is the communication pathways that connect the services, the applications and the infrastructure together. So we'll talk about the network security as well. So let's first talk about the control plane security. So in control pane, we have the API server authentication and authorization. So it's really important to secure, this API server authentication. For that, we can implement OAuth or OpenID Connect. So that is a very strong way of authenticating and we can also use the role based access control policies. Role based access control policies are very popular in the Kubernetes world. suppose I'm part of one user, I'm part of one group and the group has an owner. So as a group owner, or as a part of the group, as a group member, I will only have one access. The access to some of the functions on Kubernetes, suppose I can, if I'm part of a, Developer community and I'm using a staging environment. So I may be able to list, create, delete, or, do anything on a lower environment, but suppose there is a production environment. So on that, my role based access as a developer. should not be able to delete any resource or should not be able to edit any resource. So role based kind of access will enable a certain kind of, ownership or certain kind of rules, to a kind of, the group we belong to. Suppose there is a developer group, there is a DevOps group, and then there is an SRE group, and then there are like the incident managers. So all will have a different kind of role based access. And the role-based access will define the kind of operations they can do on a particular workload or on a particular control plane mode. So this was what we talked, spoke about the role-based access. Then we come to the HCD encryption. So what is HCD? So like we have heard this name a lot in Kubernetes, so it's nothing but a key value database. It works like the way, the Redis works. So it is always encrypted at rest to prevent unauthorized access to our critical information. So it's very important to have encryption on etcd because if we don't have encryption, we all have, we will have all our, basically cube system resources. Vulnerable because they will all be exposed if we don't have encryption on the etcd level. So most of the companies they follow etcd encryption, but like even if we do a demo project if we do a a Just want to do a a poc still it is very important to encrypt your etcd because a small gap in not, managing etcd properly, there will be a risk to, expose your resources. that is very risky. it's always recommended do a etcd encryption at risk. Then we talk about the network policies. network policies are applied on the Kubernetes services. Kubernetes services is the way the pods or the other resources communicate with each other based on the port we expose. network policies control the traffic between the pods. Suppose I am in a different namespace and, another app is in a different namespace. We have two apps. We don't want them to communicate. So we will ensure that we will, make such network policies that we will restrict traffic between the apps. And there is, suppose there is a database app which uses a logging app. But for them we will always want them to, have a communication. So for them we will have a network policy which will allow, Access to and from the both the apps using the ports on the service objects. So network policies always control the traffic between the ports. They minimize the risk of lateral movement within the cluster. So there's a specific tools which help us leverage the network policies. like we have Calico or psyllium, they can actually define and enforce these kinds of policies. So we spoke about the control plane security. Now we will talk about how we can actually secure our node. So node level access if somebody has a worker node level kind of an access they can do anything they can even remove our Workloads, they can even expose the secrets if the attackers have the access to the node It really becomes difficult For us to control what they can do So on the node We know there is an operating system, so we have to harden the operating system. So how can we do that? Just start by minimizing the attack surface at the OS level, use lightweight hardened distributions like, where we are, which are very less, vulnerable. So use those kinds of hardened distributions. For example, there is a Ubuntu minimal. And then, follow all the CIS benchmarks for the system hardening. So CIS benchmarks are nothing but security standards, which we have to follow. To make sure that we are just following the guidelines set by the CIS benchmarks. Then we talk about the container runtime security. So on a node we have the container, so it's always better that we limit the container's access to the host system by enabling app armor or the SE Linux, so I will talk about the AppArmor, I'll share some resources, how we can, secure our runtime security through AppArmor. by which, we are able to make sure that, if a user is not, comes in or if a profile is not, listed in AppArmor, if it is blacklisted, it cannot access the container at runtime. so that's a really important feature where you can restrict certain profiles through blacklisting and you can allow certain profiles with AppArmor through whitelisting. So both of the things can be done. So in this, you can only restrict the container runtime. To a specific profile by profile, you mean that, there may be a user or a group of users or a particular profile, which we have created on the node, which has access to some of the containers. So that can be achieved through app armor. Then we talk about the security on the kubelet level. So how can we secure our kubelet? The agent running on each node. So it runs as a, like a daemon set on, on each node. So we can secure the kubelet by enforcing, the TLS for communication. Like we can use, certificates. And, we have the certificates bind, bounded with the keys. So always use the TLS certificates, on the kubelet and restrict the kubelet API access. So by which, if only a client has the certificate key, those clients can only, use, or do operations on the kubelet, on the kubelet level. So it is always very good practice to use TLS certificates. Thanks. We can always enforce TLS for communication and restrict the kubelet API. So by this we will have no unauthorized kind of access on the node level. So only the admins which, who have the, the keys for those certificates can, run commands on the kubelet or all of the authorized users. So this was all about the node security. Now let's talk about the application security, the workload security, or how to actually come inside a container and have policies which can secure the container. So the most important thing, is to not give a container an elevated level of access that the user should not have a root kind of access. Or we should not expose such kind of access where we can run, all admin kind of commands. So that is really important, when we give, the pod, when we talk about the pod security standards. So initially, there was a term called pod security policies, which is now deprecated and has been replaced by pod security admission. So what it does is enforces the security policies, enforcing the baseline. And, also restricted or a privileged kind of access is also restricted. then we talk about the runtime security. continuously monitor the containerized workloads for malicious activities. for this, for runtime security, we have tools called Falco, which will determine or detect any kind of anomalies based on the rules and the behavior patterns, which we have set. I would also share the resources for Falco tool. At the end of the session, so it's a really cool tool where we can, apply runtime securities for the workloads and then, a very important thing is secrets management. How we manage our secrets. As I told you, some people will just expose their secrets in config maps in plain text. that is not a good practice and, these kind of config maps are available for any user to list and, even copy our secrets. all the sensitive information like the API keys or passwords, first of all, should never be hardcoded. Hardcoding a password or an API key in itself, is it's, damaging and can, cause a really big issue. So instead we should use secrets management solutions like the AWS secrets manager where we can store the secrets, and we can rotate them after maybe 30 days or 90 days as per our secrets policy. We can also use like most of the companies, they use HashiCorp Vault to manage it, manage the sensitive. secrets information. So with, by using HashiCorp vault or even as your secrets, we never expose our secrets and they're always encoded in a base 64 format. So these are all the best practices which we can use to secure our workload. And the secrets management is definitely, a very secure way of, managing your secrets or any kind of passphrases we use in our code. So moving forward. We talk about the network security. So how can we strengthen our cluster security? Because one is our, we were able to secure the cluster. You're able to secure our workload. Now we have to make sure the way the, our workload and our, all of the application which is hosted on Kubernetes, how it interacts with the network and how we can make sure that there are no attacks on the network side and we can prevent them. So we will talk about the service meshes. So the implementing a service mesh. In a large scale applica application is almost a mush because then we will have a Istio based service mesh, which is in itself very secure because it uses the mutual TLS, it uses the, mutualist TLS encryption, between all the microservices which interact, with that service mesh. using T service mesh. Or Linkerd kind of a service mesh will ensure that you are secure on the network level, like if the traffic is coming, it will first come to the service mesh and then it will get routed to the workload. So it's always a very good practice to have a service mesh, before your application. And then we can do a kind of, security, implement security through ingress and egress control. Some of the some of you folks might be aware of nginx ingress controller So how we can do is that we can secure the external facing services Whatever are ingress, the domain names which we use, suppose we use a domain name called abc. com or something called, xyz. com. So these are nothing but ingress hostnames. So the way to secure them is with the help of, a server client encryption using certificates. We can always use TLS certificates and we can always use ingress controllers. So ingress controllers themselves have encryption. implemented on their, like the network node, so we can define clear rules for both ingress and egress traffic with Kubernetes network policies. So how we can leverage is that we can use our Kubernetes network policies, which, we discussed are, always applied on the service, Kubernetes service, with these network policies, we can use the ingress, objects and, which are defined in our Kubernetes cluster. With the help of ingress controller. So all the ingress will be defined on the ingress controller and all the ingress can be, basically protected with the help of, TLS certificate and that network policies can, we can leverage with the ingress host names. So that is the way we. Secure our incoming traffic and even the outgoing traffic through Ingress and through Ingress controllers. Then, we might, have heard the term about, distributed denial of service attacks. the DDoS attacks, the DDoS protection, is also, done by Kubernetes. So Kubernetes clusters exposed to the public internet are always at the risk of DDoS attacks. So we have like cloud services which, can prevent these DDoS attacks. We have like in all the cloud providers such as AWS, Azure and GCP. So they all, offer us a very scalable solution where we can get, protected by using their services all from all these kind of attacks. Now, we'll talk about a real world threat and incidents, as we are all managing or part of major incidents. We, in the past couple of years, we have seen a major incidents in organizations coming through, security vulnerabilities. And especially there was an example where the Kubernetes infrastructure was compromised. And, there was a lot of big reputational damage as well as a financial damage. So we'll just take that example here. So in 2021, Kubernetes security incident happened. An attacker actually gained unauthorized access to the cluster level API server due to weak authentication configurations. So as we just were discussing that if we leave these Cluster API is exposed because of the weak authentication. It can be very damaging. So there was an incident which happened in 2021, where an attacker, was able to gain the access and basically destroy some of the running deployments. The compromised API allowed the attacker to basically extract the sensitive data. the secrets as well as the passphrases. And was able to tamper with the running workloads causing, service outage, service disruption at that moment of time. So this is just an example, just one example, but there were several, incidents which happened due to same kind of, problems. unauthorized access to the Kubernetes cluster API server. how can we actually prevent it from happening? if we have a strong API authentication and a role based access control enforcement, that is a very good way of, preventing it. And as we already spoke about the HCD data encryption, it's very important to encrypt our, data in HCD, at rest, at, during all the time. And, another thing which we can do is do a monthly security audit. Or, there is maybe, do an automation of the security audits. To catch all kind of misconfigurations or if somebody has taken out any rule or if there are no role based access being present on a cluster. So those all can be red flags and, any, all the companies have the InfoSec team. So they should be regularly doing the security audits to, help us preventing, from these kind of incidents to happen. as we spoke a lot about, all of these, level of securities on all the Kubernetes layers. So let's also discuss about some of the best practices, which we can take away from this session and follow in our daily lives. to secure our kubernetes, clusters. So i've just listed some of the industry best practices for securing the kubernetes So first is the regular security audits as we also discussed in the previous slide, you know Continuously auditing the security kubernetes security of the kubernetes clusters using the guidelines provided to us by the cis They have set up a set of benchmarks It always help us to identify, any kind of misconfigurations or any, gaps in, our infrastructure. So doing a regular security audit is a must, for the applications to operate on scale. Then, another way we can, help our SRE teams and even the DevOps team is to. Set up continuous monitoring. So with the help of like real time monitoring, we discussed about tool called Falco. There's another tool, an open source tool called SysTick, where we can always watch for a suspicious behavior across the cluster. So Falco has rules. So if a rule for a particular workload is triggered, we will get alarms, we will get notified on our emails, on our all kind of, notification channels such as, PagedUtxMatters, where we get notified when there is a breach, in some, security or if there is a rule which should not have been there or some workload have been triggered something, which, you know, having some users having unauthorized access. So if we have a continuous monitoring in a DevOps culture and an agile environment that can definitely help all the, the ecosystem to, catch the incidents, catch the issues in early stages and, take necessary actions. Another best practice for Kubernetes is, is a golden rule of thumb that always give least privilege to, all the things. This least privilege principle is that implement role based access control. And always follow the reach privilege principle, like any developer or any, user who doesn't need access or admin kind of an access should not have, basically an access where they can go inside the container, run commands or do any kind of, create directories or do any kind of, Stuff which is not required. So always, as a rule of thumb, give least, privilege to a user who is not required to do anything important on the cluster. And then, with the automation, like we can have automated patching that regularly update and patch the Kubernetes components. because kubernetes also launches, the, patching, regular patching, if we follow the kubernetes, documentation, they will be releasing the security patch, I think once in a month, and it's very important to keep up with the security levels. So there should be regular patching with the help of, CICD tools like Jenkins, which can run an automation pipelines. And, close all the known vulnerabilities in the previous, releases and make sure that we are always up to date as per the, kubernetes, documentation. So that's really important to, help, taking out the old vulnerabilities in the code or in the cluster. So these are the resources which I am like, sharing with, my viewers. this is like how we can, encrypt the data on a, on a cluster. How we can use the, another tool called Trevi. it's a very handy tool where we can secure our workloads. I was talking about Sysdig. So this is the documentation of Cystic. I was talking about App Armor, that how can we create a profile, create some rules on App Armor and only a, some specific part of, the users which are part or are, or associated with that profile can only access some of the workloads and other users will not be able to access. If we define the profiles in the App Armor, then, on also sharing about something called Sec Comm. So by this you can, secure your network policies, or the network part of Kubernetes. then, talking about more about App Armor, there are some tutorials specifically for the security. Then we talk about the ALCO rules. And these are the dogs, the Alco rules. you can. Go through there's a cheat sheet for kubernetes security very handy for someone who implements Security in day to day life on the clusters and then there is a security checklist, which is also very handy for someone to host an application or Also, even when you want your application to communicate with the other applications or the outside, Vendors it is very important to check the security checklist. So by that you will know, what are all the parameters, which make sure that your, the workload and as well as your node is always secure. So these are really good handy resources and all of them are open source. So you don't need to buy a, get a license. You it's everything is available. Open source. The beauty of Kubernetes is most of the stuff is cloud native. And it is open source and, it's very easy to implement and, most of the stuff you'll find on GitHub that people have written security policies, which the whole of the world can leverage. at the end, I would say, security comes more of as a responsibility. we all should step up and, be very, open minded and, take a responsibility in fighting as a cyber warrior by why? Because if we consider ourselves as a cyber warrior, we will find that ownership in ourselves to protect all our applications. using these, security best practices. So concluding, this talk, I would say, as Kubernetes continues to drive cloud native innovation, security cannot be an afterthought. This should be no compromise on the security. Every layer of Kubernetes ecosystem must be fortified to protect against increasingly sophisticated cyber threats. So by adopting the best practices such as the network policies, runtime security, runtime monitoring, Kubernetes, secrets management organizations can ensure their clusters are actually secure and resilient. Kubernetes offers us a very like a, niche specialist certification. So for this, you have to be the, Kubernetes security specialist certification really offer the deep understanding of the security measures and empowering all the professionals to take the charge of the cloud native security. But for this, you have to be, like CKA certified first. and then you can, sit for an exam for CKS, but if you want to like, grow in this field and, understand the in depth working of, the security protocols, on the Kubernetes layer, I think this is a must for somebody who is, motivated by, the security, cyber security and wants to work in this field. So that's. recommendation I can give, this certification will really help you understand the in and out of the, cluster security. So that is it from my end. a massive thank you for the Con 42 incident management team who gave me this opportunity to present my thoughts. And a massive thank you to all my viewers who were with me, listening and, I hope you all gain something and, you all take away really good, security measures, to secure your, infrastructure to secure your cluster. And, by this, I would say, if you have any questions, you can reach me out. on the screen and I'm, really again, thankful for giving me this opportunity to share my thoughts. hope you liked it. And if you have any comments, please do share. Thank you.
...

Manpreet Singh Sachdeva

Staff Software Engineer @ Walmart Global Tech

Manpreet Singh Sachdeva's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways