Conf42 DevSecOps 2024 - Online

- premiere 5PM GMT

Kubernetes Networking 101

Video size:

Abstract

A beginner-friendly talk that explains how networking in Kubernetes works under the hood. It provides examples of the process of IP address management and helps you understand the journey of a packet as it travels from one pod to another within a Kubernetes (K8s) cluster.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone, and welcome to this session on Kubernetes networking 101. In this session, we'll dive into the fascinating world of Kubernetes networking. Now, I know networking can sometimes feel like a black box, especially when Kubernetes has its own complexities on top of it, whether you are a developer or a DevOps engineer or somewhere in between. My goal today is to make this topic approachable and practical for you. Networking in Kubernetes is a vast topic. And it's impossible to cover everything in just 30 minutes. So in this session, we'll focus on the foundational concepts and key areas that you can build on. So before we start, a quick introduction. My name is Abdul Kareem. I'm the founder and principal architect at V Lambda. And I've been in DevOps SRE space for more than a decade now. You can feel free to ask questions during the Q& A at the end. Or you can also reach out over email or Twitter using these coordinates. So here's the agenda. We'll start with a brief introduction to Kubernetes to ensure we are all on the same page. Then we'll dive into the core topic. That is Kubernetes networking. First, we'll explore how pods communicate, starting with communication within the same node, and then moving on to communication across multiple nodes. Next, we'll talk about the overlay networks and how they handle cross node communication. From there, we focus on services, which are a key to abstracting and simplifying, port communication. We'll also briefly touch upon IP tables and its role in routing the traffic within a Kubernetes cluster. So Kubernetes is the most popular open source container orchestrator platform available in the market. It simplifies the deployment, scaling, and management of containerized workloads. Making it a crucial tool for modern infra developers. It was originally developed by Google and they released it as open source in 2014. So it's been almost 10 years now that Kubernetes is available in the market. One of the reasons Kubernetes has become so popular is its ability to handle dynamic environments. Imagine running hundreds of thousands of containers manually. It is just not feasible to automate much of this complexity for you. It also stands out because it adapts to changing demands effortlessly. For example, you can scale your application as well as infrastructure up or down automatically, ensuring you always have just the right amount of resources. Never too much and never too little. When something goes wrong, Kubernetes acts like a safe healing system as well. detecting and replacing any field containers to keep your applications running smoothly. All of this is made possible by Kubernetes networking, of course, which acts as the highways connecting every container, pod, and service within your cluster. Without a well maintained network, traffic would slow down, applications would crash, and the entire system could grind to a halt. So networking is a foundation that allows Kubernetes to orchestrate applications at scale. And that's what we'll be explaining in the session. Now let's talk about pods. Pods are the basic building blocks of Kubernetes. In Kubernetes, you don't work directly with containers. Instead, containers are grouped into pods. Think of pod as the smallest deployable unit in Kubernetes. a logical wrapper around one or more containers that need to work together. All containers within a pod share the same network and storage namespace. which means they can communicate with each other using localhost. This design simplifies, the way containers interact within the same pod. For example, if you have an application split into multiple components, like a web server and a monitoring agent, they can reside in the same pod and communicate seamlessly. Pods are also designed to be atomic and ephemeral. Atomic means they're treated as a single unit. If something goes wrong in a pod, The entire pod is replaced and ephemeral means they can be destroyed and recreated at any time, which introduces an important consideration, the pods IP address. It's never permanent. It can keep on changing. Every new pod will have a new IP address. So in short, pods encapsulate containers. Provide them with a unique environment and act as a foundation for all high level Kubernetes abstractions, like deployments, replica sets, stateful sets, and so on. Next, we'll talk about the Kubernetes networking model. Now that we understand what a pod is, let's dive deep into how networking works for pods. Kubernetes operates on a model called IP per pod. This means that every pod in the cluster gets its own unique IP address. This is a fundamental part of Kubernetes design and a key reason why its networking model is so powerful. With each pod having a unique IP, Kubernetes can treat them like independent entities on a network, right? much like a device on a local network. This simplifies communication because one pod can reach another directly without requiring NAT or any network address translation within the cluster. Another key aspect of the networking model is that all containers in a pod share the same network namespace. This means that they share the same IP address, network interfaces, and boot space. So by sharing the network namespace, Containers inside a pod can communicate directly without using pod level networking. For example, if one container is running a web server and another is running a monitoring agent, the agent can collect metrics using localhost without any external networking configurations. This shared namespace is what makes pods so efficient, for running tightly coupled workloads. While containers within a pod work as a cohesive unit, communication between pods continues to rely on Kubernetes broader networking model, where each pod has its unique IP. So this inter pod communication model is a foundation, to how Kubernetes simplifies application design. Cool. Let's bring everything we've discussed so far into this example. to visualize the concepts of IP per pod and shared network instances. So in this diagram, we have a virtual machine, VM1 with an 2. 3. Inside this VM, there are two pods, each with its own unique IP address. Pod A has address 0. 10 and is running a Python application on port 3000, while pod C has IP of 0. 11 and contains two containers, one running a Java application on port 3000. And the other running a data docketed. Now, if you have run traditional workloads, you would realize that on a single VM, it is not possible to run, an application on the same port, multiple applications using the same port. But since, we are running in this isolation provided, using containers and Kubernetes, you can have a number of applications all listening on the same port. So in this case, Python and Java, both applications are running on the same VM. And I'm listening on port 3000. All right. The setup illustrates the IP per port model. So each port is assigned a unique IP address, allowing them to communicate directly with each other, without requiring any translations, right? So for instance, port A, if port A wants to send data to port C, it can do so by addressing port C's unique IP address 192. 168. 0. 11. And within port C, the two containers are sharing the same network namespace. This means, they both have access to the same IP address. and can communicate with each other using localhost. For example, the Datadog agent can collect metrics from Java application by connecting to localhost colon 3000 slash metrics. This shared namespace simplifies the way containers within a pod interact, making them feel like they are running on the same machine. All right, so now that we've discussed the IP per pod model and how pods communicate, you might wonder how does Kubernetes assign IP So the answer is it does not. Kubernetes doesn't itself get involved in the IP address management, but rather it leaves it to the CNI. that is the Container Networking Interface target. Let's look at the pod creation lifecycle in the next slide. So the process of IP address allocation for a pod involves several components working together. But I'll break it down step by step to make it simple. It all starts when, the Kubernetes schedules a pod on a specific node. The kubelet on that node takes over and instructs the container runtime interface plugin to create the board. In this example, the CRI plugin is container D. Container D creates a sandbox ID and sets up the Paul's network namespace. Container D then hands off the job of setting up the networking to, a CNI plugin, right? Or the Container Network Interface plugin. The CNI plugin works with your chosen networking solutions such as Flannel or Calico or, AWS VPC CLI to assign the pod a unique IP address from the available pool. Once the networking is ready, the CRI creates something called as a pause container, which holds the network setup and acts as the base for all containers in the pod. Finally, the application containers are started sharing the same network setup and IP address established by the pause container. Now what's key here is that Kubernetes itself doesn't directly manage the networking details. It delegates that responsibility to the CNI plugin. This allows Kubernetes to support a wide variety of networking solutions. Now that we understand how pods get their IP addresses, the next question is, How does one pod communicate with another pod using these IPs? Kubernetes provides a flat networking model that ensures every pod can communicate directly with every other pod in the cluster as long as it knows the IP address of the target pod. So let's take a closer look at how this communication happens. Starting with pods located in the same node. So this example, we have two pods, pod A and pod B running on the same node. Each pod has its own unique IP. Now, pod A wants to communicate with pod B, right? We know that when Kubernetes creates a pod, it sets up a dedicated network namespace for that pod. Each pod also has its own, namespace, which includes its IP address and network stack. Both the pods have been assigned their unique IP addresses. Now, all of these namespaces are isolated from each other. But they can communicate through the root network namespace of the node, right? So the root namespace acts as a bridge, enabling traffic to flow between the pods. To connect the pods to the root namespace, Kubernetes uses virtual Ethernet pairs, or V8 pairs. Each pod is connected to the root namespace with its own V8 pair. So pod A, connects to V if X and pod B connects to V if Y, these V pairs can act as virtual cables, bridging the pause network namespace of the node. This ensures that the pods can send and receive packets within the node, right? And V pairs are always created in pairs, right? imagine. It will be a pipe. A pipe always has two ends, right? A pipe cannot have one end. So the virtual Ethernet pairs are like pipes connecting your virtual namespaces, virtual network namespaces within the node. So the journey of the packet begins in the pod A's network namespace. From there, the packet exits to the, with the interface, virtual ethernet interface, which connects the port to its corresponding virtual ethernet pair right with xx. And once a packet reaches this virtual ethernet pair, it's passed to the root network, namespace of the load within the root namespace. The container bridge interface plays a crucial role. the CBR zero acts like a virtual switch, which connects all the virtual Ethernet pairs on the node. It looks at the destination of the packet and knows that it has a routable interface. VETHYY available on the same node for the destination IP. Once the packet reaches the VETHYY, VETH pair, it travels into port B's namespace and arrives at its destination interface. No, it's zero for me. Processes the packet as it as if it directly came from party without any modifications to the source or destination. Right? that was easy. Now let's look at an example where we have multiple nodes in one, right? we have called a and B running on node one. while we have pod, C and pod B running on node two. and in this case, pod A wants to send data to pod C. Okay. pod A generates a packet with its source IP set to 10. 0. 1. 10, which is its own IP address, and destination IP 10. 0. 3. 10, which is the IP of pod C. This packet exits pod A via its zero interface. it travels to the virtual Ethernet pair, VEtec sets into node's root network namespace. Within node 1, the bridge device recognizes the destination IP. and figures out that it is not local to node one and needs to be routed externally. The packet is now out of the host and on the network, right? And it reaches node 2. In this example, we are seeing only two nodes, but in reality there could be hundreds of nodes in your cluster. It is the responsibility of the CNI plugin to ensure that the packet reaches the correct node based on the destination IP. So if you have ever used Amazon EKS, you must be familiar with. This dashboard, right? this is a worker node of an ES cluster. We can see that AWS keeps a notebook of all the IP addresses assigned to the pos running on each node. Now a WS deploys V-V-P-C-C-N-I plugin add-on with every EESS cluster, and it is the only supported CNI plugin on EKS. You can use other plugins like Calico or Celium, but they are not officially supported. Coming back to our example, the packet now reaches Node 2's network interface 8. 0. From there, the bridge inspects the destination IP, 10. 0. 3. 10, and forwards the packet to the correct virtual Ethernet pair, connected to port C. And the packet finally reaches port C. All right. In the earlier slides, we saw how Kubernetes uses a flat networking model where every pod gets its own unique IP address, but what happens when your cluster grows and the IP range available for your pods starts to run out? This is where overlay networks come into play. An overlay network is a virtual network built on top of an existing physical network. Think of it as a way to create a larger logical network that spans multiple physical machines. Overlay networks are particularly useful in Kubernetes when your IP ranges, your IP address ranges for pods is limited, but you need to scale a cluster beyond those limits. or if you need to isolate traffic for security or multi tenancy purposes, or the underlying physical network doesn't natively support pod to pod communication across nodes, right? So the overlay network encapsulates pod traffic, right? Allowing you to traverse the physical network as if, it were part, it were all part of a single logical network. Now this abstraction solves the problem of limited IP space while enabling communication between your pods. the common examples of, overlay networks, CNA plugins that support overlay networks are flannel, We've, Calico, Cillian, and so on. So let's try to understand how all the networks work through an example. Now, in this example, again, we have two nodes, both running two parts each and called a wants to talk to part B or C. Now there's, you need to notice that there's a flannel device created in the root namespace of every node, right? We've, shown it through flannel zero. So it does flannel zero. So the packet A leaves pod A through Vth pep. the packet reaches the root namespace of the node, and it hits the bridge, the bridge device, and it realizes that the destination IP address does not belong to this particular node. In this case, something interesting happens here. The packet is intercepted by flannel device. Flannel maintains a mapping of all the pod IPs and the corresponding nodes in the user space. So when Flannel device sees the destination IP, it can easily look up in the map and realize that pod C is running on node 2. So with this information, what it does is it encapsulates the packet, which means it puts a header on top of the packet with node 2 as the destination address and node 1 as the source address. So your entire original packet is now encapsulated inside a new packet which is created by flannel. So flannel has now encapsulated the packet and it is ready to leave node 1, leave node 1 and it eventually reaches node 2. When this packet reaches node two, it is again intercepted by the flannel device. The magic of encapsulation is undone here. flannel will again look up, uh, in this map and change the source and destination addresses of this packet. So the source becomes IP of pod A and destination becomes IP of pod C. And the packet will follow the same path in Tilt to Perth and eventually reach pod C. So this is how an overlay network would function, in a Kubernetes cluster. In Kubernetes, we've seen that pods are ephemeral, meaning they can change the, they can change when pods are, pod IPs are ephemeral, which means the IPs can change when pods are recreated or rescheduled. This possesses a challenge, this poses a challenge because communication between components using ephemeral IPs for pods is not practical. So Kubernetes addresses this issue of dynamic pod IPs by introducing services. A service acts as a stable endpoint that abstracts and routes traffic to a group of pods instead of a single pod. So this ensures that the application can communicate reliably even when the pod IPs keep on changing. Okay. This slide shows a typical YAML configuration for a Kubernetes service. So let's break it down. There are two important sections to notice here. The name in metadata is the service identifier. And it is used for the service discovery. if your pod wants to talk to this particular application, it can just say, make a gate request to Hello Kubernetes and Kubernetes will resolve this name, this domain name to the service IP address using kubernetes. The other thing is the selector section in inspect. It is a list of labels that are used to group pods. So all pods with the label Hello Kubernetes will be part of the service. So how does a pod actually talk to another pod using service IP? Let's look at an example. So just like before, pod A creates a TCP packet and places it on the virtual Ethernet there. This time the destination is not a pod, it is a service. The bridge device doesn't recognize this destination IP and the packet is about to be sent to the default gateway. But before it leaves the node, it gets intercepted, but before it leaves the node, it gets intercepted by the IP table of the host. So IP table is a firewall program, and every packet that goes in or out of your node has to go through IP tables. of course, if you have it installed and enabled. Kubernetes uses IP tables to perform something called as DNAP or destination network address translation. Which means that we are going to, modify or alter the destination network address, the destination network address. So the destination IP in the packet is rewritten from service IP to the IP address of one of the backend pods behind the service, right? So there could be multiple pods running, in service C, it would pick one of them at random, right? It does a round robin of sorts by default. So this translation is managed by a contract. which tracks the connection and ensures that the response packets from pod C are correctly routed back to pod A. So to pod A, it appears as though it is communicating directly with service IP, while in reality, Kubernetes is, transparently redirecting the traffic to one of the backend pods. This abstraction allows Kubernetes to provide load balancing while hiding the complexity of individual pod IPs. So with this new information, the packet lands on eight zero and finally moves out of node one on its way back. it again hits the IP tables, on the node, and you can see that the source and destination are flipped, right? So now the source is called C and destination is called A, and the IP tables is going to use contract information to rewrite the source IP address, right? Because the original source, of this packet. Is the original destination was service C. So the packet has to come back from service C. This is known as SNAP, right? Source network address translation. So earlier, when it was going out, we did DNAT and when it came back, it, it did, SNAP. finally reaches port A and the journey is complete. So thank you so much guys. that was it from my end. And if you're interested in reading and learning more about the networking concepts of Kubernetes, here are some resources that I've collected, which you can refer later. Thank you.
...

Abdul Karim Memon

Co-Founder @ ReLambda

Abdul Karim Memon's LinkedIn account Abdul Karim Memon's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)