Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello everyone. Thank you for having me.
Today we're going to talk about ambience Mesh. My name is Abdel.
I'm a cloud developer advocate with Google. I work
on Kubernetes and service mesh.
I've been with the company for almost ten years. I'm also a co host of
a Kubernetes podcast called the Kubernetes podcast from Google. And I'm also
a CNCF ambassador and that's my Twitter x if
you want to reach out about anything related to this topic. So today
I'm going to do an introduction to ambience mesh. But before I get going,
we have to understand what istio is and what
does it do. So Istio is a service mesh tool.
It's open source, it is part of the CNCF landscape. It's actually
a graduated project in the CNCF.
Like any Kubernetes tool, it basically follows the same architecture
which is based on a control plane and data plane. In the
case of istio, the control plane is literally called Istio,
which is a deployment. If you are familiar with Kubernetes,
it's essentially a deployment that runs in the istio
system namespace and it's completely stateless,
so it can be scaled up and down depending on traffic.
The data plane for Istio is what we call the proxies.
These are based on an open source proxy called Invoi.
So Invoi is C proxy, which was written and open sourced
by Lyft and they have just made it available for everybody to
use. And the way issue works is
essentially you would deploy the control plane, then you
would typically label namespaces in kubernetes to
say, I want this namespace to be part of the service mesh. And I'm
going to explain why we say a service mesh later. And what
would happen is the proxy will be automatically injected next to your workloads
and it will be set up such a way that it transparently
intercept traffic coming in and out of your application. So in the
example you see on the screen, the little
rectangle, the gray rectangle is the pod. Inside the pod you have
the service, which is one container, and then the proxy is a second container.
It has a name. In Kubernetes we call it sidecar.
Sidecar is not a Kubernetes native object per se.
It's more like, well, that's not correct.
Since Kubernetes 1.28 kubernetes
have implemented a way to handle sidecars
in a more native way. But sidecars
is just a common term. It's just like an agreed on kind of
pattern where we decided that inside a pod, if you have two containers
and one container is providing extra features,
then we would call it a sidecar. So when
your pod boots, the proxy would boot, it will connect to the istio control plane,
download all its configuration, including any policies to
enforce, any routing to do, where to send,
telemetry, et cetera, et cetera. It also downloads the entire routing table,
so it keeps in memory a list of all the other pods, pods in
kubernetes that are represented by an API port. So the proxy is aware of
all the other pods and then any certificates.
And one of the key things that people use Kubernetes istio for is mtls,
so mutual tls where you have certificates on both the client and server.
So this is how istio stands today.
It has also like a set of, we call them special proxies.
One of them is the ingress gateway and the other one is the egress gateway.
They are just standalone invoice proxies that also are configured
through the control plane and they are used, one of them is
used for ingress, so ill traffic coming from outside the service mesh to inside the
service mesh, and you can also use it to enforce some policies on
the perimeter. And then the egress gateway is used for traffic leaving
the service mesh, which also can be used to implement some policy
enforcement. Now why do we call this a
service mesh? Where is the term mesh coming from? Well, if you're familiar
with Kubernetes in Kubernetes space, you would use a service capital s
as a way to implement service discovery and load balancing. So if you have two
applications, application a, application b inside the Kubernetes cluster,
you would create a service for application b, and then that service would
create a DNS entry in kubedns or whatever DNS you're using
core DNS or whatever. And then you would use that sqdn of the service in
order to a discover all the pods behind the service and b have
a single point of entry toward the service.
Typically a service gives you a stable vip virtual ip,
and then regardless of where you are in the cluster, if service a
is talking to service b or application a is talking to application b. So application
a would use service b for service discovery,
it will get an IP address, in return, it will send traffic to that IP
address, and then some magic behind which is typically
implemented through IP tables or other mechanisms would implement load
balancing, which is typically a route robin in a service mesh scenario,
that's completely different. Services which are again capital s
services are still used for service discovery, so you still implement service
discovery from an application perspective. The same way if you have service b, you have
to create a Kubernetes service for it. And then you would use that
Kubernetes service to call it from service a or from application a.
The communication path is completely different because since the proxies
know each other, and since every proxy know every other proxy,
the service discovery is implemented the same way. But at the
moment when service a application a sends traffic, that traffic is intercepted
by the proxy and the proxy sends traffic directly to
the other proxies that represent service b directly to
their IP address port. So the VIP is not used for traffic routing.
That cluster IP created by the service that you have created
is not used in this case. And that's why we call it service mesh,
because you have basically a mesh of communication. You have every proxy
or every pod talks to every other pod.
So explained a little bit how
istio works. Istio is used for a lot of
things, including policy enforcement. You can do things like timeout
and retries and circuit breakers. You can do things like authorization,
using jot tokens, or using Spiffey,
which is one of the protocols inside of kubernetes.
You can do a lot of traffic shaping like content routing, canneries a,
b testing, et cetera. So all of these things are implemented in
the infrastructure layer. That's the key point with the service mesh like istio, is that
if you want to implement any of these things as a developer, you will have
to write code for it. With istio, you basically let
the network layer, if you want, handle all these things for you. So the app
itself doesn't have to be even aware that there is a timeout or there is
a retry policy, or there is a circuit breaker or any of these things,
right? The concept of service mesh is
not really new. It existed for a very long time. People had to do
this kind of traffic routing, et cetera.
And it used to be implemented through proxies. Typically what really a service
mesh introduced is just this concept of sidecar that we have talked about, right?
So sidecars
give us a lot of things.
They allowed us to implement kind of network smart
feature in the network, in the infrastructure layer without having
to implement them in the code.
And while they are useful and important, sidecars have
some complications. One of them is that they are very invasive. What do we
mean by invasive? Well, imagine that you have a scenario where you
don't have istio, and the steps in order for you to
implement istio is that you would start by deploying the
control plane. That's just typically just a
deployment. There are a bunch of crds that you have to deploy because in the
istio world it has its own objects for traffic routing.
So all these crds get created and then in
order to add existing applications to the service mesh.
Here I'm talking about scenario where you're going from I don't have a service mesh
to I want to have istio. If you're starting fresh, this is probably not
a problem for you, but then what you would do is you would tag
or label namespaces and then you will have to restart your pods. And that's why
we say it's invasive, because it requires restarting workloads in order for
the proxy to be able to be injected and used.
And that's typically not a problem,
but typically not a problem. Depending on the scenario. If you
don't want to reload your workloads then it would be hard.
And typically what people do is that they would wait until next time they
do upgrades for their kubernetes clusters and then they would install istio,
which is fine, except that you are basically implementing
too many changes and that's typically not recommended
from a change management perspective.
It also doesn't work with some implementations like istio sidecar istio
doesn't implement TCP, doesn't implement TCP,
sorry, it doesn't implement websockets. There are a bunch of things that are not implementable.
Only HTTP communication is implementable. And then
the last thing, and this is like a contention point really, if people have been
using Istio for the last five years, is the
resource requirements in the last benchmark executed
on Istio, I think 1.18 the benchmark
is something like a 0.3 or
0.4 virtual cpu and around 40
or 50 megabytes of memory per sidecar for a
service which is serving 1000 requests per second.
Again, don't forget that there are sidecars per pod. So for each pod
there is your container plus the sidecar. So zero point 35
or 0.4 VCPU and 50GB of memory might not
sound like a lot, but if you are running inside a cluster that contains 1000
containers or 2000 containers, that could add up.
Essentially the moment you add istio, you're doubling up the number of containers in your
cluster and that's an issue. So the community and
the maintainers of istio got together and tried to
figure out a way to solve this. And they came up with this idea
of ambience mesh. So the whole idea of ambient mesh
is to change the data path. The control plane will remain the same and is
the same. The data path, the way we insert intelligence
into the network has
to be implemented through a set of requirements. One of them, it has to be
nondisruptive to workloads. In other terms, adding or
removing the proxies or whatever is going to replace the proxies should be
not transparent, at least
for a while. Ambient mesh should have compatibility with sidecar
based istio, because we are aware that the way people will
implement ambient mesh will be through a migration process of
existing istio workloads. And that's like a very complicated thing
to do. So one of the requirements is traffic interoperability
between traditional sidecars and no
sidecars, which is what ambient mesh is aiming to do.
And then in order to enable it, to disable it, they wanted to implement it
through a simple way. So in the new architecture for
ambient mesh, sidecars are gone and they
are replaced by two types of proxies.
Those proxies try to treat the mesh as two different layers,
secure layer and the layer seven processing layer.
The secured layer is implemented through a proxy per node.
So it's a multitenant per node cluster. So there is no more
per pod proxy anymore. It's per node called
ztunnel. ZTunnel runs as a demon, so it runs one proxy
per node. It's completely stateless, which means it can be scaled up
and down. It has built in authentication
and encryption, and it implements some of the layer four policies
and telemetry. If we want full
layer seven policies, like authorization policies for example,
which require something to look at the HTTP header
to implement the authorization. Then they added another
thing called the waypoint proxy. This is a per
namespace proxy which still uses invoi. So ztunnel is a
new developed proxy in Rust, but the waypoint
proxy implements is
based on invoi. And then they used this new protocol
called Hborn for encryption and authentication.
Now in this new architecture, there are a bunch of things that
we have managed to solve. One of them is if people only want to
do mtls, then you don't have to implement the layer seven
process layer, you don't need it. You can just disable it and just
have the overlay, a discure overlay layer through the ztunnel.
If you want some basic traffic management like through TCP routing,
et cetera, et cetera you can also do that. By the
way, I said earlier that issue doesn't support TCP.
That was wrong. It doesn't support UDP, not TCP. And then
if you want some advanced traffic management or
security with authorization policies, then you can implement the layer
seven processing layer. And through this new architecture, the aim
also is to try to make adopting service mesh as easy as
possible. So this is how the ztunnel looks like.
I talked about the fact that ztunnel runs per node.
You can consider the little purple squares as the
node. Each node has a ztunnel running in it as a demon set,
completely scalable up and down. If there is a lot of traffic, all the containers
in the pod which now don't have sidecars anymore send
traffic to ztunnel. And then the Ztunnel implements HTTP
tunneling as an overlay to basically encrypt traffic as it goes between two nodes.
One of the things also ztunnel does is that it keeps the identity of the
pod. So if you have container
c one or pod c one sending traffic to pod s one,
then s one will see the traffic coming as the identity of pod c
one. So it will see the service account essentially. That's what I'm trying to say.
Then if you want to add those layer seven policies,
then we create the waypoint proxy for you, or you will have to deploy it
manually. And then if
there are any policies to be enforced, then they will be enforced by the waypoint
proxy.
Again, the waypoint proxy ran per namespace, so there is no more sidecars.
Again, so it's just a special proxy that runs somewhere and it's
responsible for one namespace, completely scalable as well if there
is more traffic, because it's stateless.
So we talked quickly about how we'd traditionally
deploy issue service mesh in the traditional deployment
model, which uses sidecars. So you would deploy the control plane and
you would tag namespaces and then restart
them to inject the sidecar in
the new mode with ambient mesh. You don't have to do any of that stuff.
You deploy the control plane, obviously, and then you can just enable
the ztunnel or enable the ztunnel and then the
ctunnel will be implemented through the network CNI, because istio does have a CNI.
So they basically took the CNI and made it work better with the
ztunnel.
So what is Hbond? So traditionally
with istio based proxies in istio sidecar,
every connection from the client creates a new TCP connection
between the proxies. So you see here I have two containers, c,
one and s one container, c one talks to three different ports,
and then for each of those ports there is a new TCP tunnel,
or TCP connection created between the proxies.
So with Hbone, one of the things this protocol
can do is that it can tunnel through the connection through a single MTLs
connection using HTTP connect.
So it's actually better performance than
sidecars. And although this is what
Hbone is able to do, by the way, this is actually not visually correct
because there are no sidecars. It's the ztunnels
talking to each other. And the ztunnels will have a single MTLs connection and
they will tunnel all traffic through that connection.
I don't have a demo, so I just want to quickly talk about
some stuff that are important to keep in mind in istio traffic
management or in the existing sidecar based proxy.
This is typically how you would do traffic management. So if you're familiar with Kubernetes,
you know that you create deployments and services and stuff. But if you add istio,
then you remember all the crds I talked about.
All these CRDs gives you objects that allows you to do
traffic management in istio. So one of them, for example, here is an example.
I have a virtual service and a destination route. So let's take an example.
We have service a on one side and then we have service b on the
other side, and then we added service b
version two. And I want to send part of the traffic from
service a to service b, in this case, 5%.
I can do that with Kubernetes natively. So what I have to do is I
have to deploy what we call destination rules. The destination
rule, essentially what they do is create like virtual services in a way not
to be confused with the actual objects called virtual service, but they basically take
service b v one and service b v two, and make them look
like two different destinations. And then with the virtual service, then you can
say, I want 5% to be able to send to v two, and then I
want 95% to be sent to v one. Right? And because of
the mesh concept I talked about earlier, the sidecar
on the service a side is able to do that fine grained tuning of
sending traffic between a and b. What happened over
the last few years or so is that Kubernetes, or the Kubernetes
community have worked on a new API, a new open source
API called the Gateway API. So the gateway API is essentially a set of APIs
that are going to be the next generation ingress.
They will eventually replace Ingress as an API.
And the Gateway API was implemented with a bunch of kind of lesson learned
from the Ingress API. One of them is being able to do things natively
in the API itself instead of just relying on extra crds or
extra annotations. If you have implemented ingress in
istio in a service mesh, sorry, in Kubernetes,
you would know that it can get very long, because the Ingress
API in Kubernetes solves the most common denominator
across all cloud providers and across all the open source tools that exist.
And it is up to each cloud provider
and each open source tool, each gateway API, each whatever,
to add that layer of
customization they need. And those annotations that you
see in an ingress object, they are typically not compatible with each other.
The Gateway API had in mind to be able to have a single
standard mode of implementing most of what people care about,
things like routing rules and path based and
host based routing rules and those kind of things, right? And so the
Gateway API comes in three different objects.
So you would have what we call a gateway class, a gateway and an HTTP
route. There are also TCP routes and there are TLS routes.
But a gateway class essentially is something that
the cloud provider implements for you or installs for you
that defines the type of load balancer the gateway objects create. An actual
load balancer. The HTTP route is what maps
the actual service or the backend toward that load balancer.
You can have multiple personas deploying these things in the ingress space.
It's up to the service owner inside the namespace to
deploy the ingress object to expose their application to the outside of the cluster.
Here you can have a platform admin implement the load balancers
for you, and then it's up to each service owner to implement their own routing
rules. Also, one of the key implementation details of
the Gateway API is the fact that you can do cross namespace
routing, which you couldn't do with ingress. This is just an example
where you would have the gateway object called Foo,
which is using a gateway class provided by the cloud provider or
by the infrastructure provider. In that foo gateway object,
which deploys the load balancer, you can decide what the domain is, you can have
TLS certificates, you can have policies, et cetera, and then you can allow the store
developer and the site developer. These are two different namespaces, two different apps to
use HTTP route to route how traffic gets from the load balancer to their back
end. This is how an object looks like.
And there is a reason why I'm talking about this because
in the context of ambient mesh.
So this is an example where you have an HTTP route that says if I
have hostname foo.com, if I have a rule that matches
the header cannery, then I would send it to the cannery version of the service.
Otherwise I just want to send to existing version.
You can also do things like weight based bumping, like 80%,
20%, stuff like that. So what's happening right now is that
the istio community have decided that they are going
to take the Gateway API and use it as
a way to do traffic management. There are multiple reasons for this.
The main reason, the straight point reason, is no one needs more crds.
So we're trying to get rid of crds. That's reason number one. Reason two,
since the Gateway API comes with already a bunch of those routing rules
natively implemented through the API itself, Istio have decided
well, if this is the way forward for kubernetes, and eventually at
some point all Kubernetes clusters will have the Gateway API installed out of the
box. Because it is an upstream API like ingress
is, then we might just leverage it, we might just use it
to be able to implement this. And so as of today, there is Istio
and Linkerd. Both actually support the gateway API. So you can
just use the Gateway API to define all your route
and your routed rules management instead of
using the built in crds of Istio or Linkerd.
And there are more to come down the path. That's it.
I hope this was useful as a basic introduction to Mitmesh.
I know I talk a lot and I talk very fast, so maybe you can
just go back and slow it down a little bit in the slides.
There are a bunch of links in the show, notes to material
to read and I hope that was useful.
Don't follow to don't forget to reach
out to me on Twitter if you need any, have any questions, or if
you need any help and subscribe to the podcast. Thank you.