Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Andre and today I'm going to tell you about two
different approaches of managing Kubernetes clusters. About these
OpenStack Magnum and about the cluster API. But before
that, let me introduce myself and introduce my
company. So name is Andrei Novoselov. I work
at these Gcore company and I'm a system engineer
team lead. Juicor has a lots of products such as can
DNS cloud and a lot of more.
But I work at the Gcore cloud. So I'll tell you about the Gcore cloud.
We have more than 20 locations around the globe where we
are provided the public cloud service. So users can
use some basic cloud services such as
virtual machines, load balancers,
bare metal servers and firewalls,
or more complicated platform services such as
managed kubernetes as a service, function as a
service, logging as a service, basically anything
as a service is in my team's responsibility zone.
But today we'll talk about the managed kubernetes as a service and about the OpenStack
Magnum and the cluster API. And we'll start with OpenStack
Magnum. So what does Magnum do?
It orchestrates container clusters and it supports
two technologies, dockers form and kubernetes.
And we'll talk about kubernetes. But Magnum itself cannot configure
the virtual machines and it uses another Openstack
component which is called OpenStack heat. And heat
creates cloud init config for the Kubernetes nodes. And it also updates
application versions and configuration on the Kubernetes node.
So let's take a look at the Magnum architecture.
First of all, Magnum has an API and it's an HTTP
API. It gets some requests,
and basically all those requests are some tasks and tasks.
For example, create a cluster, update these cluster, delete the cluster.
So Magnum API puts those tasks into the Rabbitm queue,
and Magnum conductor gets those tasks from the rabbitm queue and
makes them done. And it's a pretty common architecture
for the OpenStack service. So an API
RabbitMQ and some engine and
heat is pretty same. It also has heat API
RabbitMQ to handle the tasks from the
API to the engine and heat engine.
It also has a heat agent. Heat agent runs
on every virtual machine configured with heat,
and it updates
the application version and application configuration if
needed. So what's the
limitations of this approach? Well,
first of all, there is no control plane isolation from the
user. So the user gets the full
admin access to the Kubernetes cluster and he can do whatever
he wants with the control plane components. And that's not how you
do the managed Kubernetes service. And there's
one more minor thing. It's an OpenStack API and
we are not providing users access to the OpenStack API. So we
have to hide this API behind the Gcore cloud
API. But it's not a big deal. Let's talk about the control plane isolation.
Here's a scheme of Gcore Magnum based managed
Kubernetes service architecture on the cluster level.
So managed Kubernetes cluster is always inside
the client private network and client can select which
networks he prefers. And let's
talk about the control plane. Control plane is free
virtual machines. And on every virtual machines we have
all the control plane components, Kubernetes, control plane components such
as EDCD, Kubernetes API control, managed Kedola,
kubedns and so on and so forth. All of
them are Portman containers controlled by the system
D,
etCD, Kubernetes and control and
Kube API have these ports
exposed outside of these virtual machines. And we
have load balancers. So there's free virtual machines, free TCD
replica, free Kube APIs and we have cloud load
balancers to
hide all these replicas behind these load balancers. And we also
have a firewall which only allows access to
those free exposed services.
And none of that is visible for
the client. So client cannot see the firewall,
the control plane nodes and the load
balancers. Client can have as much
worker nodes as he want. And on the worker node,
Kubernetes is also inside the podman container
which is controlled by the system D.
And for the Kubernetes workload,
the container engine is docker itself.
And if kubernetes or any other
pod inside the kubernetes need to access Kube API,
etcd or kubedns,
well, you have to go through the cloud load balancers
and through the firewall inside the master nodes,
inside the control plane nodes.
And that's it.
And what's the pros and cons of OpenStack Magnum?
Well, the big plus is that it is
OpenStack and it has a great community which
is developing Magnum and supports it and adds new
features and fixes bugs for us.
And I guess that's it.
That's the biggest plus. And let's take a look
at the downsides. Well, first of all,
it is extra RPS for the cloud API.
Like I said, we had to hide the
Magnum behind and these heat APIs behind the cloud
API. So now all the requests to those APIs
go through the cloud API and that's a lot of requests if you have a
lot of clusters, but that's not a big deal.
Well, the second thing is that this
constructor is really fragile. So what?
I mean if something went wrong while the
cluster was creating or updating,
the Magnum cluster goes to the error state
and there is no way using the OpenStack
Cli to make it alive again. So basically
the debug looks like this. You have to find the reason.
For example, OpenStack could not create a load balancer
for the ETCD or for the cloud API while was creating these
cluster. So cluster went to the
failed state while it was creating. So let's
say you fix it. The original
reason, I don't know, you restarted the Octavia. Maybe the problem
was in the rabbitm queue, but now it's fixed and
load balancing can be created and now you have to
log in into the production MariaDB and update
some roles for this cluster and set.
Like now this cluster is active,
not in an error state, and the same for the heat.
And if it's
just one line for the cluster inside the mammum DB,
the heat has a little complicated
structure of heat templates and heat stacks and
you have to find these gained stack and update it
well. And it's a lot of update operations on the production database,
which is not what you want to do
on daily basis.
And like I said, if Octavia
was in some error state
or maybe rabbit MQ, it happens
from time to time. So you have to not only
to fix that OpenStack component, but also to fix
the Magnum clusters if they were creating while Octavia
or rabbit were not available.
The other thing is observability.
So let's say you want to know the state of all
the Cube API containers in all your clusters. Well,
you have to log in via Ssh to the control
plane nodes and to find out is everything okay
or not. Or you may create can
Prometheus exporter if you're using Prometheus to monitor these clusters
or do something with it, but from the box there's no
such thing as health
check for all the system D units on the master cluster.
And if you want to be sure that everything is okay, you have to log
in via Ssh to those nodes and there is no bare
metal nodes. And that's a huge minus.
And one more thing, here's the
compatibility metrics. On the left column you
can see the OpenStack version and
on the second column you can see the supported Kubernetes
version. So if you are using
yoga, which is just one year old,
the highest Kubernetes version for you is 1.23.
And here's the Kubernetes supported
versions for now, and it starts with 1.24
and 1.25 and 1.26 is the highest version.
But in April we'll have a release I hope, and this
will change and the supported versions will be 1.25,
1.26 and 1.27. So the OpenStack
is about two Kubernetes version behind the
Kubernetes. So I guess it's a big deal.
And we did not wish to
add support to those Kubernetes versions to the Magnum
and we decided to take a look at
other tools for
managing the lifecycle of the Kubernetes cluster.
And there's a project which is called Cluster API
and it's a Kubernetes project. And what's the goal
of this project? Well first of all it manages the lifecycle
and by that I mean create, scale, upgrade and destroy all the Kubernetes cluster
using a declarative API. It works in different environments,
both on premises and in the cloud, and it defines common
operations and provides the default implementations and provided the ability
to swap out implementations for the alternative ones.
So if you don't like it, if you don't like something in
cluster API you can do it your way
and cluster API can be extended to support
any infrastructure and.
Sounds great. So what
is it made of? What are the main components of the cluster
API? Well first of all it's a controller manager,
a bootstrap controller manager, control plane controller manager and
infrastructure provider.
Let's talk about it a little. So cluster
API, basically it's four Kubernetes
controllers. So it runs inside the Kubernetes. So you need a
Kubernetes cluster to create another Kubernetes cluster.
And it operates with Kubernetes
objects with the custom resources. So these
controllers, these control those custom resources
and reconciliate them and what
do we have out of the box? Bootstrap control. Out of
the box supports KubeadM microcades,
Taylors and EKs control plane
control. Out of the box supports KubeadM
microcades, talos and the project called
Nested.
And there's a bunch of infrastructure controllers for
AWS Asia Wesphere metal free.
Well lots of, lot of, lot of providers,
but there is no jquery provider yet.
So let's
talk a little bit more about how
it works. So we
have those binaries
like controller manager, bootstrap controller manager and
control plane controller manager. They handle the whole lifecycle
of the Kubernetes cluster and they know nothing about the infrastructure
behind them. So that's why you need an infrastructure provider.
Infrastructure provided is a thing that allows
cluster API to create some basic infrastructure
objects such as load balancers,
virtual machines,
firewalls, server groups, et cetera,
et cetera. And the cluster API says okay,
if your infrastructure provided allows me to
create a virtual machines, load balancers and firewall these,
I can do anything on that cloud.
So let's take a look at some yaml
and let's take a look at the cluster object.
And there's obviously some cluster
configuration inside these pack. But I
wanted to point your attention at the control plane ref
and infrastructure ref. So we have a cluster and
it has infrastructure reference to the GcoRe
cluster object which is the implementation
of this specific cluster on the Gcore provider.
And if we take a look at the control plane we'll
see pretty same thing.
It says okay, we need free replicas of control plane
nodes with some Kubernetes version and
it has a reference to the infrastructure provider
to the kind Gcore machine template. So we have
a template for the Gcore specific infrastructure
provider for the machine to create control
plane node and we need three of them. So control plane has
a reference to the Gcore machine template on
these infrastructure and the same for the machine deployment. Machine deployment is an
object which describes the worker group of
some Kubernetes cluster. So this one says okay, we need six
workers and to create them please
use the infrastructure reference to the Jacob machine with this
name and that's it. So what
I tried to say that this is how cluster
API works. It has some basic objects
which do not change from one cloud provider to another, and the cloud provider
will always give an infrastructure reference.
Infrastructure provider, my bad. So cluster
is always referred to the Gcore cluster. Control plane
and machine deployment. Both are created from the Gcore machine templates.
And one more thing, meshing. Meshing is an
object which describes control
plane node or virtual machine or
worker virtual machine or bare metal machine.
So what are the limitations of the
cluster API? Out of the box we have
Kubeadm bootstrap provider which is more
or less suitable for us, cube ADM control plane
provider. So that basically means that the
Kubeadm will be used to bootstrap the control plane
and to join workers to that control plane,
and has a provider for the cluster wrapping
which uses Kubeadm and there is no infrastructure provider for
the Gcore infrastructure. And what else
limitations can we see? There's also no control plane isolation
from the user. You need a Kubernetes cluster to
create another Kubernetes cluster and there is no
jiggle provider. And we still have free virtual machines for
the control plane if we use Kubeadm.
So we decided to do it our way
and this is our implementation of these cluster API.
We decided not to use free virtual machines
for the control plane.
We already have a Kubernetes cluster to handle
cluster API and we decided to
put the control plane containers
into these cluster as well. So we have
a namespace for each client cluster where
we have some custom resources describing cluster juicy cluster,
some control planes, some machine deployments and
we also have the control plane pods inside
it. So the control plane components
are pods inside the service Kubernetes cluster and
all the cluster API objects are at the same namespace as
the control plane pods. And we have no virtual
machines for the control plane.
And so to do that we had to create
Gcore bootstrap controller, Gcore control plane
controller and infrastructure provider
which is called CAPGC. And thanks for our
colleagues from these working for help with that. Thank you guys.
And we has to do two more things. We had to create open ven
controller and we use Argo CD and we'll talk about it a
little bit later. So let's
take a look at the Gcore cluster API based managed Kate
service architecture. So now
you can see that there are some difference.
In client private network there is
no master nodes anymore, only worker nodes.
And in Gcore private network we have a Gcore
service Kubernetes cluster where in some namespace we
keep all the custom
resources of the cluster API such as like I
said, cluster machine deployment, control plane and
so on.
And also we have control plane
binaries such as ETCD, Kube API controller, manager,
scheduler and some more.
So since because of the
service cluster is in some Jacob private cluster private network
and the worker nodes are in the client's
private network, there's no directly connectivity,
network connectivity. So what did we do? We used a cloud
load balancer to expose the Cube API to these public
ip address, so all these components on the worker nodes
can access the Kube API via the Internet.
And that's it. That's simple. But what
about the reverse connectivity? What if someone would like to do
to type the command Kubectl logs? And what
happens when you do that? Your Kubernetes API
works as proxy to the Kubernetes
and shows the log from the Kubelet so Kubelet can get the logs
from the node. What if someone would like
to use a port forward? It's pretty same.
Kube API works as a proxy to
some service in the Kubernetes. And Cube API
has no way to access Kubelet because the Kubelet is inside
the client private network and is
not accessible. So and
the admission web hooks. So what if Cube API
would have to validate some custom resource before putting it to the ATCD?
It will need to access some pod
inside the client clusters and there's no way to do it.
So VPN we
decided to do it this way.
On these client side, we put
a pod with OpenVPN server and we expose it
with cloud load balancer to the Internet
in the control plane in
Kube API port there are two containers. One of them is
Kube API and the other one is these OpenVPN client.
And the OpenVPN client connects to the OpenVPN
server port through the cloud load balancer and
it gets the routes to the node,
network, port, network and service network. And now right
inside the port of the Kube API, we have some routes
to all those networks. And Kube API can access kubelets on
the node networks, services and pods inside the Kubernetes cluster.
But what if client deletes something? What if client deletes
an openupian server Kaliko KQ proxy
and we have an argo CD for that.
So inside our service cluster we
have an argo CD which have apps
for all infrastructure which should be controlled
by Jacob inside the client's
worker nodes such as Kubeproxy, Kube DNS, Calico and
OpenVPN service for the reverse connectivity.
So argo CD manage
these renders
and manifests for all that infrastructure and puts them directly into
the Kube API which is located inside
the Gcore service cluster. And then the Kubelet accesses
the Kube API and finds out what
ports should be run on the node. So if clients deletes
anything using the Kubectl,
the Argo CD will recreate it.
And one Gcore thing about the observability.
So comparing to
the OpenStack magnum, well, we could
not find any suitable tool
to find out. Is everything okay with cluster? We have it
just out of the box. So we can use a command
like Cubectl, get cluster minus a capital
a and we'll get all the clusters which were provided,
which were provisioned using the
cluster API in our service cluster. So we can see that we
have 1234 clusters and all of them are
ready and these version is 1.24 point ten.
We also can get information about all the
control planes so the output is pretty same,
but it tells us everything about the control planes, not about
all costs. So we can see that all control planes are
ready. So EDCD, Cube,
API, Kubescadal, controller, manager, all of them are
up and running. And here's the version of the control plane.
We also can take a look at the worker
pools which are called machine deployments in the cluster API
and we can see that in this namespace,
the first one there should be free replicas,
so free worker nodes and all these are ready and all these are
updated and they are running and they're two days old
and the version is 1.24 point ten and
we can get information about any cluster we're interested just with single
Kubectl comment or about all of them and
we can get a machine. So Kubectl get machine
will minus capital a will bring us
all the virtual machines or biometric
machines which are controlled by cluster API inside
this cluster and we even can see the
Openstack id of these virtual machines and how long
it exists and the Kubernetes version on
it. So it's really great.
And that's it.
We moved from the cluster API to the Magnum and
what we got from it we got a great speed up.
All the control plane is just
a bunch of ports and it's really much easier
to spin up some ports than to create a virtual machines.
We got the easy updates and that's much faster.
Obviously we got these easy updates. We have
1.24, 1.25 and 1.26
version of the Kubernetes out of the box and
we can update all the infrastructure inside the client's
cluster with our Argo CD applications and
it's really easy as well. We got the reconciliation loop.
So if the heat tried to create for
example a load balancer and failed, then you are the one who have
to fix it. If cluster API tries to create
a virtual machine, a load balancer or whatever, and fails,
it just was a little and tries again and tries again and tries
again until everything is done well
and it's really great. So after you fixed
your Octavia or Nova or RabbitmQ inside the
OpenStack, you do not need to reconfigure all the
cluster in the region. Cluster API will just do another try and
succeed and we'll be happy with that.
We got the biometal nodes with our
powered by intel and it's a
great feature and a lot of our clients wanted biometal worker
nodes and now we have it because we created an infrastructure provider
which can use biometal nodes as worker nodes
for the Kubernetes we got that transparency.
So if you want to look at these specific machine or
at the machine deployment or at cluster using the
Kube native way with the Kubectl,
you got it. And we
have no control plane nodes no more.
So there's no extra cost for us for managing control plane
nodes. And I guess
that's it. Thank you for your attention and
feel free to contact me if you have any questions.