Transcript
This transcript was autogenerated. To make changes, submit a PR.
My name is Yuri Besonov, and I'm a Partner Solution Architect at AWS.
I work with different AWS customers and partners and help
them to build solutions on AWS.
One of the areas where I'm involved more is containers, especially Kubernetes.
Kubernetes on AWS is delivered by Amazon Elastic Kubernetes
Service, in short, Amazon EKS.
And today, I will share with you how we can build and scale production
grade Kubernetes multi cluster environment using GitOps on EKS.
But before we go multi clusters, let's start from the very beginning,
from one cluster and define why do we need cluster management at all.
We generally see customers with a Git source control system, some kind
of infrastructure as a code tooling like Terraform, CloudFormation,
CDK, Pulumi, some infrastructure as a code state like S3 bucket
for Terraform, and then resources.
And EKS cluster.
In the pipeline, customers usually do some checks on the source code, they
build an application, then run unit tests, enforce policies, and after that they
need to deploy application in the cluster.
And as they grow, they may add additional environments, like for
development, staging, for test purposes, or for different projects.
It is still manageable.
If growth continues, this can become unmanageable.
To many versions, to many clusters, to many pipelines to track, which
cluster is up to date, which cluster has which version, which needs to be
updated this quarter or next quarter.
And this is even without taking into account what is inside the clusters.
What are the applications, what are their update policies.
And it is uncommon, not uncommon, It's not uncommon to see so many
pipelines in an organization.
Why do we have so many clusters?
Because we have different light of businesses, we have different
applications, and different workloads.
We could have back end application system, web application for front
end, or even state workloads.
Workloads to host databases or streaming workloads.
There are plenty of legacy applications which are running in the clusters and
also up to date modern applications like artificial intelligence,
machine learning to work with machine learning and training and inference.
And besides deploying the clusters, we also need to think how teams will access
those clusters and will share resources.
How we will configure each clusters with add ons, maybe with different
configuration for developing and prod.
For example, different number of replicas in developing and prodding
environment, and also how to deploy teams.
different workloads to different clusters.
So there are plenty of challenges when you need to work with containers,
with EKS, with Kubernetes at scale.
And of course, our pipeline could help us to manage the add on,
deploy the infrastructure, and enforce our compliance and security.
It is possible with pipeline, and in this case, the actual model will be push model.
We trigger the deployment manually each time when we decide that we need to
deploy a new version of an application.
But what happens in case of some issues in the chain?
Or what will be the state of truth?
If there are some problems arise, do we consider git as an initial state of truth?
Do we consider terraform state file as a state of truth?
Or do we consider pipeline itself as a state of truth?
Or maybe it's EKS etcd database is a state of truth?
With this setup, environment cannot match what your intention was.
It is not source of truth anymore, and it's more, I would say, source of hope.
And with this architecture, you need a lot of auditing and compliance mechanism.
And what about if we need to integrate new services?
In this case, we need to, for example, we would like to use generative
AI services like Amazon Bedrock.
In this case, we need product manager for pipelines who need to integrate this.
And if we have many business units that they can be overwhelmed and maybe
new services, will never be deployed.
If we rely on GitHub tooling for fleet management, we can have a process that
will constantly try to reconcile the source of truth in the Git repository
with the actual state of the cluster.
So we can, and we should solve those problems which we had with the push model.
In this case, Git becomes a source of truth instead of source of hoax.
Hope, and it will enable reproducible automated deployments, cluster
management, and monitoring.
So we can use GitOps and various tooling to solve our problems with deployment.
And we also use a principle of GitOps to achieve it.
Those principles are declarative.
We define what do we want to achieve, not how do we want to achieve.
It's reduce complexity.
The second principle is versions.
All the configuration are in Git, versionable and immutable,
and it enhance auditability.
We know who did what and when.
We also use pull model, whereas an agent in the cluster which pull desired state
from the git instead of pushing this state to the cluster from the pipeline.
And also, we have an agent which is continuously reconciles it to the cluster.
Agents know how to translate declarative configuration in the git repository
to the real state of the cluster.
And do it constantly so after each change we always have the consistent environment.
And let's now go deeper and talk how we deploy the clusters.
Most of the customers are using many of the customers using
Terraform to deploy EKS clusters.
We have a nice accelerator called EKS Blueprints that can help you to achieve
your deployment goals quickly by allowing you to rely on maintaining and best
practices pattern of EKS deployment.
It can bring you some patterns like fully private cluster.
Clusters which use APV6 with Node Group or Carpenter for
analytics or machine learning.
For various purposes, we have various patterns already developed for you, which
you can use in your own environment.
Let's see the process to deploy an add on, like a load balancer.
Because the cluster itself, it's just compute environment.
And in order to use it, you need various components, like a various add on.
For AWS load balancer, for example in order to make it work, we will need a
role and a policy which gives a controller to possibility to access application AWS
resources, such as AWS load balancer.
The role will reference a pro policy with appropriate right.
If we decide to addon to install the addon Helm chart with Terraform, we can
configure it with previously created role.
Then Terraform installs the configured Helm chart into the
cluster, and the service account can reference the proper AM role.
Then it's again push model from Terraform.
We still hit the problem of of which Terraform state is a state of truth.
And Terraform apply, it's only applied when we manually when
we'll be manually triggered.
Can we do better?
Of course, we can do.
Can we use, for example, Git approach and Argo CD?
We can ask Argo CD to talk to Kubernetes and do the Helm install on the addons
via creation of application object that reference the Helm chart repo.
But then, we still need a way to provide it with appropriate configuration.
In this case, the AM role for load balancer controller.
With Argo CD we there is a notion of application set.
Application set allow us to dynamically create or generate
Argo CD application objects.
Application set are able to read a secret in the cluster and apply its configuration
to some of the labels of the secret.
But first, we need a way for our Teraform to create the secrets and then the
specific annotation for that secret.
In this case, Teraform creates a load balancer controller IAM role, reference
this role as annotation in the secret and create the secret in Kubernetes.
This way, Argo CD should have all the necessary materials
to create an application and install properly the add on.
Once Kubernetes has the secret configured, Argo CD can create an application for its
application set manifest with appropriate configuration and install the add on.
So we have complete cycle.
To recap, Terraform creates a role.
Terraform creates the Kubernetes secrets with appropriate annotation for this role.
Then Argo CD creates the application from application set.
This is what we call the GitHub bridge.
It is fully integrated with EKS blueprints and you can rely on this pattern to
keep your Terraform infrastructure as to keep your Terraform infrastructure
as code management but shift the Kubernetes deployment part to Argo CD.
So in this case, Terraform will be responsible for creation of the clusters
and creation of the secrets, but after that Argo CD will take over and
deploy all the workload and all the necessary add on using application set.
This can be used for add ons, but also for workloads when your workloads
can be configured dynamically from the metadata created by Terraform.
From there, we have different options for managing Argo CD, our clusters and GitHub
repository for add ons and workload.
We can have a different cluster pointing to the same Git add ons repository, where
we will store the generic application set.
Each application set can then be transformed by Argo CD
in different applications.
which can have different values depending on the target cluster we are running on.
So you have a set of application sets which are acting like a template and
after that you could have specific parameters for specific clusters
which can be generated on the fly.
We can also use, in this case, centralized Argo CD instance in a hub cluster that
will be managed that that will be able to manage application deployment in different
target cluster from a centralized place.
This is what we call a hub and spoke model.
When you have a hub management cluster and you have workloads cluster for
different environment to different projects to let's say which will be
controlled by one instance of Argo CD.
We can leverage Terraform to create our hub and spoke clusters with
a simple hub and spoke topology.
We can also leverage Terraform workspaces to manage a different Terraform
configuration for our different cluster but reusing the same Terraform code.
With this setup, we need many secrets created in the hub cluster.
One secret for each of that.
Target cluster, like a hub cluster dev clusters staging, or production clusters.
Argo CD will be able to leverage those different secrets when we need
to communicate with each cluster.
So Argo CD will help necessary, will have necessary credentials to access workloads
cluster from the centralized hub cluster.
Application sets stored in Git will generate different
applications in the HUB cluster.
We can also specify different add ons versions and other configurations by
matching labels on the target clusters.
Within the single repository, you can still control different
versions of the HUB cluster.
to be deployed on different cluster and centralized in one location.
So you have a benefits of centralization of configuration, but after that
flexibility to have different different parameters, different specific
parameters for various clusters.
Clusters, then you can scale this with many other clusters.
So you can have several clusters for staging or for example, complete
set of clusters like dev staging and production for various project
or various lines of business.
As I said, many of our customers are using Terraform and e Ks blueprints for
Terraform to create their infrastructure.
While Terraform is really powerful tool for Token to AWS, we have
seen several issues when letting Terraform manage Kubernetes resources
inside the clusters, like a Don and workloads and what we just saw.
We can delegate some of the Kubernetes stuff from Terraform to Argo CD using.
GitOps bridge.
Terraform will still create resources, AWS resources, and Kubernetes secrets
with with metadata indentation.
From there, Argo CD is able to configure and install the Kubernetes
add on, like external DDS, GitTip, or SearchSecret to one or many clusters.
With the GitOps approach, we could also manage the teams.
We can create with ArgoCD namespaces.
We could enforce role basis access control and policies and everything
using GitOps and using configuration, which is stored in ArgoCD.
Some customers are even able to do one step faster.
further, for example, and let Argo CD sends to tooling like ACK or cross
plane to also create necessary AWS resources from VPC to EKS clusters.
In this case, we don't talk about Kubernetes as container orchestration
system, but as a platform framework.
We are building a fabric for people To bring different pieces of
infrastructure together using native constructs like mutating admissions,
Hema validation admissions, and all those benefits of Kubernetes API.
With that, with the help of GitOps, Kubernetes API, we could be able
to deploy not only workloads to the clusters, but also the clusters and
all the necessary resources which you require in AWS, for example, for your
application to run, like SRE buckets, databases, and so on and so forth.
Of course In these clusters, we will need some workloads.
For that workloads, we will need configuration management for
different applications, for different versions of those applications.
We also can use one Git repository to store configuration
for different applications.
Like in this case, for example, as you can see, you have some UI application,
you have some database application.
And if needed, you can also differentiate the configuration needed
for different clusters or environment of clusters in the Git repository.
So you may have different values for your development clusters and
and for example, for production clusters, of course, you will have
different, for example, parameters for secrets or access to the database.
But also you could have a various number of replicas or other
parameters, which will be different from environment to environment.
RGCD could reconcile the appropriate directories.
The directory is dependent on the target clusters, so you could
have a configuration of specific cluster in specific directory.
With this, you can create different configurations like version, scale, and
optimize cost for compute and storage.
As we we saw, you may also choose to create additional resources with
GitOps using manifest and add ons for AWS controllers for Kubernetes
or cross plane, for example, to create Amazon RDS databases
associated with specific application.
So you could have a configuration for application, but also for
resources which are necessary for this application in AWS cloud.
And of course, any talk.
cannot be complete without some mentioning of security.
Security is top priority for AWS and clusters are not exceptions from it.
You, or you would want to store the secret materials for your
application, for example, a private SSH key, keys or secrets in secure
location like AWS Secret Manager.
Then the controller like external secrets operator would create and
update the secret bases on the material from AWS Secret Manager.
When external secret operator creates the Kubernetes secrets with git credentials.
Or and URL hostname, it can be also validated by Kiverna, by policy agent.
Kiverna can have a policy that only allows credentials for Argo CD to be
type, for example, type of SSH only.
And it also could check if those credentials are coming only from
specific Git repository or Git server or from specific region.
It could be could be quite secure.
And Argo CD, of course, with Argo CD another let's say tip you should never
access remote EKS clusters using service account tokens that never expire.
You should never use those tokens.
Argo CD controllers like application controller, API server, or report server
should always be configured to use IAM role using IRCA or EKS pod identity.
When adding another remote cluster, specify the IAM role
that Argo CD should assume when connecting to the more cluster.
This also could be stored in the Kubernetes secret.
A Kibernet policy can be configured to ensure that your cluster secret
created only contains IAM role to assume and never, uh, KubeConfig token.
Of course, you can replicate this, the same approach with many clusters
and use it to access not only for example, staging clusters, but also
production clusters and any other required clusters in a secured way.
To sum up, AKS blueprints.
and GitOps bridge give you a possibility to create and manage multi cluster
environments with security and at scale using GitOps principles.
You can check those QR codes and the links if you want to dive deeper, get additional
information, or even you can try.
hands on in your own on AWS account the Argo CD on EKS workshops,
which we prepared for you.
With that, I would like to say thank you.
I hope you enjoyed the session and please feel free to reach out
on LinkedIn and have a discussion about platform engineering,
containers application modernization.
I would really appreciate that.
Also, if you spend one minute and answer on a short survey on the
session, your feedback is very valuable.
Thank you.