Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I am Karan and welcome to my talk on the art of complex
deployments strategies in Kubernetes using Argo rollouts.
In this talk you can expect to learn about the different deployment strategies
that are out there, their pros and cons and their
use cases as well. Then we will be taking a look at Argo rollouts,
how to set it up in a Kubernetes cluster, and how we can create
the blue green deployment strategy along with the canary these deployment
strategy. After this we'll be taking a look at the best practices for
setting up the different deployment strategies along with the common challenges
that are faced while setting these strategies up. So I hope you
are excited for this talk. Let's get into it before getting into the
actual talk, just a brief introduction about myself. I am
currently working as a software engineer at this company called Storylane. It's a
Y combinator backed company and we specialize in creating
demo automation software. Prior to working for Storylane,
I worked for Hackerrank, specifically the labs team at Hackerrank where we
were responsible for building short product ideas for the company.
I primarily specialize in the DevOps and the backend domain,
and I work with technologies like Kubernetes,
Docker, Helm, Golang Node, JS, Ruby on Rails,
et cetera on a day to day basis. I also have a
technical blog, you can check it out on my website,
karanjatan.com blog. So let's
not waste any more time and let's jump into what are zero downtime
deployment strategies? So this is a brief overview of
how a user interacts with an application through a load balancer,
and how we ideally want to deploy a new release of the
application without affecting the user experience and
without causing any downtimes in the deployment.
So why is zero downtime necessary? Of course,
if there is any downtime in the application, it can affect the revenue of the
business. It can also affect the user experience. So there can be cases where multiple
versions of an application are deployed and some requests from
the user are going to the older version, some are going through the newer version,
so there can be discrepancy in the user experience.
Then it can also hamper the customer trust in
the application and the company as well. And also in a
lot of the cases there are SLA applications by the company to its
customers. So these are a few reasons why zero
downtime deployments are necessary. Before diving
into the different deployment strategies that are out there, first,
let's take a look at the basic architecture of a application running
on a Kubernetes cluster. So when a user is
interacting with your application, it does that through the DNS,
which routes the traffic to the load balancer, which then routes
it to the ingress of Kubernetes and that routes it to
the service responsible for the application in Kubernetes. And the service
then finally routes the traffic to the deployment.
We also have our cluster admin which is responsible for maintaining
the deployments, the service and the ingress of the application.
So this is just a brief overview of the Kubernetes architecture
as a prerequisite for the later stages,
specifically the demo part of the video. So Kubernetes
out of the box offers us to use two different deployment strategies.
One is the secrets deployments strategies and another is the rolling
deployment strategy. So first let's take a look at the recreate deployments
strategy. So this is a brief overview of how
the recreate deployment strategy works. The process goes
this way. First the current
deployment is deleted, then a new deployment is created, and then
the user is redirected to the new deployment. So you
can see a problem over here that while the deployment is happening,
while the new deployment is being created, there is using to be a
downtime and the user would be able to interact with the application during the deployment.
So here are the pros and cons of this deployment strategy.
Let's go through the pros first, so it's easy to implement. It does
not take up any additional resources because first the older deployment is deleted,
then the new one is created. So essentially it keeps using the same amount of
resources always. Then it also helps prevent
data conflicts. So if there are two different deployments simultaneously
trying to connect to a same data source, so there may be some conflicts.
So this helps with that issue. It is also helpful
in architectures where you don't want multiple versions
of your applications running simultaneously. Then let's go through the cons
next. There is a downtime as we discussed, while the deployment is happening,
it is not suitable for high availability applications.
There is a slower rollout process because if, let's say
there are hundreds of pods and you're waiting for all of those pods to be
created, then it's going to be a time consuming process.
And there is also a risk of deployment failure because
the first thing that we do is delete the old deployment and then create the
new one. So if there is any issue in the new deployment,
then there is going to be a problem. So yeah, let's jump into the use
cases of this deployment strategy. So these are
the use cases of the recreate deployments strategy. It can be used in
non production environments where the applications is not required to be highly
available, and it can also be used in cases where two
different versions of the application cannot be writing to the same
data source like a single tenant database. So yeah,
let's jump into the rolling deployment strategy next.
So this is how the rolling deployment strategy looks like.
In contrast to the recreate deployment strategy, it does not delete the older
version of the application first. What it does is it creates the
new version of the application in a rolling or a gradual
way. So what it does is it will create some pods
of the new application, and until those new pods are completely
in the running state, it will not delete the older pods. So essentially it
will roll out the deployments in a gradual manner.
So here are the different use cases for the rolling deployment strategy.
This is a really good strategy for your DeV
environment or your staging environment. It is also a good
strategy if your application requires a gradual rollout process,
and it is also resource efficient because it
will not maintain multiple deployment strategies of your application, it will gradually roll out
the new release. So it is also resource efficient.
So now let's jump into some of the more complex deployment strategies.
So first, let's take a look at the Canary release deployments strategy
here. The deployments strategy asks
us to create a canary group, and that Canary Group
is going to be responsible for serving to a subset of
your traffic. Let's say 20% and 80% of the traffic
will go to your original deployments. So you can see that it
is sort of similar to the rolling deployment strategy, where the
rollout is going to be gradual in nature, but it is fundamentally
different than the rolling deployment strategy. Because over here we are
going to have full control over the canary group,
essentially the group which is going to be responsible for serving the subset of
your traffic. So let's take a look at the pros and cons of the Canary
release deployment strategy. Let's first go through the
pros. So this deployment strategy is really good at risk mitigation
because of the Canary group and the concept of serving to only a subset
of users. It also gives real world feedback
because of the fundamental process of the deployment.
Again, this is similar to the rolling deployment strategy where the rollout happens in
a gradual manner, but here we have full control over the rollout
process, and there is also an option for
a quick rollback. So if that Calgary group, which is serving to a subset of
users, showcases some issue in the deployment, then we can quickly roll back
to the old group, which was already serving to most of the traffic.
Now let's go through the cons of this strategy.
It is complex in routing, so setting up
the routing for this is a complex task. There is a monitoring
overhead and also there can be inconsistency in
user experience because of the Canary group concept.
And there is also a limited testing scope when the deployment is
happening. Here are the use cases for the Canary deployments strategy
if you want real world feedback. So this is a really good strategy
for you. It is also useful in performance sensitive
deployments because of the concept of the Canary group. And if
you also want continuous deployment environments where there is no downtime,
this is also a good use case for that. Now let's take
a look at the blue green deployment strategy. Personally, this is my favorite.
So in this strategy, you maintain essentially two different
deployment groups. One is the green deployment and one is the blue deployment.
So the green deployment is responsible for serving your real
world traffic. It is responsible for your production application.
And the blue deployment is essentially a copy of the
green deployment. So this is how the process goes when you create a new release.
First the application will be deployments to the blue environment.
And in the blue environment you can do your testing. You can see
how your application is behaving, and once everything is good to go,
you can swap the blue environment with the green environment.
So essentially blue now becomes green and green now becomes
blue. And green was already serving the old application,
so now it essentially becomes the blue deployment. So this is how the blue green
deployment works. Let's jump into its pros and cons.
There is essentially zero downtime when you use
the blue green deployment. You can immediately roll back to
your previous deployment. Because of the blue green strategy.
You can easily test your application as well in the blue environment.
So what usually companies do is that consider
the blue environment as the UAT or pre prod environment,
where you can do the testing before converting it to the green environment,
and you can also load test that particular environment easily.
Now, here are the cons. It is resource intensive.
You essentially have to maintain two copies of the same deployment.
It is complex in data management because of the same reason.
There are potentials for unused resources.
The blue environment can be sitting idle for long periods of
time. And there is also complexity involved in the configuration and the
routing of the deployments strategies. So here are the use
cases of the blue green deployment strategy. If you are looking
for a strategy that allows you to keep your application highly available,
a very critical production environment, then this is a really good
strategy to consider. And this is also a strategy which allows
you to do robust testing before releasing it
to the public. So the goal of this talk is to
not only talk about the different deployment strategies that are out there, but also
about how we can implement the blue green deployment strategy along
with the Canary release strategy in a Kubernetes environment. And for
that we will be using this tool called Argo rollouts.
So let's take a look at what this tool is. So Argo
rollouts is a cloud native open source tool built
for Kubernetes where we can create these complex deployment
strategies like the Bluegreen deployment strategy or the Kennethy release
deployment strategy. It also allows
us to easily roll back to the older deployments
and it also allows us to easily configure
the traffic routing in the Kubernetes environment.
So it gives us a very neat dashboard to do all
of this stuff. So we'll be taking a look at that dashboard and
how we can set up these strategies in the demo part of this
video. All right, it is about time.
Now let's get into the demo part of the video and let's see how we
can set up Argo rollouts in a Kubernetes environment.
All right, we are in the demo part of the session now. In this
section we will be going over the blue green deployments strategies along with
the canary release deployment strategy using Argo rollouts in a
Kubernetes environment. In order to set up the Kubernetes environment on
our local system, I will be using the Kubernetes in Docker
tool. All the prerequisites for this demo
session are going to be provided in a readme file and this is going
to be part of a GitHub project. The link to this GitHub project will be
provided at the end of this talk. So let's
jump straight into the terminal and get started. So the first
thing that we can do is create a Kubernetes cluster using kind on
our local system. We can do that by running the kind create cluster
command. And you can provide a name as well if you want. It's not
required. I'll be providing argo rollouts demo. I would be going ahead and running
this command since I have already created the cluster.
Once your kind cluster is created, what we
can do is we can ensure that we are on the correct Kubernetes context.
We can use the Kubectx tool for that. Whatever clusters
that are created using the Kydin tool, it is prepended using kind. So just
add kind and the name of your cluster. So yes,
once we have switched to the correct context, we can
go ahead and see the blue green deployment demo
jumping back into vs code again. I have gone
ahead and created these two service YaMl
files. One is the green service Yaml file and another is the blue service
Yaml file. Both of these services are going
to be pointing to the same app selector which is going to be Nginx and
the same target port. But just the name is going to be different. This is
going to be Nginx blue and this is going to be nginx green.
Another file that I have created is the Argo rollout Yaml file.
Over here. We can provide the active service as well
as the preview service. So active service is going to be the service through
which the live traffic is going to be routed and the preview
service is going to be the one which is going to act as essentially
the one which acts as a test bed for us.
And I have also disabled the auto promotion. So auto
promotion essentially is a concept of argo rollout,
where unless and until we go ahead and click
the promote button in the dashboard or run the promote command through the CLI,
it will not go ahead and scale down the green deployment and
switch the blue and the green replica sets.
So during this time we can
do whatever testing that we want, load testing, test the application's behavior
and test the new release changes. So initially
we are going to be starting off with the NgInX and a particular
release docker tag. And in order to test the change
between the two deployments, the green and the blue deployment, I'm just
going to make one minor change and I'm going to change this to
latest. So that's how we are going to see the deployment in action.
So yeah, let's jump back into the terminal and apply all
these YAML files. So before running the YAML files,
I just want to show everything that is present in the default namespace so we
can get that by running this command kgetol. And just for context,
I have set an alias for the Kubectl CLA
tool with K. So this is essentially running Kubectl under
the hood. So as you can see, there is nothing but the default Kubernetes
service running that comes out of the box when you run the Kubernetes control
plane. So now we can go ahead and start applying our YAML
file. So first let's apply the green service
Yaml file. Then we can apply the blue
service YaMl file. And finally, let's apply the rollout
file as well. Now if I go ahead and get all
in the you can see that there are two pods running
because that is what we specified in the rollout TML file.
Two services have been created, Nginx Blue and NginX green,
and the replica set for the blue
deployment essentially, which is running as of now.
I can also show the rollouts that have been created.
So this is a CRD, the Argo rollouts project.
So NginX rollout has been in
order to view the rollout that we just created,
we can do that on the rollouts dashboard. And in
order to spin that up, we can run the following command Kubectl argo rollouts dashboard
keep in mind this command will only work if the Argo rollouts plugin has been
installed in the Kubernetes cluster. The installation guide for
installing the plugin is also present in the readme of the
GitHub project. So let's go ahead and run this command.
As you can see, the dashboard is now available at port 3100.
So let's jump into the browser and see our rollout in action. As you
can see, the rollout that we created is present over here, NginX rollout and it's
using the blue green strategy. So let's click on this. As you
can see, the image that we had provided is mentioned
here. It is stable and active. Two pods are running as of now.
So what I'll do is I'll jump back into vs code, change the
image tag to latest and let's see the blue green deployment
in action. So here I have made the one change
to latest and I'll apply this yaml
file and let's see how it behaves in the browser.
So as you can see, a new revision is being created
right now. And since we set the auto promotion to
false, what it is doing is right now it is maintaining
both the states, the green deployments as well as the blue deployment.
And once I click on the promote button over here,
then it will essentially swap the two between
the blue and the green and it will make the green deployment as the live
deployment, essentially converting it into blue. And the previous deployment,
the older deployment will be scaled down in 30 seconds. So you
can change the scale down time as well if you want. But since
we have disabled the auto promotion then we don't really need to change that
time because we can take however long we want before clicking this button.
So I'll go ahead and click on yes, so now as you can
see the active state changed. So essentially the swap between
blue and green happened. And now the older deployments has
a timer over here. So within 15 seconds or so it will be scaled
down. So this is how the blue green deployments
strategy works using argo rollouts. I hope this was
helpful. Now let's jump into the canary release
deployments strategy. For the canary release deployments
demo, I have gone ahead and created these two
YAML files. One is the service YAML file. It is a pretty
straightforward Kubernetes service that is targeting port 80,
and another file is the argo rollout YAML
file. This time the strategy specified is
the canary strategy. And over here
we are following the same methodology where the
Nginx image is provided. And later
we will be changing this tag to latest and we'll
see how this deployment behaves in real time
on the argo routes dashboard. So I have gone ahead and
applied the two YAMl files that I just showed you, and once we do
that, this is the dashboard that we get for the canary release deployment.
And as you can see, the dashboard is a bit different compared to the blue
green deployments strategies. There is one more panel over here which contains the
steps of the canary deployment. So essentially what happens
is that through each step, so each step, we can set a weight,
and after that we can set a pause. So a pause can either be indefinite
in nature. So in this case, we would have to go and manually promote the
deployment. Otherwise we can provide the pause in seconds so that
it automatically gets promoted. So since we have
five pods in our deployments, essentially what will happen is
that at each step. So 20%, 40%,
60%, one single pod will be added into our
canary group. So at 20%, one pod will be added,
because 20% of five is one. Then after we promote
manually through this pause section, then another
pod will be added at 40% and so on. You get the gist of it.
So now let's jump back into vs code, change the image
tag from this to latest and let's see this
in action. So as you can see, I have changed this to latest and
I'll apply this yaml file and let's see how it changes the deployment
behavior. So, as you can see, a new revision was
created with the tag of latest. And over here, it has been paused
on this step. So now what we can do is we can
test our deployment. Since the canary group is essentially 20%
of the deployment, we can test this particular small
subset. And once we are good to go, once we get the green light
we can promote manually by clicking over here. Are you sure?
So yes. So as soon as I clicked on yes,
another pod was added and now it is going to go through an
automatic pause of 10 seconds. So after 10 seconds it should add
another pod into the revision two. It did that.
Yeah. This is essentially how the canary release deployment strategy
works. And if you remember the YAMl file that we saw for this strategy,
we have complete control over the steps over here and we can have even
more steps after this. We can change the percentage, the weight,
the pause duration, and essentially we have complete control over
it. So yeah, this was the demo part of this
talk. I hope you enjoyed this. Now let's jump back
into the slides and go over the best practices for creating
zero deployments strategies. All right, so we are done
with the demo part of the session. Now let's get into the best practices of
zero downtime deployment strategies. The first one is that your
deployments strategy should allow you to rigorously test your applications in a live
environment. So as we saw that canary release and blue green deployment,
both of them allow us to do that. We also want real
time monitoring so that if there is issue during the
rollout or even after the rollout, we are aware about it.
Also, the deployment strategy should be able to handle
any issues during the rollouts so that if
there are any issues there is a graceful degradation, meaning that the entire
application does not go down, some part of the application goes down. And finally,
traffic control. Your deployment strategy should be able to also
handle the traffic control complexities in the live environment.
Here are some of the common challenges and pitfalls while implementing
zero downtime deployment strategies. One is inadequate rollback
procedure. So I have seen this multiple times where there are no rollback procedures
and when something goes down then it gets very scary and difficult to roll back
to the previous working deployment. Then overlooking
any dependency management is also a common pitfall
that I have seen insufficient load testing without
load testing the application. If we go ahead and deploy, that can also be a
very scary thing and can cause issues in the live environment.
Ignoring database migrations, this is something that is very
common. So if there is a migration in the newer version and
that migration was not applied in the older version, and if you are going ahead
with a blue green deployment, then your blue environment can go down or your green
environment can go down if the migration is executed in one or the other
environments. Neglecting user experience during rollouts
is also a very common challenge because as
you try to maintain two different states of your application there can be inconsistencies
when the user hits both of the environments
and finally complex monitoring configuration that
is also one of the common challenges while creating these complex
deployments strategies. So please be mindful of all of
these common pitfalls and challenges so that your entire process
and your life is a lot smoother. So we are done
with the session. Thank you. Thanks a lot for sticking till this point.
I hope that I was able to convey all of the topics in a manner
that was helpful to you. If you want to connect with me
you can find all of my links present on the screen right now. One easy
way would be to scan the QR code on the left. It will take you
to a page where all of my links are present. And also
I have made the project that we used in the demo part of this session
available on GitHub for you to access and use.
So yeah I hope that you enjoyed this session and thank you for having.