Abstract
Infrastructure as Code(IaC) has made managing infrastructure easier in a lot of ways, but there are many challenges that companies accept as the cost of adopting IaC especially when scaling. IaC is good at provisioning individual resources (or a few of them together) but engineering teams want an entire environment with various components like networking, platform (ec2/eks), database, s3 buckets, etc. to deploy and operate their applications.
To provision and tear down an entire environment, these teams have two options. They can either hand roll pipelines to manage individual resources and then manage complex dependencies between these resources within those pipelines or create a monolith IaC for the entire environment. These approaches are inefficient and slow down feature development and innovation. They also make replicating, visualizing & understanding environments difficult. What if there were a better way?
This talk digs into these challenges to try to better understand them and then look at how to resolve them. We will introduce Environment as Code (abstraction over IaC) that enables teams to provision & teardown entire Environments in an efficient way and promotes best practices like loosely coupled infrastructure resources.
Key Takeaways:
- Challenges scaling Infrastructure as Code
- What is Environment as Code?
- How Environment as Code can help resolve those challenges?
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello there. Today's topic is from infrastructure
as code to environment has code challenges, scaling ISE
and how to resolve them. This talk is based on my and
my team's experience working with and helping various companies
adopt infrastructure as code and the
challenges we have seen scaling infrastructure as code over
the years. I will also introduce environments
code which has helped us resolve those challenges. Some of
what you will hear today around environments code
is new and I would love to hear what you think about it,
answer any questions and discuss it further.
My name is Adarsh Shah. I am the founder and CEO at
Compuzest.
We all know what infrastructure has code is. It helps us automate
provisioning of infrastructure resources. It is one
of the key DevOps practices that enables teams to
deliver infrastructure rapidly and reliably.
Here's a typical evolution of your ISE. You probably
start with a very simple setup, a monolith ISE with
a single ISE run. If you are using terraform,
then that means a single state file. As you can see in the diagram
on the left, it has networking platform
EC two and S three bucket,
all in a single monolith ISE run.
As you scale, you start breaking the ISE into separate
smaller iscs. As you can see
in the diagram on the right, there is a separate networking IAC
and then platform EC Two, platform KDAs
and postgres that depend on
networking and then Kadas Addon that depends on
platform K has this talk is focused on
teams who have already broken down their infrastructure
as code into smaller runs or looking
to do that. If you have a very simple setup and can
just execute it as a single run, you won't
have most of the challenges we talk about today.
For the execution of your IAC, it probably starts
with running it on the engineer's machine.
But as you mature you have more members of the team that want
to run the IAC and want a more reliable
and stable execution environments. You create a pipeline
or use Gitops to execute the infrastructure
as code from a shared environment. This talk is focused
on teams who already use pipelines
or Gitops are looking to do
so.
Applications teams need an entire environment,
something like you see in the diagram, to deploy and operate
their applications. Just getting networking
platform K eight s or RDS database
on its own is not going to allow them to run their applications.
They need an entire environment, whatever that might
mean for the team that has these various
infrastructure resources and dependencies between them.
In this example environment, networking needs to be
provisioned first and then platform EC
two, platform K eight s and then
the k eight s addons. If you want to get an environment
like this using infrastructure as code,
here are your options. Create a
monolith ISE but we all know this is bad.
It creates a tight coupling and is not recommended unless,
of course, you have a very simple use case,
the state files become large and it becomes painful to maintain
them with monolith ISE.
Option two involves hand rolling pipelines using tools
like Jenkins, Circleci, et cetera to run
the ISE and manage complex dependencies in
the pipeline code. While your ISE is declarative
and idempotent, these pipelines are not.
You have to write a lot of custom code to provision an entire
environment.
Teardown must be supported and any failures
errors that impact the environment must be accounted
for as well. If there are failures or
errors. While execution, they usually get managed manually
by engineers. As you can tell,
these options are inefficient, costly, and typically requires
a dedicated team.
Here are some other challenges scaling ISE if you
want to follow principles like immutability for
your environments or make it easier to share best
practices implementation of environments across various
teams having a mechanism to easily replicate
environments is critical. Since the pipelines I mentioned
in the previous slide are not ideal for managing entire
environments, it becomes painful to replicate
them. Teams spend a lot of time writing custom
code to replicate environments.
Visualizing and understanding environments are challenging,
too. Teams also struggle to do that.
Trying to find that information by going directly to
the cloud provider's dashboard is even more confusing.
If these want to troubleshoot an issue,
share knowledge between teams, or make any changes
to existing environments, they need to go through a painful
and time consuming process. A lot of teams create
diagrams for their environments with various
infrastructure resources and how they are connected using
tools like Vizio or draw IO. But these
diagrams get out of date soon with real environments.
Instead of helping, they actually provide incorrect information
and can cause confusion over
a period of time due to human error or indirect
changes. Provisioned infrastructure drifts from the
desired state in code while with existing solutions,
like using a pipeline to execute ISe.
Drift can be directed since ISE is declarative,
but only when that pipeline executes the IAC next
time, and it will only find drifts within
that particular IAC. Teams should know
about the drift right away, and not just for individual infrastructure
resources or a few of them, but for the entire
environment and various component dependencies,
so they can remediate any issues as soon as
possible.
Now that we understand the challenges scaling infrastructure
has code, let's understand what environments
code is and how it helps resolve those challenges.
We can start by looking at a higher level from IAC
to environment as code environment S code is
an abstraction over infrastructure has code,
as you can see in from IAC to environments as code
declarative and executes and manages various
infrastructure as code components. Various ISE
components are responsible for provisioning infrastructure,
resources, etc. Is responsible
for executing infrastructure has code in the
right order.
If we use the Lego analogy,
infrastructure has code automates various Lego pieces that
are your individual infrastructure resources, or a
few of them together, while environments code automates
how these Lego pieces are connected to make up
a Lego toy your entire environment.
Here's a definition. I know it's long, but I think it's
important to go from, from, from IAC
to environments as code abstraction over infrastructure
as code that provides a declarative way of defining an
entire environment. It has a control plane that
manages the state of the environment, including relationships
between various resources, detects drift,
as well as enables reconciliation.
It also supports best practices like loose coupling,
item potency, immutability, etc.
For the entire environment,
etc. Allows teams to deliver entire environments
rapidly and reliably at scale.
Now let's dig deeper into provisioning from,
from, from from IAC to environment as
code. At the top, we define our environment
has code, which is declarative. We push it to source control
control plane that's associated with environments.
Code picks up that challenges. It manages the state
of the entire environment, including any dependencies between various
components, their statuses, et cetera.
And then it starts reconciling various infrastructure
components. These are infrastructure as code
pieces that have their own state.
So if you're using terraform like you have in this instance,
networking is actually terraform code.
And terraform manages the state of that networking
component. And then once networking is done, it provisions
platform kdas and postgres. So the control
plane that's associated with environments code manages
in what order these components run, and these after
that, the kades add ons runs. So as you can see,
the control plane is what that manages all of these various
pieces. But infrastructure's code,
terraform in this case, is actually responsible for provisioning
resources in your cloud provider.
And then for the tear down, it reverses the logic and
starts from the leaf node and then goes up the chain.
So now that we looked at what environment s code is, let's look
at the various attributes of environments. Code environments
code manages an entire environment, so it
should support defining that entire environment with
various infrastructure components in an easy
to understand format. It also supports specifying
various relationships between these components.
This diagram shows can example environments
code using the YAML custom format.
We use this for our product zlifecycle,
but it doesn't have to be a YAmL format.
Anything that you can use to specify the entire
environment, any format can be used has.
You can see on line 54. It allows you two
specify the type of infrastructure has code,
which is terraform in this case, and also that
this component depends on the networking components.
Environments code promotes loosely coupled ISE
components like you see in the diagram. It brings
these loosely coupled ISe components together
to give an entire environment like
infrastructure has code tools have state files
to capture the state of each ise run.
Environments code also has a state file that captures
the state of the entire environment, including the various
components and their relationships has. You can see in the diagram
it has operation and status that tells you about
the last run, of what type of operation it was
and what's the current state. It also tracks
component operation and status from the last run.
Item potency and immutability are key principles
for infrastructure as code. How do you apply these
two an entire environment?
Let's first understand what they mean. In case
you're not aware of these principles.
Idempotency means no matter how many times
you run your IAC or your code and
what your starting state is, you will end up with the same end
state. This simplifies the provisioning of infrastructure
and reduces the chances of inconsistent results.
So let's look at when you start at the top. Let's say
you want three vms. Your code provisions three
vms in non idempotent case,
if you reapply the challenges, you get three more
vms. So if you are expecting three vms,
you actually end up getting six instead.
On the item portent side though, if you reapply or change
it knows that you already have the three vms, so it
won't provision any new vms if you reapply the changes.
So you end up getting the three expected vms.
With EAC, you can achieve item potency for the entire
environment has it tracks state for the entire environment
and knows what the last operation was and
its state pipelines don't do that for you.
Configuration drift is a huge problem with infrastructure.
It occurs when over a period that there are
changes made to infrastructure that are not recorded and your
various environments drift from each other in
ways that are not easily reproducible.
This usually happens if you have a mutable infrastructure
that lives for a long time. These issues can
be resolved by using immutable infrastructure.
So as you can see on the left, if you
have version one of your code or your infrastructure
code deployed, you make some changes to your code
and in case of mutable infrastructure,
you apply the new version to the same infrastructure.
So you have long lived infrastructure. In case of
immutable infrastructure, when you
make a change, you actually provision a brand new set
of infrastructure with the new version, redirect traffic
to that new version, and then get rid of the old infrastructure.
Immutable infrastructure means instead of changing an
existing infrastructure, you replace it with new.
By provisioning new infrastructure every time you're
making sure it is reproducible and doesn't allow
for configuration drift over time. Why not apply this
principle? Two, the entire environment. You can do that using environment
has code. You can replace entire environments
by bringing brand new environments up instead of
changing existing ones. To achieve immutability.
As mentioned earlier, teams usually create diagrams manually
and then keep it updated as they change code.
You all know how that goes. The diagrams get out of date over
a period of time and provide misinformation and are
more harmful than from from from from from
IAC to environment. As code understand format, you can use
it to create a visual that helps teams understand
their, as well as other teams environments within their organization.
This screenshot is from our product zlifecycle
that is created using the environment has code
concept environments
code has a control plane that contains a reconciler
that observes whether the desired state and the current
state have drifted and then reconciles them.
You might be thinking this looks like Kubernetes controllers and
yes, it is based on the same concept. In fact you
can use Kubernetes controllers to achieve it. In these case
though, it probably makes sense to have an approval step that
shows the plan before bringing the actual status
back to desired state. And this might involve,
as this might involve destroying or recreating
infrastructure companies
and promoting changes across various environments becomes
a lot easier with environments from from from
from IAC to environment as code understand format and push to source control.
You can compare the code for various environments and
promote changes. You can also use Gitops
for the entire environment. Using environment
has code. So let's look at how the Gitops
flow would look like with ESC. We start
on the left, you define your environment code,
you can add push to a branch, you validate
if everything is valid, and create a pull request.
Someone from your team looks at the
PR, approves the PR, and then it eventually gets
merged to main. There is a control plane
that's associated with environments code that observes
the repository, picks up a change, and then starts the
reconciliation process.
Thanks everyone for attending my talk.
Please feel free to reach out if you have any questions about
environment has code, infrastructure has code,
and also we have a product that uses the
same environments as code concept, so check it
out. It's called zlifecycle and as I mentioned at
the start, I would love to get your from from from
from from from IAC to environment as code
again.