Lessons Learned from Writing Thousands of Lines of IaC
Video size:
Abstract
Immutable architecture is the backbone of infrastructure as code, to ensure production environments cannot be changed during runtime. While this has the benefits of its inherent safety measures, this can also be restrictive, all while creating new challenges for security. Immutable concepts are much more effective when it comes to securing cloud native environments and infrastructure, which is becoming an increasingly more complex task.
This talk will focus on some of the fundamentals of immutable architecture, best practices and recommended design patterns to work around its limitations and enhance security, as well as what you most certainly should not be doing when running immutable architecture both from an infrastructure and security perspective.
This will be demonstrated through a real-world example of deploying a single-tenant SaaS in an automated pipeline, typical challenges encountered, and what was learned on the way, through a Terraform, Kubernetes and step functions example.
Summary
-
Eran Bibi is the co founder and chief product officer of Firefly. Today's talk will be about some of the lesson learned while writing infrastructure as code in terraform. He will embrace the immutable infrastructure concept. Try to minimize the duplication of code when you're writing.
-
My next few tips will be regarding terraform state file. State file is the persistent way to keep the current state of the cloud. Use data form data block to get information out of cloud provider APIs.
-
My next tip for you is don't bypass the infrastructure as code pipeline. Even if you will put those safety measurement and try to do everything for education and behavioral change in your team, you need to make sure that you are monitoring for infrastructure drift.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello. Welcome and thank you for joining my session. My name
is Eran Bibi and I'm the co founder and chief product
officer of Firefly and in the past decade
I'm doing DevOps as an Obi and also as my profession.
And today's talk will be about some of the lesson learned
while writing infrastructure as code,
specifically in terraform by embracing the immutable
infrastructure concept. So let's start with
understanding what is immutable infrastructure.
So the concept saying that you cannot change or
alter any configuration after you are provisioning a
server. And it's very different than the traditional way
of provisioning infrastructure where you provision
a server and then keep patching it and changing the configuration
over time. By having immutable infrastructure,
you can enjoy a lot of benefits, lines keeping everything
consist, and also you are
getting more predictability about what is going
on because everything is the same. And in case
you have some issue, you just need to create a new infrastructure
and you don't need to worry about the changes between
the service that you have or the setting that you have in place.
And as Bernie says, new is always
better.
Okay, let's dive into the details. So in this section
I'm going to share with you a few of the tips. I'm going to use
terraform as the main language for infrastructure, as code
in my example. But some of the tips are relevant for other frameworks
like Pulumi and CDK. The first
one is a very basic pattern of using models.
So instead of having a huge terraform file
describing all of your application and infrastructure in
a single place, you can split it into smaller
parts and then reuse them. So if I need
to describe in few words what is models? Models are
smaller pieces of infrastructure, has code that the
queue can share between project. And if you want to get
the idea of how modules look like, you can see
in this diagram. So you have the
main project, this is the root model, this is what you are
doing as a default. But if you would lines to have
smaller pieces and reuse them in few separate
use cases, you just creating that directory
called models. And then you even can create a
child model inside. So a child model
can be that networking piece or the storage piece,
everything that you can reuse in other project.
So by embracing that practice you
basically going to the other suggestions that I have
is to use the dry patterns, do not repeat yourself.
So try to minimize the duplication of code when
you're writing. Even if you are going deep dive
into one project and you think you will be done
after completing that project, think about going
ahead and you will assume that infrastructure
changes happening only on a certain level
or a certainly resources. So make
sure to create that logical separation between
the resources so you will minimize the
risk. If you are writing comes change.
You don't want to take the risk that if you
are putting a typo or something that you missed
in your pipeline will damage the entire application.
You want that the blast radius will be small as possible,
so try to create smaller pieces and
reuse them whenever you can.
Another tip that I have for you is keep
everything consistent. So before you are
creating your first project just
do the reading and understand what is the best practice
for naming convention and try to keep everything in the
same convention. If you are using
variable which is highly recommend just to make sure
what is the right order to use them and if you
are inherent resources from place
to place, make sure that you are doing it in
the right place and also about the use of the models.
So you can use out of the box models that you can find in
terraform registry but you can create new one.
So before just writing a lot of code by yourself,
make sure if that code can be available for you already
written in the web. My next few tips
will be regarding terraform state file and
if you are familiar with the terraform architecture then
you know that state file is the persistent
way to keep the current state of
the cloud. So terraform will know how to create new plan
once you are introducing a change to your cloud.
So if you are a team with multiple people
that using terraform against a certain type
of infrastructure you should use remote state and
terraform providing a backend service that can help you to save
the state in remote location. I personally
prefer the s three bucket as the place to save the
tf state file and this is what taking me to
the next point of you need to make sure you are backing
up the state file. So because the state file is so
crucial in the piece of having a
healthy terraform deployment, you need
to make sure that you always keep in a
backup of the specific state file. So if you are using s
three as I did, you need to make sure that you
are turn on the versioning in the s three bucket
and then you have that peace of mind that you can always
go back to the previous revision of the state file
in case there is comes disaster of something happened
to the state file and you need to go back. The next tip
is to use the state locking. So we started
with understanding the terraform is something that you can collaborate
with other team members. So you need to make sure you
know how to handle a situation where multiple people
trying to provision and introducing changes into
the infrastructure in the same time. And this is why
there is a feature called state locking. So if you are using
s three, you need also to use dynobodb in
order to manage the lock. So the entire locking mechanism
is just to prevent that
situation where two people trying to
write to the same file in the same time. So if
someone doing a change while terraform
is writing the state file is basically locking it for
changes. The other team member will get a message that
the state file is locked. And there, there is no situation where
two people creating to the same file in the same time.
The next tip is try to avoid situation when
you are manually changing the state file. So as
I mentioned, state file is only a JSON
file and is human readable and you
have the power to edit it
and remove lines and add lines. But this is
a bad practice and what will happen
when you will manually change the state file is
basically a place for a lot of errors.
So if you are in a situation when you would like to import
new resources that you are not managing terraform and
you like to make them manage, just use the terraform import command.
So I didn't face any situation that required
a manually modification of state file,
but I heard about a lot of people that tried to do that
in some cases and they ended up with a corrupted
terraform deploying.
Next, I highly recommend you to use data form
data block. So the data function
is basically a fantastic way to get information
out of the cloud provider APIs. It can
be the cloud provider or other provider that terraform uses.
But the thing is, just think about it as a query language
that you can use to get information from the
providers. And I can give you a very quick example
of a great use of data calls.
So if you have a model and you have an r coded
list, for example, I would like to have a
list of the availability zones. I put them in the
file. So this list is basically statically
handled and each time there is a changed in the availability
zones I need to do a manual change. But if
I will use the data block, I basically have
in a call to the cloud using the provider and in
this case I will use a call called AWS
availability zone and I will give it a property called
state available. And basically this data block
will get the list of AWS availability zones
which are available in a dynamic way
through the API of the cloud provider.
So the only thing I need to do is just to create that variable,
the availability zones and put the value of
the data call. So this is just an example.
And a data call also allow you to share
information between project. So if you will
use it wisely, you will find that data call is a
powerful feature in terraform.
My next tip for you is don't bypass the infrastructure
as code pipeline. So when
you are creating an infrastructure has code, it's basically a
committing to a certainly way of provisioning
infrastructure and then infrastructure changes.
While it's very easy to go to the cloud console or
to use the CLI or API of the cloud,
all of the changes from now on have to be through
the infrastructure as code pipeline.
So do your base your best
and put those safety measurement in order to
make sure no one can alter and change the
infrastructure directly from the cloud.
And this has got me to the next point of you
should understand that even if you will put those
safety measurement and try to do
everything for education and behavioral change
in your team, you need to make sure that you
are monitoring for infrastructure drift. And just
to make sure everybody here understand what is infrastructure drift.
So drift is when your infrastructure become
different than the one you describe in your infrastructure
as code manifest. So it's
mainly because somebody doing a manual change,
but it also can happen by a third party application
like a CSPM machine that also creating
stuff to the cloud. So you
need to make sure to have the right tooling in place that always
evaluates your terraform state and terraform
HCL configuration against the real actual deployment
on your cloud, there is few projects that can help you to
do that. And I think the most important tip that I
have you today is just treat your
infrastructure as code, the same as any other
code. And the meaning is that
if you have a very good CI CD pipeline with
reviews and static code analysis and lines
and scanners and all of the good stuff that the queue can
put in the shift left in the CI, just make sure
to have them also in your infrastructure as code.
For example, there is a great project for security scanning
like TF scan and Chekhov. There is a
very good project that even can give you a cost projection
like infra cost I o. So just make sure
that you put all of those stuff even for your
terraform code on the other infrastructure as code
language. And also don't forget about getting
your peer reviewing and approving your pull
request. And this is something that we
tend to be more loose with that state of
mind. But from my experience, once we became
more measures with understanding that infrastructure
has code is just like another code, we see
how the quality become much much better
over time.
So if I would like to conclude all the stuff that
I mentioned here, and I think if you take not all
of those items, only few of them,
and you implement them in your journey,
you will be with a very successful point
and you will have a better experience using
infrastructure as code. Thank you very much much.
I will be available for any other question after the
talk. Thank you and see you again.