Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, thanks for joining me. I'm going to talk, but telephone
practices, the good, the bad and the ugly telephone is a great tool.
It is used widely across the world. This talk will cover
not only the specific bits and pieces regarding terraform usage and best practices,
but also hopefully will make you think about youll unique use
cases and scenarios and help you see the big picture
and utilize terraform in a broader context rather than just an
infrastructure tool. So a little bit about myself.
I'm Hila Fish. I'm a senior DevOps engineer currently working for weeks.
I have 15 years of experience in the tech industry.
I'm a DevOps culture fan. I think this what helps companies achieve
great things. I'm conferences, a co organizer.
So DevOps Day is Tel Aviv in Israel, and Sascraft which is
a monitoring conference. I'm a mentor in opschool, which is
a course for DevOps and Ba'ot which is
a community in tech for women. And I'm a lead singer in
a cover band, as you can see in this picture.
Okay, so telephone implementations can be good, bad and
ugly, right? So we will talk about it here. Just one disclaimer
beforehand I'm using to show mostly examples from AWS,
but this talk is suitable for any cloud provider
that you will work with. And also I mostly or not
mostly only worked with telephone open
resources and not enterprise or cloud. So I don't know if
what I'm going to show you applies to them as well. So just bear
that in mind. Okay, quick wins. I'm going to
show youll briefly things that you can do in terraform that
will achieve great value in a low amount of time, just a matter
of seconds or minutes. These first thing is versions lock.
So like we have requirement TXT or packages lock
JSon. So that's exactly the same thing.
So when you think about modules,
providers and terraform versions, each of them has
a version that gets deployed, which means a
specific syntax that is accepted and the
features with it. So if you lock the versions, you know exactly which
features are the ones that you can use, which syntax is
acceptable and valid and stuff like that.
So the good thing here is to lock the versions always of
the modules, providers and the terraform. The bad thing
is to lock in a way that will still allow breaking changes to
break through. So you have version constraints. So if
you lock the version to the major or the latest of major x
until breaking changes get introduced, then that's fine
because little changes applied and no breaking changes will get through.
But if you still lock it
in a way that will allow the breaking changes, then this
could be bad. And the ugly thing here is to have
no version lock whatsoever. Trust me, it is really really
ugly. You will see a lot of stuff in plan apply of
like I don't know this syntax, what this is it,
stuff like that. So just lock your versions.
The next thing I want to show you or talk about is tagging resources.
So I think this is a must implement practice
because it allows you to filter cloud provider expenses and
sort them. Because if you use tags you can sort by whatever you
choose to and that way you also gain visibility on
ownership and overall projects. So if you tag
them correctly you can do a lot of great things and sort
and really track your cost management and other stuff along the way.
So the good thing here is to tag everything because why not
everything is good. You can use default tags on
the provider level. So for example, if you use the managed by tag
that I showed before in the examples, then managed by is by terraform
of course. So if you set it in the provider level then you can just
forget about it. You don't need to set it up on other resources
along the way. Once it is set up on provider level, then it
will get applied to any resource that is under that provider.
And another good thing to do is to add enforcements to
failure pr were if tags weren't added. I'm going to talk
about the practices enforcement later on.
So you'll see that the bad thing here is to tag
inconsistently because as we know, consistency is key.
Inconsistently means that sometimes you will be
able to sew, sometimes not, and it's not that great.
So try to be as consistent as possible. And the
ugly thing is to have no tagging whatsoever.
Because again, trust me, if you'll tag things,
phenops team will thank you
and your team leader will thank you and the management and company will thank you.
So do it. And the last thing in the quick
wix section I went to show youll is the remote state. So by default terraform
works with local state. So even if you're walking
alone, okay, let's say you're the only one managing infrastructure still you want
to think about, or you should think about backups
and to have the state secured and to have
redundancy. Because if something happens
to your local machine that it's not good.
So all sort of things. So you should really use a back
end like in this example, this is s these, but again, call cloud
providers, provide that. And then this way the state will kept
remotely. So the good thing here is to have the state
remote and secured because a lot of times we
have in the state sensitive information
like secrets and stuff like that. So you really need to make sure that these
state is secured. Have these state backed up.
So if you use s three as the back end then enable
versioning and then it will be blood and ensure that tf
state lock occurs. Because when you run write operations
then if a lock doesn't happen then there
will be conflicts if other people trying to run it as well. So not
so great. And these bad. And the ugly here is quite the opposite
actually. So if the state is kept locally, if it is
remote but not backed up or not secured, and if
telephone state lock doesn't occur during write
operations. Okay, so we talked about quick wins,
stuff that you can do on your day to day in order to gain
a lot of value out of terraform. Let's talk about second nature.
What do I mean by second nature? I mean that these are the things that
youll should have in your awareness on your day to day in order for
you to really work with terraform in these best way possible.
So the first thing is using community modules versus creating
them. So hey, why reinvent
the bicycle, right? I mean if it exists then use it. Using official
community models is good because they are proved over
time, they support it by the community and you eliminate
the need to support it and keep it up to date because they do it
for you. A lot of well known cloud providers features
are already covered by modules, so do your research, check available
modules before implementing yours. So the good
thing here is to use official modules wherever possible.
The bad thing is to write your own modules while official
modules existing. But if you still maintain them as such, so you
have enforcements and stuff like that, then it's good.
And using community modules without version
lock, it's also a bad thing as I specified before.
And the ugly thing here is to write your own modules and
paved no checks or consistency whatsoever and
code practices not applied. So we will talk about it
also later on. And also remember, community modules,
as I will show in a bit,
usually use enforced linters,
formatters and logical checks because they want to maintain
important aspects to allow new users to get involved
and to contribute to the module and use it right. So they
have everything set up to create the best quality
code. So why not use it?
So if you do have to create your own modules, make sure that
they are stateless and clean and generic and you do not repeat yourself there
and it is kept as simple as
possible and use enforcement slope. So like community modules
have these enforcements, use them as well. I will also
cover it later on and youll just
bear in mind that the code should always be clean and readable.
Okay, variables and locals. So unlike variable values,
local values can use dynamic expressions and resources arguments.
Locals also don't change values during or between terraform
runs such as plan, apply or destroy. You can
use locals to give a name to the result of any terraform
expression and reuse that name throughout your configuration.
Like this example with the tears that I showed right here.
So on the module side, you should use variables
for needed settings for the module config itself.
And if youll set a variable on the module, you should set default
or validation because if the module expect a
variable to get passed to it and it doesn't get it, then the module
will break right. So make sure to set up a default or
validation. And for locals it should be a
constant to be honored or relied on. So for example, you have
a bucket module that creates buckets.
If you want to have consistency of all buckets should have the same
name convention. You can do it through using that convention
in the locals as I showed in this example. So this is also
a great thing and you could really enforce guidelines and practices
through locals on these live side. When I
say live, I mean where I call the modules because modules are
generic and nothing happens there. And there's the section where
we call the actual modules and do the actual creation
of the resources. So these live section variables
wise, if you use the variables once, then just set up
default on the variable. But if you use it per regions or
other logical breakdown that you have, then use tfvals
file and then you specify each variable value
on tfvals based on each region or each other
breakdown that you do avoid using locals and use
data sources to pass their outputs to the module itself.
And basically just remember to keep the live section
as simple as possible. No logic, only call the modules
themselves. Okay, so to
sum things up regarding variables and locals use
these tfvals wherever possible. I haven't mentioned it
before, but use environment variables. So if you have environment
variables already set up, then just utilize them instead of creating
a new variable. Use locals to these hard code names
and tags which are set only once or to decrease code
readability, repeatability, sorry.
And keep things generic as much as possible.
The best thing here is to use multiple locals block if
not necessary, because terraform allows that it allows youll to create multiple blocks.
But if you don't need to, then why burden my eye with a lot
of stuff that is written, right? So just have one
block if it's not necessary. And ignoring environment
variables could also be bad because then it forces you to maintain
more variables than you need to. And the ugly thing here is
to hard code values on variables that should support
multiple scenarios. So these are why
variables are for right to set them up according to our needs.
So if you hard code values whenever it is not good
to do that, then it could get ugly. So that's
about that file structure. So when you
think about the file structure, and I say that in regards
to both modules and these live section, you should think
about it for better logical arrangement and easy management.
So this is how it is structured on community
modules. So they are basically the standard for us.
Main TF is the main logic variables, data and
outputs. And usually,
at least from my experience, for example,
if we take a module of VM, the VM itself
creation, these resources creation is on the main and then if
they have complementaries like security groups, then it should be
in SG file. If there has definitions for
log answer, then put it in ALBTF. So that's
about that. And if the main TF gets complex,
if it has a lot of things,
then consider break it down to sub modules.
So let's know IAM. For example,
the community module IAM has breakdown of sub modules.
Also eks one of the sub modules are the creation of node
pools. So it shouldn't be in the top main because it's not
the main logic, but it is relevant. That's why it was breakdown to
modules. And also when you break it down
then the variables for each sub modules is only with
that sub modules. And then also your variables.
TF file doesn't get huge and it really is
easier to maintain it that way. Also it's best to have
naming convention which reflects the actual purpose of each file.
And that way you will get a decent logical arrangement for faster
access, better readability and cleanliness of the code.
Okay, the next thing that I want to talk to you about is applying classic
code best practices. So yeah, Terraform is not a
pure programming language, I know that. I think that everyone can
agree on that. But similar rules of writing code apply
to terraform as well. Terraform progressed over the years in a way
that adopts code best practices. For example, you might
remember that before Terraform, a zero point 13 you can't even use
for each for modules. And in August 2020 with the release
of Terraform, zero point 13. Hashicorp finally introduced the ability
to loop over modules with a single module call.
So even Hashicorp realized that hey, Terraform should follow
best practices for its code.
So that's why they introduced these capabilities.
So keep your telephone code in
source control management like GitHub, GitLab,
BitBucket. Keep it simple,
stupid as much as possible, of course. Do not repeat yourself.
And make sure that your modules that you create and
everything that you use are item potent. Which means that
whenever you create something, the result of that
something, the result of the logic that runs is always the same.
Youll expect the same result. Because if not, then there's
a saying about it that maybe you're crazy. I don't know. Let's leave it aside.
But everything should be important because you want to
make sure that everything is as expected. You always
expect the same results. Functional programming
is also another approach to writing telephone
code. It is great. I haven't did
it myself yet, but I spoke with other developers who
are utilizing functional programming into their telephone
code, which is very interesting and fascinating. So I really encourage
you to check this road as well and about human
and cleanliness. So there's an interesting read
by tixen Guo. I hope that I pronounce his name properly.
He really writes things in a clear manner about
applying classic code best practices in terraform.
So I want to quote him on something about human
and clean code. So the computer that
processes your code doesn't care if the variable names
are ambiguous or inaccurate, right? If used
correctly, it still gets executed.
But since human beings are the ones to maintain
this code, then we need to make sure the code is readable.
Things like refactoring clean code, naming conventions,
stuff like that are invented so that we humans
can read these code better for the sake of us human and
not the computer, right? So that's about that. I really
encourage you to read the article because it's really interesting.
Okay, so we spoke about the quick wins stuff that you can
do in minutes in order to get a lot of value out of terraform
for your company's long term and whatnot.
We talked about second nature things that you need to think about on your day
to day when you work with terraform. Let's talk about the long
haul. Long haul means that stuff that you should prepare
for and plan ahead in order to work with terraform in a best,
efficient way. So structuring youll telephone
code, how do you structure your code?
There's a
lot of ways to do it. So let me show you how we do it
here at Wix. In Wix, we did the
structuring like that team project, blood provider and
region. This is actually a feature oriented approach.
And that way, when you look at the example here, you see the live
section, right? And we have bi, which is the
team airflow is the project AWS, the cloud provider, and us east one
is these region. That way. Also, it allows the state to be
very small, because the state is only for the
airflow project, for bit, for us
east one in AWS. So it's very small. It allows you
better flexibility and control over
what you are inserting and what you're managing at that
specific point. So it really is very
beneficial to have this structure. And also, when you
come to think about accounts, currently we manage
the accounts on the region level, which is not great. So that's
why we are structuring it or thinking to structure
it again, on top of tears. So it's
an ongoing process. But think about that. Think that if you have
multiple accounts that you need to manage and different projects,
and the code doesn't repeat itself, like I will show in the next example,
then maybe the account should also get
into the consideration of the structuring of the code.
Another example of structuring these code is using
workspaces. So workspaces isolate
their state. So if you run telephone plan in one workspace,
you will see only the state for that workspace and not the
other one that is just around the corner.
So one example is to use when you have
the same telephone config, but different customers. So let's say GCP,
okay, I talked about AWS until now. Let's say GCP. Each project in
GCP is a different customer, and it's
exactly the same code, right? Because it's the same code, just different customers.
So in that case, you can use for each
customer, each project, which is the same code, just different
workspaces, and each workspace is a customer.
So this is one example. Another example which really links
and couples with the one that I just showed is when you have the
same service, but different regions. So we have different customers, right,
but all customers need to go to one service,
financial service, for example. So if I have financial service
on different regions, use east one, use east two, and stuff like that,
then I can also use that for workspaces.
Okay, some comes about workspaces. So if you
use them, consider using telephone wrapper to avoid
human errors. Because when you use workspaces,
it's using the CLI telephone workspace, select x,
that way if I forgot to change the workspace,
I am a bit, so it's
not great. So consider if you creating a telephone
wrapper that will actually run the code for you, and then you will run the
wrapper instead of running telephone directly. And that's why
this wrapper will handle the changing of the
workspaces and management for you. Second thing is that
you have less visibility because hey, I just started with
the CLI, right? So if I haven't used the telephone workspace
built in variable here, then it means
that I don't even have the ability to know that we have
other workspaces. If I haven't did the telephone workspace
list command,
it really is important to know that you have less visibility and to take
that into consideration when you're considering using workspaces.
A couple more things about workspaces. So from
these terraform official documentation, it says that use workspaces
to manage multiple non overlapping groups of resources with
the same configuration. Okay, so it means,
it suggests that these usages are qualified,
right? Multiple environments, dev staging, stuff like that.
Multiple regions like I showed in the previous example, or multiple accounts
or subscriptions. Okay, cool. Now let's see.
Also from the terraform official documentation it says that for
different development stages like staging versus production,
named workspaces are not suitable isolation
mechanism for this scenario. So if
you do go with workspaces, maybe I read it quote,
I don't know, just make sure that you go into it with
open eyes and you know what you're doing. And I
think that we can all agree on at least one workspace usage. Both the
documentation says that and other people that I worked
with and showed me that they are doing it is
when you have workspaces, you have a default workspace, this is
the main one. And then if you create another one, you can call it whatever
you want. This could be a side branch. And then you can test out
any code that you want to introduce, see that everything works
okay, and then apply this code to default
workspace. So create a new workspace, do whatever you want,
test it out, and then if call looks okay, these apply it to the
default workspace. Okay, so to sum things
up in regards of structuring your telephone code base these
good thing of really thinking things through and planning
ahead. And if you for example,
take the first example that I showed you with the feature oriented
one, then it allows small states set up and small
state is a very good practices to have. And also the first
example with the feature oriented is it really allows you to
set up a terraform as a platform because that way you can
let any team in your organization use terraform.
Each team has their own control
over these folder. Also in GitHub each
folder has it is stated in the code
owners so they can approve their own prs
and stuff like that. So it really gives you flexibility,
enables independence, and offloads responsibilities
to others. The bad thing is that if you don't think
and plan ahead then organizational changes could cause a
need to restructure the code. And you don't want to restructure the code just
because you didn't plan. If stuff evolved, great. But if
you need to restructure just because you didn't plan it correctly, then it
could be a bummer. Another bad
thing is to use workspaces for the wrong reasons. I just spoke
about it before, so just make sure you're
doing it for the right reasons. And the ugly thing here is that
if you structure the code in a way that will allow or
enable huge states to occur, then this could
lead to invalid dependencies. So it happened to
me quite a lot that I did a change X and then I can a
plan and then I saw in the plan it's going to change Y and I'm
like what? I changed x not Y. So huge
states could lead to it. So make sure you choose
a structure that will allow smaller states has
possible. The next thing I want to show you
or talk about is the executing terraform. So make
sure that youll always strive to
remote execution because that way you don't need to
set up local credentials, you don't set up local
configurations, you paved better audit of
who can what. So it is always great to have remote execution.
You should run apply with telephone plan file
so you can pass an argument of which file the
plan file to run and then you know exactly what is getting applied.
And you should set up a telephone timeout because I had
cases where I ran auto scaling and the auto scaling was based
on spot instances. So telephone just waited for the price to
fall in the right. So it's
not nice. I just need to wait and wait and wait and it's not nice.
So set up a telephone timeout which makes sense to you.
The bad thing is to execute the telephone locally so
either your computer or a server because then you
have no audits. It's not nice. And the ugly here is
to execute locally and click
control c while terraform is running. If you don't want
to wait for the timeout, I understand, but it's best for you
to just go and grab a cup of coffee or cocoa,
cocoa, whatever, but it's not good.
Control c while telephone is running could lead to disruptions
in the state conflicts. It could really really get
ugly. So don't do it.
Okay? Practices enforcement so we
talked about that. The most important part of every module, even if it's a
private module, which is only going to be used internally,
is readability and cleanliness of the code right? In order
to keep things in check, in order to make sure that
everything is clean and right and everyone has guidelines,
then you should use enforcements. So these enforcements
already happen on community modules, so you should
also do them yourself on your internal modules.
So this is example from the AWS auto scaling community
module. As you can see on each pr there's
a set of checklists that is being checked for the GitHub
actions. So it checks if the contributor added
documentation. If he formatted, he or she formatted all day,
formatted the code, terraform,
lint, telephone format, what else? End of file.
So a lot of things that are being checked and it's really awesome to have
these checks because these simple checks can easily remind developer
to keep a high quality standard of pr as best as possible.
Okay, so to sum things up in regards to practices enforcement,
I tried to think about bad things to say about that. So maybe
the bad thing will be, I don't know, it forces
the developer to revisit the code and add more stuff.
But it's not really a waste of time because it is good
to add this stuff. It's not just on
a whim. These are important things that we need to add and
that's why it's good to add them. So I only have good things
to say on practices enforcement. So youll should add
pre commit or pre were linters for matters and logical checks,
either through GitHub actions or CI pipeline checks.
You should also, if you want, create a slack,
but that actually tears you if there was a drift between
the plan that you did and the actual environment.
And speaking of actual environment, you should always
make sure that the enforcement know or verifies
that the master should always be your source of truth, your actual environment.
So for example, in have when
we push the code to GitHub, the GitHub
check if tears were added and more stuff to come. And then
once everything was cleared and everything is okay,
it runs the plan for me. I see that everything looks good and then I
do Atlantis apply because we use Atlantis for the actual run.
Atlantis apply does two things.
One, it actually merges these code and applies it
that way. I know that what applied is what merged to main branch
and that way. This is awesome and really it
keeps the situation as it should be. Main is the actual environment
on the right. I put you some open resources, enforcement and helpers
that you can use after things talk, which you're going to sit
down and read about enforcements and
how to do it. So these are a few checks and a
few tools that will help you with this enforcement journey
and set up. Okay, so we talked about
a lot of things here, right? I showed you a lot of things you can
do in telephone or think about telephone. So maybe stuff
will stay with you, maybe not. But the thing that I really, really want
you to think about and stay with youll after this presentation
is to think and ask yourself, when you work with
terraform, how do you envision the infrastructure and the company needs?
Because you should really think things
through. Planning ahead will allow you to enable others on their
terraform journey. You will be able to set up guidelines
and best practices of your own, like tagging, usage and whatnot.
That way you will make sure that everything is
utilized in an organized way
and an orderly fashion way. And this is what we need in a company,
right? We need structure and we need to make sure everything is aligned because
it's better. We can really keep things
in check and we can really make sure that everything is
manageable that way. So take into consideration your
use cases and your pain points. Terraform constraints,
where do you see yourself and your company in
the long tears and then plan accordingly.
And if we wouldn't plan ahead,
we wouldn't be able to set up terraform as a platform as we
did here at Wix. So this is one take but of
it. And even if you're a startup, you should still
think about scales, think about how should you address and
prepare for changes to come. And then you will be able to utilize
terraform in the best way possible.
So like any other tool, don't use telephone in an
ad hoc mindset. Plan for your future needs.
Because I spoke about tixen Goa before.
I want to quote him on another thing.
Programs evolve and code changes. And it is
really rare that you write telephone code and it stays like that
because this is not how projects work.
If that was the case, then we wouldn't be talking here on
telephone practices and you would only use it once in
one way and that's that. But we will always have projects.
And because businesses went to improve and the project
is the way to move from the current state to the next desired
state. Changing from one state to another is
a project and by nature project means change and these
code is also change constantly so think about your
structure and how you structure things and allow projects
to evolve and get introduced to your environment and to
your company. Thank you so much for listening.
I hope that it was beneficial for you and I want
to do a quick shout out for some people from Wix that
helped me liberal that helped with the
visuals of the presentation and the logical flow.
Without her it wouldn't look like that so thanks for her and
other people. Ilya Schenking from my team ran Schneider,
Oprah Velez and thermal cupak they all pitched in and gave me
some inputs so thank you guys and again thank
you all for listening and if you want to approach
me on LinkedIn or Twitter or mail and consult about telephone or
other sre aspects I would be more than happy
to help. Thanks a lot.