Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, call for it viewers and welcome to this episode SRE 2024
session about creating terraform providers and my journey when
I wanted to create one, starting from learning the documentation up until
publishing it into the terraform registry.
We'll cover a bit of a refresher about what is infrastructure as code and
then discuss where should we start and how to create a provider in coding
and live demo. I'm Harel Safra. I'm a data platform engineering
team lead at riskify. Riskified is a fraud detection company
centered around e commerce solutions and
my team managed all online databases that are part of our
systems. We use SQL, NoSQL,
search engines, graph databases to provide our services.
Before that, I've been in the infrastructure domain for 20
plus years, managing servers, network switches,
stored databases, and anything to that effect.
What is infrastructure is code infrastructure code is
a programmatic definition of infrastructure elements. That means that
we want to define infrastructure, which could be various things
like servers and network switches, but it could also be database
users or elasticsearch indexes in a programmatic way
that allows for repeatable and documented processes
where the knowledge is baked into the process itself and doesn't depend
on humans remembering what they need to do.
There are two general approaches for infrastructure as code. The first
one is declarative. We ask the user to define what they want to achieve
and the framework compiles that into
infrastructure elements. You can think of terraform in that domain.
The other approach is imperative. The user defines how they
want to achieve something. It's more of a code based approach,
and the framework again creates an infrastructure
for the user. You can think of Pulumi in that domain.
Terraform providers are the plugins that interface
with the infrastructure API on one side, and they also interface
with a terraform core over RPC on the other side,
and they bridge the gap between what terraform core is and what the
infrastructure needs on the API to create the
infrastructure elements. There are 4100
plus published providers as of this recording,
and the main point is that anyone can create more providers. You don't have to
associate with the infrastructure vendor itself. Anyone can create
and provider as long as there's a valid API that
you can work with. You need to understand a bit about
the architecture first. The first element is terraform core,
which is actually the terraform executable that runs when you run a
terraform command. If you run terraform plan, it actually
runs terraform core in the documentation language.
Terraform Core will communicate over RPC with a terraform provider,
and the terraform provider uses native calls inside the process
to communicate with a client library that
then communicates over the native infrastructure APIs and
protocol with the infrastructure itself to provision resources and
this native library. This native communication can happen
on HTTP. You'll find it in various documentation,
but it could also be GRPC or SQL or system calls
or anything that infrastructure knows how to interpret.
So obviously the first thing you need to understand is the API that's
supported by your infrastructure. Find the correct API
to interface with that and find an easy way to do
that. If you, if there's an existing goal and client for your
infrastructure, try and use that. It will probably be easiest.
Otherwise you have to reverse engineer the protocol and that can be a bit annoying.
After you understood the API, you should learn go.
Go is the language that the terraform providers are
written in. It's a compiled, high level programming language.
I need to understand Go didn't understand go
too deeply, but you need to understand control structures and a
bit about interfaces and how you go about creating
code. I use a step by step
tutorial that's found in the Go site tour
and it's a good tutorial. It will take you from not knowing Go
to have a working knowledge about how to use it.
I like Go for its simplicity. It's easy to
understand and to learn. I like very much that it's compiled compile
find problems before you do in production, there's one
thing that need to remember. There's no exception. If you, if you are
used to exception handling for other operate languages that
you have to check method return values for errors
and otherwise you'll find that your code will error
out for various reasons because it didn't check the error, the return error
so you understand the API you have a working knowledge
of go now you need to understand how to create the providers.
I used Hashicorp's documentation to
learn how to do that. It's located in the developer portal
under terraform plugins. You don't
need to learn everything in advance,
just read the overview and then continue from there in the
sections you need. After you
have the basics, let's see how you create the provider itself.
We'll use a demo of a provider that manages lines
inside the text file. It's a simple made up example,
but it will allow you to understand what we do.
All the files are managed in a single path that's defined in the provider
configuration. There's a single type of
resource, a file that has a file name and
a lines array, and it's actually just an array of
strings nothing too fancy. The file API that provided
is limited and limited by design, because when
you're working with other types of APIs, there will be limitations that you
need to understand and work around. And you didn't just want to provide an
API that allows you to do everything on files.
An example for this sort of our recent configuration, you can see on the left
there's file file
one resource that has file name equals file one and lines
line 192, and the provider will translate that
into a text file containing line one and line two,
named file one. Clone the code
under my GitHub repository, eight safer terraform provider
filedata and it's also published in the terraform registry
for you to see after.
After you understood the API that you work on and
you're into will point
you to creating providers with a plugin framework.
And this is actually the correct way and the recommended way to create new
providers. It abstracts a lot of interactions that
either happen with terraform core and allows you to focus on your logic.
You start by cloning the terraform provider, scaffolding framework,
repo to your GitHub
profile, and then you can tweak it and customize it
from there. Therefore, providers have
four basic operations that they need to support for
each managed resource. And these operations both
provision the infrastructure itself, or change infrastructure,
or delete the infrastructure, but they also amend the terraform state.
Or actually they provide the instructions for terraform
core to amend the terraform state after they've
done the operation. So after a create operation, you need to
amend the state so that the state will include the new resource
that was created. A create operation obviously provisions
a new resource. A read operation gets
the infrastructure's current state. That means that it will go
and, and read the infrastructure state over the API and
return that into terraform core. An update
operation changes attributes that can be changed,
and obviously not, not every attribute in the threshold can be changed
and a delete operation removes the resource. Terraform core sometimes
uses delete operation to change,
to change resources that cannot be changed if there's an immutable
opinion with an attribute, therefore core will destroy
and recreate the resource to change that
attribute. If you look at the code,
this is how the repository works. When you close the repository, we have
a documentation directory and
all the other code sits in the internal provider
and the resources. So if we look at the
file resource, it has a few
different methods that cover the operations. It has
the delete operation. The delete method covers the delete operation,
update, read and create. And if you look
inside one of the methods you can see they all start with
the same kind of way.
You start by getting
the parameters the terraform course has sent to the
provider, and then you do a
bit of logic and then you return a response
to terraform port to allow it to amend the state correctly. So if you take
for example the create operation, you can see this provider starts with
creating a full name from the base path inside
the provider, and then from the
file name provided inside the create operation. Then it
iterates over all the lines inside
the lines array and writes them to the file
with API writeline operation.
Schemas and attributes are used to map between terraform configuration
files and the code itself. So every
configuration block of the provider, the resource,
or the provider itself has a schema that
define the needed parameters, and schemas contain attributes that
define the data elements itself. Each attribute
has a type. It could be a primitive, like an x 64
or a string, and could also be a complex type like a map,
an object, a list. Each attribute also
has properties like a description,
if it's optional, if it's sensitive, or other attributes,
other properties, and it can
have optional validators that check the user
supplied values against what the provider expects
and allows you to not check them later on because they will fail
validation checks. If you
look at the code, you can see that the
file reasons has a schema method that
defines the schema that the file resource expects to receive. It starts
with a description of the file resource itself and
then has an attribute named filename which
is of type string. It has a description file name,
it's required obviously, and it has a validator
that checks the correctness of the file name provided
in this example with the regix. It also has a lines
attribute which is a list attribute. A list
interfrom has is a collection of elements of the same type.
This is also required, and the
element type is string, as I mentioned earlier. And it has a
validator that's a list validator that requires the list to have
at least two elements inside it, just as
an example. Nothing too fancy about it.
Try new descriptions wherever you can,
because these descriptions are later on grabbed
by the plugin by the documentation framework to create these
lovely documentation files. And you can see the description
is copied from here and every other attribute.
So if you have the descriptions inside, inside your
resource file, it will be copied to the documentation and you can
use that to publish your provider later on.
Types of the terraform plugin framework are not
native Golang types, so an n 64 in the plugin framework is
not a native n 64 because they have additional methods
to handle null values and unknown values.
For example, in x 64 and any other
type has an is null method that returns a true force
in case this is a null value or not.
When you want to access the values, use the in
case of it's a primitive type, you use the value
type method like value in 64 that
returns a native in 64. If it's a collection,
you can convert the values into go learn types with the add methods.
For example, list has an elements as an
elements as a method that returns the
type as a native go type.
If we look at the code again inside the file resources,
you can see that over the
breakpoint, you can see that the same create method is accessed.
The filename attribute, which is a string attribute
with a value with value dot string.
This is used in various other places, so this value string
will copy the native the framework type
into a native type and convert into a string that you can work
with after you created your code. The code
run at least hopefully. You probably want to run it locally and
debug it in case there are errors. The first thing that
you can use to run telephone providers locally is
to use a TFC like config file with a provider.
Installation dev overrides substance,
which translates a registry address into
a local address that has your code.
This will allow you to run the code without publishing it into terraform registry.
You can use log based debugging for simple cases,
but for more complex cases use debugger based debugging.
It will allow you to set breakpoints and run your code
as any other code. You do that by passing
a flag debug. True, you set an environment
variable that it outputs and then you run your action
that you want to run, and then it will break inside
the provider code. Let's see an example for that
in action. We started with a configuration directory.
We have a configuration directory that has a
provider file that defines a base path,
and this base path is the same as the one we're currently in.
It also has a resources file that defines
two files and file two with these names
and the line inside them. And we can see that if we add
file two, you will see the lines AA,
BBB and CCC the same as the provider has defined them.
If you want if we run terraform plan now,
there's no changes because the resources have the same values
as this as the infrastructure.
Say you want to debug the the plan
state and specifically want to debug the create the read operation inside
the terraform plan. We'll start by setting a breakpoint
inside the read method of the provider.
Then we'll make sure the run configuration has a debug
equals true as a program argument and
we click the debug button and
it will instruct us to copy these values and set
them as an environment variable. And this will allow the
core executable to reconnect to that running session,
not just use what it has in this directory.
So we export this value and if you
run terraform plan again now you will see
that the
debugger has jumped and started to run and it breaks inside
the read method. And from here you can just
use the regular debugging operation to
debug your code and see what happens. In this case you can see that
this is the read operation for file two.
And if we resume running the code there will be
another break for the read operation of file
one. And while it's running,
the terraform process is hanging. It's actually waiting
for instruction that are returned from the provider.
And if we resume the operation it will continue
and in this case again show that there are no changes.
So that's how you debug operations. Use it.
It's very powerful. It's very easy to debug like that.
After we created the code, they debugged it. It seemed to be working. Add acceptance
tests it will be used both for automatic tests
during deploys inside inside GitHub actions,
but you can also use it locally to make sure that your changes are valid
and not you didn't break anything. You can
have automatic testing for resources, data sources,
providers, and anything that you created.
The state inside the acceptance test is checking.
Basically you don't have to check that the state changes are done correctly,
but if you do have any change that you want to validate on
the infrastructure side, you need to do that inside the acceptance
test. You can run it manually.
All tests with manually will make test act and
the way you structure that is you have file
resource. The resource name underscore test inside
the same directory. So if we take a
look at the code, you can see that
the file resource has a file resource test
file and it has a
function called, and each function is actually a test.
So you have multiple functions for test and
there's a helper function here that defines the files
that need to be run. So you can see there's a configuration block
for provider and another configuration bug for
a resource of type file data with this name that
gets passed inside as a parameter. Here and
has the lines formatted as lines here.
And this allows you to see as the same kind of configuration block for different
tests. You can see for example that in this test
I passed file one with one and two in
the lines, and in this case the test and update I
passed the same file name but with two and three. And this
allows the same configuration method to create different
configuration files. To allow you to test the
check. You have to define each check to
pass. And this is just a check that tests the resource attribute.
The filename is indeed equal to file one.
This check is a bit more interesting because it tests
that lines the first line.
The first value inlines is equal to
two as we defined here. As I said,
you can run the configuration acceptance tests both
inside the operation and you can also run it by running
bake test act. That will,
that will just run the acceptance test and
we can just run it again and again to make sure that everything that we
created and changes indeed validated break any other functionality.
After you finish creating your provider, you debugged it. It looked all working.
You created test acceptance tests and auth
passing you could publish into the terraform
register and allow other people to use your work. You first define a
GPG key and then set the
repo sacred GPG private key and passphrase to your value.
An important thing that it took me some time to understand is that
you need to create a tag named v version.
For example v zero dot one dot zero
to allow the automation to grab the changes. And after
you created a tag and pushed it into GitHub, you can log into Terraform
registry and add the repo for
the initial node. Terraform will read what the current context of
the repo, but it will also set webhooks that
push any new changes to Terraform registry.
And new changes are new tags. So if you have a tag named v
one and then you push a new tag named
v two, there's a default GitHub
action that will compile that into resources
and then Terraform registry will go and grab these resources and publish it.
You can see this in this example, the version is zero
dot one dot zero and it's backed by a tag name v
zero dot one dot zero. So to wrap
it up by journey with the provider started with
no provider, just manual management scripts,
manual procedures, a lot of documentation.
Confluence. I started
by learning go and terraform framework.
I then created code for the provider
and released more features. In my example I created a provider
that defines users and sets
inside the aerospace database. That's something that wasn't available
after the code was created. I published it into
the terraform registry and people appear to be using it.
Thank you for your time today. If you have any questions,
feel free to reach out to me either by my email,
by my LinkedIn profile, or opening
issue on the provider that's published under GitHub hsafra.
Thank you and I hope it's been instructing for.