Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Welcome to Terraform secured by open policy
agent. Today we're going to talk about adding policy as code
to your infrastructure. As code my name is Peter
Oneill. I am the community advocate for the open policy agent
project. I'm also a digital nomad and contributor
to open source projects. You can find me on on pretty much all social
media platforms with at Peter Oneill Jr.
On today's agenda. First we're going to discuss Terraform and infrastructure
as code. Kind of just how these tools came
about and what they do for us right now. Then we're going to
talk about introducing these concepts into Gitops
best practices and putting them into your CI CD pipeline.
Next we're going to talk about how do we authorize the changes
that are challenging with Terraform and your CI CD pipeline
and how to best control what is happening next.
After this, we're going to talk about decoupling these policies
or decisions in this pipeline so that you're no longer baking
these policies into your services,
but rather having a standalone tool like open policy agent to
handle this decision making. And then lastly,
we're going to talk about securing putting
this whole process together. And what does it look like to have a secure pipeline
with Terraform? Terraform apply secured open policy agent everything
up. We'll show a quick demo of how this all works with Terraform,
Opa and GitHub actions.
So to start, Terraform brought about the idea of
defining your infrastructure as code. And with this idea cause
the concept of moving from an imperative type of programming
languages to a declarative type of programming language.
And so to create a simple analogy here, let's talk about driving
to a destination. When driving to a destination, you can hop in the
car and drive yourself. Think of this as the imperative model where
you choose every twist and turn that you need to do in order to
get to the destination. Or you can do the declarative model
where you choose a destination and then you
call a taxi or a rideshare app and they figure
but the directions, you're no longer in control of the individual twists
and turns, but rather you just want to get to that end state. You want
to get to that destination. So you're declaring that I want to be
at this destination. So imperative is you
decide each twist and turn on how to get there and then declarative is
just wanting to get to the destination and letting the service
handle the rest. So using that same analogy
for driving, we are going to look at the three phases for
deploying infrastructure as code. First, we have the coding phase
where we are writing the terraform manifest files. And so you
can think of this as choosing the destination that we want to go to,
or understanding the full picture of our infrastructure
as we see it in the end state.
And then so knowing what the end state is or the destination,
we can now generate a plan. And so with
this plan, this is very similar to seeing the gps destination
on your smartphone before actually going to that destination
or taking that ride share. So this is seeing the route and each of the
twists and turns. And so with terraform, we're going to see each
of the resources we went to create or modify and then
each of these changes that needs to happen in order to reach the end started.
And once we have this plan and we know the end state, now we
can approve this plan and execute
it. And during the apply phase, when we are executing this plan, if everything
comes according to plan, then there'll be no errors and we'll be exactly
where we need to be. But if there are errors along the way, because the
infrastructure at the time of the plan doesn't match the
current state, this is where those errors will bubble up. And then
when an error occurs, you can either roll back or stop the plan,
or stop the apply where it is in order to fix the problems
and then continue the planned code creation
from the point that you are at. And while
in a dev or test environment, running terraform apply
directly from your laptop may be acceptable. Typically you
will want something a bit more robust once
you are in a shared team or production environment.
And so with that you will want to run your terraform
apply as part of a continuous delivery or continuous
deployment model. So with continuous delivery
or continuous deployment, we went to use the same Gitops best
practices we use for the dev
side on our infrastructure side as well. Right? So with these Gitops
best practices, these are having a single source of truth.
Being a git repo, this is going to be
defining what you want in a declarative language, being Hashicorps HCL.
And then any changes should be versioned. So anytime
you do want to make a modification, this process is going
through a pr in order to show that
the underlying infrastructure is changing. And then this whole
process should be automated so that no
manual changes should happen to your infrastructure, making the
current started different from the version state.
And so with that, let's look a little bit about how this
might look. And so on the left hand side here,
this is going to be when you're actually defining your manifest files and then submitting
them to a repository, doing a git commit or a pr
in order to push this code to a repository. Once this code
is sitting in the repository, there will be some automated
testing that needs to happen. And then once all
the testing has happened, this will generate the plan, and then it's
sitting in a delivery state waiting to be approved and
then deployed. Right? And so going through that authorization step,
which, that manual authorization step, or you
can look at a continuous deployment model where you set up
a more robust testing and policy
suite in the middle so that you can go straight from deploying
your code to a repository to it creating the underlying
infrastructure. And this is where policy comes very important,
so that any changes that you intend to
happen happen as you expect them to happen, so you don't
have any runaway resources or unintentional actors
doing anything that you wouldn't expect.
And speaking of unintentional actors or malicious actors,
right. This shouldn't be the only consideration when thinking about authorization.
Authorization should encompass protecting your resources from
any changes, whether those are intentional or not, malicious or
not. And I think at this point, it's very common to
consider storing secrets in a secrets manager to
protect yourself from unknown users or malicious users.
But even more important is being able to protect
your resources from changes you don't expect to happen. Right?
And so these can be unintentional changes or radical
changes. So you may have an unintentional change where you've
deleted all of the tags from a set of resources without knowing
that you were going to do that. Or you may have a radical change that
has unexpected where you created 1000 containers instead of 100 cause
of a typo, right? And so these types of changes can happen
and can have serious effects to your infrastructure
without the proper guardrails in place set up to
protect it. So speaking of these guardrails,
let's talk about how to enact these guardrails with open policy
agent. So open policy agent, or OPA, is a
general purpose policy engine. And it is general purpose
because it is going to work with any services, not just terraform,
right? Because it is expecting a query or
a question as a JSON blob, and then
it's going to return a response as JSON as well. So this makes
it a very flexible tool when needing to evaluate policies,
right? So you are able to now evaluate these policies separate
from the service that needs the policy evaluation, right? And so
this is a decoupled method where you have decoupled
policies from your service. And so removing these
policies from the service itself
allows you to have more fine grained
control over how the policy over
the specifics of the policy. Right. And so
with OPA, OPA comes paired with a
dedicated policy language called Rego. And so Rego
is purpose built for defining policies. It is also declarative,
much like Terraform's hcl language,
allowing you to follow the same Gitops best
practices that we talked about earlier for storing
these policies in git and making sure that they are
versioned and having an automated deploying of these policies
once they are submitted. Right? And so this is
a nice complement, having your policy as code next to your infrastructure as
code using the same or similar deployment
methods. And with this
model you are able to have full control over the policy development
lifecycle. So you are able to make updates and
changes to your policies separate from the rest of your application and
infrastructure. So anytime you need to modify
what the policies state, you're able to roll these
out without having to restart
services or recreate resources. You can just start enforcing
new policies from that point.
All right, and now let's bring it all together and talk about how
we're going to secure your infrastructure with open policy.
Agent so after you define the manifest files that you
normally would, you may now add in an additional step where developers
or operations folks are using OPA on the command
line in order to check that
their manifest files are valid before they are shipping them out, just giving
them a little bit more confidence in their changes before
they're even submitting them to a git repository. And then you
will submit these manifest files in a pr or
a commit. And then OPA is going to now
be part of your automated testing suite where this is going to be
the crux of your policy enforcement. And this is
where OPA is going to catch
any resources that
are not meeting your organization's policy requirements before they
actually get deploying to the underlying cloud services or hardware.
With that, let's do a quick demo where I'm going to show
how to do some organizations on a terraform
manifest file using OPA.
All right? And so in this demo we have three files. The first
file here is a terraform manifest file. We're going
to be using the Amazon EC two module in order
to create EC two instances.
And then we're going to have two test case here. The first test case we're
going to create 16 resources or 16 instances. The second
one we're going to be creating three. And so the first one will fail.
The second one will pass based on the policy that we have
and so these are just standard AMI
configurations. And so let's pop over to our rego policy
here. In our rego policy, we see that we have set a
blast radius of 30. And what this means is
we are essentially giving weighted values to the different
actions that terraform can perform. If it's going to perform a
delete action, this is going to be weighted at ten. For a create action,
this will be weighted at two. And then for a modify action, this will be
weighted at one. And so remember, for our first test
case, we're going to be creating 16 EC two instances which
will be 32 just above our designated blast radius.
And so that will fail the flow.
And then underneath here, we can see the actual policy.
We can see that by default we're saying authorization is cause,
but we're using the word authorization auth z here.
You can set this to anything. This is just the name of a rule,
just a variable. So this is not defined by Rego. You can define,
accept, deny, anything that works
for your policy. And then underneath that we see the various rules that are
going to do the actual scoring system.
And finally, our last file here is going to be the
GitHub actions file. That's going to be run
once we submit the code to GitHub, right? It's going to check, but the code
install terraform, install OPA and then run
a format init validate plan.
It's going to then convert the terraform plan that's coming out as a
binary file into a JSON file for OPA. And then we're going to
give that JSON file to OPA to evaluate and see
what happens. So with that, let's go over and
look at our terraform manifest file one more time. And let's
say that we are a good dev and we went to actually
check this ourselves beforehand, right? So then we run that terraform
plan, create the binary file, convert the binary files to JSON.
We see that we are creating those 16 resources. And then,
so let's actually get the score here for
that blast radius. We can see that by running OPA eval
against that terraform plan and then comparing that to the regal
policy we have defined in the policy folder, which was the regal file
we just looked at. And then we are looking for the rule
with the name of score. And that is going to give us the actual value
of 32, showing that the weighted value of this change is
32,
which we know will fail this evaluation.
Right? And so with that, let's go ahead and say that we did
not run this check, we didn't know that it was going to create this many
resources and let's submit it anyway.
Let's see blast radius 32.
Send that off. Let's go over to
our browser here. With that
we should see it pop up blast radius 32.
And then this is going to take a little bit of time for GitHub
to set up the container, install terraform, install OPA,
run the terraform formatting commands,
create the plan as we stated, convert it to JSON.
And then once we have that actual plan being
compared to OPa for authorization,
which we can see exited
with a status code of one. So now let's
go back, let's go
back and modify the manifest file to be just three resources.
All right, going back to our code editor here.
Let's comment out this one.
All right. And with that, let's do a,
let's, let's just, just check this one more time.
Locally, create that same output file.
Let's get the score.
Score is now six, which is as expected. And let's run the
local, let's run the local
eval.
Make sure that that is not turning anything. All right, so everything
AWS expected and let's submit it. So blast
radius six,
submit those changes. Let's go back to our browser
and we're
seeing that it is spinning up a new container. So we have to wait
a little bit. So as this happens every single time,
doing all the same checks, spinning up terraform,
creating OPA, running the format commands
and getting down to the bottom here. Terraform validate terraform
plan. We're going to see that it's going to spit out the plan for only
these three resources and
then authorized and
we can see that the job completed and everything is green.
So now we can hand this off to our continuous delivery
or continuous deployment system to finish off creating the resources.
Cool. And with that I'll end the
demo and back to the slideshow.
And in the slideshow you'll see a couple of links here to help you get
started with OPA. Using the OPA exec command, using another
tool that you could use is comes which is built on OPA and Rego
does the same sort of manifest validations.
And then we also have an integration with
AWS cloudformation hooks if you are an AWS shop.
And on the right side you can see a couple of resources to
the Styra Academy, to the OPA docs, and to
Styra free just to get some hands on with
OPA using an
tool. And then lastly link here is the link
to this demo. So with that,
say thank you for joining and I look forward to
connecting with everyone. Hope you have a great rest of your conference.