Transcript
This transcript was autogenerated. To make changes, submit a PR.
It. All right, another session.
This time we will talk a little bit about infrastructure as code,
but not infrastructure as code itself. But if
it is secure, and if not, what we can do to
make it a little bit more secure, what is
quite important, I believe. Okay,
so let's get started.
My name is Paolo Pivots and I am a lead system engineer
at IPAM, Poland. I work here for last
two and a half years almost free. Also, I am DevOps
Institute ambassador and AWS community builder.
So we will have a little presentation today.
Part of it will be in AWS because, well,
you see why. All right, so first
of all, we will talk about security
in terraform templates. So what is terraform?
I strongly believe that all of you know already,
but just to be sure, it is an
infrastructure as code tool, right? So very basic
information. Right? So what is infrastructure as code then? Right.
So infrastructure as code is a fast,
dynamic, programmable way to deploy hidden misconfigurations
everywhere. Yep.
And when we start to think about infrastructure as code in this way,
the new world opens, because very
often we use infrastructure as code just like this. Right?
We are writing something. Deploy, rewriting, deploy,
rewriting, deploy, rewriting, deploy. And after two days,
finally our template is ready.
I've been there. I know that it's quite,
let's say not usual to see
something like that, but we have possibilities
to work with it a little better.
So in terms of security,
what we deal with. So for
sure, you remember like almost two years ago, solar winds,
right, to go a
little, but through it, to give you a little
bit more insight,
what happened there. So the bridge was
done for the company, which has like 300,000
customers, right? And 30,000 used
Orion with the application, which, let's say were
infected. This infected
version was downloaded or were downloaded 80,000
times.
80,000. More than ten government institution
in us was infected or affected by these, by this
version. And to say even
more, Microsoft, Nvidia, Palo Alto, what is interesting,
because they are also working with the security different level,
but yes, and VMware, just to list a
few of them. And Orion was the first.
And like a half a year or one year later, they had another
issue. Right? So now
you can say, all right, but it was not about infrastructure as
code. Right? Okay, I agree. But this
is what we deal with and this is
one of the element of the puzzle. So let's look on
few numbers first. And those
numbers are collected from IBM and Ponemon Institute report from.
Well, almost two years right now. So it's for 2020.
So average cost of each bridge is almost 4 million.
Quite a lot, right? All right. IBM and Ponemon works with big
customers or big organizations.
But anyway,
this will be interesting because average time to identify and contain
the breach, and the breach
is 280 days,
from which 207 days in average
is to identify the problem and then more
than 70 to contain the problem. So to fix
the problem, IBM and Polymon
identify the percent of breaches
by different areas. And in cloud misconfigurations,
we care almost in infrastructure as code.
So breaches caused by cloud misconfiguration
for them, for their report it was 19%.
So one fifth of all breaches because
of the cloud misconfiguration,
it start to be quite scary,
right? So what kind of cloud misconfigurations
we had and breaches because of this. So in
2019, Imperva had a breach
where they lost customer or lost they
lost customer records like API keys, TLS certificates,
et cetera, et cetera. Because of network misconfiguration,
hard coded API keys and not encrypted records in
database.
Imperva identified these issue after ten months.
Another example of misconfigurations,
capital one. Also, these same year,
more than 100 million records exposed,
bank account, social numbers of their customers, et cetera.
Why it happened because there were
misconfiguration in IAM policies and unencrypted
storages. Those reports are obviously available
in the Internet, so you can search for them.
There is a lot more of them to give you
the impression what we have.
So what kind of cloud misconfiguration we
can have really two types.
One is cloud misconfiguration itself, when we are
creating some resources or we are updating resources,
et cetera. And these second, very important
and very tricky
really, is configuration drift.
So I will talk a little bit about drifts in
some time, but right now we need to know that drifts
are definitely more danger
because they are not monitored.
All right. Okay. So what is drift, in fact?
So drift is, as I said, is unmonitored,
undocumented change in the configuration done manually
in most of the times, right? Most of the cases. Because if we care,
talking about change done by infrastructure
as code templates or CDK or wherever it
is monitored, documented somehow. Right.
But this change is done manually by someone,
no one knows when, where, why.
Right. And a very sad
fact, 90% of organizations
allow users to make changes without proper
process. And as a proper
process, I mean, even go that high that
you disallow or prohibit.
Let's use more strong
word. They prohibit access to
business accounts for users,
okay. I mean access to console,
to play with resources on the account.
And this is according to the state
of devsecops by accuracy. Formerly because they care acquired
by tenable. It's also for 2020.
So this fact is very sad and very unbelievable,
right? So we are talking here also about financial
companies, about healthcare companies where these information should
be very secure. So if you care. Scared enough.
We didn't finish yet. So now we are going to the
new report from this year and
it is interesting because sonotype did
a report where they talk with 300 professionals
and they have very interesting information. These.
So 36% of professionals suffered a
serious breach because of cloud misconfiguration.
So of course here we care talking about different group of
people. That's obvious. So that's why
we have different numbers. But I hoped
it will be lower. But it's not, right.
Very interesting is that almost
50% of teams had more than 50 misconfigurations
per day.
50 misconfiguration. More than 50 misconfiguration per
day. I say wow,
right?
It's hard to imagine really.
So what kind of misconfigurations they
find. So the most important, the most popular
ones is misconfiguration of IAM.
In all clouds, IAM are
the core of everything.
It's not quite very clear,
very well understandable for people
how to work with them. But the main goal
which we should look forward and try
to achieve is to have as less
permissions as possible, as required
even and go with policies like a
one to one. So one resource has access to
one other resource and if this resource needs
access to similar resource but another one,
it means another policy. Right? So we
are decoupling the policies
here. Yes. In these end of the day
we will have a lot of policies. There will be a total mess, total chaos.
But first of all, we are working with them through infrastructure as code,
right?
And we have proper naming and proper tagging.
This is also important. And the second obvious biggest
percent is security groups. It's very tempting, very easy
and very common that. All right, I need to test something.
I will just add my ip
to ssh. Well, I don't know what is my IP.
I will just put done.
Right.
So these elements are
quite important. So now
how those teams, those people are catching the issues. And this
is also interesting because 33% of these
do manual checks before deployments. And by manual
I mean they are looking on the templates they have. For example,
I don't know, code review or whatever, they are reading the templates
and they can miss stuff. Obviously,
27% of them do a post deployment checks
what is really
risky because this is already deployed.
All right, so we know what kind of
issues people have, we know how they
catch and try to fix the issues, so how much time they spend on
it. And this is also quite interesting and
I'd like to focus your attention
here. So fair percent of these 300
people said that these spend more than 500
hours per week fixing the issues.
Finding them and fixing it means that twelve
people from these 300 say that. So their teams
spend 100% of time
working with infrastructure as code security issues.
And it's for twelve people.
Work for twelve people only for this.
It sounds scary,
but we
are lucky here. And we have helping hands.
And with helping hands, I mean the
processes, the approaches, like for example, shift left.
What it means, this is like a paradigm
not only for software, which is obvious, right? Everyone is
saying shift left in creation software, et cetera, et cetera. But it works
also for infrastructure.
Thanks to that, we have very fast feedbacks that something is wrong.
Tools like SE stake also can be
relevant for infrastructure. So these static analysis tools.
And again, like with the proper development,
everyone is responsible for security, right?
We can implement open policy agents or any software
which are using this approach.
And this is project under a
CNCF. So quite mature and quite
stable. I mean stable from
the perspective of big organization. It's not something what is like
ifmrl, it is here right now and tomorrow will be
gone. Now this is like a policy engine
that automates and unifies implementation of
policies across the environments and allows
us to enforce, monitor and remediate policies
across all environments and resources, right?
And this is the one link which I want you to remember,
all right, what we should do, we should control
everything before we deploy. And here I'm
saying that if you have infrastructure as code and you deploy
it first on dev, then QA, then preprod, then RC,
then prove this dev environment.
These first in the chain is in these.
The importance of this environment is exactly the
same from the security perspective, like production. I know this
like how it can sound.
But let me ask you something.
Would you like to create a bank account
in the bank which has the exposure
of dev and qa environments recently?
Because I'm not. So if
template is deployed, it is already too late.
Let's put it in our heads
like daily scrum, we also have daily scrum,
always have daily scrum at 10:00 a.m. In our team the
same way, if we deploy templates, it is already too late.
All right, we should use dedicated tools to prevent
deployments with misconfigured resources
and use CI CD pipelines for it. Why? This is
important because CI CD pipeline can have first of
all implemented those tools and then stop fail,
break the pipeline if something is wrong.
So we have a lot of tools around,
to name a few. Chekhov, right? The only
one which is written in Python here,
Terrascan TFSec to written in Golang,
another one, CFN Nag, I think this is in
python as well, but I don't remember at this moment. So this CFN Nag
is especially for cloud formation, right? And snick
for example. So these, all of them except CFN NAG
work with terraform,
with Kubernetes,
with cloudformation except terra scan and TFSEC,
right? Obviously because the dame is even part of
the terraform. All right, so also
we have tools from Acurix which are quite
more advanced right now. They are acquired
by Tenable. So I'd like to demo few
of them, three of them really. It will be terra
scan by acurix, Chekhov by Bridgekiro and
TFSeC by Aquam at this moment. Right?
So this is the GitHub for terra
scan, for Chekhov and for TFSeC
and you can implement it in your own pipelines.
So let's see how they
look like. I have very
fresh installation of arm,
sorry, very fresh installation of vagrant
machine because I don't need anything more
here. So let's do first,
let's install those tools with Chekhov.
It is quite easy, right? So what we need to
do is pip install checkoff
probably I didn't have pip here. Yes, exactly.
First I need to get update.
So we spend a few seconds here on this
or even more, let's see.
So as I said, checkoff is only one tool which we
will test here, which is interesting.
All right,
this is quite interesting at this moment,
but not. Don't worry, I have a backup in mind.
Okay, let's try, maybe it'll stop it and.
Oh, all right.
Yep,
it, it. So I have an
issue here. So no worries,
let's do with it differently.
Right here I have my AWS
console. We will go to EC two.
I will create an instance very
quickly.
Let's use AWS Linux for it.
I need to use default
setting. I don't need anything else. All right.
Okay, this is all right. At this moment
for the security group, I will say this, I don't care at
this moment. All right, please do not
do that. I try to do
this. I'm doing
this right now only for the presentation purposes.
Okay, create new pair.
Okay, let's call it presentation
download. All right, it is downloaded.
Okay, so my instance is now launching.
Something wrong happened here with this ubuntu.
So we don't care about that.
Let's try to connect.
Okay,
it was presentation Pem, right?
And now what I need to have is the ip
of this machine and
it should work. Please work for me. Yeah,
perfect. Okay, so we are going to become
the root and we try to install checkoff.
Okay, Pip is not found. All right,
so what I need to do is to install Pip.
I don't remember right now what
it is here in Linux
in the red hat. So I know already.
Okay.
All right, got it.
Default problem.
Yeah, of course I should install it as
a user, but here only for this presentation purpose.
It's okay to have this.
Okay, so now I should have checkoff
install. Let's try, maybe I need to
do this a simple like that looks
like not. All right,
checkoff not found.
So let me just check where it can be
because it should be installed really in
our, all right,
let's come back to this a little bit later. Right now we will install
how I did it. Okay, Pip, install checkoff.
So should be here. Let me just repeat it,
just,
and now let's try to do maybe the same thing here just
to speed things up.
Oh yeah, I have it.
Right, so checkoff is installed. Right. So now what
I want to do is to install terra
scan. Okay, so this
is the turbo which I
need to install first download
obviously. Right, I have it. So now let
me unpack.
It's
okay. What happened here?
Let's try maybe with the name.
Okay, I have some issue here again.
For some reason things do not
want to work for me today.
So let me try to do this again.
All right, let's check. I have it.
All right, I have it. So let me try to,
yeah, probably some typo or things
like that. All right, let's remove the bundle.
Okay, now what I need to do is to
install terra scan into
UsR bin local
bin install of course.
All right, and now I can remove it from here. All right,
so now let's try if I have it. Yes,
version two without
dashes, just version. Okay, version 112
is installed. All right, so now the last tool,
tFSec,
let me copy it.
Okay, I have this one. Yes.
And as you can see right now, this one
is downloaded not as a package. But just like
executable or almost executable. So first
let me just move it to TfSec
to have more clear name.
We need to do ch code for it
and obviously we need to install it TF sec
into bin, usr uSR
local bin and
remove it from here. Let's check.
Yeah, we have it in version.
Okay, so what we will do now let
me check if I have git.
I do not have it, so let me just install quickly
git. What I will do, I will clone
a curix test templates
and we will see how they work on
it. Okay,
why this template? Because these prepared
quite interesting and
problematic templates. All right, so let's go to terraform
directory. And we are here. So how to
execute checkoff? It's very simple. We need to
put checkoff minus d means directory,
obviously, and let it be our current directory.
Bam. We need to wait a second or two
and this is the effect. All right, so as you can see a lot of
failed or fails and
these organization is
clear. You have all information here, what failed,
what resource, what file, in what lines
from where this file was called, what is
the code here? And also the guide, right, this guide means
some information, how you can fix it.
And on the top, on the top
there's a lot of it, as you can see. Come on.
On the top there is a summary, so summary should
be on the end. Let me do this differently.
All right, so as you can see here, past tracked is
44, failed tracks is 58
and nothing was skipped. Skipped means that you
can say that
specific checks need to be skipped.
You don't want to check them because for example Chekhov has
the scan which is checking
the description of your security group.
Maybe it may be an issue for
someone that this is not described properly,
I mean about AWS, but for
sure it's not the security issue, right? So here
for example, when I have my security group
for this instance,
let me show it to you. These, you see this description
field. So generally we can say what is the
reason of having this specific record, right? So this is Checkoff
alerted as a problem. Okay,
so this is checkoff. Quite nice.
Let's try with TF scan,
I mean TFSEC. So here the syntax is a little bit different
and you have even the information about the use of disk
iOS, about what was
the time for evaluation, runnings, et cetera, et cetera,
how many files were checked, loaded modules and
the results. So we found ten criticals, ten high,
medium low and ignored. As you can see there is a less so
TF scan found less issues but it
doesn't mean it is worse
tool than Chekhov. Checkoff is finding different
issues, right? There is like a common part of
issues that all free tools care catching.
But I did a quite extensive best
for all of them and results
said that I should have three of them in my pipeline to be
like 90% sure what I'm doing right.
So quite not
perfect situation. All right so this was TF SeC,
how to do with terra scan. Right now
we are saying something else, terra scan scan and we
are scanning it and terra scan takes the most
resources,
takes the most time. So it's these
longest run even if it wasn't
shown here maybe they improved something but like
one year ago this time was definitely
the longest because terrascan had implemented
API inside. So you were
able to run terrascan as like a separate tool which
is available all the time like Sonarcube
and just execute the API
call from the pipeline which is very useful
because you don't need or to prepare proper
image for your pipelines or download the tool all the time.
As you can see we have 20 high 13
medium and five low 38 policies are
violated. So this is how
they work in the most default options.
All right, I don't want to show you
definitely more complicated options
for very simple reason because I strongly believe that
the beauty lies in simplicity. So if
something works from default out of
the box that is perfect tool for me. So this
is how I run them just
through my Linux machine. So how
it looks in the maybe
first I will show you the templates
of pipeline which I created because I've created
my pipeline using cloudformation.
So this is the template, very simple.
I didn't pay too much attention on the roles but I
need to have few roles in order to have my
pipeline AWS code pipeline and I
have four blocks inside. One is mandatory
to pull source. I'm doing this using
my own created repository in the
code commit and then I have my tests where
one block is for checkoff, checkoff,
second is for here
TFSec and third is for Terrascan.
And here the configuration means these
like a general approach to
doing things right. So I have my phases during
the install time I'm installing checkoff,
creating the directory for reports. Then during the build
I do these proper test and as you can see here
I stream the output
into the file. Why you
will see in the moment the same, very the
same situation. Of course the installation process is different, it's for
TFSec and also I have the possibility
to send
output in junit format
into the file and the same for Terrascan,
I'm streaming it to the Terrascan XML file
right now. All free tools allow
us to use junit to
have proper reporting what is really perfect,
because when we go to code
build and I mean more like a code pipelines,
we have our pipeline here which is failed and it is
on purpose. So let me
just release this pipeline again right
now.
These mandatory block will pull code
code on. Yes it did it.
So let me
open all of those scanners,
these as you can see I'm executing them
in the same time. In fact it's not always these same time,
but generally the idea is to have them in the same time and
then of course I can have another next steps. But for this
presentation this is what I need really.
So as you can see there is a lot,
it's generally what happened, the same happened in
normal execution, right?
So as you can see right now, this command
was executed and of course it failed.
And this is also very important, all free tools can fail
your pipeline. So this is very
good, let's say. So I told you that
I send this to this file, to this report and
also I publish this report. So here
we can say we can see how the report from checkoff
looks like in the code pipeline or code build we
see that 20 checks
failed, 84 was positive. This is
for different repository, not this from macurics,
another one and
we have the full information what failed
really? Okay, so this is Checkoff. Let's see what
we have for TFSeC.
Download,
preparation and then we have the execution.
TFSeC is only one tool which is much more
happier if terraform is installed.
It can work without but it's definitely
more happier. So again we have the reporting,
let's see what we have in report.
All right. Eleven best failed, we don't know how many
passed unfortunately. And again we have the information,
these about everything. What about
Terrascan?
Again, download, installation and
as you can expect it failed.
Perfect what we have in a report,
right? 14 checks failed,
16 passed. So first of
all, only TFSEg doesn't show how many
checks pass, right? It can be
useful information if you build metrics on it,
on this report it may be interesting how
many checks in time, how much more checks in time
you have and how many of them failed.
So I
prefer these option here, not like this
one. And this one is also perfect. And so
as you can see, 2080,
411,
1416 so different
policies, different checks, different findings,
which one will be best for you?
It's up to you. It's really up to you.
You should go with all of them, check them,
look how these work, right?
And then decide
what to use. Right?
So this was the presentation,
but we didn't talk about one
important element, tools much about drifts.
Right? So what about drifts?
The challenge here with drifts is that
drifts have to be controlled continuously
and there is always someone who has elevated privileges,
right? So even if I say that we should prohibit it,
there is always the root who can do things and
we need to remember about that. And the
available tools not always cover all possible
changes. The best example here is the drift detection from
cloud formation which catch a lot of
stuff, but definitely
not all of them. Even from the list of
this you remember this percentage from this
list. These detection has problems and
important and very unpopular management
must understand that if these say to you
it must be done now, it means a lot of risks.
What tools we can use for drift detections. Drift Ctl
acquired by SNC recently Kubediv for
Kubernetes SaaS offerings
like bridge crew akurix. They have offerings,
paid offerings of course, for continuous monitoring
of your templates, of your misconfigurations
and your drifts.
So we should go into new level just
even before, beyond just scanners,
we should have everything as code. And it means that with those newest
offerings we are able to create it. To create automated controller
remediation processes
and security must be implemented on the earliest possible stage
for infrastructure, for everything,
right? And everything as code means here policy
as code means security as code means
drift as code with those offerings, right? With those
paid offerings and remediation as code.
And what I'd like you to have
as a takeaway from this session is that please do
not pretend that you have security because every
possible misconfiguration will be explored now
tomorrow in one week, but will be right for
a few years ago for wrongly configured
elasticsearch, it was like a couple of minutes
needed to have elasticsearch hacked.
A couple of minutes and security
is very complex and expensive.
Thank you very much. I hope you enjoyed the session
and I hope you enjoy the all
sessions from this conference. Thank you.