Transcript
This transcript was autogenerated. To make changes, submit a PR.
Mondo and I'm super excited to talk about security as code
at Conf 42 cloud native 2023.
I started my career at Deutsche telecom parental team mobile,
and one of the biggest challenges we had was really securing
a large of critical infrastructure
across whole Europe.
One way to do this was really going with full automation. So we
had an internal project going on where we
tried to figure out how we can automate
all those different security requirements in a way that
can be applied really easy. A lot of
people try and they said, no,
this is not going to work, don't try it, it's a waste of time.
And we said, challenge accepted, let's just try.
How can we implement security in a way so that it
can be used for production environments on a daily
basis? And the trick was to make it,
we call it practical security. We made it so that everything was
tweakable. It has a good amount of default, so you could work out
of the box with this. But it was focused on production,
so it had definitely most of the security features
enabled, and that was really, really successful. What was missing was
the insight of how good am I doing across my fleet? What's the
next thing that I should start? Where are the servers that are not fully automated?
And getting those visibility capabilities
into the overall infrastructure is actually quite complicated.
So what we started then was the first policy
as code engine called Inspec. And we started this
project in 2014 15,
where we really helped companies to automate
all those like pdf long boring requirements
in an automated way so that they could quickly assess
a huge amount of infrastructure in a very structured way.
That company was quickly acquired by chef Software,
and I was leading the engineering team
for compliance and made it really
big so that chef
sold Inspect to Fortune 500 companies,
the Department of Defense, and so on. So I have a
strong background in policy as code, and we definitely look
into how policy is. Code can help us with security
as code going forward. So when
we think about security as code, what is it?
And security as code is really the practice of integrating security
controls and tools, essentially into the
software development process. And we
will see how we do it. And the first question before is,
really, why is that important?
Why should I care about security as code? Covid? It's more effort.
I need to do something in my pipeline. I have already my production
environment secured.
Doesn't make sense, right? So also,
hackers are cool, they're going to help us. So that's
how it used to look like. But nowadays it's more like this.
We have ransomware everywhere and that's not
individual hackers. Those are ransomware gangs. They try to do it
everywhere and it behaves like a company. They have sales
pullers, they have sales playbooks, they have customer support,
and they have affiliate programs. So essentially it's
an illegal business making a lot of money. And that
is just easy for them because we have so much vulnerable
infrastructure out there. Okay, so now
we have a huge amount of hackers, a huge amount of crimeal gangs trying
to attack everything that is connected to the Internet. And then
if we combine this with the amount of yearly
published cves, a CVE is a vulnerability that is being published
publicly, and we see a 20% increase over
time, year over year. Just in the last year, we had
over 25,000 vulnerabilities that are known
there. We assume this is just the tip of the
iceberg and it could be really a lot more.
And that, just in combination with a lot of
ransomware stuff, is essentially something
that should concern us. Once a
vulnerability is being detected,
hopefully by a person that reports it properly to
the vendor, they report a CVE, they get a
CVE number, then a CVE get assigned.
The vendor hopefully creates a patch very
quickly, and once the patch is being out, the CVE
is being published with more details, what went on,
so that we can learn from it before the patch is out.
Like we call it zero day exploit. And after the patch, we call it
just exploits. The interesting fact to
see here is that 25%
of all the cves have known vulnerabilities.
That's a huge amount of those 25%,
90% are available within months
after the vulnerability has been published. So essentially
a lot of attacks are exploitable within
30 days. This is in contrast
to how we roll out fixes. And we roll out fixes very, very slowly
across the industry. And it's not because
we don't want to, it's because it's very, very complicated. Like first,
the identify step. Just imagine in your
infrastructure, how quickly can you identify where a specific
package or misconfiguration is being applied in your infrastructure?
Really, really complicated. And once we have done this,
we essentially help. Like, we generate a report,
a lot of reds, and out of those reds we generate tickets.
Those tickets will then be worked on, hopefully being fixed soon,
and then after a while, it trickles down into
production. It needs to be tested and works in combination.
And then, so the rollout essentially is slow.
According to studies, we see
that this process takes very long. So it takes 246
days to just get a vulnerability that
has been fixed, rolled out in our infrastructure. And that is on
average, so we have 30 days versus
246 days. It's a huge gap.
So if we see this in combination,
all those issues outpace the fix,
right? So we have a yearly increase in vulnerabilities.
Hackers go full on automation like they
can scan the Internet in three minutes and the rollouts are slow.
And that in combination is giving the
ransomware gangs just a really easy pattern to
attack a lot of companies. And we see this in the numbers.
Like if you ask a little bit over 1000
it and security professionals, 80% of them
have been victims of ransomware and more than 60%
paid ransom. And that's a huge number.
And that's just caused by the
amount of vulnerabilities and the slow response
that we have to fix that.
The main problems we identified here is that first
stuff that is updated has not been patched, it's not even
unknown, it's just it hasn't been applied. And then known misconfigurations
have not been avoided in production. So it's really those two
that make more than 90 or 80% of the attacks
possible, and that's totally avoidable. And if we
just reduce it from 100%
to 20%, the attack vector for those ransomware gangs
is much, much smaller. They need to do a lot more work. And then
this essentially made the business may not be more viable and
we just need to make it way more complicated so
that the amount of VTex also are going to
be reduced. And the
challenging fact here is it's not just that
we don't want to ship fixes very quick,
it's just how the tooling that we use on a day to day basis
helps individual teams. If we look at how the
software delivery works in general, like we have the platform engineers, they work from local
development, go to source code, put it into git,
then it takes GitHub actions, then it goes into pre
production and then prod, hopefully with terraform. And so we see a pipeline
trickling through. The development team mostly
spends time with the source code, local development
and git, while the security team
really focus complete on the other side, they focus on the production
environment and. Right, so the attack vector is really
on the production side and so they need to secure the production environment.
But you see like the focus is on a completely different end of the
whole spectrum and that leads to issues.
And we can illustrate that on just one simple part.
Let's look at cloud storage buckets and we don't
want to have them public in most cases so the
security teams naturally is going to AWS,
Google Cloud Azure and just says, okay, is that thing
configured properly? And if it's not, they say, hey team,
I need fixes here, please roll this out.
While the engineering team, they think in terraform they
automate things. So the language of
how we communicate, it's different. So the testing tools that
work for cloud don't help developers and that's why the
feedback process is
really slow. We essentially need to deploy to pre production production
before the tooling starts that security is
using, so that then they see things in local development,
which means we have to deploy a vulnerable software
in order to detect the vulnerable software so that we can essentially
fix the vulnerable software, which is not
making any sense, we expose ourselves to the
other world without any need just because
the security tooling is not up to that task and avoids teams
to get feedback early. Wouldn't it be nice if the team
already sees in their local terraform configuration that
hey, this is not right, you should really do this different.
Hey, in pipeline it blocks the pipeline.
If your terraform HDL is wrong, it DevOps the pipeline.
If kubernetes manifests are not configured properly.
So we have those individual tools, but it's really not helping
with the communication because as a company you agree to a rule
set. So even the bucket like you've seen, like the language is different.
We can check that individually, but what if we need to make an exception?
Then that triggers where it needs to align,
right? So otherwise you can have tooling here,
you can have tooling there. But in order to implement security as code properly,
you need something that helps you smoothen the
process going from left to right and essentially align
you on one common knowledge. And that is what terraform
has done for infrastructure. You have this really going from, you develop
it locally, you have this state file, a plan file, and they push it
in production. You can use multiple environments and it makes it really
easy for platform teams to
go from local to prod.
We don't see this right now in the industry for security and so
it's really difficult for them and it leads to massive amount of frustration.
As I said, the platform team says, hey,
wow, you should tell me how I should do it in terraform.
It doesn't help me if you say you need to configure this in the
dashboard this way, because I automate my software,
the security teams really go the other way. They say,
hey, what's wrong? I tell you all this stuff all the time again,
so they don't see any progress? It's super driving people crazy.
And I just want to
say, this is not as humans, we don't want to work together, this is
the tooling drives, drives the complexity so
that we cannot work together very effectively.
And in the end we just say, go to management and say,
hey, those security is blocking my pipeline. Or security says,
oh, engineering has done it again like super
wrong, so we can't deploy it. So there's always this fight
between the different teams just because the tooling
doesn't fit together. And it's something that we need to figure out.
So let's think about the solution, right? And if
we think holistically, we first of
all need to think about the whole tech stack. When we as a security person
look at our tech, we have to secure the whole thing, we need
to secure the cloud environment, we need to secure our kubernetes cluster,
then the cluster configuration, everything that runs inside of the cluster,
going from workloads to application containers and
having that unified view helps you to prioritize the
risk and helps you to really focusing on the right thing. But that's
not enough. We also need to look
into the pipeline because from security perspective we have
seen many, many supply chain attacks. And you need to move the
security into the left side. So security starts with
local development, is applied in source code,
CI CD, and then that's where we see all those anchors
for security as code. You essentially need to go, it starts
on local development, it goes into git,
checks into GitHub actions, everything is secure there
and then it deploys into production. It's very important that you do
this on every individual step. Even if I have a local
development, everything being checked, let's assume I do a supply chain
attack. I can manipulate all the things that have been checked then before
they run into production. So that means even if it
looks good in git, it's being deployed, manipulated and then
applied completely different in production. I think it's great in git.
It's correct, but it's still wrong in production. So no matter where
you are, you can always start manipulating it.
And we need to really focusing on securing
every layer and every phase.
We already touched a few things. What we need for security as
code in order to apply this really structural,
you start with static and dynamic testing.
You really want to check terraform Kubernetes manifest in
the local development phase. You want to check that also in CI CD to
always make sure the way we define our infrastructure
is up to the task. It's really meeting the best practices.
And that's amazing. The next part that
we are looking at is package vulnerabilities from
container perspective, but also from runtime
perspective. Every VM that we are running,
every laptop that we are running, they all need to be updated and
be up to the task. And as we
talked about, all the runtime infrastructure that needs to be checked
very continuously. Even if I check the container in
my pipeline, you end up in the situation
that once deployed in production, if you don't update it very regularly,
new vulnerabilities come up and then boom,
one week after you deployed it already a new vulnerability popped
up. So you always need this view across individual
fire s's of the CI CD pipeline. So the
CI CD pipeline is essentially the
foundation to implement security as code very effectively.
If you're not having a full automation, it's really difficult
to implement security as code on top.
The other part of the practice is really
talking about how can we as engineers establish
secure coding practices in our review processes,
make sure we don't have like we have input validation, we have proper review
of source code, and that helps us getting better beyond just the
static testing. And we can only, those things in combination
really, really drive the security upwards back
to our problem. Where we talked about individuals,
where security wants to check the cloud and platform
engineers want to check the terraform part, we really want to focus
on the problem, right? So the problem as a company is
we don't want any bucket being exposed
publicly, no matter if it's defined in AWS,
Azure, GCP or terraform. There's really the
foundational focusing on what is our goal
and how to achieve that. And in
manifestation, that essentially means we really need to see how security
can be part of all individual process.
We have seen this now, and I argued many times that we
really need security in all aspects of that, and not
just individually, but also consistently. So just plugging in
individual parts is not the solution. You really need a unified
view that helps teams to collaborate across
those tooling. Otherwise you end up really, in this situation that local
development has a checking tool. But then the rule for production
is really different, and then you still have the clash, you still
have data not aligned, you have agreements not aligned,
and that makes the world not better. You have more
information, more distraction. So the challenging part is really
combining the individual controls, the team collaboration
with effective tooling that helps you to build this
very fast. So if we
look at what infrastructure as code has done, we really
want something that allows us to do the same thing for security
as code. And one way to do that is having the
flexibility as in terraform or in the Kubernetes manifest for
security. And the way to do that is
using policy as code. You define the security practices
in a code that you can reuse and that should the
important part, it really needs to be as flexible as infrastructure,
as code. You really want to tie terraform HCL
with policy as code. You want to tie ansible with policy
as code. So you need something that really aligns completely
on that level so that we can see things in
our local development but also in our pipeline, and then see it
in production as a result. So that sounds
too good to be true, but let's see how we can implement that in a
second. We have
seen a few things that are super
key as an organization if you want to apply security
as code successfully in your organization.
The first one is all the vulnerability.
Misconfiguration information needs to be available
to all different parties involved. Platform engineers,
security engineers. They all need to have the consistent view
and access to the tooling so that you don't have long cycles ranging
from you need to deploy it to production just to get the report.
The other one is coverage. The security tooling needs
to support built in runtime. Otherwise it's not helpful.
If it's not including both, it's leading to two
silos. Still, the rules are different and it's
not helpful. That still drives all the craziness that
we had before. The next part is automation,
building security as code. The primary goal here is to build automation
security into the automation process. So the security tooling needs
to adapt to the process you have already. You need to integrate
that into the pipeline so that you can easily do this everywhere.
And then of course, as I said,
extensibility. And this can be achieved through
policy as code where you really define individual
roles on your own. Hopefully the tooling provides
also out of the box policies, which makes
the kickstart when you're trying to implement that.
I'm going to showcase how we do this with our open source projects
to help companies to be more secure. One part of the
CVE discovery, what I brought up early on, is identifying
where things are. For that we have a graph based asset inventory
where you can use GraphQl to query the whole infrastructure,
ranging from Windows, Linux, AWS, Kubernetes,
and even terraform and Kubernetes manifests.
So that gives you an easy way to quickly question
where is what. We will focus right
now on CN spec, which is essentially securing
everything from development to production and to
see it in action. Based on the use case we had earlier,
how do we make sure a bucket is not being exposed
in this case for terraform, we really want to make
sure the block public acls is enabled as
well as block public policy. And we want to enforce this
across all the buckets.
This is good for runtime, but we also said we want to have this
in terraform. So in the same graphql based language
I can query all the terraform resources, can check for
public access blocks and can check are those arguments
being set properly and set to true? If not,
we already see this in our development part, like on the local
IDE, that things are not configured well.
So if we now deploy the automation with Terraform, it would
lead to a public bucket and
that is something that we want to avoid. So doing that early essentially also helps
security teams not to worry.
And one of the things to make this
work in combination is we want to define a
policy that is totally consistent across all those different teams
and it's focusing on the implementation check. So the
bucket should not be public. And based on the technology,
the same check has multiple variants. It essentially says,
hey, if I'm checking terraform,
apply the terraform check. If you apply it against
AWS, check the AWS rule.
And that makes it super, super easy as a team to
define the rule set that you want to have to make your
infrastructure secure. Now you can combine essentially
going from left to right and to kickstart
it. You really don't want to start building everything yourself.
But instead we publish a huge amount of policies.
We have over hundreds of policies available with
more than thousands of checks that help you to start
being secure. Now check out the mondoo registry.
We have open source policies and query packs available
that really like you can apply it. Today it's open source,
don't wait. Secure your infrastructure.
And that really helps us as teams
to be more secure. I just
illustrated very quickly that you can now apply check
in local development. You can check your terraform in your iDe,
you can check it into git, you can check it in a GitHub action,
you can run it in production against an AWS account and then
do it also in production. So now not just,
well, the platform engineers use Terraform to
do all the things. The security team can now work with the platform engineers
to assign policies for the whole pipeline
and really make it work so that you as the
whole team are being more secure.
So we are working for a company that makes security posture
management and that helps you to be
more secure. We built a platform that we are using. We are
using all automation, kubernetes, terraform and we use the product
on a day to day basis to secure us. But we work also
with a huge amount of large enterprises in healthcare,
financial and manufacturing to secure their infrastructure
and have a learning there. So thank you very much for
listening to me. Hopefully this was helpful. In case
you have any question how to apply security as code in your environment,
feel free to reach out and let me know. Happy to help.
Thank you very much.