Transcript
This transcript was autogenerated. To make changes, submit a PR.
Security, chaos, engineering, how to fix
things you didn't know were broken and make friends while doing it. The things
I'll be talking about while sounding like a job for offsec people is not really
an offset role. I'm not particularly interested in knowing something doesn't work.
We all know that there is more than enough people out there to tell you
that the goal is to not find all the broken things. The goal is to
understand how things work before you a team to break things part
one nothing works how we think it works.
Maybe it's me. If you're looking for the smartest person in
the room, just shut this off. Just go watch a different one
because you're looking at the wrong person. But if you want to feel better about
yourself, let's hang out. I don't really know how everything
works all the time and you might and that's really awesome. But individuals don't scale.
The outside world's a scary, scary place. And the people pulling the big decision
levers are constantly being fed information about the horrors of the outside
world and the threats looming on the horizon.
They're hearing about the magic dust that solves the world's problems
that they didn't know that they had. They protect the business from the next big
nasty thing. They're going to ask their teams, are we safe from that threat?
And then they're going to hear, we've got signatures in place and configurations
made to stop those threats. But that doesn't really answer that question.
The answer should be, yeah, we've modeled it. Here are the results. We arent
tracking the resolution of those gaps or like,
I don't know. But here's how we're going to find out. Everyone working
in this industry, they're constantly bombarded with ads meant
to terrify them into buying whatever jalapeno nacho cheese
flavored security tool that says it's the only real way to prevent
all the cutting edge quantum AI threats that arent continuously
trying to eat their lunch. Vendors make us worry about
the hypothetical next gen problems while the core of the business still
struggles with fundamentals that have been placed since 19
eight know where your assets are, where the data lives,
who can access the data? Taking backups, validating the backups
work, protecting information and keeping the nasties out
should really most of us be worrying about quantum
ransomware while we're struggling with managing role based access control.
My mind is still blown from hearing about vendors talking about
bloodhound and how they can detect bloodhound. It's like great I'm glad
it's super noisy. It's like, I hope you can pick it up.
Maybe next we'll talk about detecting Mimi Captain's default configuration
I was having trouble sleeping a few nights ago and I decided to look up
vulnerability trending information. So I took one look at the trending chart
and said, wow, I'm glad I'm not involved management saying
that it's probably going to be gifted with a project
around in vulnerability soon for cyber karma.
So going back on topic, it makes me wonder if we're actually improving
as an industry. And based on those vulnerability numbers, we're clearly
good at identifying these issues, especially in 2017.
Minor change, their assignment process made cves easier to get.
It's like a nice birthday gift to vulnerable management teams.
Verizon Dvir regularly shows misconfigurations
as the primary cause of breaches, not vulnerabilities.
Is the time that we're spending on patching improving our ability to defend?
Probably. But it's also true that misconfigurations contribute
greatly to the security issues that plague us today.
When you've got all these issues just dumped in front of you,
it's really easy to buy a tool and say, okay, I've got the industry
standard tools, so I don't have to worry about this. This is covered.
I don't have to think about this anymore. Then you only find out later
that it might not be working the way that you thought it worked.
As we buzzily declare that we're shifting left to
embed security in all the platforms at the beginning, trouble is kind of brewing.
Like how do we know that the developers are making secure code? How do we
know that the libraries that we rely on aren't actually held together with tinfoil
and bubblegum, we get more tools to tell us it's
extremely complicated and none of it's easy. As we do
our work, these systems constantly change and constant changes the chaos experienced in
day to day work. Chaos engineering security chaos engineering
help design, build, operate, and protect these complex
systems that are more resilient to attacks and failures.
A lot of these tools exist because it's kind of hard to afford the folks
that have time to dig into the root cause and make real change.
Arent you afraid of malware? Buy Av a lot of these successful
attacks are broken down, can be broken down to the basics like
don't let untrusted code execute. Don't let folks run untrusted
macros from excel docs. But if you do, make sure the one time it happens
isn't going to bring down your whole environment.
Arent vendors where we should be focusing our efforts to secure
things, or should it be in people? While I'm somewhat excited about AI,
I'm not really excited as being trained on the same thing humans are being trained
on. Probably stack exchange or outdated Reddit
posts or materials that help you get what you want, but it's not done securely,
which will probably result in creating broken or
insecure applications at a faster scale than a human can.
And even the fundamentals are simple, but they're not easy.
Even paid for tooling is complicated, like how does it
work with deployment pipelines? Can it work at scale? Can you get your
data out of it? Or are you essentially stuck with them for life?
Token management, like things that are considered table stakes and basic
fundamentals are still difficult even without the smug jerks clumsily saying
just turn it on. So what makes things thing challenging?
Like in some cases we don't know why we're doing it outside of third
parties saying it's best practice, follow best practice.
These basics arent really basic, they are kind of
hard. If you don't believe it, you can ask yourself like why
are there so many companies struggling with implementing native
endpoint controls like a firewall, and instead rely on vendors to
cover that gap for them? It's easier with the resources that are available to
them to just manage a portal and not the entire platform.
Have you ever really tried managing something like an app locker
or Windows defender app control? It's got poor
usability, poor user experience, and that makes it
far more challenging than it should be. If you need to be certified to
use a product or go through 40 plus hours of training, it's probably
not well designed. And things that aren't well designed are
prone to failure. Good security should not just
be for those who can pay to play, it should be the built into the
foundation. But regardless of who runs it, we need to understand how the tools
work, if they work, and ensure they continue to work. And importantly,
do they work as expected and to accomplish things, to accomplish
things, we need foundational work. We need to understand what we're
trying to protect. We need to know what is the goal that we want to
achieve? Is it top ten adversaries? Is it opportunistics?
How the tools work doesn't necessarily have
to be deep weighted knowledge, but in general were are the weak points
in your detection? What does it not cover? Like, what can't
it see? Are they being used for their intended purpose? If they
work, sometimes things are secretly a beta
with pretend machine learning and AI that only work under certain conditions.
We need to make sure they continue to work. So whatever
value is being taken from these tools, that's the baseline of what's expected.
It needs to continue doing that thing always.
I would argue that so few know how things
work that it's more scalable and acceptable to say we manage that
a part of our portfolio with Vendorx. And the more
I learn about something and how things work, the more I'm amazed
that anything works fix things world at all. So we
need to understand how these tools work. If they work, ensure they work.
And importantly, do they work as expected?
Because using security incidents as a way to measure this detection,
while it is a method, it's not a great method to measure the ability to
detect. People are stressed. There's a lot going on.
Researching gaps during this time isn't really recommended.
Obviously you can do it in retrospective, but these are still stressful times,
and a bad retrospective can end up being a game of blame the human,
don't blame the human, blame the system. Security awareness already puts
enough responsibility on the person. Let's build resilient systems
that can withstand an attacker. It's like shaming people for clicking
on phishing. Like if clicking on a link just ruins everything for you,
it's not the victim, it's the system.
Stop shaming user for clicking on things. When things break in
real life, we generally analyze. When we're wrong,
we redesign it, we model it and we build it better.
But when we think people are the cause, we just typically blame them.
Like cybersecurity awareness, we're well were of cybersecurity.
So let's build platforms and environments that are resilient enough to
withstand a user clicking on a link. Part two secure
today does not mean secure tomorrow. In real
life, you only know how the systems work. The moment you deploy it,
it gets presumably been tested, documented, everyone signed
off for deployment, and chaos been set free to run through the fields of
cybersecurity. But then tuning happens, patching happens,
priorities change, humans get reassigned work, people leave.
This is where the chaos comes in. Chaos engineering.
We arent not making the chaos everyday. Work is the chaos,
and we're working to make sense of how things change and why
planned and approved changes can fundamentally change how a system
operates. An example,
software development constantly invents new ways to
change. How systems work usually features performance,
usability, so on. Oftentimes, vendor platforms aren't equipped
to move things quickly. So in these cases, what do you do, do you hold
back their releases? Do you stick with older versions of things? Sticking back with older
versions of things means you could have more vulnerable software.
So when we're looking for reliability, we know how it operates, we know
what failure looks like, and we can validate multiple layers of defense to
cover the gaps generated by the failure. Trying to
prevent failure doesn't work, and it's likely never going to work.
You cannot prevent all failure. Everything's going to
fail at some point. So it's our jobs to understand how they'll fail
and plan accordingly. And these experiments help us understand how
something's going to fail so we can learn from it, and it will enable us
to build systems resilient to cybersecurity failures.
You get something dumping lsas and you have got a whole slew of controls
that have failed and questions that come along with it. How did it get there?
Why did it run? Was privilege escalation needed?
If so, why did that work? How did it get through our network defenses?
Does that mean we need to revisit configurations to do this?
I enjoy building attack trees that help model these scenarios and
guide me in creating new experiments with visual aids to help
with telling the story to other humans and help
me not look insane while I'm doing it.
Part three experiments in madness KPIs
super hot. Key performance indicators
help set benchmarks, measure progress, set business goals.
There are nearly unlimited numbers of terrible KPIs floating around,
or floating around around there based on things
that you can't control, like KPIs on intrusion attempts,
numbers of incidents. Time to resolve. These can be gamed or just
inaccurate. You can't control how many external entities are
trying to ruin your parade. Something like a meantime between
failure, those I can get behind. Every successful
event on the endpoint is a failure. App control stops something. There's a
failure at every previous layer, and that's just where the work begins.
It's where we need skilled humans. Anybody can push the go button, but you need
folks in place who understand what the button does and what it all means together
to understand it. We need to tie it back to what we care about.
What are the customer concerns? Are they worried about threat coverage?
Defensive resilience? Needlessly duplicative technology?
You're going to want to take the time to figure out their goals, especially if
they aren't sure what they need. So when you're looking at it, what concerns
you? What threats concern you? What does your coverage look like?
Can you safely withstand the attacks from these groups? How much failure can
you safely withstand before you have a really bad day.
Testing versus experimentation. Testing implies a
right or wrong answer, and we're not looking for right or wrong. We're looking for
new information to help us make better decisions. I want to know what
works, what doesn't work, why it works, how it does, and what cost
effective improvements we can make that will make other teams work harder.
I'd like to know that the money we're spending is providing value beyond
compliance. Checkboxes. Starting with design, what do
we think will happen? Assuming there's a documented expectation,
we can safely wager that the platform will do what's promised.
Tons of people will say truly and verify, but I don't really trust
claims. I want to test the claims and gather evidence.
It's admittedly a lesser exciting part, but we
don't want to eat the dry ramen and then just drink the water afterwards.
We just want to tie it back to business goals. So in building an experiment,
we want to start with a nice, reputable source to use.
I generally go with either intel sources or map it out in
front of me. But if I'm not emulating a specific threat, I'm looking at
each stage and finding out what's available to me at
every single part. Odds are soon you're going to find that most adversaries
look extremely similar to each other. Maybe they'll deploy
something differently, but in the end, it's just untrusted code executing.
It's like 22 Jump street. You infiltrate the dealer, you find the supplier.
Just do the same thing.
Mostly. Obviously there are edge cases.
So DFIR report usually has great breakdowns of adversarial
actions that you can take to plan your experiment. But don't use actual
malware because that can ruin your day. You can
use icar or mimicats, things that aren't inherently
destructive and won't get out of your control. Like,
you can even make AV flip out by appending mimicats commands after calc exe,
because next gen AV likes to pretend it's learning everything, but it's going to trigger
on those words. I wonder why. This is just a visual example
of a couple different tests where we expect to see some observables,
sort of a scorecard, if you will, about what you think you
should see and where. And doing things is noisy. In this
example, we're running a couple of atomic tests. In this scenario,
were seeing a file download, execution,
and local discovery. And when these typically run, you're looking for endpoint events,
which makes sense because that's the easiest place to look for indicators of attack.
In reality, downloading a file should provide no less
than two to four pieces of telemetry through your
stack, or even nothing, depending on how your organization is
running. Point being, when you do things, you should be
able to see layers of defense in action.
So questions that I have from above would be why are these
files allowed to be downloaded? Is that normal for that user's Persona?
Do people typically download things out of band? Do they use
Powershell to do it? Why are they able to launch setup
from the desktop? Is that normal? Like pretend it's malware?
Shouldn't the proxy handle malware at that layer? Do people normally
start services in the command line? The answer to all that could
be yes. But if you don't know that, it's difficult to say whether your tools
and platforms are working as expected. We're going to be filling this
out later. It's a quick, high level overview of what the experiment
ultimately looks like. At its conclusion, we'll have
an easy to understand view of the overall status of the
cybersecurity tooling. So this one
is relatively straightforward. Take your outcome,
your goals, validate your position. If you deploy a thing,
does it do what it's supposed to do? You don't want to just assume the
outcome. You need to exercise, it needs to be exercised.
And getting started is probably
one of the hardest parts. So in order to
sort of guide this like mitre defend framework, it's going to help if you're
unsure how your defenses measure up. Attack helps if you've
got absolutely no idea how to get started or what an attack looks like.
If everything is super awesome, you've just created a baseline to
measure future changes against patching, tuning whole
platform replacements. It can all be measured against that baseline.
One key note, this baseline should be changed over
time to reflect positive changes. Otherwise, you'll be
measuring the wrong thing, and you won't really know if you're improving from
the previous iteration. If everything's not super awesome,
well, you now know where you've got to start.
Part four measuring stuff the
phrase we don't know what we don't know is an traditional
annoyance of mine. It's like you're throwing up your hands and
giving up. Presumably there are things that have
been done outlining what you know about your environment. Even a bone
scan is a start. Whatever that is, that's what you know.
Everything else needs to be tested, identified and measured.
Whatever you know, that's your current baseline. So to put
visibility in, expect to put visibility into perspective.
What if they could only guarantee that 80 or 90% of the time your cheerios
weren't poisoned or contained bits of rock and glass?
There'd probably be a big investigation. So we
should not settle for 80 or 90% visibility. We don't want
to be eating glass. Cyberglass. I don't think cyberglass is
a thing yet. And the view of experiments and practice.
This is currently my favorite view of the world. It tells me exactly what defenses
are working for the given technique. Now, this example is purely fictional,
but the idea is that you take a view like this and guide it to
where you're going in the delivery stage. You can see the payload was blocked at
the endpoint. Arent mission accomplished. But why did it get to the
endpoint? That means there were failures at the other layers coming earlier
in the chain. Why didn't it get stopped earlier? Was it a misconfiguration?
If nothing's wrong and everything is working and expected,
is that normal? We don't know. That's what we're going to find out.
And that's the beauty. In experimentation. You're learning about
how things actually work rather than guessing based on a configuration.
And once you can do it that first time, you automate it. And now you
have ways to identify drift. Over time, you're able to measure the impact
of changes and avoid mistakes later on that are more expensive to fix,
like PG and E. Fixing the infrastructure
50 years ago would have been way less expensive than making it for deferred
maintenance. Buyers homes lost and eventually rate
raises that we have today. And it helps you make friends.
Friends like these. So this was AI's attempt to
make five half human, half ravioli people.
There are six here. I had some free tokens that were expiring,
and I needed to make some art for this presentation.
Security chaos. Engineering should not just be limited to engineering.
Teams don't think they're the only ones that can have all of the fun.
Hunt teams. Hunt teams can validate
analytics to let them know that in lieu of an active compromise, that their analytic
is still functioning. Parsing results can let groups zero
in on where detection is missing and guide hunt activities to where the gaps live.
Architecture teams or groups can be looking for duplicative
technologies, maybe reduce tech tech debt.
If you don't know where things overlap, then when you make changes, you might
be losing critical tooling. You could also find that you've got too much emphasis
in one place and not enough someplace else. And this is how you can find
out without waiting for an adversary. Operations teams want to note
that the tools they rely on to work through all the nonsense of maintenance of
updates. They want to make sure their alerts still work. They want to know where
the holes are that they need to worry about. And our product
teams everywhere. They want to know how their portfolios
perform, whether they provide actual value to their customers.
Knowing that your product works enables
better decisions when it comes to the renew replace cycles, and will
help give you a baseline to measure against. It also helps validate
that a new product will do what it claims, and you don't have to take
their word for it. Or if they say, just look at the minor attack evaluation
results. You could see we saw everything. Yeah,
sure you did. I don't want to know what the best tool is.
I want to know the best way that we can reduce the impact of a
malicious entity getting into an environment. And that is what
we can do with the experimentation.
Part five making friends along the way
and I will wholeheartedly,
wholeheartedly attest to making friends.
Director director director director director of security chaos engineering
become popular like popular with audit
second line functions, which you might not want unless
you really like writing responses to their requests.
Because talking to them and giving information means you'll be
writing responses all the time, because you're bringing real life views into what's
typically a box checking exercise and
making other programs better. It helps you make friends,
and that's really what working is about, isn't it? Work is really
all about making friends in the right places to help you get the things you
want done, actually done. So in the example, I'm going with compliance
now, everybody loves compliance. And many times when
asked about the security program of their business, they'll answer, we're HIPAA
certified were PCI compliant, which, if you're listening
to this, you know exactly what that means. But it doesn't have to mean that
compliance can be made mostly painless. And director of security
chaos engineering support measuring
those tools it helps measure the effectiveness of your
program. Effectiveness of your program opportunities to
improve your program communicate recent achievements demonstrate
stewardship of your resources show how your team supported objectives
of your organization, possible actions that you want others to take
to improve. And by doing it all the time, it's not an exercise in finding
the right evidence at the right time to show compliance. It enables you to
be compliant because you're secure, not secure because you're compliant.
Security chaos engineering helps you ask better questions.
Questions like, does our endpoint security stack proactively reduce
the impact of unwanted activity by malicious actors.
How do we minimize the impact of ransomware? Do our network security
tools effectively provide defenders with data necessary to investigate
and or mitigate unwanted activity? So we're reframing
the question to ask what the most effective use of resources would be,
whether it's tuning implementation build versus buy instead of just
asking what's the best av we can buy?
Making things palatable for others will make your life easier,
and it takes patience to do this kind of work because it's not as common
as it should be, or hopefully in the future will be. You may get
empathy from any sres out there when you explain
hey, here are my challenges, how do you solve this?
And they'll smile, they'll nod and
maybe smuggly chuckle first time.
So get started. Start small with business
goals and use examples they'll care about. Start by experimenting against
controlled endpoints to validate they function the way the owners think.
Give them data to build KPIs to show how awesome they are.
Be nice. If you're not nice, nobody's going to want to help you
trace back why something doesn't work if you're mean, it's really easy
to get sidetracked by finding oddities. Don't go chasing waterfalls
and beware. It's work that people don't even know they need
until it's done. And success means more work because that's
really what life's all about. Life's about succeeding and
generating more work for yourself. Our ultimate
goal is making things better with proven, well designed functional systems
that provide depth. It increases the cost of attack and it's resistant
or resilient to failures. And I believe
in you, so thank you for listening and or watching,
and I hope you learned something from this. And if not, I made a
terrible mistake.