Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Hey, all. My name is Christopher Weber, and I am the director
            
            
            
              of product and IT operations at Open Raven.
            
            
            
              We're going to talk a little bit today about staring
            
            
            
              into the data abyss and how we can achieve a higher level of
            
            
            
              cloud security. Mostly so you can sleep better at night,
            
            
            
              because I think that's really the critical piece of understanding
            
            
            
              how we might better sleep at night just to not have
            
            
            
              to think about all these crazy things. Because the
            
            
            
              reality whats I think that a lot of us are dealing with at this point
            
            
            
              is there is just so much
            
            
            
              data, and I mean so much data.
            
            
            
              It's really easy to think about when we look in terms of things
            
            
            
              like s three. I think using that as kind
            
            
            
              of our starting point, it's really interesting to reason about just
            
            
            
              how much is actually out there. So, first off, think about the number of AWS
            
            
            
              accounts you have. Even in my small organization,
            
            
            
              we've got over 30 accounts, and each one of those ends
            
            
            
              up with a bucket per region, per config.
            
            
            
              So AWS's config service, you add to that things
            
            
            
              like cloud trail,
            
            
            
              cloud formation, all that sort of stuff, and we haven't even started talking
            
            
            
              about your actual data and the applications that
            
            
            
              write and how all of that plays together. So it's
            
            
            
              really incredible just how much data ends up in the
            
            
            
              cloud, if you will. And I think it's worthwhile taking
            
            
            
              a step back. Right.
            
            
            
              In the old days, for those of us that have been around a little
            
            
            
              while,
            
            
            
              the data kind of protected itself in some way, right?
            
            
            
              You had to get into the system before you could get access to
            
            
            
              the NetApp filer, to the big
            
            
            
              EMC boxes, because you had to actually have access
            
            
            
              to those systems, whether they were sitting via NFS or
            
            
            
              via some fiber channel loop, you had to have access. And not
            
            
            
              just that, but the data could only grow so big
            
            
            
              because one of the things we realize is that in
            
            
            
              those environments, it really was a lot more
            
            
            
              about, you could only afford to
            
            
            
              buy so many shelves, you could only afford to add so
            
            
            
              many controllers because it was so expensive.
            
            
            
              And not to mention the upper limits of those systems, right?
            
            
            
              You could only get so much space on a given filer.
            
            
            
              And I think this is where it becomes really interesting
            
            
            
              and important to think about how much the world
            
            
            
              has actually changed, because it wouldn't be so bad with
            
            
            
              all of this data and the unlimited ability to write,
            
            
            
              except all those darn breaches.
            
            
            
              And I think when I look at
            
            
            
              s three in particular, and we can
            
            
            
              talk a little bit about rds as well. We can talk about your
            
            
            
              elasticsearch servers and pick a different cloud.
            
            
            
              Right? If we're talking about Google and Google Cloud storage
            
            
            
              or bigquery or any of those sorts of things,
            
            
            
              you have similar sets of problems. But at these, end of the day,
            
            
            
              it really boils down to we have to think
            
            
            
              about how we protect these environments a bit better.
            
            
            
              And I don't want to belabor the point, but I think it's
            
            
            
              important to really think through all of the
            
            
            
              breaches in these environments. Right? So Corey Quinn
            
            
            
              from last week in AWS or the duck bill group,
            
            
            
              as part of their last week in AWS newsletter,
            
            
            
              they regularly call out this bucket negligence award.
            
            
            
              And it's really interesting to me to
            
            
            
              think through just how
            
            
            
              much data gets exposed in some of these larger breaches.
            
            
            
              And the crazy part here is the three that we're showing
            
            
            
              here. That first, second and third is 1st, second and third that
            
            
            
              I found in my inbox. There's nothing particularly
            
            
            
              interesting about any one of these three breaches, except that
            
            
            
              it's personal data, it's customer data. And even more
            
            
            
              so than that, when you look at things like the Breast cancer Organization or
            
            
            
              breastcancer.org say that ten times fast,
            
            
            
              it was personal images, it was things that really
            
            
            
              make a huge difference to care about because we need to protect
            
            
            
              folks. So I don't
            
            
            
              want to go into any and shame any particular organization,
            
            
            
              but we all have this as a potential, right?
            
            
            
              We all have this data from customers that we need and have a responsibility
            
            
            
              to protect. So let's do that, right?
            
            
            
              The reality is that we're going to take that seriously.
            
            
            
              So the first thing we're going to do is we're going to add ourselves some
            
            
            
              security tooling.
            
            
            
              I think the starting point here is a CSPM tool.
            
            
            
              And if this was a live studio
            
            
            
              audience, I'd ask you all to raise your hands as to who knows what a
            
            
            
              CSPM tool is. But since I can do that, I'm going
            
            
            
              to go ahead and define that so that we all make sure that we're using
            
            
            
              the same meaning for these same acronyms.
            
            
            
              CSPM is cloud security posture management.
            
            
            
              So in a nutshell, you apply
            
            
            
              policies and you get alerts when things
            
            
            
              have incorrect configuration or configuration
            
            
            
              that's not secure by some
            
            
            
              definition and the
            
            
            
              way that plays out. So we
            
            
            
              install these tool and you know what? We're going to
            
            
            
              lean on people. Whats should know things better than us. We're going to
            
            
            
              apply the AWS CIS benchmark policy. For those that
            
            
            
              aren't aware, CIS is the center
            
            
            
              for Information Security and they
            
            
            
              do a fantastic job putting together a set of benchmarks.
            
            
            
              We all feel good, right? We're going to know all about our environment and
            
            
            
              it plays out really well because we're going to come back into our
            
            
            
              CSPM tomorrow once all the policies have run,
            
            
            
              and then we find ourselves in these abyss. So let's
            
            
            
              talk about this a little bit, because anybody that's done this before
            
            
            
              knows where I'm headed. But let's talk
            
            
            
              so there are five controls
            
            
            
              as part of two. One which deals with security
            
            
            
              of S three buckets, as in the AWS Chris benchmark policy.
            
            
            
              And I'm only going to deal with the automated ones because these are the ones
            
            
            
              that any CSPM tool is going to
            
            
            
              actually evaluate against. So let's look at these. First off,
            
            
            
              ensure all s three buckets employ encryption at rest.
            
            
            
              This makes sense, right? Until you realize that
            
            
            
              there are lots of places where you wouldn't necessarily want to use encryption
            
            
            
              at rest. For example, things that are intentionally
            
            
            
              made public, or my favorite things
            
            
            
              that have heavy readloads.
            
            
            
              Let's just say I got to know the CFO really
            
            
            
              well after some mistakes made with Athena
            
            
            
              and KMs and the cost of reading
            
            
            
              from Athena. There's a great story there at some point, so catch
            
            
            
              up with me afterwards to dig into that. But I digress.
            
            
            
              Two one, two. Ensure the s three bucket policy is set to
            
            
            
              deny HTTP requests. This is really a way of prohibiting
            
            
            
              effectively what could be anonymous calls, right?
            
            
            
              So if you're coming in via HTTP, that means that you're likely
            
            
            
              not authenticated via s these, and that's what this is
            
            
            
              wanting to do. There are lots of reasons that you might want things turned on,
            
            
            
              right? We may want to serve up images, we might want to
            
            
            
              serve up things that come in directly over
            
            
            
              the various protocols.
            
            
            
              So like cloudformation, that sort of thing. So there's lots of legitimate
            
            
            
              reasons why that may be a thing. Ensure MFA
            
            
            
              delete is enabled on so
            
            
            
              pro tip if you use MFA delete, you are going to need to go
            
            
            
              use the root account to go delete
            
            
            
              anything that has MFA delete turned on. So this seems
            
            
            
              really good in practice, or rather
            
            
            
              really good in theory, but in practice is absolutely terrible.
            
            
            
              I don't have to explain to this group, I don't think,
            
            
            
              why you shouldn't be logging in as the root user,
            
            
            
              and anything from a security policy that really binds to
            
            
            
              needing to access the account as a root user likely
            
            
            
              has some concerns and then finally
            
            
            
              block public access. Well, first off,
            
            
            
              AWS, by default when it creates buckets for you, doesn't tick
            
            
            
              this box and it gets really interesting when that plays out.
            
            
            
              So think
            
            
            
              a little bit about that. Here's the
            
            
            
              reality. Based on what we just talked about,
            
            
            
              95% to 100% of your
            
            
            
              buckets, they're going to flag,
            
            
            
              they are absolutely going to show up
            
            
            
              as being problems, as being in violation of
            
            
            
              that security policy. And when you get to a point where 95%
            
            
            
              to 100% of a given asset fails
            
            
            
              by default, those checks
            
            
            
              are kind of useless. It really is hard to
            
            
            
              think of a world in which it makes sense that
            
            
            
              everything is in violation of that policy. And I think,
            
            
            
              for me, what's really critical here is I have no ability
            
            
            
              now to priority what's bad, because it's all bad,
            
            
            
              right? The sky is falling.
            
            
            
              Well, which part of the sky am I even caring about at this point?
            
            
            
              So I think the
            
            
            
              real piece becomes, now,
            
            
            
              what can we focus on to really
            
            
            
              drill into and think a little bit differently, me,
            
            
            
              about what data we need to know and what
            
            
            
              information we need to be aware of for our success
            
            
            
              in this arena. So we'll start with where
            
            
            
              did the data come from? There's a bit of a history piece
            
            
            
              around this first point, and I want to call it out because a
            
            
            
              lot of folks aren't aware of this. So back in these day,
            
            
            
              because as I was talking about EMC and
            
            
            
              NetApp, you should probably get a good feel that I'm a little on the older
            
            
            
              side and been around the block a couple of times. Back in the day,
            
            
            
              AWS had this thing where you could only have so
            
            
            
              many s, three buckets for a given account.
            
            
            
              And one of the workarounds was to store things
            
            
            
              that were loosely affiliated, but not necessarily the same data in
            
            
            
              a single bucket. So what you might do is you might have your images
            
            
            
              in one prefix, you might have, or static assets,
            
            
            
              if you will, and then maybe some customer data in another prefix,
            
            
            
              and then maybe some separate application data in
            
            
            
              another prefix because you only got so many
            
            
            
              buckets and it was in the 100 bucket range was the limit.
            
            
            
              That limit has been lifted
            
            
            
              because it was at one time a hard limit. Like, you couldn't actually get them
            
            
            
              to raise it unless you were like super special. That's not the case anymore,
            
            
            
              which is fantastic. But those buckets still exist,
            
            
            
              those applications still write to those places, and it's still a thing.
            
            
            
              What region is it in? So I think it's really important to reason
            
            
            
              about the regionality of the data, because a lot
            
            
            
              of times it doesn't necessarily matter whether
            
            
            
              it's protected. You can have stuff that's completely protected
            
            
            
              properly and you still be in violation of compliance concerns,
            
            
            
              because you've got data that shouldn't be there in that region.
            
            
            
              Not to mention, from my perspective, it's really interesting. We've got
            
            
            
              a map at open Raven where you can look at your infrastructure, and one of
            
            
            
              the first things that catches a lot of customers'eyes and is always,
            
            
            
              why I'm a super big fan of it, is you look at it and go,
            
            
            
              wait, why do I have stuff in AP Southeast
            
            
            
              one? I shouldn't have anything these. And then sometimes it's,
            
            
            
              oh, we turned on AWS config and it put a bucket there.
            
            
            
              Fantastic. Or you go hover over and look at the buckets and go,
            
            
            
              yeah, that shouldn't be there at all. We need to go take care of that.
            
            
            
              So I think that's a really valuable tool.
            
            
            
              What apps actually write into this bucket? And I'll talk about the
            
            
            
              write piece a little bit later, but it's understanding what
            
            
            
              apps send data to that bucket and
            
            
            
              keeping that in mind. The other thing
            
            
            
              is, is things all coming from automated processes or
            
            
            
              is it being manually uploaded to? So one of the things that becomes really
            
            
            
              an interesting question, and when you look at some of the breaches, a lot of
            
            
            
              times it's not uncommon that a
            
            
            
              backup got uploaded to the wrong spot or to a place that
            
            
            
              someone thought was safe but wasn't because
            
            
            
              they were manually uploading it and there weren't all the other controls in place from
            
            
            
              the application side. And I think it's really critical to kind of look
            
            
            
              at that and reason through. Okay, so is
            
            
            
              it a normal thing for this to be manually uploaded for someone could accidentally upload
            
            
            
              the wrong thing? I things from there.
            
            
            
              We really want to talk about what kind of data is in the bucket.
            
            
            
              This seems really straightforward, and you can take a bunch of
            
            
            
              different approaches to go figure out what's there. Right. If there's protected
            
            
            
              health information, if there's personally identifiable information,
            
            
            
              you should know. Hopefully you're going to want to know if it's
            
            
            
              these. And on one hand, we can absolutely
            
            
            
              go talk to each individual person. And if you are
            
            
            
              in a large organization,
            
            
            
              that probably won't work super well. So you can use tools like
            
            
            
              open raven or AWS Macy to
            
            
            
              go and classify the data that's inside the buckets.
            
            
            
              The same is true on the open raven side. You can do this with your
            
            
            
              RDS instances as well. And we're looking to expand beyond just
            
            
            
              s three. We've got a bunch of stuff coming down the pipe,
            
            
            
              and it's going to be exciting, but you
            
            
            
              need to know what kind of data is there?
            
            
            
              Things one always makes me laugh a little bit because the first place we always
            
            
            
              jump from is who owns it. And this
            
            
            
              would be amazing to know. Like, I would love to know who owns
            
            
            
              the data. The problem is, and I want to call it out here as it's
            
            
            
              a great thing to know, but reality is that you're probably
            
            
            
              not going to know. It's going to be hard to track down who owns
            
            
            
              it. And just because someone owns it doesn't necessarily mean they have control
            
            
            
              or have any semblance of understanding of what's actually going into
            
            
            
              the buckets. I think it becomes a lot more critical to
            
            
            
              understand who can write to the bucket. When you understand how
            
            
            
              data can get in the bucket, you can start from there.
            
            
            
              So even if one team owns the
            
            
            
              data in that bucket,
            
            
            
              can applications that are owned by
            
            
            
              other teams right into that bucket and it get accidentally used? Are there
            
            
            
              other opportunities for people to once again manually upload into
            
            
            
              it? So you can use tools like open Raven?
            
            
            
              We've got a feature coming out, it's API only now, but will
            
            
            
              be available in our UI soon, where you can actually go in
            
            
            
              and look and say, okay, what security principles have the ability to write
            
            
            
              into this bucket? You can use tools like hermetic as well, which does
            
            
            
              a bunch of things around IAM and
            
            
            
              better understand who can read and write to a bucket. But I think it's
            
            
            
              so common for us to focus on who can read from it.
            
            
            
              I think the starting point should be who can write to
            
            
            
              it because these you can actually start to identify where your
            
            
            
              actual risk is.
            
            
            
              So I've talked about a lot of what, right, we want to
            
            
            
              know all of those things and I think it's really critical
            
            
            
              to think in a different way,
            
            
            
              think about where we can start and how we
            
            
            
              really enable teams to start taking next steps.
            
            
            
              So the first thing is, don't protect data
            
            
            
              that doesn't need protecting, right. If it isn't there, you don't have to
            
            
            
              do anything with it. So I really kind of
            
            
            
              call out a couple of things. First off, use intelligent tiering.
            
            
            
              This is going to sound silly, but it gives you the ability
            
            
            
              to get an alert about the state of the world that isn't directly
            
            
            
              tied to all the security tooling. If you're using intelligent tiering
            
            
            
              and all of a sudden you start accessing a bunch
            
            
            
              of stuff and it's changing lies so that your
            
            
            
              costs go up, you're going to see that. And the reality is that we're all
            
            
            
              watching cost a heck of a lot more than a lot of these security
            
            
            
              tooling, because the security team is looking at
            
            
            
              the security tools, cost is being looked at by everyone.
            
            
            
              And so as a result, we can use things like intelligent tiering to
            
            
            
              save money because things shouldn't be being accessed all the time.
            
            
            
              And it gives us the ability to see those anomalies in
            
            
            
              the system. The next thing is applying lifecycle
            
            
            
              rules, and this ties really closely with using data retention
            
            
            
              rules. So lifecycle rules are the technical implementation,
            
            
            
              right? I go into the s three bucket and I say, hey,
            
            
            
              after some period of time, delete this thing.
            
            
            
              Data retention rules are the business side of that, right?
            
            
            
              It's the hey, we're dealing with healthcare data,
            
            
            
              so it must be kept for 24
            
            
            
              months, five years, whatever it happens to be. But on five years
            
            
            
              in one day, we can get rid of it and we should get rid of
            
            
            
              it. And so the real key becomes, can you use
            
            
            
              something like lifecycle rules on those s
            
            
            
              three buckets to remove that data so that you don't end
            
            
            
              up having to protect it going forward?
            
            
            
              There are some also great conversations about having
            
            
            
              data that you don't need and how
            
            
            
              it plays into legal things like discovery and whatnot.
            
            
            
              That's a little bit broader than this talk goes into, but I think more
            
            
            
              than anything, there's no reason to protect things whats
            
            
            
              don't need to exist. So get rid of it so
            
            
            
              that you're not protecting things unnecessarily.
            
            
            
              Manage your riskiest buckets first. I think it goes without saying whats
            
            
            
              public buckets are going to, by definition, be the riskiest.
            
            
            
              The problem is that we normally stop there
            
            
            
              in our conversations. It really becomes a good
            
            
            
              point to go, hey,
            
            
            
              go start there. But also look for a couple of things. Look for broad write
            
            
            
              permissions. So if you can find and track down places
            
            
            
              where you've got everybody and their brother is able to write into
            
            
            
              that s three bucket, you've probably got a problem, because it's much
            
            
            
              more easy for something to be exposed than
            
            
            
              it would be if only two or three applications are able
            
            
            
              to write, or no human users are able to write into that s
            
            
            
              three bucket. So that becomes a really important thing. And then one of
            
            
            
              the things that we found in our environment is backups
            
            
            
              aside, one of the real indicators that you've got actual legit
            
            
            
              data somewhere is lots and lots of small files, whether it's lots
            
            
            
              of images that are being uploaded from customers, whether it's
            
            
            
              Json, that sort of thing. The large number of files tends
            
            
            
              to be an indicator of there's some automate process,
            
            
            
              or there's some process that's putting data in these and that's a really
            
            
            
              good place to start because it's actual data coming from
            
            
            
              customers and not just a dump of
            
            
            
              some source code archive out of NPM
            
            
            
              or something like that. We see all sorts of fun things,
            
            
            
              but I think really the biggest thing is focus on those large numbers
            
            
            
              of small files as a good place to start and hone in on your
            
            
            
              managing for risk.
            
            
            
              Ultimately, I think for me the biggest thing
            
            
            
              is, and yes, I get, I work for open Raven.
            
            
            
              There's a reason why I do. I believe that understanding
            
            
            
              your data classification by being able to
            
            
            
              understand what is actually out in these world
            
            
            
              matters,
            
            
            
              it's so critical to be able to go out and say,
            
            
            
              this is what's in that buckets. And you can start
            
            
            
              really simply whether it's using open Raven,
            
            
            
              whether it's using Macy, go do
            
            
            
              some scans, understand what you've got out there and
            
            
            
              from there, run those scans regularly, making sure
            
            
            
              that you are actually checking for things. And one of the cool things
            
            
            
              we do at open Raven is we cache the results,
            
            
            
              right? If the file hasn't changed because the e tag hasn't changed, we're not going
            
            
            
              to rescan that object in s three because we know it hasn't
            
            
            
              changed, but we're going to do those sorts of things.
            
            
            
              And then more than anything,
            
            
            
              you need to have rules in place to alert on the things
            
            
            
              that are actually critical, right. You want
            
            
            
              to know if you find european data in
            
            
            
              a us region. You want to know
            
            
            
              if you find Pii in a bucket that's
            
            
            
              open and that's the real critical differentiator,
            
            
            
              right? It's not that you found Pii, it's not whats
            
            
            
              you have an open
            
            
            
              bucket, it's that you have Pii in
            
            
            
              an open bucket and it's those sorts of things that really
            
            
            
              provide the value. So,
            
            
            
              to summarize, I think the real key is these three things.
            
            
            
              You turn on intelligent hearing, it will
            
            
            
              get more eyes on the problem, because if costs
            
            
            
              bump heavily, you'll know that, hey, data, whats shouldn't be
            
            
            
              accessed is being accessed. Classify your data, go figure out
            
            
            
              what you've got. And these use those retention policies
            
            
            
              and those lifecycle policies to delete the stuff you don't need. Ultimately,
            
            
            
              that's really going to be what is the game
            
            
            
              changer for you going forward.
            
            
            
              So with all of these things said,
            
            
            
              I want to thank you for joining me for my talk. I can be
            
            
            
              found on the interwebs, you can find me on Twitter, hit me up via
            
            
            
              email, and I'm trying the new Mastodon thing. We'll see how that plays out.
            
            
            
              But I hope you've enjoyed this talk, and I'm looking forward to catching
            
            
            
              up with you in discord.