Conf42 Cloud Native 2022 - Online

Understanding Cloud Control Plane Compromise Attacks

Video size:

Abstract

When the headline reads “Cloud Breach Due to Misconfiguration”, this is only a small part of the story, causing teams to focus solely on eliminating cloud resource misconfigurations and getting a false sense of security.

What’s missing in these stories is the series of moves attackers make to discover knowledge about the cloud environment, move laterally, and ultimately extract data without detection. When they gain access to an environment, they’re after API keys that enable them to begin operating against the API control plane of the cloud provider. And once a control plane compromise attack begins, it’s too late to stop it.

In this session, Josh Stella - Chief Architect at Snyk - will deconstruct how control plane compromise attacks go down in the cloud, and how teams can recognize and address the architectural design flaws in their cloud environment that make them vulnerable.

You’ll walk away from this session with an understanding of:

  • How cloud hackers think and operate in order to steal data
  • What questions you should be asking about the security of your cloud environment
  • Why cloud security is a design problem, and what secure cloud design looks like

Summary

  • Josh Stella is a chief architect here at Snyk. The topic today is understanding cloud. Cloud cloud control, plane compromise attacks to prevent these. A methodology of how to think about the problem.
  • The cloud fundamentally changed computing security, and it did this because it is software defined. It's programmable, so you can fully automate things. It is dynamic and ephemeral, particularly if you are using truly cloud native patterns.
  • Hackers also started acting differently concurrently with the rise of cloud. The much more common methodology of the hacker these days is to first look at public facing endpoints for vulnerabilities. Youll need to be using it in its most effective and rapid way and correct way to build systems, compete, and stay safe.
  • Five fundamentals of cloud security. Policy is code. Because the cloud is software defined, is programmable. We can express security not in Excel spreadsheets but literally as code. This is a massive leap forward for security.
  • The only way you're going to get in front of cloud security issues is by changing the way you build software. You need to be using policy as code to check things like infrastructure as code. This has to be integrated into the CI CD pipeline for prevention and to educate the people building things.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Josh Stella. I am a chief architect here at Snyk. Prior to that and presenting at Conf 42 in the past was the co founder and sometimes CTO, sometimes CEO of fugue, but now Fugue has joined sneak and we're building some awesome stuff. So I get to go back to designing computer software, which is what I'm most passionate about. Okay, great to be back here at Conf 42. The topic today is understanding cloud. Cloud. Cloud cloud control, plane compromise attacks to prevent these, a methodology of how to think about the problem so that you can hopefully not be one of those folks who end up in the news. Right? None of us want that. All right. The cloud fundamentally changed computing security, and it did this because it is software defined. And when you fully automate something and make it software defined, it has very different behavioral characteristics than a kind of a manually assembled pile of atoms in a data center, right? You can bring things into existence and modify them nearly instantaneously out of this effectively for most people, infinite pool. So it really changed what you're securing the lifespan of the things you're securing. It changed the amount of time you have to secure them, right? So in the data center we had, we bought boxes, we stacked them with humans, we ran cables with humans, right? And therefore, it was relatively static and persistent. So fairly normal recapitalization cycle would be three or five years. You buy a new switch, and in somewhere between three and five years, youll look to the replacement for that switch. That's the upgrade. The same with a server or a network attached storage device or whatever. But these were long lived static entities relative to cloud. And as a result, we gave them names, right. Whats were like permanent names. There was typically a team that owned those assets in terms of making sure theyre kept running, had adequate power, had HVAC, had working with these security team, the correct defenses around them. Typically defined in the network, right? Typically defined as, I mean, you had some operating system level stuff too, of course, both in terms of hardening guidelines and in terms of software you'd run to secure servers and anti malware stuff and things like that. But a lot of it in the data center era was TCP, IP based defense in depth, based on the network with devices, devices like firewalls and intrusion detection systems, all these colorful one and two U units in your racks coming out of security companies, the scale, even in big data centers, typically thousands of components, as opposed to tens or hundreds of thousands or millions of components. And by component, I mean something screwed into a rack, not every chip on every board. And the services were pretty simple. When you look at these services that a data center offers to a business unit or an application development group, they tend to be big chunks like compute, which might be defined as virtual machines or containers, but some kind of compute, some storage, typically block storage, right? So your application is defining how that block storage is used, right. It's a disk, remote or local. There are network services around DNS, et cetera. But even if you were operating in an environment with a lot of these things, monitoring services, what have you, theyre probably in the teams, I would argue, and the big ones are a handful. Compute, storage, network are the really big ones. Well, okay, let's look at the cloud and how different this world is. So when you were procuring resources for the data center, it was a hardware procurement cycle. You had to ship things made of metal around in the cloud. It's software defined from a user perspective, right? It's an API call away. Now, there's still a box made of metal, but it's already sitting there at Amazon or Microsoft or Google or the real cloud providers. Because it's software defined, that means it is expressed in the form of API, and that means it's programmable. And we know what that means, right? When you can program something. Well, first of all, your mistakes can be much costlier, much faster. Your errors are executed at the speed of electrons firing across circuits, not at the. You don't have that. Wait a minute, maybe I shouldn't do that, right? It's programmable, so you can fully automate things. But you can also use decades of knowledge about programming and computer science and software engineering to make things right, to check things, to test things. It is dynamic and ephemeral, particularly if you are using truly cloud native patterns, immutable infrastructure, serverless functions, containers if done right. So where you had a box that you would slide into a rack and name Frodo in the past, now you have an instance or a container or a serverless function. Whats comes and goes in hours, minutes or seconds, it never gets a name. It just has to join some service group and become aware of its role while it is needed. There's no friction here. There's very little friction here. And because of that, the developers own a lot of this stuff. They build a lot of these stuff. My background is as a developer and as a software architect and not as a data center operations person or a security person. Those folks don't tend to out of the gate know as much about code as developers, which is natural, right? That's because of how the data center worked. But in the era of cloud security, this stuff is all programmable. If humans can put an API on something and program it and automate it, we're going to do it. And the cloud is kind of that for infrastructure. So you end up with developers, maybe not the ones living in the application development teams, maybe living in the security team, maybe living in the dedicated DevOps team or platform engineering team, owning how this cloud infrastructure is created, maintained, configured, et cetera. And the scale, I mean, at what was fug is now Snyk becoming sneak cloud, which is, I don't have any announcements today, but we're working on some really cool stuff. We manage millions of resources for our customers, resources being like, again, not chips on boards, but servers, databases, network definitions, et cetera. There are a lot more objects. They're coming and going much faster, right? Dynamic and ephemeral and much bigger scale. And on top of that, there's more diversity of what they're made of, of what types they are. So in the data center, we primarily had storage, compute, network, and then there's a bunch of other stuff too, but it's kind of along those lines. In the cloud, you have like how many different completely cloud service provider managed databases are there at Amazon, a dozen. Those are abstractions where the database is just directly accessible, DynamoDB, RDS, et cetera, at Google, bigtable. And the cloud providers are coming out with more of these all the time. And it's an imperative for them, it's a competitive imperative for them. So when you're thinking about security, you're no longer just thinking about TCP IP packets traversing networks that lead to compute and storage. You're now thinking about how does this container that only lives for two minutes access that database? And by the way, it is completely a separate control plane than the TCP IP network. So at the bottom, traditional security doesn't work or scale well in the cloud. Hackers also started acting differently concurrently with the rise of cloud. So you never know in this kind of scenario. Well, let me say I don't know in this scenario if it's coincident or causal. However, the behavior has definitely changed. And the kind of pre cloud era was really the most common attacks, successful attacks. The attacker would pick a target, they would search that target for vulnerabilities, and then often they would end up doing a low, slow exfiltration, maybe things like outbound DNS requests containing little bits of the database they wanted to extract. And that is kind of the Hollywood version of a hack, right? And a couple examples of these, the Sony motion pictures hack that North Korea did, right, Sony was targeted because they made a movie that the authoritarian government in North Korea felt was insulting to them. And so they hacked all the executives'emails, they went after Sony in particular. We are seeing much less of that type of approach now. It still happens for sure. But the much more common methodology of the hacker these days is to first look at public facing endpoints for vulnerabilities. So it really doesn't matter who you are. I mean, we know from the capital one breach that that's how she discovered a vulnerability in capital one. She wasn't targeting that bank, she was targeting anything with a vulnerability to assemble a list for her. And this was done through automation. Coming back to we're asking about programmable global infrastructure. Now, youll need to be using it in its most effective and rapid way and correct way in order to build systems, compete, and stay safe. But hackers are using it to their purposes, right? So they'll search for vulnerabilities, get a list of these things, of that list, pick a target, and then very often do a very brutal, almost instantaneous smash and grab exfiltration. Just take all the data and go. And very often, as was the case in the capital one breach, or in two of the Uber breaches, this is stuff that the exfiltration was not detectable on the customer facing TCP IP network. It had to be understood in the form of API calls. So your traditional perimeter data center security watching packets stuff would not help you mean it might help you at the margins. But if you look at all these breaches, the twitch breach, capital one breach, et cetera, the Uber breaches, the security cloud security company whats got breached a couple of few years ago, Imperva, their database got hacked. When you look at them, there are some very common traits both to what happened in the breaches and also how those things are described incorrectly by the press. Okay? And therefore people start thinking about the wrong things. So in Twitch's case, this huge 125 gigabyte data and source code leak, it says due to server misconfiguration. Nonsense. Was a server that was misconfiguration involved in these breach? Yeah, often there is, but that's not why the hackers got 125gb of data from them. It was due to a poorly designed set of access controls and permissions based on identity. No misconfigured server should be able to connect and download all the source code for all the applications you have, including some source code from your parent company, Amazon. That is a design mistake and there are probably a bunch of reasons for it, but that is not due to a misconfigured server. You're going to have misconfigured servers. You don't have to shoot yourself in the foot with bad architecture. Okay, capital one breach more complicated than it looks. People attributed whats to a misconfigured waf yet again, the misconfigured server was the foot in the door and the hacker didn't care about that misconfiguration server. They went after s three storage, which is an abstracted storage service. The server didn't matter that much, right? So what these folks are doing is they're going cloud, cloud control plane compromise attacks control plane. What we mean are the APIs that allow modification, creation, deletion, read, describe operations against cloud resources. That's what they're going after. They're going after endpoints that are hosted by the CSP, by Amazon, by Microsoft, by Google, using stolen, typically credentials that they're getting from somewhere. So that initial penetration, the misconfigured server, who cares as long as that server doesn't allow for discovery and movement. All right, so if all the way on the left youll get into, maybe you have a cloud misconfiguration, a dangerous port is left open during a maintenance window. Maybe you have an application vulnerability like log for shell or something like this. Maybe as in the case of Uber, if you get to GitHub, and I suspect this is probably the case in twitch, that we don't know API keys stored in source control repos and that aren't rotated frequently like long lived API keys. So that's the initial penetration, but then what's done with that? The goal is always to get to a bunch of other stuff, typically storage, typically data persistence of one form or another, databases, object stores, et cetera. That's kind of the norm. And so the automated detection of the misconfiguration that you've got this bot that's out there looking for anything that has a certain version of Apache running because it has a known CVE, or you have a bot that's looking for public facing IP addresses with port 22 open, or whatever it is, it discovers it. You then get a toehold in that initial penetration and then you start exploring from there. The vast majority of asking, as is the vast majority of defense, is knowledge. It's understanding what's dangerous, understanding what's vulnerable and not making those errors, and understanding what you have the hackers is going to do discovery and movement, and that typically leads to data extraction. I mean, there are other kinds of breaches where Stuxnet was tapping the side effectively. Through controllers of uranium centrifuges, you can get defacement kind of attacks like anonymous will do, but most of the time, your economically motivated hackers are going to tend to steal data from you. Okay, so here at sneak, we've identified among our customers the ones who are really successful with this stuff, five fundamentals of cloud security. And for the remainder of this presentation, I'm just going to walk you through these steps I promised you up front, a methodology that would help you deal with the kinds of threats I just described for the last 20 minutes or so. All right, so these five fundamentals, there are really four things youll do, four verbs here, and those are on the outside edges. We'll go through them one at a time. And in the center is this big noun. Policy is code. Policy is code is because the cloud is software defined, is programmable. We can express security not in Excel spreadsheets and lists of rules in English or French or whatever human language you speak, but literally as code, as executable code. And this is a massive leap forward for security. So policy as code is essential. It is the method by which these other things are accomplished and by which you can secure your systems. If you're not doing policy as code, the hackers are faster than you because they're doing exploitation as code. So you can't use those old manual processes. Okay? So step one, know your environment. Much easier said than done, especially when a reminder, you've got these dynamic and ephemeral resources, and there are hundreds of thousands of them in a medium to large cloud environment. But that's just at any one moment in time. They also went through the software development lifecycle to become deployed assets. And so as you think about this, you have to understand in all the contexts if you want to understand what's safe and unsafe. So, for example, we know from a department of justice filing that in the capital one breach, s three list permissions were left on to a running virtual machine with a set of Im API keys. Well, you have to have list permissions for s three. Somewhere in youll organization, like when I go to the console or whatever it is, humans need to be able to see what's out there, right? But you really don't want a public facing asset, compute asset, a container or a virtual machine, even a serverless function, to be able to just get a list of all the locations in s three, that's extraordinarily dangerous. It's only dangerous because of that combination of things in these context, a compute instance, whats has those resources that is connected to the Internet. Okay, so you've got a number of dimensions here. Youll have to know at any given moment in time the state of everything, the list of everything and what it's doing and its current state. And then you have to track change over time if you really want to understand what's going on. And by the way, the hackers do this, right? They are really good at discovery. That's why they are good at hacking the successful ones anyway. Okay, the next step here is to prevent error. And that may sound a little strange, but the only way you're going to get in front of cloud security issues is by changing the way you build software, okay? You can't put devices on the network to solve the problem. The imperva breach. Another example, using identity. All right, so these are design issues, these are architecture issues. And what that means is you need to be using policy as code to check things like infrastructure as code, like terraform templates or cloud formation templates before they're deployed, and give those developers who are writing that code the feedback that lets them act on things and prevent those errors from getting out there into the live system. And this has to be integrated into the CI CD pipeline. Okay? If you really want to make this a practice, if you really want to get safe and secure on cloud, a big part of your effort is integrating your security tools. And of course sneak does this with all of our tools, not just these cloud oriented ones, into that CI CD pipeline for prevention and to educate the people building things. That's a nice side effect of that. All right? And this gets to empowering your developers. So in one of our customers, there are over 2000 application developers in the various business units writing new applications for running their business. And the cloud security team and the platform team is like three people, all right? So when you have that scenario, and everyone does, whether they're fully aware of it or not, I guarantee you this is what's happening in your organization is the majority of people by a very large margin are much more concerned with shipping software and hitting quarterly numbers or mission goals or whatever, if you're a business or a government entity, than are concerned with securing it. And what that means is the security teams need to act more chief architect and provide their knowledge in a way that can be absorbed naturally by the people building stuff, which means in an automated way, right. So IAC security looks which give feedback in the code editor through the CI CD pipeline. You need to provide the knowledge, the guidance as to what's secure, what's insecure, but not in emails and not in written form. You need to do it as policy, as code. So when they're actually building a system, your developers are building a system. Their tooling are telling these when they're making errors, and of course, complete integration to the DevOps workflow. So right away if you look at our five, first one is know your environment, and then immediately we're going to prevention and secure design, and then to empowering the developers. So this is all the things we've shown so far contribute to stopping digging the hole that you're currently digging in your organization, right? It's stopping future mistakes from being made. And all of this is being done with policy as code, as the mechanism. And if you're not doing this with policy as code, it really comes apart at the seams. It really becomes vastly less effective because the scales we're operating at and the short time frames in which we're operating necessitate automation. If there is any human in the looks, what you are doing is slowing yourself down to being slower than the hackers, and you will lose. So this really needs to be expressed as code and then measure what matters. How much risk are you taking? This is going to be different for every organization. Okay. There are very small data sets that exist. You might have some that are extremely damaging in terms of the blast radius if they were to be exposed. And there are very large data sets that don't matter that much. So you need to be measuring what matters. As you're fixing stuff in your cloud environments, you need to be quantifiably tracking that. I mean, that's one of the things we spend a ton of time on, is giving you that ability to know kind of where was I? Where am I? How am I doing? What's the list of stuff we've accomplished? And of course, investment is a big piece of this as well. And this acts as a virtuous cycle. Right. Policy as code is the facilitating technology. But as you go through this and measuring what matters that contributes back to understanding youll environment and back around the horn, we hope these five fundamentals are helpful to you. It's a very complex topic unless youll kind of start breaking it down. We are having on May 5 a cloud security masterclass. I will be hosting that with my Fugue co founder, Andrew Wright. We also have some really cool resources for you over on our website. Fugue co resources the state of Cloud Security 2021 is the last big market survey we did. We do this every year on the state of cloud security. There's tons of interesting stuff in theyre, and then we also have an engineer's handbook on cloud security, which is a great place to start for engineers. And you can also, of course, find in resources there a lot of classes we've taught, videos, et cetera. So I hope this was useful to you in understanding cloud cloud cloud cloud control plane compromise attacks fundamentals as a methodology to try to prevent bad design, and bad design is the principal enemy in cloud security. So thanks much for your time. I'm around to take questions. Take care.
...

Josh Stella

Chief Architect @ Snyk

Josh Stella's LinkedIn account Josh Stella's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)