Conf42 DevSecOps 2022 - Online

Threat Modelling in CI/CD environments

Video size:

Abstract

A brief introduction to applying the Microsoft STRIDE Threat Model to CI/CD environments as a way of mapping your DevOps toolchain attack surface.

Unfortunately, as STRIDE is from 1999, may contain references to Friends, Frasier and the Spice Girls.

Summary

  • Darren Richardson: Threat modeling is a methodology used to identify, understand, communicate threats. At its core, the idea is to understand the flow of data in a system. You can apply it to software, applications, networks, systems and even business processes.
  • stride analysis is a threat model put out by Microsoft in April 1, 1999. It breaks threats down into these six distinct categories. It's got a good mnemonic, so I like to use it.
  • This section is based on wild speculation of what I assume the IT business was like in the late 90s. We're going to assume that every data handover that happened in the time was done in a 90s style coffee shop. Here is our data flow diagram circa 99.
  • The trust boundary is where ownership of the data changes. These are the places where data shifts and the controller, the owner of data, changes. The next or final step in this process is to apply the stride analysis. It's impossible to sum up how much security has changed over the last two decades.
  • Source code repository is where your dna is stored. Spoofing any kind of authentication against it might cause problems. Tampering could lead to the modification of source. The thing we don't talk about at any point here is mitigations.
  • Next, pipelines. Read and write access to, well, read from git to write to binary storage. Spoofing code base could lead to injected code tampering. Repudiation is the one threat that stays essentially the same. Just log everything centrally, make sure it's not editable.
  • If you're able to spoof the domains or download the wrong tools, this would give you a lot of power inside the pipeline. Tampering obviously this is the build step, so any tampering could lead to corrupted software. Information disclosure here might be through reverse engineering the binaries or examining the images.
  • Stride, in my opinion, has been built to look externally. Most of those issues are problems with configuration or some kind of mishandling. Make sure you aim the threat model internally, too. As your service evolves, you continue to visit this threat modeling.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good evening. My name is Darren Richardson and I'm here to give you a brief introduction into the threat modeling process and how we may apply this to dev ops platforms. Also, we'll take a quick look at the stride threat model and how it's changed, or in this case hasn't changed over time. So we're going to dive in straight to the big question, what is threat modeling? The answer to this is simple. Threat modeling is a methodology used to identify, understand, communicate threats. At its core, the idea is to understand the flow of data in a system, understand how the trust changes for that data, to define the boundaries where that trust changes, and then to define mitigation for those switches of trust. The threat model represents all this in a structured manner. So dataflow diagrams leads into threat boundary diagrams and such. But in its most condensed form, a threat model is just structured approach to mapping a system and its security. And now we've discussed what it is, let's discuss what it is not. And that's restricted because threat modeling can be applied pretty much anywhere. So you can apply it to software, applications, networks, systems and even business processes. If there is any area where you want to examine how things are done securely, you can apply threat modeling. And in our case, we're going to use a DevOps environment. There are three aspects of threat modeling. You initially map and diagram the data flow. So, for example, in our system, we're going to be switching between a lot of different tools, and the core of threat modeling is understanding this data flow. So knowing how your data moves, how your data switches hands, as it were, and then we break it down to trust boundaries. And this is, in short, where the controller ownership of the data changes. And then we apply the threat model to this list of trust boundaries. We'll move on to a little quote from the security development lifecycle from Microsoft. I like to include this here. It's basically the best summary of threat modeling I think there is, if you can just summarize it in one sentence. So let's move quickly on to stride analysis. Now, stride analysis is the threat model I tend to use, and I use it for a few reasons, but let's quickly dive into it and see what it is. So stride itself is a threat model put out by Microsoft in April 1, 1999, and it's still in use today. And I understand that's quite weird. I don't think any of us are thinking, well, I want something secure, I'm going to use something from 1999 that's just extremely unusual. And the fact it was released on April the first also makes me kind of worry about it. But in my opinion, it's a great model that breaks threats down into these six distinct categories, and it's got a good mnemonic, so I like to use it. There are other interesting models you can use that all have really good acronyms. So you can use pasta if you like. You can do a vast analysis. You can have dread. That's always a good one, but I prefer stride. We're going to quickly go through the six parts of the stride monotony. So we have authentication, which is spoofing. So an attack against the property of authentication, impersonating a user to log in, or generating false websites, like anything to trick people's traffic into somewhere it shouldn't be. We have tampering, which is an attack against integrity. So modifications made to a system where they shouldn't have been possible. Repudiation, which is the only word on here, which is slightly unusual. So what repudiation means is the ability to deny something has happened. So plausible deniability is a great way to describe it. So if you keep your logs locally on the server, if they are not properly protected, then you are vulnerable to repudiation. Information disclosure is another big one. Attack against confidentiality. So, access to source code, publishing of customer or client information, or any confidential information, really, most companies have confidentiality levels that are set, and anything that's not public, if that reaches, or anything that's not categorized as public, if that reaches someone, it shouldn't, that's information disclosure. Denial of service is of course a standard one. Degrading or denying access to the service through any means, even cutting cables, when you think about it, can be considered a denial of service attack. And then of course the elevation of privilege or escalation of privilege, whichever word you prefer, an attack against authorization. So allowing the execution of remote code while unauthorized, or for example, elevating limited user to admin level, or even picking the lock, because if you're applying this methodology to, for example, a physical presence, then that's another escalation of privilege. So before we go on, because threat modeling can get quite complicated, I decided we would start with stride, as it is originally intended. So we're going to head back to 1999, and we're going to start by making a data flow diagram for a system as it should have been in 1999. Now, I have a minor caveat here, in that I was twelve years old in 1999, so I was not an IT professional at the time. I was a slightly disobedient secondary school student, I spent most of my time skipping homework and playing video games. So this entire section is based on wild speculation of what I assume the IT business was like in the late 90s, which is based on my personal cultural touch point, which is of course the american sitcom, more specifically friends and Frasier. So we're going to assume that every data handover that happened in the time was done in a 90s style coffee shop. So here is our data flow diagram circa 99. We have our developer Steve, who keeps his source code on his laptop. He gives the data to Dave, who works in operations in the 90s style coffee shop. The data transitions through operations to installation via Dave's car. And I seem to recall that floppy disks were this big. You can see this is Dave driving his floppy disk to the server room or the data center. The data is installed through the three and a half inch floppy drive. Maybe I got inches and feet confused with this illustration. I'm not sure. And then the communication interface, HTTP port 80, and our end user Barry can watch highly compressed cat videos over dial up. This is how I'm going to assume that these services existed in the 90s for the sake of the next section, which is trust boundaries. Now let's pause here a minute and discuss what a trust boundary actually is. Now, there's a wiki definition here of the trust boundary being a term used in computer science, blah, blah, blah. I want to break it down to a very simple term because trust is where ownership of the data changes. Where control of the data changes. When you are installing and then you have the installation interface with the three and a half inch floppy disk, you have the user interface over HTTP. As you install, they change. So the ownership of that changes. And in most practical places it means as data transits, it doesn't always mean that because you can go very deep in these threat models and for example, take a look at some kind of single system and go to a deep process level. And at that point the data is not really transiting. But I would go in this case with number one here. In most practical places it means data transit. So when you do the threat boundaries, you end up with something like this. There is a threat boundary as the data is handed over. In our 90s coffee shop, there is another data transition as it switches over to the installation interface. And a third transition boundary, this threat boundary when the end user is connecting to the interface. So this is the context of threat modeling. This is what we care about. These are the places where data shifts and the controller, the owner of data, changes. And then the next or final step in this process is to apply the stride analysis. So as we've gone through spoofing, tampering, repudiation, information disclosure, escalation of privilege, these six things are the stride threat model, and all we do here is apply stride. So for each threat boundary, we should look at the inputs and outputs we should consider and note potential threats based on the stride and then plan mitigations for the device threats. So I'm not going to do the whole 90s data flow because I think that would take a bit too much time and I want to get into actual modern systems. So I've just made up a short example. So, for example, in the handover boundary here of data handover spoofing, is it possible to establish the identity of either the developer or operations? So maybe with id cards, but failing that, it would be possible to pretend to be one of them. Information disclosure. They're in a public coffee shop, so it's very easy to overhear what's happening, overhear the installation information, overhear basically what's going to happen over the installation process. For tampering, we can look at, for example, the data transition of ops going to the installation there. Does he go straight to the data center and install the software on the server, or does he leave it in his car overnight? Can it be tampered with? Is it possible to distract him? Maybe the spy skills walk by because it's the late ninety s and he's distracted long enough to have a disc switched. Repudiation is the threat that hasn't changed. In my opinion, it's the only one that still relies on logging and everything. And denial of service. Say he loses disk two of eight and the installation corrupts, the service is down, and our end user, Barry, cannot watch his compressed cat videos. So here is how I believe the threat model was supposed to be applied. It probably was a little more complex, but, you know, I think we should hop straight to modern times. So we have a data flow diagram in 2020, now, 23 years later, here we are. It's kind of impossible to sum up how much security has changed over the last two decades. I defy anyone to fit that amount of changes into a 25 minutes talk. So if we say what has changed, I think we can just say everything except the stride threat model because it hasn't been updated since the release in 99. So when we ask how does it change the application of it, it simply doesn't. The only thing that happens is it gets more complicated. The transition to modern day is not as jarring as you might initially think when you see this. Now this is a very simple diagram of a more modern workflow. Now there are some parts missing here, but this is just a very simplified throw flow. Don't let the scale fool you. Everything we're going to do is exactly the same. So we have the developer Steve Jr. Steve's son has joined the game now and he's developing on his laptop. He pushes his code to GitHub, GitLab, BitBucket. Wherever you store code, we have a pipeline system. Jenkins collects source code, tests with Sonarcube, secures in the pipelines with a couple of open source tools, builds a docker image, stores that docker image in artifactory has some infrastructure as code, an installation interface, maybe a server in the middle here which is pushing out these updates through SSH using something like ansible containers which contain the workloads and a communication interface. And now our end user, 23 years later, Barry is able to enjoy extremely hd cat videos over 100 megabits connection. So he's extremely happy with this. The trust boundaries in 2022 are exactly the same. Now this slide, if you're wondering, is actually exactly the same slide as I showed in the trust boundaries for 99. The reason for this is this is going out as a video, so I want people to have the references nearby in case they are going back and forward. So here is the trust boundary again. In most practical places it means data transit. So as the data moves and the owner, the controller of that data, changes, that's what we care about. And then applying this methodology, we end up with something that looks a lot like this. And I understand this might look a little bit complex. Initially, we have things we never had before. We had entirely internal threat boundaries. Here we have a nested trust boundary. We have systems within systems, built upon systems. So it's not just one person handing over code and another person installing it. This is automated infrastructure, this is DevOps, and this is the trust boundary as we would apply it to this system. So getting to the meat of this conversation is applying the stride analysis circuit 2022 to our system. And we're going to go through all of these boundaries. So we'll get started with the source code repository. The source code repository, obviously potential IPR. That's what we're worried about here. This is where your dna is stored. This is where your company's core code is collected. So spoofing any kind of authentication against it might cause problems. Tampering could lead to the modification of source, which is problematic. If that source then makes it through to the end. Product repudiation threats act as logs can be deleted, which would make changes untrackable. So you have no chain of authority, you have no way to track what is happening. Information disclosure, IPR leakage, mostly if someone leaks the source code, loss of reputation, loss of innovation, and then denial of service. Service loss prevents builds being committed or elevation of privileges. So an unauthorized admin access leads to code based modification. The thing we don't talk about at any point here is mitigations, because we should. Let me clear that up. So threat modeling process should always have mitigations. But these are examples. These are things that are mostly, for example, handled by the source code repository. So spoofing already covered, tampering that is handled by principle of least privilege, repudiation, non changeable logs, all these things are very clear. So in most situations, in this example, the mitigations are quite obvious. But this shouldn't be all you have. You should go as deep as you can. Basically, any concern you have for this particular section should be brought up in here. And there's no rules saying that it should just be one point. If you have four or five concerns for spoofing or tampering, write four or five concerns here. This is absolutely fine. Next, pipelines. So this was our pipeline boundary. This contains the pipeline configuration. Read and write access to, well, read from git to write to binary storage. So theoretically there's a lot of damage that could be done from this point because, well, I mean, one of my colleagues said something that hadn't really struck me the other day, that Jenkins, or pipelines in general are just formalized remote code execution. And he's absolutely right. So the ability to edit the code after it's been read from git, the ability to write additional binaries to the storage. There's a lot of dangerous things that an attacker could do if they got hold of the pipelines. So spoofing code base could lead to injected code tampering, modification of pipelines. I mean, if you modified the pipelines, you can build basically anything you want. Repudiation is the one threat that stays essentially the same throughout the entire time. It's just log everything centrally, make sure it's not editable. That's the mitigation for all repudiation threats. If someone can tell me a repudiation threat that is not mitigated by central logging, I would love to hear it, because I've so far not encountered one. Information disclosure, same risks as the source code. It's IPR loss, so losing your intellectual property rights. Denial of service in acute situations could prevent the deployment of emergency patches and elevation of privilege. Alter build pipelines, disable or skip tests this sort of thing, static code analysis tool, quickly go through this one because in my opinion it's a, let's say less than critical area. It only returns pass or fail values. So the best you can do really is force the test to skip. Now there's an in pipeline security tool that I want to talk about. So like Aqua security, put this trivia tool out. It's one of my favorite tools for dealing with docker images as they're being built, but it opens the spoofing angle to an external attack. If you're able to spoof the domains or download the wrong tools, this would give you a lot of power inside the pipeline. Tampering obviously this is the build step, so any tampering could lead to corrupted software. Information disclosure it's again more about the IPR here. Denial of service to prevent security checks and then you basically force the idea of should we do the security check if we can't or should we skip it or should we pause the deployment? Storage is another big area, so this is where our binaries are stored for deployment. If we have a compromise here, then someone can upload their own binaries or maybe even download as reverse engineer them. So spoofing for credentials to allow for upload and download when we don't want them, tampering changes to the images which could then be distributed. Information disclosure here might be through reverse engineering the binaries or examining the images. Denial of service if this is our single source of truth for our deployments and we deny service here, then basically all our later containers are stuck. Might lead to some problems trying to update and then reload of altered images, which would be if you are able to elevate privileges and then upload the new image might pose a problem. Installation interface is kind of a very basic SSH, so standard sort of threatens here. So tampering of the SSH, config, spoofing the credentials, leaking server information, basic stuff. Also killing the service prevents any kind of access the container boundary where obviously if we install the wrong containers, anything can happen. Could include like a coin miner, basically any kind of malicious container we can think of tampering if we're able to download the wrong images, information disclosure is again leaked. IPR denial of service this is the service. So denial of access to the containers and the services down an elevation of privileges might look like container escape. Something along those lines. And then finally we have the communications interface, which is accepting traffic from the whole Internet. So if there is a login, obviously spoofed logins to the web browser. Tampering could be tampered with, serve malware, any kind of illicit materials, information disclosure, for example, robots that should be removed. This kind of information could be private. And then login elevation. So this was the stride analysis. Now, this isn't supposed to be a kind of catch all for the stride. This is more me talking for half an hour to get you thinking about stride, because in your systems, you're going to have a much more realistic work case. You're going to have a better understanding of your data flow, you're going to have a much more in depth analysis. You may have noticed more trust boundaries. So the idea here is not to give, like, an in depth understanding of an exact model, but just to get you in the right headspace to be able to generate your own. And that being said, I like the stride model, but there is a problem. In my experience, it's extremely effective in certain situations, so in most cases it's great. But stride, in my opinion, has been built to look externally. Stride, in my opinion, only seeks external threats. And it makes the assumption that every attacker is an external attacker, that every threat is malicious instead of clumsy, let's say. But history, which in 1999 they didn't have the benefit of knowing, has demonstrated that not every attacker is external or not every problem is malicious. So we have these two areas for mishandling and malice, which I don't have a good answer of where they should be in the stride threat model. Like mishandling, misconfiguration, mismanagement. Those are generally what lead to the rest of the stride model, really. So most of those issues are problems with configuration or some kind of mishandling, and then internal malice. If you have an employee with an axe to grind, then it's the principle of least privilege that should be keeping them out. But they're going to have a much easier time with threats that you have perhaps not considered, because stride looks externally. So if you are doing a stride analysis, I would keep these two in mind. I bring this up because every time I've done stride, I've had this question from someone. Someone has asked malice and mishandling questions. How do we deal with a misconfiguration? Where do we put that? And in truth, I don't have a good answer for you. My recommendation is, keep this in mind, because as I've written here, 82% of attacks are caused by this human element. So the threat model is great. But make sure you aim the threat model internally, too. And then finally we have the steps of what's next. Now the obvious thing is do it all over again, because stride is an iterative process. So you do it, you validate it, you do it again. You schedule it every six months, sit down and ask if your service or platform or whatever has changed enough to warrant a revisit to the threat model. The idea is that this is not something you do once and then forget about it. You keep doing it. As your service evolves, you continue to visit this threat modeling, or if you prefer a reference to 90 sitcoms, join us again in the next season of threat modeling. Thank you. If you have any comments or questions, I will be lurking around conf 42 discord for a while. You can send me a message on LinkedIn. My information is here. My LinkedIn is LinkedIn.com greatbushybeard, so feel free to look me up. Bye.
...

Darren Richardson

Cloud Security Architect @ Eficode

Darren Richardson's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)