Conf42 DevSecOps 2023 - Online

Git those secrets out your repos!

Video size:

Abstract

Why having secrets, passwords and certificates in your codebase is a bad idea (even if they’re private), and how can we detect and handle these secrets we find? With our lessons learnt while helping clients manage the detection at scale and how to implement automatic checks.

Summary

  • Daniel Oatesley is a Devsecops engineer and co director at Punk Security. He talks about how to find secrets in git repos and how to prevent it. Punk Security is also the home of the first devsecops CTF platform.
  • Secret magpie can detect secrets in git repos or in version control systems. What types of secrets can we leak and where do they live? What can go wrong? How can we better defend and make sure that our secrets don't get leaked.
  • Secret files can be stored inside a git message. Just because we delete a file doesn't mean that that file immediately disappears. There have been multiple occasions theyre attackers can go and find them. What we should be doing is we just shouldn't be doing it.
  • Alex: One of my pain points is around secret management and using secret vaults. He says these secrets should be rotated regularly as well. When implementing CI pipelines with secret scanning, he says. Alex: We should think about the encryption around those secrets.
  • Secret Magpie can be used in multiple ways. It can integrate with GitHub, GitLab, Azure DevOps, or a flat file system. It scans a repository called wrong secrets. By default we limit the number of branches to 20. But if you got more than 20, you can push that up.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, thank you for coming to listen to my talk about getting pass, getting those passwords out of your repos, detecting leaked secrets at scale. So just to give you a little heads up, we're going to be talking about how we can find secrets in git repos, what these problem is, how we can prevent it, and what can possibly go wrong with some little stories along the lines of things that have happened that I've seen over the last kind of five years of working in this wonderful area of devsecops. So who am I? My name is Daniel Oatesley. I'm a Devsecops engineer and co director at Punk Security. I started doing Devsecops about five, six years ago and I've kind of loved it ever since. It's kind of been a passion of mine. I started off life as a developer, then I became systems operations, managing servers and databases and firewalls and such, and then ended up in security doing pen testing, red teaming, and then fell into this world of devsecops. And I am a true enthusiast of it. I loved coming out and talking about it, I love teaching people about it and how to automate security checks to detect issues and then how we can deal with those issues rather than just implementing tools. I like to talk about things such as how do we ticket them, how do we track and monitor those issues through to remediation and making sure that we don't overload people as we're onboarding these tools into their pipelines. I put Terraform down because I love terraform for building everything that I kind of do in infrastructure world. I am a security guy. I do lots of ctfs and playing is probably the best way of saying it. I am a massive geek as well. I'm a massive Star wars fan. Some of you might have seen on LinkedIn that I'm busy building the Millennium Falcon at the moment and I'm busy having the pictures of that just to show how much of a true geek I actually am. So at punk security, we're a Devsecops company. Quick little shout out to our four open source tools which are DNS, Reaper, secret, Magpie, which we're going to be talking about later on SM, Beagle and Ponesby. If you're interested in those tools, they are open source. Please go and have a play with them. We are also the home of the first devsecops CTF platform. We're busy enhancing that at the moment. We ran it last year for our second birthday for punk security. We intend to run it again for our third. So that's enough of the bits about me. This is the bit that we're probably more interested in. So we're going to cover off. What is the problem? What are we trying to fix by detecting secrets in our git repos or in our svcs or version control systems? What types of secrets can we leak and where do they live? What can go wrong? If those secrets do get detected? How easy is it to actually find these things? How can we better defend and make sure that our secrets don't get leaked or aren't accessed in an insecure way? And then I'm going to quickly talk. I'm going to talk about and give a little demo of our open source tool, secret magpie. So what is the problem? Essentially, there are two different types of secrets that we want to use. One is a secret that we want the web application to use, such as an AWS key to be able to access an s three bucket, maybe upload some information. We want the application to work. What can go wrong, though, is that those AWS permissions can either be over permissive or can be accidentally changed, and then that leads to abuse by attackers. Now, I've got a real world example of this. We were helping a consultancy company recently who got done over twice over a bank holiday weekend. They accidentally leaked their AWS keys into a web application that went up on the Friday before a bank holiday. An attacker got hold of those AWS keys, logged into the AWS account and then ran up a massive crypto farm and cost that consultancy thousands. We tried to explain to them that they needed to put secret detection in and we needed to put various CI pipelines in and to rotate the AWS key. They did the rotation, but they didn't actually put any checks in. They didn't implement secret detection. Next bank holiday came round, they did another release on a Friday, just before everybody ran off for the bank holiday, thinking everything was fine, they accidentally uploaded the AWS key again and got done twice as hard this time. So these things do happen quite regularly, unfortunately, and quite scarily. And then we've also got secrets that shouldn't have been there in the first place. So these might be like a EMV file that might accidentally get leaked onto a web server or a configuration file where you've got your database password for being able to connect into the back end. These are the kind of things that we don't want up there and we're going to cover over what those different types of files are and why we probably wouldn't want them up there. However, it's not just about detecting secrets in files. We should also, whilst we're looking at this subject, also think about how we're managing our secrets as well. So we should be making sure that secrets aren't being recorded in log files or trace outputs. So we need to understand theyre those secrets are how they're being used, where they're being stored. We should also have a look and see how these secrets could be used for elevating permission privileges. So again, going back to that AWS key, should it be allowed to contact to EC two s or create new lambdas, or should it only be locked down and restricted? And then how do we make sure that those permissions that have been applied to that AWS key remain in sync? We also need to think about how an individual might get access to these secrets. So again, we have another example where we were trying to figure out how somebody had abused a secret to log into a database. It turned out that developer had logged, used this database password to connect into a production system and then started dumping out production data so he could use it in these test environment. Now we all know that we shouldn't use production data in test environments because it would be a breach of data protection or GDPR. But we also need to worry about, well, who's going to access these secrets. So if we're storing secrets in a git repository, and that git repository is open to our entire organization, who's to say that somebody won't log, go collect that key and then go and use it in a way that we weren't intending. So again, we need to be thinking about how we're going to be managing these secrets and who has access to them. So what types of secrets should we be thinking about? So we need to think about passwords, obviously, because hard coded passwords just shouldn't be a thing. But we do use them. Developers put them in EMV files, we also use them in terraform, in TFR files. And we just need to make sure that those files can get into our git repository. Now we can either use git ignores or we should be using other more secure methods of being able to collect those secrets, such as key vaults. Then we've also got API keys. I mean these are quite popular, they're quite useful, especially in these cloud environments. So we can access other systems, but we need to think about how we're going to rotate these API keys and where they're being used and how many different repos might be using them tokens. Again, we shouldn't really be storing these things in our git repos, private keys and private certificates. I have seen that in the past where an NgINX server has been spun up. It was a docker container and they left the private key while the private certificate for encrypting the traffic for the NgInx box, they left it inside their git repo and then posted it up online. Well, if you lose control of your private key, your data can be decrypted. So you really need to be protecting these things again. So where can these different secrets exist? Well, obviously they can exist inside a file. We've already discussed things like EMV files, configuration files. Theyre could be hard coded somewhere inside the source code or they could just be accidentally dropped in as an extra file inside the repository. However, they can also be stored inside a git message. And even if we've got, I have seen it, theyre people have put in, in the message, these password for this is XYZ. But we've also got to worry about the git history as well. So just because we delete a file out of our git environment or a git repo doesn't mean that that file immediately disappears. It's still there in the history. Even if you get, the only true way of being able to really get rid of it is to rewrite that history. And even then, if you know the Shah message hash, you might be able to go back and go and be able to extract it. And we're going to have a look at one of those problems in a bit. But yeah, the only real way of being able to fix it, if you put a secret inside a repository, rotate that secret and make sure it's not being used anywhere else. So we've kind of roughly covered what can go wrong already. Things like hard coding credentials is not best practice. But as we all know, developers do do this, and the developers aren't doing this to be bad people. They're doing it so then they can develop quickly and easily and theyre can spin systems up and down. But this does have issues and we just need to make sure we're aware of what can happen if we start storing those secrets in a git repository. Ideally, what we should be doing is we just shouldn't be doing it. There have been multiple occasions theyre attackers can go and find them. I mean, if we have a quick look on GitHub. So just give me 2 seconds. So all I'm going to do is I'm going to go to GitHub now. I'm going to go up here into the top and I'm just going to have a quick look for remove aws keys, the same as any kind of attacker could do very easily go down to more click on commits and I can see all the keys that have been recently removed from GitHub. So these are aws keys that have been removed. Now I'm not necessarily saying that these people are bad or that whatever they've done is incorrect, and I certainly am. I'm not going to go and click on one of their comments, but if you did, you would most likely see that they'd remove their AWS key. Now that is just evidence that it happens. And if it's that easy for us to just quickly do live on a demo, imagine what's going on out there. For those that were astute would have seen there was 15,000 commits recently where they were removing AWS secrets. It happens. So if we have a quick look at like a standard pipeline. So in this case we've got feature branches, we've got a main branch, so we can do integration, so we can do pull requests and stuff like that, we may well have a junior developer that's just started. He hasn't created a feature branch, he's just committed straight to Main. He's created a env file and he's pushed it up because he wanted to deploy this application out with the username password of admin, which is not unusual. I'm pretty sure we've all seen applications out there that have done that and he's nicely put. The comment of he's added the credentials. Nice. A senior developer has seen what he's done and gone oh my God, what have you done? Or security team have triggered an alert and gone oh my God, what have you done? You need to get those passwords out immediately. Junior developer has gone along and he removed it. He's quite helpfully decided to put a nice little squiggly open bracket saying that you need to put the password there and then he's going to commit as he's removed the password. Now the problem is that he hasn't rotated that password inside his application. And you might be thinking, so what the, so what is that with git history? It's still there. That password is still there inside the history as we can see here. So we've still got the same username, admin, admin and then we've got the password as admin. He hasn't rotated the password and along comes Mr. Bad Guy and he can go back through the git history. He's downloaded this repository, he's done a little search like we were doing on GitHub, discovered that they did a remove. So he's just gone to one commit before, gone and extracted the password, tried it and logged straight into the application. So that is like kind of a very basic example of what can go wrong in scenario one. So let's just continue. So what happens if we do this inside a feature branch? So we have our junior admin, he comes along, he's done an EMP file again, he's put admin admin in theyre and he's put it up there. Our security team have detected what he's done, raised an alert against him, he's gone and removed that password, pushed it back up and he's then done a merge into the main branch. Now the problem is, even when we toddle along a little bit longer, there are certain merge types that just keep all of that history. So if you just do like a standard merge, it will keep all of those previous comments and Mr. Bad guy can just go back quite easily and just go pick it up. And there you go, he's got the credentials. Even if you do a squash commit and you know the shard for that particular comment, you can still go back and be able to extract it. So even just doing squash commits are not going to protect you realistically. You just shouldn't be committing env files into your git repository, especially if theyre got secrets inside that. There are lots of examples of these exploits out there. I've already listed three and we've seen on GitHub people are removing their credentials and if it's that easy to go and find them, bad guys can on average probably find an AWS key that's been uploaded within five to ten minutes. It's pretty easy. So how can we defend against this? Well this is where the SEC, the brilliant sec part, the security bit, gets joined in with development and operations so we can try and implement automated scanning. Now we've got two areas where we can put our git scanning tools and we can either put them on the developer's machine. So before they do a git commit we can do a scan and make sure that they haven't got any secrets in there, and we can also do it on the CI pipeline. Now some people say only do a pre git commit, that should be perfectly fine. The issue is that developers can disable these because they might have a requirement for it. They might just want to do a test, and they might just want to just get it stored up there and just get it out of the way, or it crashes and the application breaks, or there's multiple different reasons. I would do it in both places. I would do it as a pre commit, and I would also do it in the CI pipeline. That way if anything goes wrong on the developer's end, you can pick it up in the CI pipeline. Yes, the secret's up there. Yes, it's in the git history. No, I wouldn't bother going and rewriting the git history. What I would do is I just track record that happened, and then I would get the secret rotated out and then make sure that secret wasn't being used in any other repos or any other applications. Now, there's lots of tools out there that you can use. There's git leaks, there's truffle hog, there's Gitguardian, trivial do it. I'm sure that I've missed a few. I think there's Gitileaks. There's lots and lots out there. GitHub are doing it now. Lots of the git repositories like GitHub, GitLab, Azure radio, they've got these secret scanning tools pre built in. I mean, they charge you for it, but it's there and it's relatively easy to get implemented. So I definitely do it. But when you're doing the CI pipeline, think about where you're putting it, where you're scanning. Do you want to scan on every git push? Probably what I would do, but then once you've done it on the git push, do you need to then rescan it on a pull request or a merge request or anything else that you might be wanting to do. Do you need to rerun it after you've done it on the git push? Probably not. So I probably wouldn't bother. How else can we defend it? I've already alluded to the fact that when I'm doing my CI pipelines, if I'm doing git scanning, secret scanning, if I detect a secret, what I'll do is I will get it to automatically raise a development ticket, and I'll add it into a security backlog or assign it to the project, depending upon what the configuration requires. That way we can then track and make sure that that secret has been rotated. It's not a punishment thing. It's not to make somebody look stupid. It's about understanding where that secret, what happened, why that secret got up there? Has it been rotated? Has it been used anywhere else? And just making sure that we can show that we got those tick boxes and show the security team that we've taken it seriously. We can also make sure that we rotate those secrets and provide training to people as well about how to do this and how to handle those secrets and what the secrets are and why it's a bad idea to put like API keys into the source code. We should also be maybe asking our red teamers or our pen testers to do manual verifications against our git repositories and our git history and make sure that we've not missed anything. See if they can find anything in there. Now one of my pain points is around secret management and trying to get people on boarded with using secret vaults. So rather than you storing your secrets inside your repositories, get your application to go and extract the secret out of the vault. This could be quite easily done. If you're using AWS, you can use assumed roles and go and get it out of these SSM parameter store. It's one of the things that I like to do. You can also, if you really want to, when you're doing your CI pipelines or if you say you're using Argo CD for doing your deployments, you can get Argo CD to go and collect the vault, go collect the secrets and then help it be deployed with your application. Whenever it's going either into an environment variable or into the application as it's being spun up. We can also, if we're using secret vaults, is we can log who's accessing those secrets. So in the case of when we were doing our investigation of who was accessing those secrets, rather than having to go digging through and trying to find, trying to look at who's potentially used it, we can go straight to the vault and pull out an audited log. And that way we can trigger automated alerts. So say we got a developer a over here who then tries to access a secret, a production secret. We can say, right, Mr. Developer, why have you done that? It may well have a legitimate reason, but if he doesn't, we can say, right, you shouldn't have been doing that. Now I'm going to have to rotate that secrets and they can carry out an investigation. I also believe that these secrets should be rotated quite regularly as well. I know from my years of being a system administrator, rotating passwords was quite a big and challenging thing to do. Quite scary because you didn't know where that password was being used or how it was being used, which configuration files it was in. But if we're using a vault and we're not storing it with the application, it means that we should be able to rotate them regularly. We might even want to rotate them with every time we do a release, and then that way we can version control our secrets and roll our secret back. If we roll our application back, it's possible. I've seen it being done. We also should think about the encryption around those secrets. So where are they being used? How are we accessing them? Is the connection between the application and the vault, is that secure? And when it gets pulled down inside the application, where's it being stored? How are we keeping it safe? So those are the kind of things that you should be doing. Now, when we come to implement CI pipelines with secret scanning, I've had this multiple times over my career. There was a large software house that I was working with. It was building software for the NHS, and their development teams were dead against doing any kind of CI pipelines, especially with security scanning, because they were saying, we don't write vulnerabilities. You can't find any vulnerabilities in our system. Personally, I prefer to call them security defects because there are a security defect until somebody can figure a way about how to exploit it. So Alex changed my terminology of when I'm talking to developers. So this particular team, I managed to convince these eventually that we wanted to do secret scanning. We deployed in git leaks into their pipeline, and they were very reluctant, but we got it in. Two days later, my phone blew up and it was all these development teams. Well, this particular development team, I was blocking their pipelines. I was blocking their development. This pipeline's gone wrong, it's creating false positives. And basically it was just utter garbage. So I went and had a look and discovered that they had actually put a production AWS key into the source code that was meant to be gained, delivered to the development environment. The developers obviously looked very sheepish at that point and suddenly realized that by me stopping that deployment, it protected their production environment and stopped them from having to go to the NHS and look rather foolish because theyre leaked out this AWS key, they had these ability to go and rotate it anyway. So they went and rotated it, we tracked it, we understood what happened and that pipeline remained and so did the secret scanning. But if you're trying to deploy secret scanning into a well developed environment and you've got no idea what's out there, there is no easy way of being able to scan all your repos at mass and be able to figure these things out, which is why we developed secret Magpie. So Secret Magpie, we kind of built this as a pre secret scanning tool. So before you implement secret scanning, you can run secret magpie against all your git repositories and be able to understand where your secrets are. Which are they in a comment? Are they in a version? Are they in a file? Who's using them? And give you that ability to tune your git scanning tools before you start implementing them into a CI pipeline. Now you can implement secret Magpie into a CI pipeline, but I would recommend that you keep it as a manual tool, as a verification. Secret Magpie uses two different secret scanning tools. It uses git leaks and uses truffle hog. These reason we use two is because we found that they were these best ones and they complemented each other quite nicely. They weren't finding the same thing twice, they were kind of finding slightly different things and then we built it. So then it gives you a nice easy to read output. It's a nice HTML file that you can these start pivoting through with a nice little bit of Javascript. So let's get on with some demo time because I'm going to do a live demo. So what I'm going to show you here is a couple of commands. So we've built it. So then you can either use it as a docker container or it's a python script. So you can go on and you can just quickly download and make sure that you're running the latest version through our GitHub. We do nightly builds and nightly releases just to make sure that it's all nicely up to date. And as you can see, mine is. So what we're now going to do is I'm going to quickly show you the help file. So as you can look through here, we do it in multiple different ways. We can integrate with BitBucket, we can integrate with GitHub, GitLab, Azure DevOps, or we can just can a flat file system. And if you're using one of these cloud services, we've also got the ability for you to pass through the various different authentication tokens. So with GitHub and Azure Ado you'll have an organization and a private access key or personal access key. GitLab is slightly different, BitBucket again is slightly different again. Or if you just want to download all of the systems yourselves and just scan a path, you can certainly do that as well, we've also got some other options inside theyre as well. So you can control the output of your file. You can also have it as a CSV or a JSoN or the nice HTML file which I'm going to show you. You can also disable parts of it. You can also single it down to a single branch. So rather than running multiple branches you can have just one single branch. And by default we limit these number of branches to 20. But you might have more than 20 branches. So we limited it by default to maximum of 20 branches. But if you got more than 20 branches you can certainly push that up. It'll just take longer to scan through, so it's nice and easy to run, as we can see here. I'm just going to pass through an output folder into the docker container. I'm then going to run secret magpie. I'm going to run it against our GitHub environment. I've created an organization called Punk security demo and pass through as an environment variable, my git token, because obviously I don't want to release that. That would be rather foolish of me. We're also going to output the file into that output directory and I think we'll call it results. And then I want the HTML file. So as this is busy running, as you can see, you can see a nice verbose output and with every good security tool it should have beautiful ASCII art. So we've got some nice beautiful ASCII art in here. As you can see here we're scanning a repository called wrong secrets. By default. This has got 22 open branches and we're limiting it to the first 20. So there's two branches that aren't going to get scanned. I just wanted to be able to show you guys how that looks and how it works. We're using this wrong secrets repo. It is actually an OWASp repository which you can go and get yourself. So if you do a Google search for OwAsp wrong secrets, you will find it. It is a bit of a secret CTF challenge designed by OwAsp. We're using it in this one just because it's the easiest one to be able to show you how secret scanning kind of works. Now I have just slightly paused the video theyre just for a minute, just whilst it ran through, because I didn't want to waste any time. But we can have a quick look through this output and as we look inside this output, we've scanned through one repository. It's found 536 secrets. 200 of those are unique truffle hog found sorry, git leaks found the most and truffle hog found eleven pretty good. We found some various different interesting things. So like private keys, AWS keys, GCP security keys, some fine grained passwords. If we want to now have a quick look at the HTML file. This is the bit that I think gives you more a better user experience. So we can go inside here and we can quickly filter on, say, AWS keys. So now it's just filtered on the AWS filters. We can now see the files that these are stored in. We can click on those and we should be able to go straight through to the file and be able to see the AWS key there, be able to verify it. We can now go back into secret Magpie, mark that as a confirmed if you wanted to. So this one's already been marked as verified because that's what it's done already. But we can now track inside here whether you want to confirm that you've rotated it. It needs rotating. You can mark it as a false positive and the idea is that you can quickly rotate through these and then export the results as a CSV file. If you're interested in learning more about it, please feel free to reach out to us with any questions that you might have or if you want to use this and then you found there's a bug or something like that, hit us up on either LinkedIn or raise an issue on our git repository and we'll get back to you. So at this point I just want to say thank you very much. I realize it, I'm three minutes slightly over my time, but I hope you've enjoyed it. And if you want to carry on the conversation afterwards, please feel free to come and find me.
...

Daniel Oates-Lee

Director & DevSecOps Evangelist @ Punk Security Limited

Daniel Oates-Lee's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)