Conf42 Chaos Engineering 2022 - Online

What's Really Going on Inside Your Node_Modules Folder

Video size:

Abstract

Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past year and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.

Summary

  • Faross: In October, a hacker offered to sell the password to an NPM account that controlled a package with over 7 million weekly downloads. Malware was added to these packages that would execute immediately whenever anyone installed one of the compromised versions. Faross: This is just the tip of the iceberg as attackers take advantage of the open source ecosystem.
  • Vulnerabilities are accidentally introduced by maintainers, by the good guys. Malware is intentionally introduced into a package by an attacker. It will always end badly if you ship malware to production. We need a new approach to detect and to block malicious dependencies.
  • We downloaded every package on NPM and we spent a few weeks poking around. The most common attack vector is typo squatting. Dependency confusion happens when a company publishes packages to an internal NPM registry. Third vector that we see a lot is hijacked packages.
  • Most malware is in install scripts. Most malicious packages actually start their routines upon installation. The next is privileged API usage. And finally, we have obfuscated code.
  • If you ship code to production, you are ultimately responsible for it. Most people don't think of their open source this way. We built a tool at socket to help with this problem so you can quickly at a glance get an idea of the security of a package.
  • There's a question about how quickly you should update your dependencies. If you update too slowly, you're exposed to known vulnerabilities. Another idea is to audit every dependency. Is there a way to use automation to kind of do something in the middle?

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Jamaica make up real time feedback into the behavior of your distributed systems and observing changes exceptions errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Close hello and welcome. Thanks for coming to my talk. It's a jungle out there. What's really going on inside your node modules folder? My name is Faross and I'm an open source maintainer. I started webtorint, which is a peer to peer file transfer protocol and standard JS, a linter that catches bugs and enforces code style. I've been doing open source since 2014 and have created over 100 npm packages. In the past, I volunteered on the node JS board of directors, and I also teach a class on web security at Stanford University. Now I'm the founder of a startup called Socket, which helps protect the open source ecosystem. Before we get started, let me tell you a story. On January 13, 2012 over ten years ago, a developer named Fisal Salman published a new project to GitHub. It was called Uapser JS, and it parsed user agent strings. Now, lots of people found this project useful, and so over the next ten years, Fisal continued to develop the package. Along with the help from many open source contributors, he published 54 versions. As the package grew in popularity, it eventually grew to 7 million downloads per week, eventually being used by nearly 3 million GitHub repositories. Now let me tell you a different story. On October 5, 2021, on a notorious russian hacking forum, this post appeared a hacker was offering to sell the password to an NPM account that controlled a package with over 7 million weekly downloads. His asking price was $20,000 for this password. Now this is where the two stories intersect. Two weeks later, Uapars JS was compromised and three malicious versions were published. Malware was added to these packages that would execute immediately whenever anyone installed one of the compromised versions. So now let's take a look at what that malware does. So this is the package JSOn file for the compromised version, and you'll see that it uses a preinstalled script. So this means that this command will run automatically anytime this package is installed. So now let's look at what that script does. So the first thing you'll see is that it splits based on the operating system of the target. On Mac, nothing happens, which is lucky for Mac users. But Windows and Linux users aren't so lucky. And you'll see here that command prompt is spawned for each of these platforms using childprocess exec. So now let's take a look at what that preinstall sh script does. The very first line fetches the user's country and figures out whether the user is coming from Russia, Ukraine, Belarus, or Kazakhstan, and stores that in a variable. Now if the user comes from one of those countries, then the script exits without doing anything further. However, if you come from any other country, then the script proceeds to download an executable file from this IP address, mark that file as executable, and then run it. And now based on these command line flags, you can see here that this program is a Monero miner, which is going to be used to mine the monero cryptocurrency for the attacks. Now this is a script on Windows. It's very similar, so it starts off with downloading that same or similar monero minor, but it also downloads a DLL file as well and runs whats. And then here you can see it just starting up the Monero minor and registering the DlL file on Windows. Now what does this extra DlL file do? Well, it steals passwords from over 100 different programs on the Windows machine, as well as all the passwords in the Windows credential manager. So yikes, this is a really nasty piece of malware, and anyone unlucky enough to run this lost all their passwords and had to do kind of a complete reset of their online accounts. Not a fun time. So this is kind of the aftermath. So this package was published for about 4 hours, and the open source community was pretty diligent and reported it, and the maintainer was also quite diligent. And so anyone who happened to install it during the four hour window was compromised, but it was removed relatively quickly. Any software builds done in projects without using a lock file were compromised, and anyone who was unlucky enough to update to this new version of the package, or maybe who merged a bot pr to update to this new version during this time would have also been compromised. So this was big news in the JavaScript world, and I'm guessing that you may have already heard about this attack, but this is really just the tip of the iceberg. So we've been tracking packages that are removed from NPM for security reasons, and we've seen over 700 packages removed for security reasons in just the last 30 days. And I think this trend is accelerating as attackers take advantage of the open ecosystem and the trust that maintainers have for each other and the sort of liberal contribution policies that we've all sort of adopted in the modern open source era. So I think 2022 will be the year of supply chain security, as the awareness of this issue is now coming to the fore. So one question you might ask is why is this happening now? I want to start by just pointing out that what we're trying to do here is kind of crazy. We're trying to download code from the Internet, written by unknown individuals that we haven't read, that we execute with full permissions on our laptops and our servers where we keep our most important data. So this is what we're doing every day when we use NPM install. And I just have to say really quickly that I personally think it's a miracle that this system works and that it's continued to mostly work for this long. It's a testament, I think, to how good most people are, but unfortunately not everyone is good. So let's dive into why this is happening now. The first reason is that 90% of your app's code comes from open source. So we're really standing on the shoulders of giants. And open source is the reason why we can get an app off the ground in hours and days instead of weeks or months. And it's the reason that we don't need to be an expert in cryptography or in time zones or the virtual dom to build a powerful modern web app. It's also the reason why your modules folder folder is one of the heaviest objects in the universe. Another reason is that we have lots and lots of transitive dependencies. The way that we write software has changed. We use dependencies a lot more liberally, and so installing even a single dependency often leads to many, many transitive dependencies that come in as well. A 2019 paper at the Usenix conference actually found that installing an average NPM packages package introduces an implicit trust on 79 3rd party packages and 39 maintainers, creating a surprisingly large attack surface. And so what we have here is a visualization that my team at socket created that shows you what webpack looks like if you kind of go into the your modules folder and really look at what's inside. So each gray box here represents a package and each purple box represents a file or files inside of a package. And so as you take away each layer of the dependency tree, you'll see that you just keep finding more and more packages nested inside the top level package until you eventually get down to the bottom here. But this is just an insane number of files and just a lot of modules flying around here. The next reason is that no one really reads the code. There are some people who do, but by and large people don't look at the code that they're executing on their machines. One big reason is that NPM really doesn't make this very easy. If you go to the package page for Uapserjs and you click on the Explore tab here, you'll see that you can't even see the files of this package. So people have to resort to clicking the GitHub link and going and checking GitHub and hoping that the code on GitHub matches the code that's on NPM, which is not necessarily true. But that's okay. That's okay. We can rely on Linus's law that given enough eyeballs, all bugs are shallow. So if there is a security issue in a package or malware in a package, we can rely on others to find it, right? But if everyone does that, then who is finding the malware? And so maybe this is the reason why, on average, a malicious package is available for 209 days before it's publicly reported. This comes from a research paper by Omital. So that's 209 days during which the wrong NPM command can end extremely badly. And I find this number personally very shocking. A 2021 paper at NDSS, a prestigious security conference, also found similar results, including that 20% of these malware persist in package managers for over 400 days and have more than 1000 downloads. And the fourth reason is that popular tools give a false sense of security. A lot of popular tools scan for known vulnerabilities. So in 2022, I believe this is no longer sufficient. We can't just scan for known vulnerabilities and stop there. And yet, that's what the most popular supply chain security products do, leaving you vulnerable. The thing is, it can take weeks or months for a CVE or a known vulnerability to be discovered, reported, and detected by tools. And so it's just not fast enough. So it may be worth taking a minute here to just quickly distinguish between known vulnerabilities and malware, because they're very different. Vulnerabilities are accidentally introduced by maintainers, by the good guys, and they have varying levels of risk. So sometimes it's okay to intentionally ship a known vulnerability to production if it's low impact. Even if you have vulnerabilities in production, they may not be discovered or exploited before you update to a fixed version. So you have some time to address these kinds of issues, usually now. Malware, on the other hand, is quite different. Malware is intentionally introduced into a package by an attacker, almost never the maintainer, and it will always end badly if you ship malware to production. You don't have a few days or weeks to mitigate the issue. You need to really catch it before you install it on your laptop or on a production server. But in today's culture of fast development, a malicious dependency can be updated and merged in a very short amount of time. And so unfortunately, this leads to increased risk of supply chain attacks because the quicker you update your dependencies, the fewer eyeballs that have had a chance to look at the code. So I really think we need a new approach to detect and to block malicious dependencies. But before we get into that, let's look a little deeper into how a supply chain attack actually works and the mechanics of it. So we downloaded every package on NPM and we spent a few weeks poking around. The download was 100 gigs of metadata and 15 terabytes of packaged tarballs. And as we poked around this metadata and all these packages, we noticed a few trends in the types of attacks we saw. So I'm going to go over these attacks. These are what we found. So there are attack vectors, which is sort of how the attacker tricks you and gets you to run their code in the first place. And then there are attack tactics, which are whats the attack code actually does, or the techniques that the attacker uses to get their code or to hide their code. So let's talk about attack vectors. The first and the most common attack vector is typo squatting. So typo squatting is when an attacker publishes a package which has a very similar name to a legitimate and popular package. And so you can see here theyre are two packages here with very similar names, and one of these is malware and one of these is the real package. But I would guess that it would be hard for you to know that without actually cracking open these packages to see what's inside. So let's open up the malware package and take a look at whats it's doing. So you can see here again, it's using an install script, which is a very common technique that malware uses. And if you open up this install script to look at the code, you'll find that the file is heavily obfuscated. But I can tell you, even without knowing exactly what this code is doing, you can bet this is not something that you want to run on your machine. The next attack vector that we saw is called dependency confusion. So this is pretty closely related to typo squatting. Dependency confusion happens when a company publishes packages to an internal NPM registry and uses a name that hasn't been taken yet on the public NPM registry. And so later an attacker can come along and register a package with the same name as the public version and confuse internal tools so that internal tools will accidentally install the public version. So this is why it's called a dependency confusion attack. So looking through the recently deleted NPM packages, we were able to find a bunch of likely dependency confusion attacks and most of these packages had malicious code in them. So all these packages have names which appear to conflict with internal company package names. You can see here a whole bunch of different organizations, including governments, were affected by this. And here are a bunch more clearly targeting these specific companies here in this list. And finally, the third vector that we see a lot is hijacked packages. So these are the ones that you usually see in the news quite a lot. So criminals and thieves finding ways to infiltrate our communities and infect popular packages. Once they infect a popular package, once they get control of it and they can publish to it, they'll steal credentials or install backdoors or abuse compute resources for cryptocurrency mining. And so these happen for various reasons. So sometimes it's because the maintainer chooses a weak password or reuses the password, or maybe the maintainer gets malware on their laptops. This is also kind of not helped by the fact whats NPM doesn't enforce two FA for all accounts currently, although they are starting to enforce this for the most popular accounts. And finally, sometimes maintainers just get tricked and give access to a malicious actor. This is partially just due to the fact that maintainers are overworked and when someone offers a helping hand, it's sometimes hard to say no to the help. So this is also a big vector as well. So now let's talk about some attack tactics. So what does this attack code actually do? So as we mentioned, install scripts are a huge vector. Most malware is in install scripts. And so this is a quote from a paper we mentioned earlier. So most malicious packages actually 56% start their routines upon installation, which might be due to poor handling of arbitrary code during install. So in the NPM package manager, packages are allowed to just say, hey, when this package is installed we want to run some code. And so unfortunately though, install scripts do have some legitimate uses, so we can't just disable them. It's not an easy problem to solve. So let's take a look at just another example of an install script. Again, you'll see it right here in the package JSON file. Super common. The next is privileged API usage. So we see packages accessing the network, accessing the file system, and accessing environmental variables. This is very, very common, because when an attacker runs code, what theyre want to do usually is steal some secrets, and they need the network to exfiltrate those secrets. So this is a typical example of malware that does that. So you can see here that it's making an HTTP request to an IP address and it's sending some data. The data it happens to be sending is process n, which contains all the environment variables in the environment. And then here is actually another file that it includes, which is a different exfiltration technique that uses DNS instead of HTTP. So the way this works is it creates a DNS resolver, and then it gathers the environment variables, and then it does a DNS lookup with those variables as the subdomain. So it's just another way to get the data out of the system. And finally, we have obfuscated code. So we took a look at an example of this earlier. So obfuscated code like this is just obviously, it's really hard to see at a glance of what it's doing. Although there are tools to attempt to unoffiscate code like this, there's also another kind of obfuscation, which is attackers can publish different code to NPM than they do on GitHub. And so when they do that, as I mentioned earlier, NPM doesn't make it easy to see what code is actually in the NPM package. And so a lot of people who are trying to evaluate a package will rely on the code that's on GitHub, and there's no guarantee that that code is the same. Okay, so now let's talk about how you can protect your. You know, we asked ourselves this question when we were working on, my company was working on a product called Wormhole, which lets you share files with end to end encryption. And our goal was to try to build the most secure and private way to send files. So we did all the usual security things that we could think about. We thought about security early in the design process. We wrote tests, we enforced code reviews, and we were pretty thoughtful about the dependencies that we chose to use. But we still felt like we could do better. And so we started thinking really carefully about this problem and what we could do to make it better. So the first kind of thing I recommend is that you can just try choosing better dependencies. If you ship code to production, you are ultimately responsible for it. And as an industry, I think we need a mindset shift here, because people assume that they can just install stuff from the Internet and that it's going to be safe and it's not necessarily true. And if you're shipping code to production that includes open source code, then really ultimately that code is part of your app, and so you are ultimately responsible for the behavior of that code. And the most popular open source license, the MIT license, actually literally says this. In the license, it says that the open source code is provided as is with no warranty of any kind, and in no event shall the author be liable for any claim, damages or liability. And so while this is legally true, most people don't think of their open source this way. And I think we really do need a mindset shift. The other thing is, very few of us actually read the code that we're shipping to production, and so we rely on other heuristics to help pick dependencies. So maybe we look at does the code get the job done? Does it have an open source license, does it have good docs, does it have lots of downloads and GitHub stars, does it have recent commits, does it have types? And does it have tests? And we're not really cracking open the code to go much beyond this. So what that means is that we're sort of not aware of what the code may be doing. And so we built a tool at socket to help with this problem so you can quickly at a glance get an idea of the security of a package. And so this is what it looks like. So you can go to socket and look up packages to figure out what behavior the package has. And so in this example here, you can see that this package contains install scripts and that's called out very prominently on the page. So that's the first thing whats you see. And this package also happens to contain binary or native code, which means that it's not easy to audit the code. It's not like human readable. And so both of these issues are called out. And in this case it's not necessarily, and this is not a supply chain attack by any means, but it is nice that this is called out very prominently so that you can make an informed decision if you want to use this package or not. You can also see that we have very helpful quality scores that show up at the top of the page as well. Now let's take a look at another example. So this package here, angular calendar, is quite a useful package. It's a calendar component that shows up on the page and renders a little calendar. But if you dig into its dependencies you'll actually find that some of its dependencies are doing quite invasive things. So here you'll see that one of its dependencies contains install scripts. It also runs the shell scripts and accesses a file system and accesses the network. So this is probably not something that you would expect a component, a web component to be doing. And so it may be worth a little bit of further investigation to figure out what's going on here before you use this package. The other thing that we do that's quite cool is we can highlight when packages do these things and put that directly in line in the code. So in this package here, I opened it up to take a look at the files, and I could see here that the module is accessing the network as well as accessing environment variables. And I can see the exact lines where the package is doing each of these things. And so it makes it a little bit easier to get an idea of what a package is doing before you run it. So if you want to research packages on socket before you use them, this is the URL you can use, and I highly recommend you take a look at some packages there and use that information to make an informed decision before you select a package. Okay. The other thing you can do is think about updating your dependencies at the right cadence. So what do I mean by this? So there's a question about how quickly you should update your dependencies. And this is actually a question we struggled with on our team as well. So you can think of it as should we update slowly or should we update really really quickly and aggressively. If you update too slowly, you're exposed to known vulnerabilities and you're running code that's old and that may have issues, may have some bugs that have been fixed in the newer version. And so there's some downsides to updating too slowly. On the other hand, if you update too quickly, you expose yourself to supply chain attacks because you're now running code that may have been published literally yesterday or in the last couple of days, which means whats, you haven't had that many eyeballs able to look at the code. And so as you think about security, you have to balance this trade off. And there really is no perfect solution here. It's just a hard problem. Another idea is to audit every dependency. So if you're building a truly security critical application like we were doing with wormhole, then one option is to literally read every line of code of your dependencies. So if we chain put this on an axis of starting from full audit on the one hand reading every line of code to yoloing, on the other hand. By yoloing I mean like doing nothing. How closely should you audit your dependencies? And what you see here is we're in the same situation, we have trade offs and really no good solutions. So doing a full audit is something that only the biggest and richest companies seem to do in practice. It's a lot of work. Usually you need to have a security team looking at every one of these packages, and they also have to approve them one at a time and add them to can allow list, which is really slow. And this is expensive just because of the time and the effort that it takes. On the other hand doing nothing and just installing whatever you want without even looking at the code. Whats its downsides? So it means that you're vulnerable to supply chain attacks. It's risky, and a breach or bad security press can be expensive, especially as regulators start to crack down on this issue more. And so this is another difficult trade off. What do you do? And most teams I think, err on the side of doing nothing, but I think this is just a hard problem. So one thing that we tried to do when we were building wormhole is to sort of think about a happy medium. Is there a way to use automation to kind of do something in the middle? And so what we want to do, and what we ended up doing is using automation to automatically evaluate all of our dependencies. So we could use static analysis to look through packages to try to find malware, hidden code, typo, squatting attacks and this kind of thing. And that way we could manually audit only the most suspicious packages so we could spend our limited team resources looking at the code for the most suspicious packages. And that's the most high impact way that we could spend our time. And so this seems much better to me than an all or nothing approach where you either audit everything or you just hope for the best and look at nothing. And then the other thing we wanted to do is make sure that the security information was shown directly in pull requests, so that the developers on our team were empowered to solve the security issues that they saw before they deployed into production. So what does this actually look like? So this is the bot that we created. It's implemented as a GitHub app that you can install on your GitHub repository. And whenever it sees that the package JSON file or the yarn lock file has been modified, it will take a look at the new dependency that's been added and it will run a full health report against that dependency. And if there's any issues found in it, it will leave a comment with whatever the issue is that was discovered. And so that way the developer reviewing the pull request can look at it and have their attention drawn to this potential issue. In this screenshot here, you can see that I accidentally installed the package browser list instead of browsers list, which is actually a very easy mistake to make. And actually for that reason, browser list, the typo package actually has something like 700,000 downloads a year. So this is really, really helpful. This is the kind of thing that augments your review process, and it's very low cost since it only raises issues that are really worth your attention. And it runs automatically. So if you want to actually try this app out, we've actually published it for anyone to use. It's free, so you can install our GitHub app by just going to socket dev, and I recommend you give it a try and let me know what you think. It has a bunch of cool features, so it actually can block typo squats, which as I just showed you earlier, but also can block malware, detect hidden code, detect privileged API usage such as the use of file system, network, child process, et cetera. And also it can detect suspicious updates. So these are updates that significantly change the package's behavior. So we have a whole bunch of things we look for in packages. We actually have 70 detections in five different categories. So we have supply chain risk, quality, maintenance, known vulnerabilities and license. And we wrote, basically these are just all static analysis rules that we wrote. You can kind of think of this as a linter in a way. So it's sort of looking at the packages code and then looking for these different problems. We tried to focus all of the rules on problems which are something that you as a user of the package really want to know about and not things that require a lot of knowledge of the internals of the package. So the things that it finds need to be actionable to you as the developer choosing to use this package. And so that's what we tried to do in our rule development here. So yeah, if you want to try this out, if you want to poke around our website and look at these different issues, you can try it out at socket dev. And we have made it free for open source forever. And if you have a private repo, it's free while we're in beta. And I really do want people to give this a shot and share their feedback with us because this supply chain security problem is big and only getting bigger, and I really do want the community to share their feedback with me on this. I think together we can really do a good job improving supply chain security in 2022 and making 2022 not the year that the supply chain is destroyed, but rather the year that it's protected better than ever. So please share your feedback with me. There's my email and my twitter, and also we're hiring at socket if you're interested in working on this project and helping to secure the software supply chain. Thanks for your time.
...

Feross Aboukhadijeh

Founder @ Socket

Feross Aboukhadijeh's LinkedIn account Feross Aboukhadijeh's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)