Transcript
This transcript was autogenerated. To make changes, submit a PR.
Jamaica make up real
time feedback into the behavior of your distributed systems
and observing changes exceptions errors
in real time allows you to not only experiment with confidence,
but respond instantly to get things working again.
Close hello
and welcome. Thanks for coming to my talk. It's a jungle out there. What's really
going on inside your node modules folder? My name is Faross
and I'm an open source maintainer. I started webtorint, which is a peer
to peer file transfer protocol and standard JS, a linter that catches
bugs and enforces code style. I've been doing
open source since 2014 and have created over 100 npm packages.
In the past, I volunteered on the node JS board of directors, and I
also teach a class on web security at Stanford University.
Now I'm the founder of a startup called Socket, which helps protect
the open source ecosystem. Before we get started,
let me tell you a story. On January 13,
2012 over ten years ago, a developer
named Fisal Salman published a new project to GitHub.
It was called Uapser JS, and it parsed user
agent strings. Now, lots of people found this project
useful, and so over the next ten years, Fisal continued
to develop the package. Along with the help from many open source
contributors, he published 54 versions.
As the package grew in popularity, it eventually
grew to 7 million downloads per week,
eventually being used by nearly 3 million GitHub repositories.
Now let me tell you a different story.
On October 5, 2021, on a notorious
russian hacking forum, this post appeared a
hacker was offering to sell the password to an NPM account
that controlled a package with over 7 million weekly downloads.
His asking price was $20,000 for this password.
Now this is where the two stories intersect.
Two weeks later, Uapars JS was compromised and
three malicious versions were published. Malware was
added to these packages that would execute immediately
whenever anyone installed one of the compromised versions.
So now let's take a look at what that malware does. So this is the
package JSOn file for the compromised version,
and you'll see that it uses a preinstalled script. So this means that
this command will run automatically anytime this package is
installed. So now let's look at what that script does.
So the first thing you'll see is that it splits based on the operating
system of the target. On Mac,
nothing happens, which is lucky for Mac users.
But Windows and Linux users aren't so lucky. And you'll see here that command
prompt is spawned for each of these platforms using
childprocess exec. So now let's take a look
at what that preinstall sh script does. The very
first line fetches the user's country and
figures out whether the user is coming from Russia, Ukraine,
Belarus, or Kazakhstan, and stores that in a variable.
Now if the user comes from one of those countries,
then the script exits without doing anything further.
However, if you come from any other country, then the script proceeds
to download an executable file from this IP
address, mark that file as executable,
and then run it. And now based on these command
line flags, you can see here that this program is a Monero miner,
which is going to be used to mine the monero cryptocurrency
for the attacks. Now this is a script
on Windows. It's very similar, so it starts off with
downloading that same or similar monero minor,
but it also downloads a DLL file
as well and runs whats.
And then here you can see it just starting up the Monero
minor and registering the DlL file on Windows.
Now what does this extra DlL file do?
Well, it steals passwords from over 100 different
programs on the Windows machine, as well as all
the passwords in the Windows credential manager. So yikes,
this is a really nasty piece of malware, and anyone
unlucky enough to run this lost all their passwords and
had to do kind of a complete reset of their online accounts.
Not a fun time. So this is kind of the aftermath.
So this package was published for about 4 hours, and the
open source community was pretty diligent and reported it, and the maintainer
was also quite diligent. And so anyone who happened to install it
during the four hour window was compromised, but it was removed relatively
quickly. Any software builds done in
projects without using a lock file were compromised, and anyone who
was unlucky enough to update to this new version
of the package, or maybe who merged a bot pr to
update to this new version during this time would have also
been compromised. So this was big news in the JavaScript
world, and I'm guessing that you may have already heard about this attack, but this
is really just the tip of the iceberg. So we've been tracking
packages that are removed from NPM for security reasons, and we've seen
over 700 packages removed for security reasons in just the last 30 days.
And I think this trend is accelerating as attackers take advantage of the
open ecosystem and the trust that maintainers have for each other
and the sort of liberal contribution policies that we've all sort of
adopted in the modern open source era. So I think 2022
will be the year of supply chain security, as the awareness
of this issue is now coming to the fore. So one question you
might ask is why is this happening now? I want to start
by just pointing out that what we're trying to do here
is kind of crazy. We're trying to download code from the Internet,
written by unknown individuals that we haven't read,
that we execute with full permissions on
our laptops and our servers where we keep our most important data.
So this is what we're doing every day when we
use NPM install. And I just have to say really quickly that
I personally think it's a miracle that this system works
and that it's continued to mostly work for this long.
It's a testament, I think, to how good most people are,
but unfortunately not everyone is good. So let's dive
into why this is happening now. The first
reason is that 90% of your app's code comes from open source.
So we're really standing on the shoulders of giants. And open source
is the reason why we can get an app off the ground in hours
and days instead of weeks or months. And it's the reason that
we don't need to be an expert in cryptography or in time zones
or the virtual dom to build a powerful modern
web app. It's also the reason why your modules folder
folder is one of the heaviest objects in the universe.
Another reason is that we have lots and lots of transitive dependencies.
The way that we write software has changed. We use dependencies
a lot more liberally, and so installing even a single dependency
often leads to many, many transitive dependencies that come
in as well. A 2019 paper at
the Usenix conference actually found that installing
an average NPM packages package introduces an implicit trust on
79 3rd party packages and 39 maintainers,
creating a surprisingly large attack surface.
And so what we have here is a visualization that
my team at socket created that shows you what webpack looks
like if you kind of go into the your modules
folder and really look at what's inside. So each gray box
here represents a package and each purple box represents a
file or files inside of a package. And so as you
take away each layer of the dependency tree, you'll see that you just keep finding
more and more packages nested inside the top level package
until you eventually get down to the bottom here. But this is just an
insane number of files and just a lot of modules flying around
here. The next reason is that no one really reads the code.
There are some people who do, but by and large people
don't look at the code that they're executing on their machines.
One big reason is that NPM really doesn't make this very easy. If you
go to the package page for Uapserjs and you click on the
Explore tab here, you'll see that you can't even see the files
of this package. So people have to resort to clicking
the GitHub link and going and checking GitHub and hoping that the code
on GitHub matches the code that's on NPM, which is not necessarily
true. But that's okay. That's okay. We can rely on
Linus's law that given enough eyeballs, all bugs are shallow.
So if there is a security issue in a package or malware
in a package, we can rely on others to find it,
right? But if everyone does that,
then who is finding the malware? And so maybe this is the reason
why, on average, a malicious package is available for 209 days before
it's publicly reported. This comes from
a research paper by Omital.
So that's 209 days during which the wrong NPM command can end
extremely badly. And I find this number personally very shocking.
A 2021 paper at NDSS, a prestigious security conference, also found
similar results, including that 20% of these malware persist in package
managers for over 400 days and have more than 1000 downloads.
And the fourth reason is that popular tools give a false sense
of security. A lot of popular tools scan for
known vulnerabilities. So in 2022, I believe this
is no longer sufficient. We can't just scan for known vulnerabilities and
stop there. And yet, that's what the most popular supply chain security products
do, leaving you vulnerable. The thing is, it can take weeks
or months for a CVE or a known vulnerability to be discovered,
reported, and detected by tools. And so
it's just not fast enough. So it may
be worth taking a minute here to just quickly distinguish between
known vulnerabilities and malware, because they're very different.
Vulnerabilities are accidentally introduced by maintainers, by the good
guys, and they have varying levels of risk. So sometimes it's
okay to intentionally ship a known vulnerability to production
if it's low impact. Even if you have vulnerabilities
in production, they may not be discovered or exploited before you update to a
fixed version. So you have some time to address these kinds of issues, usually now.
Malware, on the other hand, is quite different. Malware is intentionally introduced into
a package by an attacker, almost never the maintainer, and it
will always end badly if you ship malware to production. You don't have
a few days or weeks to mitigate the issue. You need to really catch
it before you install it on your laptop or on a production server.
But in today's culture of fast development, a malicious dependency
can be updated and merged in a very short amount
of time. And so unfortunately,
this leads to increased risk of supply chain attacks because
the quicker you update your dependencies, the fewer eyeballs that have had a chance to
look at the code. So I really think we need a new approach to
detect and to block malicious dependencies. But before we get into
that, let's look a little deeper into how a supply
chain attack actually works and the mechanics of it.
So we downloaded every package on NPM
and we spent a few weeks poking around. The download
was 100 gigs of metadata and 15 terabytes of packaged tarballs.
And as we poked around this metadata and
all these packages, we noticed a few trends
in the types of attacks we saw. So I'm
going to go over these attacks. These are what we found.
So there are attack vectors, which is sort of how the attacker tricks
you and gets you to run their code in the first place. And then there
are attack tactics, which are whats the
attack code actually does, or the techniques that the attacker uses to
get their code or to hide their code.
So let's talk about attack vectors. The first and the most common
attack vector is typo squatting. So typo
squatting is when an attacker publishes
a package which has a very similar name to a legitimate and popular
package. And so you can see here theyre are two packages here with very similar
names, and one of these is malware and one of these is the real
package. But I would guess that it would be hard for you to know that
without actually cracking open these packages to see what's inside.
So let's open up the malware package and take a look
at whats it's doing. So you can see here again, it's using an install script,
which is a very common technique that malware uses.
And if you open up this install script to look at the code, you'll find
that the file is heavily obfuscated. But I can tell you,
even without knowing exactly what this code is doing, you can bet this is not
something that you want to run on your machine.
The next attack vector that
we saw is called dependency confusion. So this is pretty closely
related to typo squatting. Dependency confusion happens when a company publishes
packages to an internal NPM registry and uses a
name that hasn't been taken yet on the public NPM registry.
And so later an attacker can come along and register
a package with the same name as the public version and
confuse internal tools so that internal tools will accidentally install
the public version. So this is why it's called a dependency confusion attack.
So looking through the recently deleted NPM packages, we were able to find
a bunch of likely dependency confusion attacks and most of these packages
had malicious code in them. So all these packages have names which
appear to conflict with internal company package names.
You can see here a whole bunch of different organizations, including governments,
were affected by this. And here are
a bunch more clearly targeting these specific companies here in
this list. And finally,
the third vector that we see a lot is hijacked
packages. So these are the ones that you usually see in the news quite a
lot. So criminals and thieves finding ways to infiltrate
our communities and infect popular packages. Once they
infect a popular package, once they get control of it and they can publish to
it, they'll steal credentials or install backdoors or abuse
compute resources for cryptocurrency mining. And so
these happen for various reasons. So sometimes it's because
the maintainer chooses a weak password or reuses the password,
or maybe the maintainer gets malware on their laptops. This is
also kind of not helped by the fact whats NPM doesn't
enforce two FA for all accounts currently, although they
are starting to enforce this for the most popular accounts.
And finally, sometimes maintainers just get tricked and give
access to a malicious actor. This is partially just
due to the fact that maintainers are overworked and when someone offers
a helping hand, it's sometimes hard to say no to
the help. So this is also a big vector as
well. So now let's talk about some attack
tactics. So what does this attack code actually do?
So as we mentioned, install scripts are a huge vector. Most malware
is in install scripts. And so this
is a quote from a paper we mentioned earlier. So most malicious packages
actually 56% start their routines upon installation, which might be
due to poor handling of arbitrary code during install.
So in the NPM
package manager, packages are allowed to just say, hey, when this package is
installed we want to run some code. And so unfortunately though,
install scripts do have some legitimate uses, so we can't just disable them.
It's not an easy problem to solve. So let's take a
look at just another example of an install script.
Again, you'll see it right here in the package JSON file. Super common.
The next is privileged API usage. So we see
packages accessing the network, accessing the file system,
and accessing environmental variables. This is very,
very common, because when an attacker runs code, what theyre want to do
usually is steal some secrets, and they need the network to exfiltrate those secrets.
So this is a typical example of malware that does that. So you can see
here that it's making an HTTP request to
an IP address and it's sending
some data. The data it happens to be sending is
process n, which contains all the environment variables
in the environment. And then
here is actually another file that it includes, which is a different exfiltration
technique that uses DNS instead of HTTP. So the
way this works is it creates a DNS resolver, and then it
gathers the environment variables, and then it does a DNS
lookup with those variables as the subdomain.
So it's just another way to get the data out of the system.
And finally, we have obfuscated code.
So we took a look at an example of this earlier.
So obfuscated code like this is just obviously, it's really hard
to see at a glance of what it's doing. Although there are tools to attempt
to unoffiscate code like this, there's also another
kind of obfuscation, which is attackers can publish different code to NPM
than they do on GitHub. And so when
they do that, as I mentioned earlier, NPM doesn't make it easy
to see what code is actually in the NPM package.
And so a lot of people who are trying to evaluate a package will rely
on the code that's on GitHub, and there's no guarantee that that code
is the same. Okay,
so now let's talk about how you can protect your.
You know, we asked ourselves this question when we were working
on, my company was working on a product called Wormhole,
which lets you share files with end to end encryption. And our
goal was to try to build the most secure and private way to send files.
So we did all the usual security things that we could think about.
We thought about security early in the design process. We wrote tests, we enforced
code reviews, and we were pretty thoughtful about the dependencies that we chose to use.
But we still felt like we could
do better. And so we started thinking really carefully about
this problem and what we could do to make it better. So the
first kind of thing I recommend is that you can just try choosing better dependencies.
If you ship code to production, you are ultimately responsible
for it. And as an industry, I think we need a mindset
shift here, because people assume
that they can just install stuff from the Internet and that it's going to be
safe and it's not necessarily true.
And if you're shipping code to production that includes open source
code, then really ultimately that code is part of your app,
and so you are ultimately responsible for the behavior of that code.
And the most popular open source license, the MIT license,
actually literally says this. In the license, it says that the open source code is
provided as is with no warranty of any kind, and in no event shall the
author be liable for any claim, damages or liability. And so
while this is legally true, most people don't think of their open source this way.
And I think we really do need a mindset shift.
The other thing is, very few of us actually read the code that we're shipping
to production, and so we
rely on other heuristics to help pick dependencies.
So maybe we look at does the code get the job done? Does it have
an open source license, does it have good docs, does it have lots of downloads
and GitHub stars, does it have recent commits, does it have types?
And does it have tests? And we're not really cracking open the code to go
much beyond this. So what that means is
that we're sort of not aware of what the code
may be doing. And so we built a tool at socket
to help with this problem so you can quickly at a glance get an idea
of the security of a package. And so this is what it looks like.
So you can go to socket and look up packages
to figure out what behavior the package has. And so in
this example here, you can see that this package contains install scripts and that's
called out very prominently on the page. So that's the first thing whats you see.
And this package also happens to contain binary
or native code, which means that it's not easy to audit the
code. It's not like human readable.
And so both of these issues are called out. And in this case it's not
necessarily, and this is not a supply chain attack by any
means, but it is nice that this is called out very prominently so
that you can make an informed decision if you want to use this package or
not. You can also see that we have very helpful quality
scores that show up at the top of the page as well.
Now let's take a look at another example. So this package here, angular calendar,
is quite a useful package. It's a calendar component
that shows up on the page and renders a little
calendar. But if you dig into
its dependencies you'll actually find that some of its dependencies
are doing quite invasive things. So here you'll
see that one of its dependencies contains install scripts.
It also runs the shell scripts
and accesses a file system and accesses the network.
So this is probably not something that you would
expect a component, a web component to be
doing. And so it may be worth a little bit of further investigation
to figure out what's going on here before you use this package.
The other thing that we do that's quite cool is we can highlight
when packages do these things and put
that directly in line in the code. So in this package here, I opened it
up to take a look at the files, and I could see here that the
module is accessing the network as well as accessing environment variables.
And I can see the exact lines where the package is doing each of these
things. And so it makes it a little bit easier to get an idea
of what a package is doing before you run it.
So if you want to research packages on socket before you use
them, this is the URL you can use, and I highly recommend you take a
look at some packages there and use that information
to make an informed decision before you select a package.
Okay. The other thing you can do is think about updating
your dependencies at the right cadence.
So what do I mean by this? So there's
a question about how quickly you should update your dependencies. And this is actually a
question we struggled with on our team as well.
So you can think of it as should we update slowly or
should we update really really quickly and aggressively. If you update
too slowly, you're exposed to known vulnerabilities and
you're running code that's old and that may have issues, may have some
bugs that have been fixed in the newer version. And so there's some downsides to
updating too slowly. On the other hand, if you update too quickly,
you expose yourself to supply chain attacks because you're now running code
that may have been published literally yesterday or
in the last couple of days, which means whats, you haven't had that many eyeballs
able to look at the code. And so as
you think about security, you have to balance this
trade off. And there really is no perfect solution here.
It's just a hard problem. Another idea
is to audit every dependency. So if
you're building a truly security critical application like we were
doing with wormhole, then one option
is to literally read every line of code of your dependencies.
So if we chain put this on an axis
of starting from full audit on the one hand
reading every line of code to yoloing, on the other hand.
By yoloing I mean like doing nothing. How closely should
you audit your dependencies? And what you see here is we're in the same situation,
we have trade offs and really no good solutions. So doing a full
audit is something that only the biggest and
richest companies seem to do in practice. It's a lot of
work. Usually you need to have a security team looking at every one of
these packages, and they also have to approve them one at a time and add
them to can allow list, which is really slow. And this
is expensive just because of the time and the effort that it takes.
On the other hand doing nothing and just installing
whatever you want without even looking at the code. Whats its downsides?
So it means that you're vulnerable to supply chain attacks. It's risky,
and a breach or bad security press
can be expensive, especially as regulators start to crack down on
this issue more. And so this is another difficult trade off.
What do you do? And most teams I think, err on
the side of doing nothing, but I think
this is just a hard problem. So one thing that we tried to do when
we were building wormhole is to sort of think about a happy
medium. Is there a way to use automation to
kind of do something in the middle? And so
what we want to do, and what we ended up doing is using
automation to automatically evaluate all of our dependencies.
So we could use static analysis to look through packages to
try to find malware, hidden code, typo, squatting attacks and
this kind of thing. And that way we could manually audit
only the most suspicious packages so we could spend our
limited team resources looking at the code for
the most suspicious packages. And that's the most high impact way that we
could spend our time. And so this seems much better to me than an all
or nothing approach where you either audit everything or you just hope for
the best and look at nothing. And then the other thing we wanted to do
is make sure that the security information was shown
directly in pull requests, so that the developers on
our team were empowered to solve the security issues that they saw
before they deployed into production. So what does this actually
look like? So this is the bot that we created. It's implemented
as a GitHub app that you can install on your
GitHub repository. And whenever it sees that
the package JSON file or the yarn lock file has been modified,
it will take a look at the new dependency that's been added and it will
run a full health report against that dependency.
And if there's any issues found in it, it will leave a comment
with whatever the issue is that was discovered.
And so that way the developer reviewing the pull request can
look at it and have their attention drawn to this potential issue.
In this screenshot here, you can see that I accidentally
installed the package browser list instead of browsers
list, which is actually a very easy mistake to make.
And actually for that reason, browser list, the typo
package actually has something like 700,000 downloads
a year. So this is really, really helpful. This is the kind of
thing that augments your review process,
and it's very low cost since it only
raises issues that are really worth your
attention. And it runs automatically. So if you want
to actually try this app out, we've actually published it for anyone to use.
It's free, so you can install our GitHub app by just going to socket dev,
and I recommend you give it a try and let me know what you think.
It has a bunch of cool features, so it actually can
block typo squats, which as I just showed
you earlier, but also can block malware, detect hidden code,
detect privileged API usage such as the use of
file system, network, child process, et cetera. And also
it can detect suspicious updates. So these are updates
that significantly change the package's behavior.
So we have a whole bunch of things we look for
in packages. We actually have 70 detections in five
different categories. So we have supply chain risk,
quality, maintenance, known vulnerabilities and license.
And we wrote, basically these are just all static analysis rules that we
wrote. You can kind of think of this as a linter in a way.
So it's sort of looking at the packages code and
then looking for these different problems.
We tried to focus all of the rules on problems which
are something that you as a user
of the package really want to know about and not things that require
a lot of knowledge of the internals of the package.
So the things that it finds need to be actionable
to you as the developer choosing to use this package.
And so that's what we tried to do in our rule development here.
So yeah, if you want to try this out, if you want to poke around
our website and look at these different issues, you can try it out at socket
dev. And we have
made it free for open source forever. And if
you have a private repo, it's free while we're in beta.
And I really do want people to give this a shot
and share their feedback with us because this supply chain
security problem is big and only getting bigger,
and I really do want the community to share
their feedback with me on this. I think together we can
really do a good job improving supply chain security in
2022 and making 2022 not the year
that the supply chain is destroyed, but rather the year that
it's protected better than ever. So please share your feedback
with me. There's my email and my twitter, and also we're hiring at
socket if you're interested in working on this project and
helping to secure the software supply chain. Thanks for
your time.