Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome to our talk. Today we would be talking on supply chain attacks
focused on NPM attacks and remediation of such supply
chain attack vectors. My name is Dhanish
and I'm a security researcher playing around and working in cybersecurity
for last eight, nine years. And we have done our cybersecurity research
on different open source attack vectors,
especially related to NPM, hard coded secrets
and a lot of other such scenarios.
We also have been invited to conferences like
Blackhead, Sascon and some other global
conferences. So this is the research we
done last year and we are going to present it today to you
guys. First of
all, the disclaimer is that anything that
is presented in this talk is not meant to be illegal, unethical or
malicious in any way, and we expect the same from you.
So keep that in mind.
So first of all, supply chain, traditionally, supply chain means the
involvement or a network of suppliers, raw materials and manufacturers
to produce an end or a final product and then supply
that to the final consumer. That could involve people,
entities, information resources and activities. So this is
the traditional supply chain, right? For example, take an example
of this delicacy, right? If you look at it,
not only the chef made it, but also the involvement
of different ingredients, whether that was butter,
honey, pistachio, other ingredients. If any
of those ingredients go bad, the final
product or the final consumer could be affected.
And same for the car industry.
For example, the car is not manufactured in a single
manufacturing unit. Different parts get outsourced from different
countries, from different factories. So similar is
for the software. Software is not developed
or not coded, you could say in a single in
house team. There are involvement of dependencies, binaries and other components
to prevent the reinvention of the wheel,
right? If we look at this software development
lifecycle, it looks like this and there is
a huge involvement of dependencies into it. And we would
be talking on the risk factors that are linked
to those type of third party dependencies.
And there is a famous proverb that chain is
only as strong as its weakest link. And this is a little bit
of a meme that is going to be very
relevant in the upcoming slides.
So security issues we are going to discuss today would be focused on dependencies
today. So there are different type of supply chain
attacks and different type of supply chain attacks. Scenarios or
possibilities. First of all, if you look at the vulnerabilities, if for example,
if you are going to use a third party code, and what if that code
does have a vulnerability? For example, there was a huge
case of log for shell in the past, right? If you are using a
third party dependency and that third party dependency is vulnerable,
so that vulnerability inherently comes into your own code base,
into your own production application. And then we have a type of
squatting. It's like a mimicking the name of a trustworthy package to
fool or to
trick the developers to trust a malicious package, for example,
right? Then we have repo checking tags,
could claim repost is username. When an actual person changed the name,
it's similar to subdivision takeover. Then we have account takeover.
The focus of this talk would be on account takeover and we would let
you know that how that is effective and how that works,
dependency confusion and the effectiveness of these type of attacks is
that security researcher was able to breach Microsoft, Uber, Apple and Tesla to
make a point, right? And this is another example from
the SS that 500,000 systems were
affected because of our supply chain attack vectors and obviously log four
shell. All of these companies were affected one way or another by
the log for shell. So coming to the node package
manager, actually the node package manager is the world's largest software registry
versus the software registry. Software registry is a platform,
is a solution where third party libraries or
dependencies or code snippets are placed
for others to use in an open source scenario.
And JavaScript is the most used language
for last 910 years according to stack overflow.
And we are going to focus on the Javascript dependencies and
NPM dependencies on this research and this tag vector.
Let's take an example of a package that stays on NPM. If you
look at it, this is a very famous NPM package called exprs.
You can see there are 31 dependencies of it.
Like Express is dependent on 31 dependencies and these
are the dependents. For example, these packages
are dependent on exprs. So what does that essentially mean?
We would have a visualization of that in the future. Slides NPM
packages are used by developers on a regular basis, obviously. And there are maintainers
of those packages who could push out updates. What that essentially
means that any package or a third party dependency or a code
that stays on NPM as a package have a maintainer
or an open source contributor or multiple of those maintainers
that can push out updates and stuff like that. This is
a snapshot or the screenshot from the NPM
from the last year, and we can see that this number of packages were there,
download numbers were huge. So you can get an idea of
how widely the NPM dependencies are being used in
the real world. Let take
an example of the similar same package called Express.
And we would now visualize that how the web of dependencies
looks like. For example, this express is dependent on
a lot of dependencies and those dependencies are dependent on other dependencies
and stuff like that. So if you are using a dependency, you are not only
depending on that, you are depending on a lot of other dependencies, essentially.
And that is basically a supply chain that if any one of those
gets bad, you are going to get affected.
So let's move forward.
As I've already mentioned, there are maintainers on NPM.
And what if the accounts of those maintainers get hacked?
Take an example of a Facebook. There are packages on Facebook.
What if the admin of that page gets hacked, right? Obviously that
page would get affected. And similar to NPM, if the account
of the maintainer of NPM package gets attacked or hacked,
that NPM could get affected, right? If we
look at the possibilities, there are two common possibilities. What if their email addresses
are takeoverable? We would get to that in the future slides. And what
if their passwords are leaked in some breach? In both the cases,
attacker could obviously take over, then pivot out and mess
up with the code. So moving forward,
this is a little bit of a workflow. That package is maintained by maintainer and
those maintainer could make changes as already mentioned, and maintainer accounts are linked with
an email address, like an example of social media and
other accounts. Obviously in NPM there are accounts that are linked to
email addresses and those email addresses are obviously linked to a
domain or a mailbox, for example. So what
if these domains get expired, for example, right?
If the maintainer or developer is using some custom domain.
What if that custom domain gets expired? We would look into the
possibilities. So last
year attacker was able to take over
NPM library that had 6 million downloads.
To make a point on how significant the takeover of
a maintainer email or a maintainer account is on the
security of supply chain.
So let's take an example of a package that has 36,000
dependent projects, for example.
And that package is obviously on a software registry account,
which is NPM. That NPM account have an email address of maintainer
and that email address does have a domain, for example.
Most common ones are Gmail and stuff like that. But look at the
custom domains. What if that
domain gets expired? Obviously, attacker could take over that expired domain
and then that email address and then can reset
the password of the software registry, MPN and then take over a package.
And then those 36,000 projects can get affected.
How the attacker would actually do that.
Attacker would just look at the maintainer of the
package and then pull out all of the
email addresses of the maintainers that is available on NPM and
then would look into the who is data for all the domains of those maintainers
and see how many domains or if any domain
is expired, he would just buy that domain and
yes, claim the mail inbox and just
forget the password on NPM software registry and create malicious
updates of those packages to affect anyone that is using the packages,
right? So last year that attack was on peak
or there was a boom of that attack, but there were
no defensive strategies, even manual or automated
wherever you googled it or stuff like that. So we were able to
be on a spotlight to spread the awareness on how to
find NPM dependencies that are vulnerable to account hijacking.
And you chain secure your ecosystem from that type
of attacks, right? So manually how you
can prevent those? You can prevent those manually by listing
down all the packages that are in your company.
And you could have the log JSon
file package log Json file
and stuff like that to pull out all the packages that are being used in
your project. And then each package for
this command npm view package name here and then maintain an
email here on your maybe personal computer
to find out the maintainers of those packages one by one. If you do that
manually, obviously that is not effective. But yeah,
and then you can just take out all the email addresses and separate out
the domains and then look out the who is data of all the domains of
the packages of the NPM that maintains
that you are using in a project. Right. And then identify
the vulnerable ones.
But that's not effective. So we found out the automated ways because
mostly hundreds of packages are being used by a single organization on a project.
So it's preferable to have some type of a crone job and automation to
do that on a regular basis and not just copy paste
a command and do that manually one by one. So we
scripts a mini tool that you could use in your pipeline to
look for the takeover NPM packages in your code base
to just get rid of them or to just turn off the auto updates or
to be vigilant. In that case you can install that
automation based script from here. And yeah,
use that. You can use this command after
adding your packages and package text file and then
use it to find out the vulnerable ones.
So now getting to the real jewel,
we thought about the thing that how
much is the effect of this
vulnerability? NPM attacks vector on a global level so
what we did, we did gather
packages from different publicly
available sources. All of the packages, essentially around
all of the packages that were available at that time, NPM packages.
And then used our in house servers
and made some scripting and did some research
to find out how many of those packages are actually vulnerable.
And we are talking about millions of packages that we have done the research
gathered from different available sources. So now Hassan
would come and he would present on how we did that and what
we found out. And that's the most exciting part of
this talk. Hassan, can you please come? Yeah, sure.
Dhanish, that was really grateful insights.
Now let me share my screen so we can jump into the research.
Dhanesh, can you give me permissions so that I can share my screen?
Yeah, sure. Yes.
Now you can do that. 1 second.
Okay. Danish, can you see my screen? Yes,
it's perfect. Okay. Amazing. So,
yeah, as Danesh mentioned that we are going to focus on
the at scale research that we have performed. So before I jump
into the research, I just wanted to quickly introduce myself.
I am Khan and I've been a security researcher, security engineer.
I have multiple cvs under my name. I've got a chance to
present the supply chain attacks and its
research into multiple conferences like Black
Hat, the Sascon, Devsecon and other conferences as well.
And I really love to perform mascan
at scale. And this is a QR code for the LinkedIn
if you guys wanted to connect. So, yeah,
about this research. Initially we started with.
At that time of research, we collected all of the NPM packages.
And at that time we have 2.1 million NPM packages
available on the NPM registry. We used
multiple technologies and multiple scripts
to extract the email addresses from these
NPM packages. Because this research was account takeover
vulnerability, as Danish has explained in
his previous slides that we did extracted the email addresses.
So when we started performing the extraction, we came up
with 6.7 million email addresses.
And this is just a graphical representation that
literally shows you from the step one that we collected packages.
And from these packages we collected 6.7 million email addresses.
This is a script, a Python script that has been used and it's publicly
available.
It uses NPM public API to extract the email addresses
from the packages. And of course,
when we extracted that email addresses,
it was really obvious that multiple packages were
being maintained by a single person that has an email address.
So we started sorting out the email addresses and we came up with a
number which is like 600k emails, which are unique
email addresses. So in this representation, you can see,
we collected packages then we extracted email addresses. And from email addresses
we collected the unique number of emails. And then because
we have to look for the expired domains because to
take over an account you have to claim that
expired domain and then you have to register into NPM registry.
We extracted all of the domains from these email addresses and
we found out there were like 132k domains initially
in this research.
Upon finding out the unique domains
we came up with the number one thirty two k and when
we started looking into the expiry we used multiple resources
including APIs and who is extraction of these expired
domain. And we came up with the number 675 domains which
were actually expired domains all over the
NPM registry. From this perspective of
the research we can see we started with the number 2.1 million NBM
packages but now we are going down to 675
domains only. And let me add one more thing
here. This research is going to be a two way research because in
initial phase, in first phase of this research we're going to
extract packages and from packages to we are going to extract expired
domains and when we start doing the reattribution
we're going to attribute those domains with their email addresses and
then we are finally going to identify vulnerable packages.
And a special thanks to one of my colleagues, Yelp for
helping us in finding out the expiration of several
domains and defining the procedure of.
Yeah, this is the whole procedure that we did for the extraction of
the expired domains as
you can see. So now we are onto the part
of the reverse. We can say from
domain attribution to email attribution and then we find out
that there was literally 845 crumbs
of separate unique email addresses that has been used or has been
utilized with these expired domains.
These are just the graphical representation of the complete process
which explains that we started with the packages and then
we went down to the expiration of the domains and
then we started the research from the back and then we
attributed those domains with their email addresses and now we are onto the
path of the attribution with their vulnerable packages.
So before we jump into the conclusion of how many vulnerable
packages we identify, let's look into some stats and some fun
and very impactful stats. So if we divide total
number of email addresses with the unique email addresses
we get the number eleven. This means on average
eleven email addresses are being used in a single NPM
packages. And here in this screenshot
you can see let's look at the first sample.
We can see it's like 3800.
And then we have an email in front of front of it.
This means a single email address has
been utilized in these amount of packages.
So just imagine if this email,
which has a domain, if it gets expired and someone just
claim it, then it's literally going to affect like 3000
plus packages. And if you go last,
in the last number, you can see we have like 9000 plus
on a single email addresses. And this number is literally huge.
So just imagine the impact here,
how much of impact one
expired domain can have on NPM packages.
Another quick math
we can see if we divide this eleven that we extracted before
and we multiply it by 845,
which is actually the number of the unique
email addresses we found, we come up with the number
like 9499. And this actually
represents the total vulnerable packages
that has been found. But when we did the actual
research, we came to know the total number
of vulnerable packages was 2843,
which is really small number. Again,
we know that we started with a huge number which was actually 2.1
million. And now we are come down to the number like 2000 and
something. I mean if you are researching and you're
doing your research and from this perspective of research, you might be thinking,
okay, this research has no impact, the number is very low.
But now let me show you some really good stats
that can show how impactful this research is. So if
we look into the total packages, we had like
2800. And if we look into the dependent
repos, then we can see there are like 250k dependent
repos. These packages have 250k dependent
repos. And as we have talked about, every packages
has multiple dependencies in multiple dependents. So if one
package affected, it's going to affect other ones as well.
If we look about the dependent packages cumulatively
on all of these vulnerable packages, then we come up with the number ninety
three k. And if we look into the folks and contributors,
then the numbers are really astonishing. If we look into the folks,
we come up with the number 400k, which means literally there are 400k
people who have actually cloned these vulnerable packages.
Or when you folks something, it gives you an idea that that code might
get used into the other users computers.
And if you look in the number of contributors, you can see 50k
people are actually contributing in these packages. So the number is huge.
But if you look into the vulnerable packages,
the number, it's 2843. I mean the number is really small.
But when we look into the impact of these, look how many
folks, how many contributors, how many dependents are these packages are
actually affecting right now? So yeah, this is really huge.
There are million of downloads are happening around on
single NPM packages. If we just look into the
other packages, like NPM package and maybe
express package security
packages. You guys see
many?
Some. Hassan,
can you hear me? Yes, I can hear you.
Hello, chain, you hear me? Can you hear me?
Yes, I can. Can you hear me?
Hello, guys. So hold on for now. Hassan would be joining us in one
or two minutes and yes,
then he would be continuing with
the remaining part. So no worries.
Welcome back. Hasan. Yeah, Danish, can you hear me? Yes,
that's perfect. Now let me share my
screen once again so we can continue.
Danish, can you see my screen? Perfect.
This is the slide we have to continue, right?
Yeah. Okay. Amazing. So,
yeah, we were talking about the impact of this research that how
this research can affect. So as you know that in this
research we extracted the
email addresses. But as we know that email addresses actually
can be found in many terms, like dark web dumps or data leaks,
et cetera. So what if these emails have
been into the data breaches and these
are actually being leaked? So just imagine if
these leaked credentials are being actual NPM credentials or GitHub credentials.
So, yeah, the impact is really huge here.
We did not only research NPM. Yes, we also
did a research for ruby gems as well.
We extracted all of these ruby gems.
Initially it was one hundred and sixty k. And for this research
we did not just downloaded all the gems,
we did something different. We started scrapping the
packages that are publicly available on
the Internet. For this, we used multiple resources,
used multiple resources like GitHub, BitBucket, GitLab and other
resources, and scrapped all the public available gems from
the Internet. And this was the process that we have used.
We scraped, we extracted the gems, and then we identified
dependency confusion vulnerability on
these gems. This research has a very tricky part,
because for the extraction and for
the identification of dependency
confusion, we used multiple scripts that are linked
in below, and we utilized several techniques
and we created a vulnerable ruby gem that has been the
part of dependency focused vulnerability. And once we have this
script, the hardest part was to extract or accelerate
the data from the vulnerable gem.
We used multiple techniques, we used Burp collaborator
with, you can see Nslookup, who am I,
and hostname commands. And we extracted as much as information to
collect further exploitation of the packages.
And the fun stuff. And the fun part
of this research, we just tested
a very small chunk of gems. It was like 1700
gems was scanned, and out of these we found out
like 285 gems were vulnerable, which is actually 16%
of the gems were found out vulnerable at that time. So just
imagine like there were total number one hundred and sixty k and we scanned
only 1700. So just imagine this percentage will go more
up and how many other packages could be origins
could be vulnerable right now to dependency confusion attack.
So this is the script that we
have created and used for the identification of dependency
confusion vulnerability. Yeah. So another
tool for another problem for gems, we created this
tool. It's called a gem scanner. If you have dependencies
or gems in your code, you can use this tool. It will identify
vulnerable packages and outdated packages and
it will output on the terminal. Excuse me,
this is just an example of the output
of the tool that has been used. As you guys can see,
we have some labeled with the already
on the latest version and the current version which identifies
that. We have to update these gems. So we
have talked about the problem of account takeover and the vulnerability
and dependency confusion. So what are the solutions to
these problems? First of all, MFA, MFA has
been around for many, many times now. NPM and Even Ruby
maintainers, Ruby gem maintainers are also implementing these
type of protections. They have started this MFA
enabled from the top packages
and now they are actually implementing this to the other packages
as well. Other solutions what we have to use
we literally have to keep an eye on the
latest updates of the packages. We have to keep an eye on
the cvs and the latest security patches
that these dependencies are actually having. We have to
perform manual audits and use automations even
in CI CD pipelines to protect
our infrastructure from these third party code. And what
I prefer is to use some validate checksums
of these packages. So you only know what type of
code you are actually importing in your code. And of
course we have to mature our CI CD pipeline
in our pre commits.
We have to secure development lifecycle as
well. And these are some solutions that
can help you guys. For example, if you are using Ruby
infrastructure, Ruby on Rails MDC, you can use dependable.
With GitHub, you can use
bundler audit, you can use breakman to identify vulnerabilities
in dependencies. If you are using node
js, you can use NPM audit, node js, scan retired
js for the integrity and find
out the vulnerabilities and dependencies, et cetera. And the other
tools obviously can be used as per your infrastructure.
And if you are looking for the commercial solution, we know that
SBOM has been standing out
into the market right now. It's really to the peak
and it's definitely to have an S bomb into
your organization to protect you from such attacks.
And this is one of the SBOM solution that
has been really good into the market and can be
used in your own code as well for the protection from
these open source attacks. And I
was reading the news and I just came to know that it was made compulsory
to have an SBOM solution into your own company. And this
act was being made by no other
than Joe Biden. So, yeah, I think that's
pretty much it about this research.
So if you guys have any questions, you can
reach us out on LinkedIn or any other
platform you would like. Yeah, so that's all from my side.
So, Hassan, can you move to the last slide?
So if anyone want to connect with us, they can just scan
the QR codes. The last slide after this one.
After this one. There's no slide after this one.
Okay, I just updated that
on my slides. Let me just recheck that.
Okay. Let me just share the screen
for the sake of audience. If they want to connect with
us, and they can just do that
effectively.
So if anyone want to connect with us, they can just scan
these QR codes and they can just come to our LinkedIn and
ask any questions or just simply stay
connected. On the left, if you scan
the QR code, you would go to my LinkedIn. And on the right, if you
scan that, you would go to the Hassan's LinkedIn and stay
connected. Okay, Hassan, do you have anything to
share after this one? Actually,
I do have. If anyone
is interested in our upcoming research or the research that we have already
done, we done the
scanning or the at scale research of hard
coded secrets that included
AWS private keys and stripe private keys and a
lot of other private credentials in the open source landscape.
And that included the WordPress plugins, that included NPM
packages. And we actually scanned all
of those packages and all of those plugins for these type
of secrets to find out the mistakes of developers
when they just publish the hard coded secrets in the
public code. So, yeah, stay tuned and you
may be able to look into our research someday in
other conference or maybe this one. Yeah, this is all
from my side. Same. So,
guys, it was really nice to have us into this
session, and I hope you guys find it really productive
and really insightful session. And again, if you have any questions, then feel
free to reaches out. Thank you.