Your trusty Python package: TTPs of attacks on OSS in Python

Video size:

Abstract

What can be better than another talk on supply chain attacks? A talk with a live demonstration of TTPs. Attacks on OSS are not new and they will stay with us, so why not to learn a bit more? In this talk we not only reiterate on what is known, but how it works and how to protect ourselves.

Summary

Tactics, techniques and procedures of attacks on open source software in Python. The code I'm going to show you today can be weaponized for malicious purposes. Use only for educational purposes, and please always do your own research.
In 2023 we have at least four events that were widely discussed in the industry. Python supply chain attacks are targeting Windows developers. For security professional, there's no better way of learning how to defend something than trying to breach those defenses.
There are several ways how attackers can compromise your supply chain when it comes to the python and python packages. Four common reasons are initial access, perimeter bypass, data exfiltration, and obviously ransomware. Today we're going to mostly focus on the individual techniques.
We are moving to our first technical demo, which is a starjacking demo. In less than five minutes we basically performed a complete selling of the rating of this project and we reused that rating to make our malicious package look like a legitimate option.
Attackers use encoding encryption, bytecode and embedding binary executables inside of your Python packages. At the end of the talk, we will come to the example of remote access triangle DNX exfiltration, proxying and tunneling of the traffic. You have to consider many, many things to successfully detect them and protect yourselves against them.
We executed six methods from the intentionally malicious package. As you can see on the receiving end, the end that imitates an attacker. The first three are quite commonly known and the exec is very well tracked by many SAS scanners. But why other functions are not detected by why other obfuscation techniques aren't detected.
Most common way is when the payload is invoked at the stage when packages imported. External sandbox or malware is executed either from setup PI or through the init PI. Let's run another demo to see if it happens again.
All right, so Sam grab was finally executed. Defender did a really good job on detecting it in dynamics. The most fun part, exfiltration and command and control. Infostillers and remote access drones are very prevalent.
We downloaded malicious package, we installed malicious package and as soon as it was imported, there was some data collected locally and sent outside. It's either sort of like opportunistic type of attack where attacker relies on sort of lack of competence on the receiving end or they just bypass the static analysis.
A small triangle rat was embedded in the malicious package. It spins up a flask instance on the victim's machine and populates this endpoints. All the information requested was sent back to the attacker. Ngrok is actually a legitimate product. But you have to be careful.
You can use individual development sandboxes. Review project details and reputations so as we demonstrated earlier. Avoid projects that are not published in PYPI. Restrict direct downloads of dependencies. Use proxies. And now we are moving into defenses.
Thanks to Bunny Road CTF team and flat icon. com for the graphics. Please use one of these two QR codes. I will release all these samples of the code so you can go through them and play with them on your own. With this, I will wrap up this talk.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

You. Hello everyone. Welcome to can 42 devsecops 2023 event. My name is leonidakinin, and today I'm presenting you your trusty Python package. Tactics, techniques and procedures of attacks on open source software in Python. And yes, as you can see from the title, today we're going to talk about supply chain attacks in Python first and foremost. Before we proceed forward, an important disclaimer. The code I'm going to show you today can be weaponized for malicious purposes. So if any materials will be misused. Neither me nor my employer are responsible for any liabilities or damage caused by the misuse of this material. So please use only for educational purposes, and please always do your own research. So we have a lot of ground to cover today, so let's go briefly through the contents. First and foremost, we're going to go through why this topic is important and why is it actually still relevant in 2023. We're going to go through a really brief, quick history of supply chain attacks, and then we're going to proceed to the actual demo of various techniques. We will obviously cover defenses, most common defenses we can use against this type of threats, attacks. And last but not least, we will go through some credits and references that are quite useful in this demonstration. Let's get started. So why this topic is important. As you can see, in January there's a very well known and famous machine learning package was compromised in supply chain attack. In particular, there was a dependency that has been compromised following by that event. In May, there was a temporary suspension of new users and projects at Pipi because bit was a massive influx of malicious packets. Then in July there were six quite dangerous malicious packages published that were targeting specifically Windows users. It is actually quite common these days that Python supply chain attacks are targeting Windows developers Windows users because the more comfortable Python ecosystem becomes on the Windows platform, the more malicious code we will see in this environment. And last but not least, the VM connect supply chain attack that was discovered some time ago. It's still up and running, and there's more and more packages been published. And there was one particular package that was targeting the VMware products. So as you can see, only in 2023 we have at least four events that were widely discussed in the industry, and no one knows actually how many more events there are in the wild and how many of them were actually disclosed and how many of those attacks happening in the background and completely hidden and undetected. And most importantly for me as a security professional, this topic is very relevant because I always consider that to learn how to defend, you need to learn how to attack. And if we refer to these three quotes from famous historical figures of different eras in different time, the active defense or offense used as a defense or offensive operations used as means of developing defenses is not a new concept. So I personally believe that for security professional, there's no better way of learning how to defend something rather than trying to breach those defenses. Let's go through a really quick history of supply chain attacks. First, supply chain attacks, they were dated back to 2017. At that time, most of them were seen purely as opportunistic, low success rate, and people didn't really pay much attention. So initial campaigns targeted docky, hub, NPM, and Pipi. So all the usual victims of in 2023, they were already exploited back in 2017. Solar wind attack. The infamous solar wind attack probably was the first higher profile attack that really drew attention, and people started thinking about, okay, we have to do something with it, if you look at the right side of the slide. So colleagues from reversing labs and colleagues from sonotype, they collected a lot of really useful information, as we can see on both of these graphs. So starting from 2020, there was quite a steady growth. And if we look at the bottom chain, it says 742% of average growth rate year over year, and that was between 2019 and to 2022. And as we can see on the top chart, while there's less malicious packages being published till the end of 2023 in PiPi, there's more and more stuff going on in the NPM. Although pipi attacks are still very, very much relevant. And nowadays, researchers and malware analysts, they observe techniques when pipi is solely used as a dropper. So pipi packages used solely as droppers for malicious code that's written in other languages. And I believe there was even example of like a JS malicious code running inside of the pardon, not running, but delivered by the malicious piper package. Also, supply chain attacks became one of the favorite vectors for the major apts due to the traditional lack of control over development environments. Especially in the era of flexible working, where the bring your own device policy is quite common and some companies do not invest in endpoint protection. It is quite a common scenario that one development workstation that is poorly protected can become sort of a be on the crack in your defenses that an attacks need to get through. And so pipi attacks are seen to be used alongside with the phishing campaigns when there's advertisement of a particular malicious product or project. And so, yeah, apts are actively using it. So it's not just phishing anymore. But yeah, supply chain attacks, also seen as a means for the initial access and attacks are ranging from opportunistic, where just a bunch of packages published out there, and attackers just sitting and waiting for low hanging fruits, or people who are not cautious enough or aware enough of these type of dangers or attacks can be tailored towards specific organization. And that normally indicates there's been a lot of reconnaissance done in advance. So attackers really investigated what type of technologies a particular organization is using so they can target specific projects and dependencies. So let's get started with the main part and tactics, techniques and procedures in supply chain attacks. So first of all, for those who are not familiar with the term, with the acronym TTP. So TTP is an acronym developed by the Mitre Corporation. It stands for tactics, techniques and procedures. So tactics is basically why or the reason why attacker performs the action. And here we can say the four common reasons are initial access, perimeter bypass, data exfiltration, and obviously ransomware. And they can be mixed and matched. So it really depends on the particular apt group, the particular attacks, and the particular campaign that was targeted against the organization or a group of users. So techniques is basically how an attacker performs the action. And here they're uploading malicious packages to repositories, they utilize typosquarding, they utilize starjacking, and they obviously inject malicious code through dev credentials compromise. It's quite a common scenario when credentials been reused, they were leaked through some secondary data breach, and then those credentials happened to be credentials from the PIPI account. And so this is how the common infiltrations are happening. And procedures is basically a step by step application of techniques. So today we're going to mostly focus on the individual techniques rather than this step by step application or like a wholesome procedure when it comes to supply chain attacks. So we will discuss and we will focus on things like supply chain compromise, so how things can get compromised. Some malicious code ends up in legitimate supply chain sources that we're using common defense evasion. So we're going to go through a payload obfuscation as technique for defensive agent, and we're also going to go through the traffic obfuscation. Specifically, we're going to use an example of tunneling. But I also mentioned DNS exfiltration. So the raw materials in the references part of this slide deck, please go through them and read about DNS exfiltration because it's a really common way of how data is living your protect environments. Then we're going to go through installation and delivery. So sometimes it is installation. First delivery, then sometimes it's delivery, then installation. But we will look into entry points where you can expect malicious payload to land in your environment. Last but not least, we're going to cover two examples when we're going to go through a data exfiltration example, which will be a very basic infosiller or credentials harvester. And then we're going to go through a command and control example where I will demonstrate a very rudimentary rat or remote access triangle. First of all, supply chain compromise. There are several ways how attackers can compromise your supply chain when it comes to the python and python packages. First and foremost, public project repository infiltration, stuff like transfer of ownership. If developer of a project is tired of supporting it, and then the project was transferred to someone else claiming they will be legitimate contributors, legitimate owners. It's one of the examples how project can be infiltrated. Also official channels of contribution, especially when project is poorly managed and there is a big rotation. Retention, sorry, not retention, but attrition rather of the contributors. And the retention rates are low. When people are changing on a monthly basis, it's very hard to track who is contributing what. And this is one of the ways how malicious code can land in a legitimate project. Dependency infiltration, obviously. Why targeting main project if you can infiltrate dependency, especially if that project is poorly managed. So that is a common thing, as we've seen in one of the earlier slides, with the compromise of an ML package, this is what can happen, and this is how your project can get compromised and project that you are using can get compromised. Attacks on private pipi service and proxies on one of my previous engagements I've actually observed that a company had a private pipi proxy just sitting in the public, widely open. And the good thing that the software they use to host their Python packages hasn't been vulnerable. But imagine that if that would be a vulnerable pipi server widely opened for the Internet, then we could see quite an interesting chain of events over there. Also over permissive Python pipi repositories when you are allowing to override already published packages, this is very dangerous because this is how malicious code can land when legitimate version can be rewritten with the malicious one. And yeah, obviously vulnerable pipi service, as I've mentioned, is a good thing. That company I worked for didn't have any vulnerabilities in their PIPi proxy, but if they did, that would be quite an unpleasant chain of events then public GitHub, repos and FTP service. So sometimes I've seen and actually not even sometimes quite often, I see that there can be a GitHub repo that says hey this is a python package to just clone bit do pip install or hey we already pre built it for you, there's a tar GZ archive sitting in the releases, or we just host bit on the FTP server. We don't want to mess with Pipi. While this is not specifically an indication of a malicious package or attackers, this can be just someone being lazy with their build and release and publishing processes. So with those particular instances, I would be very cautious and I would go through stuff that we will discuss today to make sure that if it's not officially published in PyPI, it's safe to use type squatting. Really popular one. So if you type squatting Python Google, there will be lots and lots of articles. Although it's getting harder and harder for attacks these days as major software vendors like for example Microsoft, they started registering as stub packages. Basically, a stub package is a package that doesn't have any functionality, but it rather points you to the actual package that contains what you're looking for. And the reason why companies are publishing those stubs is because they want to reserve the name to prevent a type of squatting attack. And this is actually a very clever technique and I would highly encourage you if you're working on a project that is actively published out there in the public, please think about registering some stub packages so you can defend yourself like in advance. Prevent any attempts of type of squatting and starjacking. So starjacking is actually my favorite one. Starjacking is a combination of a technical attack and sort of an aspect of social engineering and playing with the psychology of developers and users where it utilizes, some people say technical flow I consider as a technical flow, but some people say it's been just by design in the piping system. And this is where you can make a reference to the GitHub source that doesn't belong to your project and effectively stealing rating stars of that repository. And with that being said, we are moving to our first technical demo, which is a starjacking demo. Obviously my Kali Linux was blocked at this point because we spent some time on talking through attacks. But hey, let's proceed. So let's go to the vs code and let's go to this package that is called pub IP info. And if we look at the setup PI setup py file that is used to basically it's a build backend for this project. We can see that there is a name of the project and there's a version, so let's increment this version because I've been testing it before, let's increment it 2.0. So it's a second major version, this package, and let's go back a few directories so we can actually get to a proper one. So let's go to sources. All right, so what we're going to do now, we are going to execute Python minus M build and while it's building we're going to go back and we're going to go to the repository and we're going to go to test Pipi. So the reason why I'm using test Pipi is because the production PiPi is actually monitored by different security vendors. So what we're going to do now is basically we're going to prevent my account from being blocked for publishing like intentionally malicious package. So if we're going to go to your project, as you can see there's already project and I believe we have a few versions already published there. So what we're going to do now, we're going to clean up whatever packages are there. Sorry, bit takes some time as I'm working in the virtual machine, so we have to be patient and there's a lot of processes in the recording running in the background. So if we just delete this version just to make sure we actually don't have any releases at all. So no releases found under pub IP info. Now let's look. Okay, we have successfully built the package and now what we're going to do, we're going to execute twine upload. So it will basically pick up the tarqz and the will and it will push bit to the test pipi and it's nearly there. Great. It's almost there. Check again. Brilliant. So this project was now published and if you click on the link, all right, as you can see we just published a brand new project. So if we click on the management panel it will show that it was released just 1 minute ago. So it's a brand new project. We deleted all previous releases. So if we go to the project itself. Sorry, we probably better use this link. Once again, apologies. So yeah, if we go to the testpipe.org project, pub ipinfo and this is the version that we just published, we can see that this project already has almost 5000 stars of rating, bit has nearly 2000 forks, it has a bunch of open issues and a bunch of open prs and it also has like a pretty legitimately looking readme description, the installation method and so on and so forth. So what actually happened here? So if we go back to the setup PI, as we can see, the URL is pointing to a completely different resource. And this resource is one of the example projects that's used by the documentation that explains how to package your python projects and how to publish them. And if we go to this link, yeah, these are the stats we are looking at. So what we just did, what just happened? In less than five minutes we basically performed a complete selling of the rating of this project and we reused that rating to make our malicious package look like a legitimate option. So this was starjacking. Now, next point in this talk is basically we're going to talk about defense evasion and the obfuscation. So why attackers want to obfuscate their payloads and their traffic? Well, first of all, they want to evade as many defenses as possible. And if their malware samples were detected, they want to make sure that it will be harder for reverse engineers, for malware analysts to get to the bottom of the actual payload that supposed to do something bad in our systems. So when it comes to the payload obfuscation, one of the few common things is when attackers using encoding encryption, bytecode and embedding binary executables that are written in different languages inside of your Python packages. Because in Python packages you can include arbitrary files, literally whatever you want can be included in the Python package. And when it comes to the traffic obfuscation that I will demonstrate at the end, we will come to the example of remote access triangle DNX exfiltration, proxying and tunneling of the traffic. So this is also very interesting and this is also an important part to be covered because supply chain attacks do not stop when you just install the package. So supply chain attack actually consists of many components, as we already discussed, and you have to consider many, many things to successfully detect them and protect yourselves against them. So with the payload obfuscation, I believe the best way will be to go through a demo. And what we're going to do, we're going to get back into the Kali Linux. We're going to check where we are at the moment. Actually it would be easier if we switch to the team accession. So we're going to go to the folder number one. By the way, when this talk will be released, I will also release the repository so you can get through all these examples on your own and just figure out how stuff works and play with it a little bit. So let's build the package. As you can see, bit has quite an obfuscated name called malware obfuscation. So we'll take a little bit to create the package. Nothing too crazy. We still have time. We are not even in half of this talk. Yeah, so we successfully build it. And what we're going to do now, we're going to run the installation of the package to make sure it's present in our system. Yeah, successfully installed. So for the ease of this demo, I'm going to run ipython in this part of the terminal. And in the top section of the terminal I'm going to start a receiver. And receiver is what we're going to use in all other demonstrations. And it will imitate an attacker that is sitting somewhere out there and just waiting and trying to basically catch the traffic from the attack machine. So we're going to do now we're going to run from Pymalware obfuscation, we're going to import the technics module. And inside of the technics module there is an obfuscation techniques class. And we're going to start running these methods one by one. So first one is executed. Then we're going to run the Unicode payload. Then we're going to run, I believe, encryption payload. And for the encryption payload, I actually need an encryption key URL. So I'll explain one by one what each and every method. Method does take some time. Yes, it's working. Guys, this is like a heavy technical demo, so hopefully everything will work from the first shot. But yeah, just in case, please be patient. There's a lot of fun stuff in this demo. All right, so we just executed 12345. I believe I missed one, which should be a combined payload method. So we just executed six methods from the intentionally malicious package. And as you can see on the receiving end, the end that imitates an attacker. We have some JSON data coming in and it basically contains breaking system name, version host IP and the username. And this is an example of a very basic data exfiltration. So to understand what's going on behind the scene, let's move back to the vs code and open up the obfuscation package and the techniques PI module. So if we scroll up in the doc string, there's a snippet of code that does a very basic thing, is basically imports Sys socket, other modules, starts a socket, connects to the receiver and then bit forms a JSON data that contains information about operating system and everything that we just seen in the Tmux terminal. And the first method that we executed was base 64 payload. And this string in base 64 the reversed base 64 string is this original code. So why reversed? Because there are scanners that recognize base 64. They do decoding of base 64 and then they do static analysis and they checking whether syntax actually looks like a programming language. And reversing base 64 reduces, not removes, but reduces the risk for attacker that the payload will remain hidden. And then when this base 64 was decoded and reversed back before that decoded and passed to the exec, we saw the first message, we saw the first part of the traffic coming into the receiving end. So the next method was Unicode payload, which was exactly the same code but transformed into the list of basically the Unicode numbers or numbers that correspond to the particular character from the Unicode table. And the reverse operation contained joining it into a string and then executing using the bit in exec method. Then the base 64 Unicode combined payload, it was combination of the first, the base 64 reversed, then transformed into this array. Array was also, or list was also reversed. And then the reverse operation was performed to execute exactly the same code snippet that imitates stealing some basic data from the system and pushing it somewhere to the remote location where our supposed attacker is sitting. And another one, which was called encryption payload, where I had to supply a paste bit URL, is basically an example of how attackers can use droppers. And droppers is basically a first stage payload or first stage malware that goes to the outside. In our case, it's pastebin grabs something. In this instance, we're not talking about the payload sitting in the external sources, but rather chain encryption key. And here attacker obfuscated the same snippet of code that we discussed at the beginning of this part. And it decrypts it using encryption key that is sitting in the paste bit. So this is a very neat technique and they can use it. For example, they can be several paste bins where one contains encryption key, another one contains the actual malware payload and so on and so forth. So they can fetch in payload from the second pace bin and decrypt it using encryption from the first base bit and then decode from base 64 or whatever they're using for additional obfuscation and execute bit. So this is like a combination of many techniques. Now to the fun part, bytecode payloads. So the first three are actually quite commonly known and the exec is very well tracked by many SAS scanners. So if you run Semgrep, Semgrep will immediately actually, let's run Semgrep just to demonstrate how easy to spot exec. But why other functions are not detected by why other obfuscation techniques are not detected. So I already played with Semgrep a little bit and yeah, it's going to take some time. And in the meantime, let's look at the remaining methods. So the bytecode payload, what's going on here is we have a code that is at the top of this file, that small infosiller that was pre compiled into the bytecode. So as you know, when Python interpreter is executed, what it does, it first of all does a syntactical check bit, then builds up the ast and then it's all transformed into the bytecode. And the bytecode is what? Executed by the Python virtual machine. And these pre compiled bytecodes, they can be embedded in the package and they can be invoked exactly in exactly the same way and imported as your regular module. So if we scroll up at the very beginning, I'm importing compiled, and inside of the compiled there is an exfiltrate function. And the exfiltrate function contains code that we reviewed at the beginning, and the last one is pretty much the same. But now instead of using the bytecode, we are using a beacon that was written in Golang. So we're basically embedding an executable binary into the python package. And this is what we just demonstrated, this is what we just observed. So let's look at the results of the semgrap. So Semgrap went through the directory where we are now, and it is through one obfuscation, and, sorry, that was the wrong one. We probably need to switch directories because there's a lot of stuff going on there. Yeah, let's wait for the Samgrep to execute once again. But we will actually use results of the previous semgrep for another demonstration. So it's actually good that we run it in advance. So Semgrap is running 100%. Great. All right, so let's look at the results. First of all, it detects the bacon. Go, obviously, because let's call it a plain text file. It's not obfuscated or encrypted. There is an exact detection in one of the techniques from techniques py, and that's it. So if we look back at the techniques that we just demonstrated, the bytecode payload wasn't detected because what Samgrep does, it does the static analysis. It goes through the syntax of the files and then bit tries to look for the patterns. And embedded binary is the same because it was pre compiled. Bit only detected the content of beacon that was in beacon go. So the source file that I've included so you can look at the source of the example, it was also detected. So as we can see, while Samgrep did a really good job, it was only the static analysis check and it wasn't decompiling the pre compiled bytecode and the bacon. So essentially we bypassed, let's say two out of six techniques successfully bypassed this protection, even though the first four are pretty much exactly the same. It's just how payload was obfuscated before it was passed into the built in exec method. And this is all for the obfuscation. So these are the most basic and common techniques. So by any means these are not perfect techniques and there are way more sophisticated methods of bypassing the protections, the scanners and so on and so forth. But this is what you can look up for when you trying to determine whether your package is malicious or not and you don't have any scanners to use. Now, installation and delivery, that's another important and interesting topic. So when I went through a bunch of materials from other researchers and I researched some of the packages in my own, I discovered that the most common way is when the payload is invoked at the stage when packages imported. So the init PI placeholder that makes directory a python module or rather than a package, I would say, yeah, a package. So you can place code in init PI. And when this package will be imported, whatever is sitting there will be executed and depends on the creativity of an attacker. It can go completely hidden for the end user setup. PI is another interesting thing because when you're installing the package, you can specify custom installation steps. And when those custom installation steps are invoked, you can place whatever arbitrary code you want, you can download additional droppers, you can execute ransom, so on and so forth. This is also a very, I would say it's probably the major vector of how the payload is invoked during installation and obviously droppers. So as we checked in the example of encrypted payload where encryption key was downloaded from the paste bin in the previous section of this demo, droppers can be a very minimal snippets of code that will go to the outside sources or external sandboxes as some researchers call them. And this is when the actual payload will be sitting. The famous sandboxes are discord, pastebin, telegram bots and anon files that was I believe closed some time ago. So first of all, by no means I'm implying that Discord is just all about hosting malware, but it's a well known platform alongside with paste bit and Telegram. And as far as I'm concerned, recently there was a change in how the URL to files are published in Discord CDN, so you can't host files forever. I believe the link is only valid for like 24 hours. But yeah, I need to double check because there have been some changes just because droppers are using discord as their external sandboxes. And as we can see in the diagram, attacker publishes malicious code, developer initiates the package installation, and when the package installation happens, there is an external sandbox. External sandbox or malware is executed either from setup PI or through the init PI, and it's time for another demo. So let's go to our ipython, let's drop this session, let's check where we're sitting now. Yeah, like I said guys, I will publish all code snippets so you will have a chance to go through them. Let's drop the receiver so we can have a clear picture of what's going on on the screen. And let's run the same build command as we did before. And while it's building, let's go through the project. So the project or the package rather, it's that pub IP info we used for the starjacking demo. And actually, yeah, we didn't have to rebuild it once again because it's already published. But never mind. So what this package does, when it's installed in the system, it will create an entry point, which is a console script that will execute the CLI. So basically this is a CLI utility, and based on the description, it says CLI utility that drifts information about your public IP. So it's a little handy CLi utility that helps you to determine what is the current IP address that was given to you by your provider. So, seems like a neat little package, and as we saw from the starjacking example, it has almost five stars of a rating. So why shouldn't we trust that package? What just happened? So when package build was executed, the installation steps were also executed. And as we can see, the imitation of an attacker just received the very first piece of information that was stolen from our system. So let's actually manually install this packet and see if it happens again. So as I said, sometimes it happens that you can find those targz on the FTP service or just sitting in the repositories on GitHub or on some other version control systems platforms. And sometimes the instructions contain stuff like hey, do curl vjet, download this archive and just do pip install and this is what can happen when you download that arbitrary package and you execute installation without scanning. So let's do once again, okay? And exactly the same result, because the installation ttps were executed once again, we have another piece of information stolen from the machine. Now let's go back to the ipython and let's import this PkG. Sorry, let's import pub IP info. Oh, as soon as the import step has happened, there was another piece of information stolen from our machine and sent to the attacker. Now for the last piece of this demonstration, I will put on the screen the Windows ten sandbox, and I will open in a non privileged mode the comment prompt, and I will show that the windows defender is running. So everything is up and running and everything is green. For the second part, I will switch the tmux terminal sessions and I will run a listener. So we're going to demonstrate the reverse shell. So let's go to our pub IP info, grab this link, move back to Windows sandbox, and let's just run it here. Bit will take some time. It's quite a small, slim machine, so it doesn't have many resources. All right, so it was successfully installed. Let's go back here and put this thing on. So as you remember, the pub IP info is a CLI utility, so we can just simply run minus help. Nothing happened. Nothing happened. And as you can see, what just happened is the Microsoft defender detected a threat and it detected a trion, and what it actually detected, it was a dropper that tried to go outside and grab a payload that was sitting in the same paste bit, and it tried to establish reverse shell connection with this machine. And because of that, the windows defender detected this behavior and it blocked it. So let's say if we would disable the real time protection, so what would actually happen in this case? So let's do the pub IP info once again. And as you can see, when defender was disabled, the reverse shell was successfully spun up on the attacker's end. So like I said before, one of the means to obfuscate the payloads is for attackers to bypass the static analysis and to bypass, or make it harder, not even bypass, but make it harder for the malware analysts to find the actual payloads. But what we observed now is Windows Defender performed a dynamic analysis. So bit executed the code and it checked the behavior, the signature behavior of the code, and based on what we observed, it determined that it was a troyan because that thing tried to go outside, grab something and execute it locally. So it was behavior of a dropper of a triangle. So in this instance, if we would have our windows defender disabled, we would get into trouble. But because it was enabled, the malware was successfully blocked. So what actually happened behind the scenes? So behind the scenes we had the first execution of malware through the init py, and it referred to the netutils class from the utils module and the run method. And this method contains a base 64 string. It was exactly the same string bit was exactly the same code that we used during the first part of demo where we demonstrated, where I demonstrated the basic obfuscation techniques. So when package was imported, it run the run method from the netitils class of the utils module, decoded the string, executed it, stolen piece of information, sent it to the attacker within the setup PI. When we executed the installation, there was a custom install step, and while it looks like legitimate step, it checks pip version, ensures it is installed, it checks git installation. The git installation method of the custom install class also contained exactly the same payload. So when we installed the package, it did exactly the same. And last but not least, what happened on the Windows machine. So on the Windows machine, when user executed this pub IP info CLI utility, there was an implant in the CLI method, and it basically tried to execute a subprocess. And the subprocess had a pointer to the drop exit. As you remember, I've mentioned that you can include whatever files, arbitrary files you want in your package. And in this case it was a pre compiled dropper. And this dropper contains this piece of code, obfuscated in form of a list of unicode characters. If we join them, we will get a base 64 string. And when this snippet is executed, de obfuscated, and executed bit goes to the Internet, to the paste bin, and then it downloads the reverse shell listener or either connector. And as we can see, when Defender is disabled, it's actually quite dangerous, because such files, they can really, really harm your system. And if we go to the install package, and if we run the Samgrep once again, you will see that the drop xa wouldn't be detected. So let's give it some time. We're almost at the end of the technical part of this presentation. I know it's a bit lengthy, but please stay tuned. Stay until the end. The most fun part is still coming. All right, so Sam grab was finally executed, and what we can see now, it's pretty much exactly the same detection against the pub IP info malicious package. So it detects Exec, it detects again, exec. Exec. And it detects exec in setup PI. And that's pretty much it. So what actually happened? Bit didn't detect the init PI because this is a legitimate code. It just runs something. But it actually detected the code that this part was referring to in the utils and obviously in setuppy. But as you can see, part in the CLI function that was invoking the pre bit, executable bit wasn't detected. So as you can see, Defender did a really good job on detecting it in dynamics, because if Defender would be disabled, then yeah, basically attacker would get connection to the machine. And I believe that's it for installation, delivery, demo. Okay, last but not least, the most fun part, exfiltration and command and control. So infostillers and remote access drones or rats are very prevalent. So in the first instance, attackers just try to grab whatever is sitting in the environment. They will just try to hide stuff. They will just try to actually extract whatever's hidden in your system and then determine whether these are actual credentials, whether that's sensitive information, so on and so forth. So infrastructures can range from the basic examples as the one I'm going to demonstrate, where it just goes through environment variables and ssh keys, or whether it's actually going to go and try to steal your crypto wallets and so on and so forth. And crypto information and remote access triangles also can range from just those that simply tracking what's going on in your system as a means for additional reconnaissance or bit can be heavily vaporized and do screen grabbing, the webcam grabbing, and it can provide attacker with ability to execute arbitrary comments. Now the demo part. So for this section of our demo, actually, let's go to the initial Tmux session we used, and we're going to back to the source, the root of the repository, and we will go through harvester and we will pre build this package. And we will install this package. So while this package is building itself, well, not really itself, but using the backend build system, we're going to go through the contents of the source code. So the setup PI says that this is a collection of connectors for various databases. So assuming this is a collection of collectors, you can import this package after it's installed and use some of the methods present in this package to maybe ease your life when it comes to connections to different databases. So yeah, based on the content, yeah, looks like legit package, some bootstrap config. So maybe it's going to bootstrap some basic database connection configs for us, who knows? So let's get back to terminal and let's do pip install and the DB tool set tar GZ. And now let's run ipython. And because I already tested it, let's do import DB toolset. Ooh, what just happened? We get a bunch of stuff on the top. Looks like ssh keys. Yeah, looks like ssh keys. And it also tried to go through the envs, seems like. Yeah, definitely ssh keys and definitely tried to go through anvs. So what just happened is we downloaded malicious package, we installed malicious package and as soon as it was imported, remember the init PI can be used to place your malware there. So when the package is imported it will be automatically executed. And what we just observed when it was imported, there was some data collected locally and sent outside. So let's see how it was done. So initpy contained reference to import bootstrap config class from the bootstrap functions module and it executed the run method of this class. So what happened here in this run method, it is mimicking a method that's supposed to create some sample configs for us, and it seems like it even created some sort of a directory. So let's take a look whether it's actually the case. Yeah, and as you can see, it actually created some files for us. So it really mimics the behavior of a legitimate package really well. But if we look down it says envs, and as we just observed on the output received by the receiver, presumably our attacker, there was also reference to the NFS key. So we assume that attacker wanted to enumerate certain environment variables and check whether you store any credentials there. So if we encode this stuff. Pardon? I would rather say decode. Apologies for misclicking. It's quite a heavy technical demo. So yeah, it takes a bit of a coordination. So if we decode the first example, it says AWS access key. So looks like this list of base 64 strings is a collection of different nvars that attacker wanted to enumerate and send back to the listener to the receiving end. So let's decode another one. Okay, so bit seems like it only allows us to decode one by one. Yeah, azure oath location. So indeed here, attacker just placed a bunch of stuff and because they didn't want us to figure out what we are trying to enumerate, they just obfuscated this part. And down here, if you take a look, there is no obfuscation at all. So it's either sort of like opportunistic type of attack where attacker relies on sort of lack of competence on the receiving end or they just trying to bypass the static analysis. But yeah, this is a basic example of a harvester. And as you can see, you can collect information, you can send it outside. So this is not really a problem for an attacker. So last part, remote access, try. And this is going to be probably the funniest part of this presentation. So let's get back to our listener, let's get back to the ipython, and probably for this one, we will also need to rebuild the package. So let's go back to the rat, let's go to sources. Okay, so we're going to run the build, and while it's building itself, let's go to setup pie of this package and see what it does for us. So it says Cli utility to search for packages across different managers, and that's pretty much bit. So like I said before, this package can be downloaded, like packages can be downloaded from the pipi where they use starjacking to trick us into believing it's a legitimate project. Or we can just get it from, I don't know, someone maybe distributed this PKG search targz in discord or some forum, or just uploaded to FTP, or we just found it on GitHub, so who knows? So let's install this one. And assuming it's CLi utility, when we execute the PKG search, it should give us some results. So yeah, it says usage. All right, so package searcher. And it gives us like a few arguments, like for instance, minus amp stands for manager. So here we probably need to specify Pipi, and there's also package. And for package, I don't know, let's put GPT. So let's assume that this malicious package was downloaded, installed, and now the user is trying to look up for all GPT packages because hey, it's 2023, and like, who's not using GPT, right in their development? Okay, so we see some results, but also we see some incoming connection on the attacker's end. And apart from status online, assuming there is some sort of a beacon or agent or listener being spun up on the victim side, we also see the Ngrok URL. And what is this thing doing? So let's copy the IP address, let's copy the URL. And let me open the fresh firefox. So now I'm outside of a virtual machine, I'm using the Ngrok, and oh, it seems like we're hitting some sort of server with endpoints. So let's hit OS information about operating system proc, list of processes. Bit was user name of the user. And we have a screen grabber. And as you can see, all the information that we requested was sent back to the attacker. So this is an example of a very small triangle rat that was embedded in the malicious package. And what's going on here is let's use pstoxarch. So from PKG search package, there were several executions. And first of all there was a Python subprocess spun up that is running a precompiled bytecode as remember, bytecode can be included and bytecode, you need to decompile it in order to look into its content. So it's not a simple file that you can just open and read it. And there was another search index, also a bytecode executed. And combination of these two does the following. So this code, let's look in the original code. This file spins up a flask instance on the victim's machine and populates this endpoints where one of the endpoints is a screen grabber. And the screen grabber allows attacker to basically steal data. Look, if you're doing something, whatnot. So there can be many more endpoints here. There can be comment injection, there can be more data exfiltration. There can be even potentially endpoint that will enable ransom. It will just encrypt all your file system and data. So who knows? And then there was another file. And what this file did, it actually checked whether the flask instance was running. And when it was indeed successfully started, it established an Ngrok tunnel. So for those who doesn't know, Ngrok is actually a legitimate product. It's a great product. If you are a developer and you want to give a temporary access, or you just want to test your solution that you're developing locally, you want to test how it's going to look outside. So you start a listener locally and then you use Ngrok and it publishes the traffic through one of the tunnels that is established using the Ngrok's infrastructure. And while this is a great product, it was quite quickly adopted by attackers and pen testers and red teamers. So Ngrok is super useful, but you have to be careful. As you can see, it can be weaponized. And I know that in many, many companies, they actually now blocking Ngrog on the DNS level. And they're also tracking whether there's an Ngrok binary sitting somewhere in the processes. Last but not least, let's look at this file to search endpoints. So what's in search endpoints? There's a search other method that belongs to a search endpoints class. And what this method does and what its obfuscated string does, it basically runs the subprocess check. It uses current executable, which is python, and that it silently installs all the necessary dependencies. So this is what's called hidden imports. This is like another type of sort of malware invocation that you can use. And then it opens up two subprocesses using no hub and redirecting all errors to dev null and all standard output. And this is why we didn't see anything. And only when I executed Psaux and grabbed for search, we saw these files and how this guy was executed. So in the CLI amongst from legitimate CLI functions, as soon as we selected the particular manager, and as soon as this manager started to doing its job, as soon as bit was finished, there was an execution of a search other method from the search endpoints class. And by that, when we run our very first search against the pipi, we invoked this chain of events when dependencies were installed and when two malware components were executed in the system. And this is how a remote attacker got access to the Troyan via the Ngrok tunnel. So why Ngrock tunnel? Well, first of all, you want to obfuscate the traffic and you want to make sure that the traffic that is used for the command and control operations is hidden. And also you assume as an attacker that your victim is sitting behind the network access translation or the firewall, so you don't know their public IP address and whether they have a public IP address. But Ngrok in this instance is used first of all to hide the traffic, as this is a tunnel and all the tunnel is going on inside of the traffic. And last but not least, we are bypassing a need for the victim to have can IP address. And this is how we are essentially bypassing the perimeter. By no means. This is a technique that has 100% success ratio. As if you have edrs or more sophisticated means for the network monitoring. You can monitor such traffic. You can figure out like hey, why do you have a tunnel established towards Ngrog? And you can just kill those events instantly. So there is a way to protect against such operations. And as we also see in this instance, author of the malware used search endpoints, used base 64 in search endpoints to hide one of the payloads and the listener and the tunnel spanner were pre compiled. And this is basically the end of the exfiltration and the c two demo. And now we are moving into defenses or how we can protect ourselves against all the techniques that we just discovered. So first and foremost, let's divide it into presupply protection post supply protection. So presupply protection is what you can do before the package is either downloaded or imported or installed. So here, first and foremost, you can use individual development sandboxes. There's no need to develop on your host machine. So what you can do, you can have a VM, well protected vm that is not directly connected to your network. It can sit behind like additional nat or whatever, and you can just use for your VDI plugins for remote development like this one in vs code. By doing so, if anything is affected, it's only the VM is affected and not your host system. So the blast radius will be much smaller then avoiding shared development service I've seen several examples where remote development environment was a big chunky development server, and this is a very bad idea because if there's a malicious package being installed on such machine and there are no protections, the blessed radius will be just enormous. Review project details and reputations so as we demonstrated earlier, as it was demonstrated earlier, you can steal reputations through the starjacking. So please make sure that the name of the package corresponds to the repository. The authors are the same, the package hasn't been published five minutes ago, and the repository looks legit and there are at least some sort of a movement in terms of pull requests, contributions and so on and so forth. Code review so manual Grep and Sam grep so obviously manual code review. If you don't have any scanners or tools, at least unzip the package, untard the package and just manually look into the code. Look in those places that I showed you today. So looked into setup PI Grep for base 64, grep for weird looking lists with unicodes. Grab for any pre compiled stuff. Do not execute anything, just look at it. If there are too many red flags, just stay away from such package Semgrap Semgrep is amazing. As I demonstrated today, it is great for the static analysis, but as with example with Windows defender, if you don't have any antivirus, especially in the windows, there will be no dynamic protection, there will be no dynamic analysis of the malware and it will be bypassed. So the defenses will be bypassed. So package quarantine basically do not use package unless you know it is safe to use. That's what package quarantine stands. But also if you have a local mirrors your private pipi mirrors that you're using to download packages and to store your packages, you can use one pipi mirror to download the package, then do analysis, make sure bit is not malicious, and then push it to, let's say, another mirror that is a production one. It can be easily automated. There are enterprise level tools that allow you to do that. So package quarantine is amazing thing to do. Avoid projects that are not published in PYPI as I mentioned before, if package is not published on if project is not published in PYPI, bit doesn't mean it's malicious. It might be just the contributors and development team are just lazy or they don't see, for whatever reasons a need to publish it. Maybe they're not going to support it for long, but using those techniques I showed you today in specific places where malware can sit can most likely be embedded. Just look through the repository and just look at the immediate red flags fixed versions of dependencies please do not do pip install and just name of the package. Do pip install and specify a particular version. Because if project was infiltrated, if newer package newer version of the package that has a malicious implant in it was published without an owner of the project knowing about it, you can install the latest version and infiltrate your system by doing so. So please use fixed dependencies. It's not that hard. Restrict direct downloads of dependencies and this is where developers will probably hate me. And this is where I personally saw a biggest pushback. People want to download stuff from Internet, they just want to do it. Use private PiPi services, trusted proxies. So if you have your PiPi mirror additional index, if you can put stuff like SCA and SAS, and you can scan packages before you release them to developers that will protect them. Absolutely. Do SCA software composition, analysis and precommits. There's a lot of stuff you can do. You can invoke x ray CLI from Jeffrog, you can use safety, you can use, I believe Samgrep as well. If you connect to Samgrep Cloud, you can also do such text there. But if it happened that you downloaded dependency and you installed it, and you didn't scan it with SCA before installation, and it happened to be a malicious package, as soon as you will try to push such dependency to the repo to the integration branch of your project, the SCA will flag it as potentially well as malicious if it knows about it, if it knows that it is malicious, if it was detected beforehand and it will just block your commit. So in this case we're talking about sort of like a containment technique. So it will not going to spread into repository. But in general, if you have at least safety or some grab or x ray or any other sca when you install, when you download the package, just run your checks against it, maybe it's not worth even installing it. And last but not least, antiviruses and edrs as I already showed you, defender, even the basic windows defender is amazing. And if you have edrs it's even better. They will kill stuff like tunneling, like DNS exfiltration and so on and so forth. And post supply protection. So also development sandboxes also relevant for the presupply and the post supply traffic monitoring is also important. I should have probably included into presupply protection as well. And principle of list privilege on build agents and nodes. This is very important, especially if you are working with not FML nodes that are just spun up, they run some pipeline and they're dying afterwards. If you're using persistent nodes, like for instance Jenkins build agents, if you are using a high privilege user that has pseudo or root privileges, I've seen such cases before moving to security. I used to be a DevOps engineer and I've seen people doing such stuff. If the malicious package will land on such node, and if package will have implants that will enumerate the system for any sort of a privilege escalation capabilities, you might get yourself in a big trouble because you will first of all allow attacker to get into your environment, but you will also allow them for an easier lateral movement and data stealing, and you will basically give them like a golden key to the city. Samgrep Sca spa. So why specifically Semgrap? So Semgrap, it's not an advertisement of Samgrep. I just found that Samgrep has a really great collection for the static analysis. So if you can do Semgrap, if you can do in combination with any SCA, especially if you just use Semgrap and its SCA capabilities, and if you use Zbom and you integrate it in your CI CD pipelines, especially if you have a dedicated CI CD pipeline that will download packages from the Internet and check them if developers request them, that's the best. Because if you have spom, and that spom indicates a version of a package with a certain hash calculated for that version, and if it happened that Spom now reports a different hash for the same version, most likely something fishy has happened. Most likely somebody changed the package, overwritten it, whatever. What was done to it, and most likely there's something malicious happened in the background. One of the best examples of the protection for developers I've seen is when two pipi servers were living in parallel, one of them was acting as an interim proxy. So when developers requested the new package that hasn't been present in the production Pipi mirror, in private production Pipi mirror, that package was downloaded from Internet. It was placed into the interim mirror. Then scanners were executed against that package, and when it was proven that it's not malicious, only then it was released and pushed into the production mirror. And only then developers had a chance to download it to their machines. And all developers were forced to use only the production pipe mirror. They were not allowed to go to pipi.org. While it introduced certain delay to download the package the first time, it actually allowed to detect about like a dozen of the real malicious packages that appeared in the interim mirror when developers simply type squatted. So as you can see, you can combine those defenses and protections. And antiviruses in the rs are also important not only on your development machines, but also when you're working on the service. So bit happened that package landed on the build agent, which is basically a server. And if it's not an FML but a persistent one, and imagine if there's the same remote access troyon being executed through an Ngrok tunnel. Well, it's better to have EDR and just kill such events even before they happen or as soon as they happen. So yeah, pre supply and post supply protection please look at this. There are many, many different protection methods and products. Again, I'm not advertising any of the products I've mentioned, but this is what I worked with and they generally generate really good results. Okay, last but not least, very important credits and references. First of all, I would like to give some credit and thank you to my employer epamsystems Ltd for supporting my initiatives on security, researching and public speaking. I would like to say a massive thank you to guys from evil Bunny Road. CTF team. This is a CTF team I'm a member of and we are having a lot of fun researching stuff together. And thanks for the guys for sharing their expertise. And a very big thank you to flat icon.com because I use their icons and graphics in some of the diagrams in this slide deck. And yeah, they asking for the credit mentioned in the credit section. So thanks guys for the graphics. They're amazing references and additional rating. I collected some materials that are used in the preparation and also just to give a bit of sort of additional materials that you can investigate on your own. And I've mentioned some of the tools and techniques and the sources I used for this research, so please go through them. They're very interesting. Last but not least, thank you all for joining this session and this conference. And thank you can 42 for the invitation. I hope you guys enjoyed it. And as soon as you will check this talk, please use one of these two QR codes. Go to my GitHub repo like I've mentioned before. I will release all these samples of the code so you can go through them and play with them on your own. Please, if during this talk you found that I made, I don't know, some sort of like a misstatement, or I made an error, or you believe that something is not true or something is not super accurate, or you maybe know an interesting technique that I didn't mention, or you maybe know how to improve the stock in the future. Please do get in touch. And also, if you're generally interested in supply chain protection, please do get in touch. I think there's a lot of stuff we can discuss and there's a lot of stuff that we can help each other with. So with this, I will wrap up this talk. Thank you once again for joining. I hope you enjoyed it and see you soon. Stay safe and enjoy the rest of this conference. Bye.

Slides

Download slides (PDF)

See all 33 talks at this event!

Conf42 DevSecOps 2023 - Online

November 30 2023

Your trusty Python package: TTPs of attacks on OSS in Python

Video size:

Abstract

Summary

Transcript

Slides

Leonid Akinin

Security Architect @ EPAM Systems

Join the community!

Featured event

2025

2024

Info

Conf42 DevSecOps 2023 - Online

November 30 2023

Your trusty Python package: TTPs of attacks on OSS in Python

Video size:

Abstract

Summary

Transcript

Slides

Leonid Akinin

Security Architect @ EPAM Systems

Join the community!