Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello everyone.
Welcome to can 42 devsecops 2023
event. My name is leonidakinin, and today I'm presenting you your
trusty Python package. Tactics, techniques and procedures of
attacks on open source software in Python. And yes, as you
can see from the title, today we're going to talk about supply chain attacks in
Python first and foremost. Before we proceed forward,
an important disclaimer. The code I'm going to show you today can
be weaponized for malicious purposes. So if
any materials will be misused. Neither me nor my employer
are responsible for any liabilities or damage caused by the
misuse of this material. So please use only for educational purposes,
and please always do your own research.
So we have a lot of ground to cover today, so let's
go briefly through the contents.
First and foremost, we're going to go through why this topic is important and why
is it actually still relevant in 2023. We're going to go
through a really brief, quick history of supply chain attacks,
and then we're going to proceed to the actual demo of various techniques.
We will obviously cover defenses, most common defenses we can
use against this type of threats, attacks. And last but
not least, we will go through some credits and references that are quite useful
in this demonstration. Let's get started. So why this topic
is important. As you can see, in January
there's a very well known and famous machine learning package
was compromised in supply chain attack. In particular, there was a dependency that has
been compromised following by that event. In May,
there was a temporary suspension of new users and projects
at Pipi because bit was a massive influx of malicious packets.
Then in July there were six quite dangerous malicious packages
published that were targeting specifically Windows users. It is actually
quite common these days that Python supply chain attacks are targeting Windows
developers Windows users because the more comfortable
Python ecosystem becomes on the Windows platform, the more malicious code
we will see in this environment.
And last but not least, the VM connect supply
chain attack that was discovered some time ago. It's still up
and running, and there's more and more packages
been published. And there was one particular package that was targeting
the VMware products. So as you can see,
only in 2023 we have at least four events that were widely
discussed in the industry, and no one knows actually how
many more events there are in the wild and how many of them were actually
disclosed and how many of those attacks happening in
the background and completely hidden and undetected.
And most importantly for
me as a security professional, this topic is very relevant because I always consider
that to learn how to defend, you need to learn
how to attack. And if we refer to these
three quotes from famous historical figures of
different eras in different time, the active defense
or offense used as a defense or offensive operations
used as means of developing defenses is not
a new concept. So I personally believe
that for security professional, there's no better way of learning
how to defend something rather than trying to
breach those defenses. Let's go through a really quick history
of supply chain attacks. First, supply chain attacks,
they were dated back to 2017. At that time,
most of them were seen purely as opportunistic,
low success rate, and people didn't really
pay much attention. So initial campaigns targeted docky, hub,
NPM, and Pipi. So all the usual victims of in
2023, they were already exploited back in 2017.
Solar wind attack. The infamous solar wind attack probably was
the first higher profile attack that really drew attention, and people started
thinking about, okay, we have to do something with it, if you
look at the right side of the slide. So colleagues from reversing labs and
colleagues from sonotype, they collected a lot of really useful information, as we can
see on both of these graphs. So starting from 2020, there was
quite a steady growth. And if we look at the bottom
chain, it says 742% of average growth
rate year over year, and that was between 2019 and to
2022. And as we can see on the top chart,
while there's less malicious packages being published till
the end of 2023 in PiPi, there's more and more stuff
going on in the NPM. Although pipi attacks
are still very, very much relevant. And nowadays,
researchers and malware analysts, they observe techniques
when pipi is solely used as a dropper. So pipi packages used solely
as droppers for malicious code that's written in other
languages. And I believe there was even example of like a
JS malicious code running inside of the pardon,
not running, but delivered by the malicious piper package.
Also, supply chain attacks became one of the favorite vectors for the
major apts due to the traditional lack of control over
development environments. Especially in the era of flexible working,
where the bring your own device policy is quite common and some
companies do not invest in endpoint protection.
It is quite a common scenario that one development
workstation that is poorly protected can become
sort of a be on the crack in your defenses
that an attacks need to get through.
And so pipi attacks are seen
to be used alongside with the phishing campaigns when there's advertisement of a particular malicious
product or project. And so, yeah,
apts are actively using it. So it's not just phishing anymore.
But yeah, supply chain attacks, also seen as a means for
the initial access and attacks are ranging
from opportunistic, where just a bunch of packages published out
there, and attackers just sitting and waiting for low
hanging fruits, or people who are not cautious enough or aware enough of these type
of dangers or attacks can be tailored
towards specific organization. And that normally indicates
there's been a lot of reconnaissance done in advance.
So attackers really investigated what type of
technologies a particular organization is using so they can target specific projects
and dependencies. So let's get started with the
main part and tactics, techniques and procedures in supply chain attacks. So first
of all, for those who are not familiar with the term,
with the acronym TTP. So TTP is an acronym developed by the
Mitre Corporation. It stands for tactics, techniques and procedures.
So tactics is basically why or the reason why attacker performs
the action. And here we can say the
four common reasons are initial access,
perimeter bypass, data exfiltration, and obviously
ransomware. And they can be mixed and matched. So it really depends on the
particular apt group, the particular attacks, and the particular campaign that was targeted
against the organization or a group of
users. So techniques is basically how an attacker performs the action.
And here they're uploading malicious packages to repositories,
they utilize typosquarding, they utilize starjacking, and they
obviously inject malicious code through dev
credentials compromise. It's quite a common scenario
when credentials been reused, they were leaked through some
secondary data breach, and then those credentials happened to be
credentials from the PIPI account. And so this
is how the common infiltrations are happening. And procedures is basically
a step by step application of techniques. So today
we're going to mostly focus on
the individual techniques rather than this step by
step application or like a wholesome procedure when it
comes to supply chain attacks. So we will discuss and we will focus
on things like supply chain compromise, so how things
can get compromised. Some malicious code ends up in legitimate
supply chain sources that we're using common defense evasion.
So we're going to go through a payload obfuscation as
technique for defensive agent, and we're also going to go through the traffic obfuscation.
Specifically, we're going to use an example of tunneling. But I also mentioned
DNS exfiltration. So the raw materials
in the references part of this slide deck,
please go through them and read about DNS
exfiltration because it's a really common way of
how data is living your protect environments. Then we're going to go
through installation and delivery. So sometimes it is installation. First delivery,
then sometimes it's delivery, then installation.
But we will look into entry points where
you can expect malicious payload to land
in your environment. Last but not least, we're going to
cover two examples when we're going to go through a data
exfiltration example, which will be a very basic infosiller
or credentials harvester. And then we're going to go through a command and control
example where I will demonstrate a very rudimentary
rat or remote access triangle. First of
all, supply chain compromise. There are several ways how attackers
can compromise your supply chain when it comes to the python and python
packages. First and foremost, public project repository
infiltration, stuff like transfer of ownership.
If developer of a project is tired of supporting it, and then
the project was transferred to someone else claiming they will be legitimate
contributors, legitimate owners. It's one
of the examples how project can be infiltrated. Also official channels
of contribution, especially when project is poorly managed and there is a
big rotation. Retention, sorry, not retention, but attrition
rather of the contributors. And the retention
rates are low. When people are changing on a monthly basis,
it's very hard to track who is contributing what. And this is one of the
ways how malicious code can land in a legitimate project.
Dependency infiltration, obviously. Why targeting
main project if you can infiltrate dependency,
especially if that project is poorly managed. So that is a common
thing, as we've seen in one of the earlier slides,
with the compromise of an ML package,
this is what can happen, and this is how your project can
get compromised and project that you are using can get compromised.
Attacks on private pipi service and proxies on
one of my previous engagements I've actually observed that a company
had a private pipi proxy just sitting in the
public, widely open. And the good thing that the software
they use to host their Python packages hasn't been vulnerable. But imagine that
if that would be a vulnerable pipi server widely opened for
the Internet, then we could see quite an interesting chain
of events over there. Also over permissive Python
pipi repositories when you are allowing to override
already published packages, this is very dangerous because this
is how malicious code can land when legitimate version can
be rewritten with the malicious one.
And yeah, obviously vulnerable pipi service, as I've
mentioned, is a good thing. That company I worked for didn't have any vulnerabilities
in their PIPi proxy, but if they did, that would be quite
an unpleasant chain of events then public
GitHub, repos and FTP service. So sometimes I've seen and actually not
even sometimes quite often, I see that there can be
a GitHub repo that says hey this is a python package
to just clone bit do pip install or
hey we already pre built it for you, there's a tar GZ
archive sitting in the releases, or we just host bit on the FTP
server. We don't want to mess with Pipi.
While this is not specifically an indication of a malicious package
or attackers, this can be just
someone being lazy with their
build and release and publishing processes. So with
those particular instances, I would be very cautious and I would go through
stuff that we will discuss today to make sure that if
it's not officially published in PyPI, it's safe to use type
squatting. Really popular one. So if you type
squatting Python Google, there will be lots and lots of articles. Although it's
getting harder and harder for attacks these days as major software vendors like
for example Microsoft, they started registering as
stub packages. Basically, a stub package is a package
that doesn't have any functionality, but it rather points you to the actual
package that contains what you're looking for. And the reason why companies are
publishing those stubs is because they want to reserve the name
to prevent a type of squatting attack.
And this is actually a very clever technique and I would highly encourage you
if you're working on a project that is actively
published out there in the public, please think
about registering some stub packages so you can defend yourself
like in advance. Prevent any attempts of
type of squatting and starjacking. So starjacking is actually
my favorite one. Starjacking is a combination of a technical
attack and sort of an aspect of social engineering and playing
with the psychology of developers and users where it utilizes,
some people say technical flow I consider as a technical flow, but some people say
it's been just by design in the piping system.
And this is where you can make a reference to the GitHub source
that doesn't belong to your project and effectively stealing
rating stars of that repository. And with
that being said, we are moving to our first technical demo, which is a starjacking
demo. Obviously my Kali Linux
was blocked at this
point because we spent some time on talking through attacks.
But hey, let's proceed. So let's go
to the vs code and let's go to this package that is
called pub IP info. And if
we look at the setup PI setup py file that is used to
basically it's a build backend for this project.
We can see that there is a name of the project
and there's a version, so let's increment this version
because I've been testing it before, let's increment it
2.0. So it's a second major version,
this package, and let's go back a few
directories so we can actually get to
a proper one. So let's go to sources.
All right, so what we're going to do now, we are going
to execute Python minus M build and
while it's building we're going to go back and
we're going to go to the repository and we're going to go to test
Pipi. So the reason why I'm using test Pipi
is because the production PiPi is actually monitored
by different security vendors. So what
we're going to do now is basically we're going to
prevent my account from being blocked for publishing
like intentionally malicious package. So if we're going to go
to your project, as you can see there's already project and
I believe we have a few versions already
published there. So what we're going to do now, we're going to clean up whatever
packages are there. Sorry, bit takes some time as I'm
working in the virtual machine, so we have to be patient
and there's a lot of processes in the recording running in the background.
So if we just delete this version just
to make sure we actually don't have any releases at
all. So no releases found under pub
IP info. Now let's look. Okay, we have
successfully built the package and now what we're going to do, we're going to execute
twine upload. So it will basically pick up
the tarqz and the will and it will push bit to the test
pipi and it's nearly there.
Great. It's almost there.
Check again. Brilliant. So this project
was now published and if you click on the link,
all right, as you can see we just published
a brand new project. So if we
click on the management panel it
will show that it was released just 1 minute ago.
So it's a brand new project. We deleted all previous releases.
So if we go to the project itself.
Sorry, we probably better use this link. Once again,
apologies. So yeah, if we go to the testpipe.org
project, pub ipinfo and this is the
version that we just published, we can see that this
project already has almost 5000 stars of rating,
bit has nearly 2000 forks, it has a
bunch of open issues and a bunch of open prs and
it also has like a pretty legitimately looking
readme description, the installation method
and so on and so forth. So what actually happened here? So if we go
back to the setup PI, as we can see,
the URL is pointing to a completely different
resource. And this resource is one of the
example projects that's used by the
documentation that explains how to package
your python projects and how to publish them.
And if we go to this link,
yeah, these are the stats we are looking at. So what
we just did, what just happened? In less than five minutes we
basically performed a complete selling of
the rating of this project and we
reused that rating to make our malicious package
look like a legitimate option. So this was
starjacking. Now,
next point in this talk is basically we're going to talk about defense evasion
and the obfuscation. So why attackers want to
obfuscate their payloads and their traffic? Well, first of all, they want to evade as
many defenses as possible. And if their malware samples
were detected, they want to make sure that it will be harder
for reverse engineers, for malware analysts to get
to the bottom of the actual payload that supposed to do something
bad in our systems. So when it comes to the payload obfuscation,
one of the few common things is when
attackers using encoding encryption,
bytecode and embedding binary executables
that are written in different languages inside of your Python packages. Because in
Python packages you can include arbitrary files,
literally whatever you want can be included in the Python package.
And when it comes to the traffic obfuscation that I will demonstrate at the
end, we will come to the example of remote access triangle DNX
exfiltration, proxying and tunneling of the traffic. So this
is also very interesting and this is also an important
part to be covered because supply chain attacks do not stop
when you just install the package.
So supply chain attack actually consists of many components,
as we already discussed, and you have to consider many,
many things to successfully detect them and protect yourselves
against them. So with the payload
obfuscation, I believe the best way will be to go
through a demo. And what we're going to do,
we're going to get back into the Kali Linux. We're going
to check where we are at the moment. Actually it would be easier
if we switch to the team accession.
So we're going to go to the folder number one. By the way, when this
talk will be released, I will also release the
repository so you can get through all
these examples on your own and just figure out how stuff
works and play with it a little bit. So let's build
the package. As you can see, bit has quite
an obfuscated name called malware obfuscation.
So we'll take a little bit to create the package.
Nothing too crazy. We still have time. We are not even in
half of this talk. Yeah, so we successfully build it.
And what we're going to do now, we're going to run the installation
of the package to make sure it's present in our system. Yeah,
successfully installed. So for the ease of this demo,
I'm going to run ipython in this part of the terminal.
And in the top section of the terminal I'm going to start a receiver.
And receiver is what we're going to use in all other demonstrations.
And it will imitate an attacker that is sitting somewhere
out there and just waiting and trying to
basically catch the traffic from the
attack machine. So we're going to do now we're going to run from Pymalware obfuscation,
we're going to import the technics module. And inside of the
technics module there is an obfuscation techniques class. And we're going
to start running these methods one
by one. So first one is
executed. Then we're going to run the Unicode
payload. Then we're going to run, I believe,
encryption payload. And for the encryption payload,
I actually need an
encryption key URL. So I'll explain one
by one what each and every method.
Method does take some time.
Yes, it's working. Guys,
this is like a heavy technical demo, so hopefully
everything will work from the first shot. But yeah,
just in case, please be
patient. There's a lot of fun stuff in this demo. All right,
so we just executed 12345.
I believe I missed one, which should be a
combined payload method. So we just executed
six methods from the intentionally
malicious package. And as you can see on the receiving
end, the end that imitates an attacker. We have some JSON
data coming in and it basically contains breaking system
name, version host IP and the username.
And this is an example of a very basic data exfiltration.
So to understand what's going on behind the scene, let's move back
to the vs code and open up the obfuscation
package and the techniques PI module.
So if we scroll up in the doc string,
there's a snippet of code that does a very basic thing, is basically
imports Sys socket, other modules,
starts a socket, connects to the receiver and then bit
forms a JSON data that contains information
about operating system and everything that we just seen in
the Tmux terminal. And the first method that we executed
was base 64 payload. And this string in
base 64 the reversed base 64 string is
this original code. So why reversed? Because there
are scanners that recognize base
64. They do decoding of base 64
and then they do static analysis and they checking whether syntax actually looks like
a programming language. And reversing base 64
reduces, not removes, but reduces the risk for attacker
that the payload will remain hidden.
And then when this base 64 was decoded
and reversed back before that decoded and passed to the exec,
we saw the first message,
we saw the first part of the traffic coming into the receiving
end. So the next method was Unicode payload,
which was exactly the same code but transformed
into the list of basically the Unicode
numbers or numbers that correspond to the particular character
from the Unicode table. And the reverse operation contained
joining it into a string and then executing using the bit in exec
method. Then the base 64 Unicode combined payload,
it was combination of the first, the base 64
reversed, then transformed into this array.
Array was also, or list was also reversed. And then
the reverse operation was performed to execute exactly the same code snippet
that imitates stealing some basic data from the system and
pushing it somewhere to the remote location where our
supposed attacker is sitting. And another
one, which was called encryption payload, where I had to supply a paste bit
URL, is basically an example of how attackers
can use droppers. And droppers is basically a first stage payload
or first stage malware that goes to the
outside. In our case, it's pastebin grabs
something. In this instance, we're not talking about the payload sitting
in the external sources, but rather chain encryption key. And here attacker
obfuscated the same snippet of code that we discussed
at the beginning of this part. And it
decrypts it using encryption key that is sitting in the paste bit.
So this is a very neat technique and they can use it.
For example, they can be several paste bins where one contains encryption
key, another one contains the actual malware payload
and so on and so forth. So they can fetch in payload from
the second pace bin and decrypt it using encryption from the first base
bit and then decode from base 64
or whatever they're using for additional obfuscation and execute bit. So this is
like a combination of many techniques. Now to the fun part,
bytecode payloads. So the first three are actually
quite commonly known and the exec is
very well tracked by many SAS scanners. So if you run Semgrep,
Semgrep will immediately actually, let's run
Semgrep just to demonstrate how easy to spot exec.
But why other functions are not detected
by why other obfuscation techniques are not detected. So I already
played with Semgrep a little bit and yeah,
it's going to take some time. And in the meantime,
let's look at the remaining methods. So the bytecode
payload, what's going on here is we
have a code that is at the top of this file,
that small infosiller that was pre compiled
into the bytecode. So as you know, when Python interpreter is executed,
what it does, it first of all does a syntactical check bit, then builds
up the ast and then it's all transformed into the bytecode. And the bytecode
is what? Executed by the Python virtual machine.
And these pre compiled bytecodes,
they can be embedded in the package and they can
be invoked exactly in exactly the same way
and imported as your regular module. So if we scroll up
at the very beginning, I'm importing compiled,
and inside of the compiled there is
an exfiltrate function. And the exfiltrate function contains code
that we reviewed at the beginning, and the last one is
pretty much the same. But now instead of using the bytecode,
we are using a beacon that
was written in Golang. So we're basically embedding an executable
binary into the python package.
And this is what we just demonstrated, this is what we just observed.
So let's look at the results of the semgrap. So Semgrap
went through the directory where we are
now, and it is through one obfuscation,
and, sorry, that was the wrong one.
We probably need to switch directories because there's a lot of stuff
going on there. Yeah, let's wait for the Samgrep to execute once
again. But we will actually use results of the previous
semgrep for another demonstration. So it's
actually good that we run it in advance. So Semgrap
is running 100%. Great.
All right, so let's look at the results.
First of all,
it detects the bacon. Go, obviously,
because let's call it a plain text file. It's not
obfuscated or encrypted. There is
an exact detection in one of the techniques
from techniques py, and that's it.
So if we look back at the techniques that we just
demonstrated, the bytecode payload wasn't detected
because what Samgrep does, it does the static analysis.
It goes through the syntax of the files and then bit tries to
look for the patterns. And embedded binary is the same
because it was pre compiled. Bit only detected the
content of beacon that was in beacon go. So the
source file that I've included so you can look at the source of the example,
it was also detected. So as we can see, while Samgrep
did a really good job, it was only the
static analysis check and it wasn't decompiling the pre compiled bytecode
and the bacon. So essentially we
bypassed, let's say two
out of six techniques successfully bypassed
this protection, even though the first four are pretty much
exactly the same. It's just how payload was obfuscated before
it was passed into the built in exec method.
And this is all for the obfuscation. So these are the
most basic and common techniques. So by any means
these are not perfect techniques and there are way more sophisticated
methods of bypassing the protections, the scanners
and so on and so forth. But this is
what you can look up for when you trying
to determine whether your package is malicious or not and
you don't have any scanners to use.
Now, installation and delivery, that's another important and interesting topic.
So when I went through a bunch of materials from
other researchers and I researched some of the packages in my own, I discovered
that the most common way is when the
payload is invoked at the stage when packages imported.
So the init PI placeholder that makes
directory a python module or rather
than a package, I would say, yeah,
a package. So you can place
code in init PI. And when this package will
be imported, whatever is sitting there will be executed and
depends on the creativity of an attacker. It can go completely
hidden for the end user setup. PI is another
interesting thing because when you're installing the package, you can
specify custom installation steps.
And when those custom installation steps are invoked,
you can place whatever arbitrary code you want,
you can download additional droppers, you can execute ransom, so on and so forth.
This is also a very,
I would say it's probably the major vector of how the payload
is invoked during installation and obviously droppers.
So as we checked in the example of encrypted payload where
encryption key was downloaded from the paste bin in
the previous section of this demo, droppers can
be a very minimal snippets of code that will go to the outside sources
or external sandboxes as some researchers call them.
And this is when the actual payload will be sitting.
The famous sandboxes are discord,
pastebin, telegram bots and anon files that
was I believe closed some time ago. So first of
all, by no means I'm implying that Discord
is just all about hosting malware, but it's a well
known platform alongside with paste bit and Telegram. And as
far as I'm concerned, recently there was a change in how the
URL to files are published in Discord CDN, so you can't host
files forever. I believe the link is
only valid for like 24 hours. But yeah, I need
to double check because there have been some changes just
because droppers are using discord as
their external sandboxes. And as we can see in
the diagram, attacker publishes malicious code, developer initiates
the package installation, and when the package installation
happens, there is an external sandbox. External sandbox or malware
is executed either from setup PI or through the init PI,
and it's time for another demo. So let's
go to our ipython,
let's drop this session, let's check where we're sitting
now. Yeah, like I said guys,
I will publish all code snippets
so you will have a chance to go through them. Let's drop the receiver so
we can have a clear picture of what's going
on on the screen. And let's run the
same build command as we did before.
And while it's building, let's go through
the project. So the project or the package rather, it's that
pub IP info we used for the starjacking
demo. And actually,
yeah, we didn't have to rebuild it once again because
it's already published. But never mind. So what
this package does, when it's installed in the system,
it will create an entry point, which is a console script
that will execute the CLI. So basically this is a CLI
utility, and based on the description, it says CLI utility
that drifts information about your public IP.
So it's a little handy CLi utility that helps you to determine
what is the current IP address that was given to you
by your provider. So, seems like a neat little package,
and as we saw from the starjacking example, it has almost
five stars of a rating. So why shouldn't we trust
that package? What just happened? So when package build was
executed, the installation steps were also executed. And as we
can see, the imitation of an attacker
just received the very first
piece of information that was stolen from our system.
So let's actually manually install this
packet and see if it happens again. So as
I said, sometimes it happens that you can find those targz
on the FTP service or just sitting in the repositories on GitHub
or on some other version control systems platforms.
And sometimes the instructions contain stuff like hey,
do curl vjet, download this archive and just do pip
install and this is what can happen when
you download that arbitrary package and
you execute installation without scanning. So let's do
once again, okay? And exactly the same result, because the installation
ttps were executed once again, we have another
piece of information stolen from the machine. Now let's
go back to the ipython and
let's import this PkG.
Sorry, let's import pub IP
info. Oh, as soon as the import step
has happened, there was another piece of information stolen from our machine
and sent to the attacker. Now for the last piece of this demonstration,
I will put on the screen the Windows ten
sandbox, and I will open in a non
privileged mode the comment prompt,
and I will show that the
windows defender is running. So everything is up and running and everything is green.
For the second part, I will switch the
tmux terminal sessions and
I will run a listener. So we're going
to demonstrate the reverse shell. So let's
go to our pub IP info,
grab this link, move back to Windows sandbox,
and let's just run it here.
Bit will take some time. It's quite a small,
slim machine, so it doesn't have many resources. All right,
so it was successfully installed. Let's go back here and
put this thing on. So as you remember,
the pub IP info is a
CLI utility, so we can just simply
run minus help. Nothing happened.
Nothing happened. And as you can see,
what just happened is the Microsoft defender
detected a threat and it detected
a trion, and what it actually detected,
it was a dropper that tried to go
outside and grab a payload
that was sitting in the same paste
bit, and it tried to establish reverse
shell connection with this machine.
And because of that, the windows defender
detected this behavior and it blocked it.
So let's
say if we would disable the real time protection,
so what would actually happen in this case?
So let's do the
pub IP info once again. And as you can see,
when defender was disabled, the reverse shell was successfully
spun up on the attacker's end.
So like I said before,
one of the means to obfuscate the payloads is for
attackers to bypass the static analysis and
to bypass, or make it harder, not even bypass, but make it harder
for the malware analysts to find the actual payloads. But what we observed now
is Windows Defender performed a dynamic
analysis. So bit executed the code and
it checked the behavior, the signature behavior of the code,
and based on what we observed, it determined that it was a
troyan because that thing tried to go
outside, grab something and execute it locally. So it was behavior
of a dropper of a triangle. So in this instance,
if we would have our windows defender disabled,
we would get into trouble. But because it
was enabled, the malware was successfully blocked. So what actually happened behind
the scenes? So behind the scenes we
had the first execution of malware through the init py,
and it referred to the netutils class from the utils module
and the run method. And this method contains
a base 64 string. It was exactly the same string bit
was exactly the same code that we used during the first part of demo
where we demonstrated, where I demonstrated the basic obfuscation
techniques. So when package was imported,
it run the run method from the netitils class
of the utils module, decoded the string,
executed it, stolen piece of information, sent it to the attacker within
the setup PI. When we executed the installation,
there was a custom install step, and while it looks like
legitimate step, it checks pip version, ensures it is installed,
it checks git installation. The git installation
method of the custom install class also
contained exactly the same payload.
So when we installed the package, it did exactly the same.
And last but not least, what happened on the Windows machine. So on
the Windows machine, when user
executed this pub IP info CLI utility,
there was an implant in the CLI method,
and it basically tried to execute
a subprocess. And the subprocess had
a pointer to the drop exit. As you remember, I've mentioned that
you can include whatever files, arbitrary files you
want in your package. And in this case it was a pre compiled
dropper. And this dropper contains
this piece of code, obfuscated in form
of a list of unicode characters.
If we join them, we will get a base 64 string.
And when this snippet is executed, de obfuscated,
and executed bit goes to the Internet,
to the paste bin, and then it downloads the reverse
shell listener or either connector. And as we
can see, when Defender is disabled,
it's actually quite dangerous, because such files, they can really, really harm
your system. And if we go to the
install package, and if we run the Samgrep
once again, you will see that the drop xa
wouldn't be detected. So let's give it some time.
We're almost at the end of the technical part of
this presentation. I know it's a bit lengthy, but please
stay tuned. Stay until the end.
The most fun part is still coming. All right, so Sam
grab was finally executed, and what we can see now,
it's pretty much exactly the same detection against the pub IP info
malicious package. So it detects Exec, it detects
again, exec. Exec. And it detects exec in setup
PI. And that's pretty much it.
So what actually happened? Bit didn't detect
the init PI because this is a legitimate code.
It just runs something. But it
actually detected the code that this part was referring to in the
utils and obviously in setuppy. But as
you can see, part in the CLI function
that was invoking the pre bit,
executable bit wasn't detected.
So as you can see, Defender did a really good job on
detecting it in dynamics, because if Defender would be disabled,
then yeah, basically attacker would
get connection to the machine. And I
believe that's it for installation, delivery, demo.
Okay, last but not least, the most fun part, exfiltration and command
and control. So infostillers and remote access drones or
rats are very prevalent. So in the first instance,
attackers just try to grab whatever is sitting in the environment.
They will just try to hide stuff.
They will just try to actually extract
whatever's hidden in your system and then determine whether
these are actual credentials, whether that's sensitive information,
so on and so forth. So infrastructures can range from the
basic examples as the one I'm going to demonstrate, where it just goes through environment
variables and ssh keys,
or whether it's actually going to go and try
to steal your crypto wallets and so on and so forth. And crypto information and
remote access triangles also can range from just
those that simply tracking what's going on in your system as a means for
additional reconnaissance or bit can be
heavily vaporized and do screen grabbing,
the webcam grabbing, and it can provide attacker with ability to
execute arbitrary comments. Now the demo part.
So for this section of our demo,
actually, let's go to the initial
Tmux session we used, and we're going
to back to the source, the root of
the repository, and we will go through harvester and
we will pre build this package. And we will install this package.
So while this package is building itself,
well, not really itself, but using the backend build system,
we're going to go through the contents of the source code.
So the setup PI says that this is
a collection of connectors for various databases.
So assuming this is a collection of collectors, you can import
this package after it's installed and
use some of the methods present in this package
to maybe ease your life when it comes to connections to
different databases. So yeah, based on the content,
yeah, looks like legit package,
some bootstrap config. So maybe it's going to bootstrap
some basic database connection configs for us,
who knows? So let's get back to terminal and let's do
pip install and the DB tool
set tar GZ. And now
let's run ipython.
And because I already tested it, let's do import
DB toolset. Ooh, what just happened?
We get a bunch of stuff on the top. Looks like ssh keys.
Yeah, looks like ssh keys. And it also tried
to go through the envs, seems like. Yeah, definitely ssh
keys and definitely tried to go through anvs. So what
just happened is we downloaded malicious package, we installed malicious package and as
soon as it was imported, remember the init
PI can be used to place your malware there.
So when the package is imported it will
be automatically executed. And what we just observed when it was
imported, there was some data collected locally and sent
outside. So let's see how it was done. So initpy
contained reference to import bootstrap config class
from the bootstrap functions module and it executed the run
method of this class. So what happened here in
this run method, it is mimicking
a method that's supposed to create some sample configs for us,
and it seems like it even created some
sort of a directory. So let's take a look whether it's
actually the case. Yeah, and as you can see,
it actually created some files for us. So it
really mimics the behavior of
a legitimate package really well. But if we look down it says envs,
and as we just observed on the output
received by the receiver, presumably our attacker,
there was also reference to the NFS key. So we
assume that attacker wanted to
enumerate certain environment variables and check whether you
store any credentials there. So if we encode this stuff.
Pardon? I would rather say decode.
Apologies for misclicking. It's quite a heavy technical
demo. So yeah, it takes a bit of a coordination.
So if we decode the first example, it says AWS
access key. So looks like this
list of base 64 strings is a collection of
different nvars that attacker wanted
to enumerate and send back to
the listener to the receiving end. So let's decode another one.
Okay, so bit seems like
it only allows us to decode one by one.
Yeah, azure oath location. So indeed here,
attacker just placed a bunch of stuff and because they didn't want us to
figure out what we are trying to enumerate, they just
obfuscated this part. And down
here, if you take a look, there is no obfuscation at
all. So it's either sort of like opportunistic
type of attack where attacker relies on sort
of lack of competence on the receiving end or they
just trying to bypass the static analysis. But yeah,
this is a basic example of a harvester. And as you can see,
you can collect information, you can send it outside.
So this is not really a problem for an attacker. So last part,
remote access, try. And this
is going to be probably the funniest part of this presentation.
So let's get back to our listener,
let's get back to the ipython, and probably for
this one, we will also need to rebuild
the package. So let's go back to the
rat, let's go to sources. Okay,
so we're going to run the build,
and while it's building itself, let's go to setup pie
of this package and see what it does for us. So it says Cli
utility to search for packages across different managers,
and that's pretty much bit. So like
I said before, this package can be downloaded,
like packages can be downloaded from the pipi where
they use starjacking to trick us into believing it's a legitimate
project. Or we can just get it from, I don't
know, someone maybe distributed this PKG search
targz in discord or some forum,
or just uploaded to FTP, or we just found it on GitHub,
so who knows? So let's
install this one. And assuming it's CLi utility,
when we execute the PKG search,
it should give us some results. So yeah,
it says usage. All right, so package searcher.
And it gives us like a few arguments, like for instance, minus amp stands
for manager. So here we probably need to specify Pipi,
and there's also package. And for package, I don't know, let's put
GPT. So let's assume that this malicious package was downloaded,
installed, and now the user is trying to look up for
all GPT packages because hey, it's 2023,
and like, who's not using GPT, right in their
development? Okay, so we see some results,
but also we see some incoming connection
on the attacker's end. And apart from
status online, assuming there is some sort of a
beacon or agent or listener being spun up on the victim
side, we also see the Ngrok URL.
And what is this thing doing?
So let's copy the IP address, let's copy the URL.
And let me open the fresh firefox.
So now I'm outside of a virtual machine, I'm using the
Ngrok, and oh, it seems like
we're hitting some sort of server with endpoints. So let's
hit OS information about operating
system proc, list of processes.
Bit was user name of the user.
And we have a screen grabber. And as you can
see, all the information that we requested was
sent back to the attacker. So this is an example of a
very small triangle rat that
was embedded in the malicious package. And what's going on here
is let's use pstoxarch.
So from PKG search package, there were several executions.
And first of all there was a Python subprocess spun up that is
running a precompiled bytecode as remember,
bytecode can be included and bytecode, you need to decompile it in
order to look into its content. So it's
not a simple file that you can just open and read it.
And there was another search index,
also a bytecode executed.
And combination of these two does the following. So this
code, let's look in the original code.
This file spins up a flask instance on the
victim's machine and populates this endpoints
where one of the endpoints is a screen grabber. And the screen grabber
allows attacker to basically steal data. Look,
if you're doing something, whatnot. So there can be many more endpoints
here. There can be comment injection, there can be more data exfiltration.
There can be even potentially endpoint that will enable ransom.
It will just encrypt all your file system and data. So who knows?
And then there was another file. And what this file did, it actually
checked whether the flask instance was
running. And when it was indeed successfully
started, it established an Ngrok tunnel. So for those who doesn't know, Ngrok is
actually a legitimate product. It's a great product. If you are a developer
and you want to give a temporary access, or you just want to
test your solution that you're developing locally, you want to test how it's
going to look outside. So you start a listener locally and then you use Ngrok
and it publishes the traffic through one of the tunnels that is established
using the Ngrok's infrastructure. And while this is a
great product, it was quite quickly adopted
by attackers and pen testers and red teamers.
So Ngrok is super useful, but you have
to be careful. As you can see, it can be weaponized. And I know that
in many, many companies, they actually now
blocking Ngrog on the DNS level. And they're also tracking whether
there's an Ngrok binary sitting somewhere in the processes.
Last but not least, let's look at this file
to search endpoints. So what's in search endpoints?
There's a search other method
that belongs to a search endpoints class.
And what this method does and what its obfuscated string
does, it basically runs the subprocess check.
It uses current executable, which is python, and that it silently
installs all the necessary dependencies.
So this is what's called hidden imports. This is like another
type of sort of malware invocation
that you can use. And then it opens up two subprocesses
using no hub and redirecting all errors to
dev null and all standard output. And this
is why we didn't see anything. And only when I executed Psaux
and grabbed for search, we saw these files and
how this guy was executed.
So in the CLI amongst from legitimate
CLI functions, as soon as we selected the
particular manager, and as soon as this manager started to doing its job,
as soon as bit was finished, there was an execution of
a search other method from
the search endpoints class.
And by that, when we run our very first search against
the pipi, we invoked this chain of events
when dependencies were installed and when
two malware components were executed in
the system. And this is how a remote attacker got
access to the Troyan via the Ngrok tunnel. So why Ngrock
tunnel? Well, first of all, you want to obfuscate the traffic and
you want to make sure that the traffic that is used
for the command and control operations is hidden. And also
you assume as an attacker that your victim is
sitting behind the network access translation or the firewall,
so you don't know their public IP address and whether they have a public
IP address. But Ngrok in this instance is used
first of all to hide the traffic, as this is a
tunnel and all the tunnel is going on inside of the traffic.
And last but not least, we are bypassing a need for
the victim to have can IP address. And this is how we are
essentially bypassing the perimeter. By no means.
This is a technique that has 100% success ratio. As if you
have edrs or more sophisticated means for the network
monitoring. You can monitor such traffic. You can figure out like
hey, why do you have a tunnel established towards Ngrog?
And you can just kill those events instantly. So there is a way to
protect against such operations. And as we also
see in this instance,
author of the malware used search endpoints,
used base 64 in search endpoints to hide
one of the payloads and the listener and
the tunnel spanner were pre compiled.
And this is basically the end of the
exfiltration and the c two demo. And now we are moving into defenses
or how we can protect ourselves against all
the techniques that we just discovered. So first and foremost,
let's divide it into presupply protection post supply protection. So presupply
protection is what you can do before the package is either
downloaded or imported or installed. So here,
first and foremost, you can use individual development sandboxes. There's no
need to develop on your host machine. So what you can do, you can have
a VM, well protected vm that is
not directly connected to your network. It can sit
behind like additional nat or whatever,
and you can just use for your VDI plugins
for remote development like this one in vs code.
By doing so, if anything is affected, it's only
the VM is affected and not your host system. So the blast radius
will be much smaller then avoiding shared development service
I've seen several examples where remote development environment was a
big chunky development server, and this is
a very bad idea because if there's a malicious package being
installed on such machine and there are no protections, the blessed radius will be
just enormous. Review project details
and reputations so as we demonstrated earlier, as it was
demonstrated earlier, you can steal reputations through the starjacking.
So please make sure that the name of the package corresponds
to the repository. The authors are the same, the package hasn't
been published five minutes ago, and the repository looks
legit and there are at least some sort of a movement in terms of pull
requests, contributions and so on and so forth. Code review so
manual Grep and Sam grep so obviously manual code review.
If you don't have any scanners or tools,
at least unzip the package,
untard the package and just manually look into the code.
Look in those places that I showed you today. So looked into
setup PI Grep for base 64, grep for weird
looking lists with unicodes. Grab for any
pre compiled stuff. Do not execute anything, just look at it.
If there are too many red flags, just stay away from
such package Semgrap Semgrep is amazing. As I demonstrated today,
it is great for the static analysis, but as with example
with Windows defender, if you don't have any antivirus,
especially in the windows, there will be no dynamic protection,
there will be no dynamic analysis of the malware and it will be
bypassed. So the defenses will be bypassed.
So package quarantine basically do
not use package unless you know it is safe to use.
That's what package quarantine stands. But also if you have
a local mirrors your private pipi mirrors that you're using to download
packages and to store your packages, you can use
one pipi mirror to download the package, then do analysis,
make sure bit is not malicious, and then push it to,
let's say, another mirror that is a production one. It can be easily automated.
There are enterprise level tools that allow you to do
that. So package quarantine is amazing thing to do.
Avoid projects that are not published in PYPI as I mentioned before,
if package is not published on if project is not published in PYPI, bit doesn't
mean it's malicious. It might be just the contributors and development
team are just lazy or they don't see, for whatever reasons
a need to publish it. Maybe they're not going to support it for long,
but using those techniques I showed you today in specific places where malware
can sit can most likely be embedded.
Just look through the repository and just look at
the immediate red flags fixed versions of
dependencies please do not do pip install and
just name of the package. Do pip install and specify a particular version.
Because if project was infiltrated,
if newer package newer version of the package that has a malicious
implant in it was published without an owner of the
project knowing about it, you can install the
latest version and infiltrate your system by doing so. So please use
fixed dependencies. It's not that hard.
Restrict direct downloads of dependencies and this is where developers
will probably hate me. And this is where I personally saw a
biggest pushback. People want to download stuff
from Internet, they just want to do it.
Use private PiPi services, trusted proxies. So if you have your
PiPi mirror additional index, if you
can put stuff like SCA and SAS,
and you can scan packages before you release them to developers that will
protect them. Absolutely. Do SCA software
composition, analysis and precommits. There's a lot of stuff you can do. You can
invoke x ray CLI from Jeffrog, you can use safety,
you can use, I believe Samgrep as well.
If you connect to Samgrep Cloud, you can also do such text
there. But if it happened that you downloaded
dependency and you installed
it, and you didn't scan it with
SCA before installation, and it happened to
be a malicious package, as soon as you will try to push
such dependency to the repo to the integration
branch of your project, the SCA will flag it as
potentially well as malicious if it knows about it,
if it knows that it is malicious, if it was detected beforehand and
it will just block your commit. So in this case we're talking about sort of
like a containment technique. So it will not going to spread into repository.
But in general, if you have at least safety
or some grab or x ray or any other
sca when you install, when you download
the package, just run your checks against it,
maybe it's not worth even installing it. And last but not least,
antiviruses and edrs as I already showed you,
defender, even the basic windows defender is amazing. And if you have edrs
it's even better. They will kill stuff like tunneling, like DNS exfiltration and
so on and so forth. And post supply protection. So also
development sandboxes also relevant for
the presupply and the post supply traffic monitoring
is also important. I should have probably included into presupply
protection as well. And principle of list privilege on build agents
and nodes. This is very important,
especially if you are working with not FML nodes that
are just spun up, they run some pipeline and they're
dying afterwards. If you're using persistent nodes, like for instance
Jenkins build agents, if you are using a high privilege
user that has pseudo or root privileges,
I've seen such cases before moving to security. I used to be a
DevOps engineer and I've seen people doing such stuff.
If the malicious package will land on such node,
and if package will have implants that will
enumerate the system for any sort of a privilege escalation capabilities,
you might get yourself in a big trouble because you will first of all
allow attacker to get into your environment, but you will also allow them
for an easier lateral movement and data stealing,
and you will basically give them like a golden key
to the city. Samgrep Sca spa. So why
specifically Semgrap? So Semgrap, it's not an advertisement of Samgrep. I just
found that Samgrep has a really great collection for the static analysis.
So if you can do Semgrap, if you
can do in combination with any SCA, especially if
you just use Semgrap and its SCA capabilities, and if you use Zbom
and you integrate it in your CI CD pipelines, especially if you have a dedicated
CI CD pipeline that will download packages from the Internet and check
them if developers request them, that's the best.
Because if you have spom, and that spom indicates a
version of a package with a certain hash calculated
for that version, and if it happened that Spom now reports a
different hash for the same version, most likely something fishy
has happened. Most likely somebody changed the
package, overwritten it, whatever. What was done to
it, and most likely there's something malicious happened in the background.
One of the best examples of the protection for developers
I've seen is when two
pipi servers were living in parallel,
one of them was acting as an interim
proxy. So when developers requested the new package that
hasn't been present in the production Pipi mirror, in private production
Pipi mirror, that package was downloaded from Internet.
It was placed into the interim mirror.
Then scanners were executed against that
package, and when it was proven that it's not malicious, only then
it was released and pushed into the production mirror. And only
then developers had a chance to download it to their machines. And all developers
were forced to use only the production
pipe mirror. They were not allowed to go to pipi.org.
While it introduced certain delay to download the package the first time,
it actually allowed to detect about
like a dozen of the real malicious packages that
appeared in the interim mirror when developers simply type squatted.
So as you can see, you can combine those defenses and protections.
And antiviruses in the rs are also important not only
on your development machines, but also when you're working on the
service. So bit happened that package landed on
the build agent, which is basically a server. And if it's not an FML but
a persistent one, and imagine if there's the same
remote access troyon being executed through an Ngrok tunnel.
Well, it's better to have EDR and just kill such events
even before they happen or as soon as they happen. So yeah,
pre supply and post supply protection please look at this. There are many,
many different protection methods and products. Again,
I'm not advertising any of the products I've mentioned, but this
is what I worked with and they generally generate really good results.
Okay, last but not least, very important credits
and references. First of all, I would like to give some credit and
thank you to my employer epamsystems Ltd for supporting my initiatives
on security, researching and public speaking. I would like to say a
massive thank you to guys from evil Bunny Road. CTF team.
This is a CTF team I'm a member
of and we are having a lot of fun researching stuff together. And thanks for
the guys for sharing their expertise. And a very big thank you
to flat icon.com because I use their icons
and graphics in some of the diagrams in
this slide deck. And yeah,
they asking for the credit mentioned in the credit section.
So thanks guys for the graphics. They're amazing references
and additional rating. I collected some materials that are used in the preparation and
also just to give a bit of sort
of additional materials that you can investigate on your own.
And I've mentioned some of the tools and techniques and the sources I used
for this research, so please go through them. They're very interesting.
Last but not least, thank you all for joining this
session and this conference. And thank you can 42
for the invitation. I hope you guys enjoyed it.
And as soon as you will check this talk,
please use one of these two QR codes. Go to my GitHub repo
like I've mentioned before. I will release all these samples
of the code so you can go through them and play
with them on your own. Please, if during
this talk you found that I made,
I don't know, some sort of like a misstatement, or I made an error,
or you believe that something is not true or something is not super accurate,
or you maybe know an interesting technique that I didn't mention,
or you maybe know how to improve the stock in the future. Please do
get in touch. And also, if you're generally interested in supply chain protection,
please do get in touch. I think there's a lot of stuff we can discuss
and there's a lot of stuff that we can help each other with.
So with this, I will wrap up this
talk. Thank you once again for joining. I hope you enjoyed it and
see you soon. Stay safe and enjoy the rest of this conference.
Bye.