Abstract
Popular programming language index websites (TIOBE index) and developer surveys (Stack Overflow) place Python as one of the fastest-growing programming languages. However, this popularity also puts in the target range of attackers. The attackers perform malicious dependency attacks and use misconfiguration tools to reveal confidential information. Jukka Ruohonen, Kalle Hjerppe, and Kalle Rindell in their research paper “A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI” claimed that they scanned PyPI for security issues in Python packages and found the presence of at least one security issue in about 46% of the Python packages.
In addition, security vulnerabilities can be present in the source code of the package. In this talk, we will address the security issues related to python packaging and possible solutions to make python packages secure. The talk begins with the importance of a secure package and vulnerabilities in the Python package index. Then, I will discuss Python packages such as Bandit for identifying common security issues in Python code and “safety” for dependency check. Next, I will discuss verifying and signing Python packages using GPG. Finally, I will discuss general guidelines for secure coding practices in Python.
Outline
1. Importance of a secure package and vulnerabilities in python package index. (05 Minutes)
2. Bandit for identifying common security issues in Python code (4 Minutes)
3. Safety for dependency check (4 Minutes)
4. Verifying and signing PyPI and conda packages using GPG and Twine (4 Minutes)
5. General guidelines for secure coding practices in Python (5 Minutes)
6. Summary and Questions (3 Minutes)
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, my name is Gajendra Deshpande.
I am working as assistant professor at Institute of Technology.
I also run a startup called Eyesec Cyber Security Solutions Private Limited.
Today I will be presenting a talk security security considerations packaging Python
packaging let us see
the outline. So in today's talk I'll be discussing about importance
of secure package and vulnerabilities in Python package index.
Then we will see some tools such as bandit safety and Semgrep.
So Bandit is used for identifying common security issues in Python.
Code safety is used for checking the dependency
vulnerabilities and Samgrep is for static analyzer. Then finally
I will conclude talk with general guidelines for secure
coding practices in Python.
Now on this slide you can see that the tob index for
Python. So recently you might have seen that Python has
reached the number one position in tob
index. So Tob is nothing, but it's a
website which ranks programming languages as per their popularity.
Now let us also see some other surveys, say for example in
static workflow survey,
Python has ranked third after JavaScript and HTML
or CSS. If we see here Javascript and HTML or CSS,
they are not really the programming languages
or the things which Python can do.
Then next is on GitHub stats.
That's on jitter IO here. Also you can see that
Python is ranked third in terms of number of active
repositories. Now if you see all these factors,
no doubt it shows that Python is one of the most
popular programming languages or scripting languages
in the world today. So this popularity itself
has created a problem for Python because many of the hackers
are targeting Python. Now let's see
the security issues and some misconceptions related to open source
software. So there's a misconception
about security of open source software. The major
reason people cite is that the code is open
source. Because code is open source, everything is
open, the folder structure is known. So they say
that this open source structure or
this information openly available makes
it vulnerable. But generally, open source software are secure
by design. Okay, so by default
or by design, Python is secure by design.
Say WordPress is secure by design. Any content management software,
open source software are secure. The problem
is when people start using unsecure third party packages
and security issues are mostly due to the understanding or
lack of understanding of secure coding principles.
So that is because these
open source softwares mostly allow third
party or any person to create their own
packages and integrate into the existing software.
Now the problem lies with the developer because he or
she may not be aware of
the secure coding principles. So that's why
we say that the Python or any open source software is secure, but the vulnerabilities
may be present in packages. So most of the times this
is the case. But again, I'm not saying that the
vulnerabilities will not be there in the original software, they will be there. But there
is a very huge community which is constantly monitoring
the different aspects, including the security aspects,
and they are continuously fixing the
security issues. Apart from that, there will be bug bounty programs
which will help you to identify the issues and fix them.
Now, importance of a secure package so insecure package
will make your application vulnerable and prone to external
threats. We don't know what kind of vulnerability
is present in the package. Sometimes the
package may be insecure because the developer
has not fixed those issues, or many times it's
a hacker or a cracker who might have issued,
who might have introduced these vulnerabilities
purposefully to extract the information.
So compromised and unauthorized disclosure of information may result
into personal and company reputation and
money. So unsecured code may damage the systems
of users, and also sometimes it
may lead to physical damage. So because of these reasons,
you need to scan your package, scan your environment,
and ensure that only secure packages are
installed. Now let us see some news articles
which have been published and which have highlighted
the security issues in the PYPI index. So first
one you can see other portsfigure has published
an article which highlights the dependency confusion or attack mounted
via PYPI repository and that exposes
flawed packages installer behavior. Then similarly
JFrog has written an article which detects
malicious PYPI packages stealing credit cards
information of users and injecting the malicious code.
Then Developer.com also
writes an article and cites that there are many PyPI python
repositories which consists of vulnerabilities.
Then there is a blog article which identifies
potential remote code execution in PYPI
index. Then also supply chain flaws
have been found in the Python packages.
Now let us see some tools which will
help us to identify the vulnerabilities which have been
shown in our previous slides, but they are just the examples.
There are many other vulnerabilities present.
Now what's the bandit? Bandit is a tool designed to
find common security issues in the Python code.
Now what it does is it processes each
file, builds the abstraction text tree from it, and runs
appropriate plugins against abstraction text tree
nodes. Once bandit has finished scanning all the
files, it generates a report. You can
install it using pip three command, so you can say
pip three install bandit and to run it against
any code repository, you can use bandit minutes r switch and
specify the code path. It can be a local path, it can
be a remote path. Now how
to use it? You can run bandit against your
project code just by specifying the code path and the minus r
switch. So as I have said, it can be your folder
project folder on your local machine, or it can be a remote folder
such as GitHub repository.
Then you can also run bandit with a specific profile.
Say for example we want to check bandit whether
there is a shell injection. So minus p
switch can be used to specify a particular profile. So in
this example we are checking all the files under
examples folder and we are checking
whether the shell injection vulnerability is present.
Then you can also run bandit with a standard input.
So in that case you can just supply
the file to bandit command.
Now bandit also allows specifying the path of
a baseline report to compare against using the baseline argument.
So that can be done by specifying minus b switch and
the baseline. So this is very
useful for ignoring the known vulnerabilities that you believe
are non issues, especially whenever
you are performing testing.
So one such example is specifying a clear text
password in a unit test, and sometimes
also you can ignore some known warnings. Then to generate
a baseline report, simply run bandit with the output
format set to JSON. So that is JavaScript
object notation. And note here that only JSON
formatted files are accepted as a baseline and output file
to a path specified. Then you
can also write tests. So these
tests are custom tests and
this also allows you to extend the functionality of bandit and
you can also write custom tests. Now to write a test,
the first step is to identify a vulnerability to build
a test for and create a new file in examples folder
that contains one or more cases for that vulnerability.
Then consider the vulnerability you are testing for and mark the
function with one or more of the appropriate decorators.
So you can use decorators such as add checks, call add
checks, import and import from and add checks string.
Then create a new Python source file to contain your
test. Then the function that
you create should take parameter context. So this context is
nothing but the instance of a context class you can query for
information about the current element being examined.
You can also get the raw abstraction textory node for more
advanced use cases. So this information you can specify
in context py file extend your bandit
configuration file as needed to support your new test.
Then execute bandit against the test file you defined in
the examples folder and ensure that it detects the vulnerability.
Then consider variations on how this vulnerability might
present itself and extend the example file and
test the function accordingly.
Now bandit comes with several plugins and these plugins are
grouped into several groupings and each grouping
is given can id.
So there are seven series starting from B one
series to B seven xx series.
So in B one series, miscellaneous tests are included.
In B two series application or framework misconfiguration tests are
included. Then in B three series blacklist
calls are included. In B four series blacklist imports
are included. Then in B five xx series,
cryptography related tests are included and in B six series injection
test plugins are included. And finally in B seven xx that
is, cross site scripting tests are included.
So on this slide you can see some of
the plugins. Banded test plugins have been specified.
Say for example B 10 one is the assert used,
B 10 two is execute used.
Now if you see here, B 10 three deals
with whether it checks whether
bad file permissions are set. Then v 10 five
checks with whether hard coded
bind all interfaces. Then there are some
tests such as 105106 and 10
seven. They deal with the passwords, then 10
deal 10 eight checks whether it's a temporary directory,
and v one 10 checks for
try accept pass. That is
it check whether proper exceptional
cases have been taken care.
Then there is something called as
flask debug true which is identified with b 20
one. So it checks whether your
flask application debug parameter is set to true.
Note here that in production environment it has to be set to false.
If you set it to true, then hackers can perform some malicious
operations and it may generate some errors and
you know that usually errors also disclose some information about the
system and this information can be used to
carry out further attacks. Then b 50 one
checks whether there are certifications with no validation.
Then b 502503 and 50 four
deal with the SSL, that is secure sockets layer.
They check whether it has a bad version or bad defaults
or no version is specified. Then similarly
there is b 50 five which checks whether the cryptographic key
is weak. Then 50 seven
checks ssh no host key verification.
Then similarly there are some things
such as subprocess without shell equals true.
Any other function with shell equals true,
right? Then start process with a shell start process with no shell,
start process with partial path, and so on.
Then it also check.
Then it also checks.
Yeah, so are there any hardcore SQL expressions?
Then Linux commands white card injection. Then b six 10
and six one one they check with Django's
parameters whether the Django's extra field is used whether it
is raw SQl is used now note here that raw
SQL generally enables you to perform SQL injection attacks,
but that can be taken care in Django
by query parameterization which happens
by default. But sometimes
you may also need to require to
write custom query.
So in that case you may have to use raw SQl.
Okay then similarly the last one that
is b 70 three that is Django mark safe.
So these are the test plugins which you can use depending on
which kind of application you have written, whether it is a plain python
application, or it's a flask application, or it's a Django
application. The next is safety
command. So safety checks your installed dependencies
for known security vulnerabilities. So by default it
uses open Python vulnerability database safety Db,
but can be upgraded to use PyAp IO safety
API using the key option.
It supports Python 3.5 and above. So you can install
safety command or safety module by
Pip install safety command for testing purpose
you can also install insecure package.
So it's really insecure, just used for demonstrating
the working of safety command. So you can install
it by saying Pip install insecure package.
Now to check your currently selected virtual environment for
dependencies with known security vulnerabilities,
you can simply run safety and check command.
Now you can also check the packages
which have been specified in requirements txt file whether there are any
dependency vulnerabilities. So that you can
do by specifying minus r switch
and the requirements txt file.
So the command will be safety check minus r requirements txt.
Then you can also read the input from the standard input.
So for example there's a requirements
txt file which contains the list of python modules so
that we are displaying using a cat command, but we are not only displaying
it, we are just passing the output of cat command to the safety
command on the standard input.
Then you can also use pip freeze command. So what
this pip freeze command does is it check the present
virtual environment or all the modules which have been installed
by Pip command in the present virtual environment,
and that output can be passed
as an input to safety check command on the standard input.
Then you can also check for specific packages. For example
here we are checking whether the insecure package version 0.1
is safe or has it contained
some security vulnerabilities.
Now this is how you can run so safety
space then check so it shows the
packages on my system. It has checked for 221
packages and it has shown which
are all the packages which have been installed
and which are the affected versions say for example if you consider white
noise package, then it says that 4.1.2
is the installed version and the affected versions are greater than 4.1.3.
So to solve this issue you can upgrade the package so
you can install 4.1.3 or higher versions and
this issue will be solved.
Now I have run this command once again after installing insecure
package and now you can see here that it shows that it has checked
222 packages on my system. And now one more
package has been identified with the vulnerability that
is insecure package. Now the version installed is 0.1.0
and the affected version it says that version is less than
0.2.0. Okay, so the solution
is to upgrade the package. The next
is safety db. So safety Db is a database
of known security vulnerabilities in Python packages.
The data is made available by Piup IO and
synced with this repository once per month.
So most of the entries are found by filtering CVE,
that is, database of common vulnerabilities and
considerations, and change logs for certain keywords
and then manually reviewing them. The list is not a denial
list or package to be avoided. Okay, so this is a very very important
statement. Just because it is appearing in Pyap IO's
insecure package list, it doesn't mean that you should avoid
it. Okay,
so you have to visit the list
and see which version is affected and use
the upgraded version or see what is the vulnerability
and then decide whether to use it or not to use it
because you will also find some of the most popular packets
also in the list. Okay, so you can install
safety db by using
Pip command. Now safety db usage.
So as I have said, you can visit this URL that is Pypi
GitHub IO safety URL
to see the list of insecure Python packages.
Now to use in the program there are two JSON
files. One is insecure JSON file which contains the package name
and just the insecure releases as a plain text.
It doesn't give you any additional information such
as description. But if you want description then you can go for insecure
underscore full JSON file which
consists of cv description and URLs and the
relevant part of the change log. So you
can install safety db. We have already seen it.
Now in order to use it in a program, you can
import by saying from
safety and score Db import insecure and insecure full. Depending on your requirement,
you can use the appropriate file.
SafetyDB also has some tools,
so first one is the safety CI which is a deep GitHub integration
that is available on Pyap IO. It checks your commits
and pull requests. Then safety is a command line tool that
checks virtual environments and requirement
files either locally or on a CI server. We have
already seen some examples. Then you can
also check for Django environment.
So it's a package for Django that wants you in admin area
if your installed Django release is insecure.
And similarly there is safety bar application which is
macOS menu bar application which is in just
alpha version and it gives very minimal information.
Then there is something called pre commit hook by Lucas Simmons.
It checks your python
dependencies against safety DB.
Then there is pip and check relies
on safety and safety DB to check for known vulnerabilities in
locked components. Now let us see the
final tool. That's a Samgrep. It's an open source static
analyzer. It works on 17 plus languages such as Python,
Go, Java, Ruby, typescript and so on.
And it also works with legacy languages. It is not
controlled by any vendors. Thousand plus
rules have been written by the community members
and it enables you to write your own rules and
rules look similar to code results available
in the terminal editor or CI CD. It addresses OS top
ten issues such as SQL injection,
broken authentication and so on, insecure serialization,
and so on. Then it eradicates classes of
bugs by enforcing code guardrails at every stage of the development
workflow. Then it also helps you to hunt vulnerabilities
by iteratively exploring a code base with lightweight queries
and a repl workflow. So I have mentioned the URL
of a playground that is Samrep dev
editor. You can visit this and select the language as Python and start exploring
it. There are some default examples
or built in examples are available. Start exploring it and if
you just go to YouTube and search for Samgrep tutorial,
you'll find a lot of very good tutorials
available and they help you to write
or they help you to explore more on Samgrep.
Finally, let's see some general guidelines.
If you are a package maintainer, then ensure that the package
you are maintaining is secure and practice secure coding.
As an application developer, follow secure coding principles while
writing code. Then use tools
to check vulnerabilities before using them in your projects.
Periodically scan your environment,
that is, even after upgrading it, because the new packages
may come up with new vulnerabilities.
It's possible. Or you might have installed a new package
which might have some vulnerabilities.
Then sign and verify packages using PGP keys.
Then use twine for improved security and testability,
then you can also scan packages before upgrading,
so scanning packages after upgrading is also
recommended. Then ensure that you install code from trusted
source, such as the official repository or the correct repository.
Don't install packages from untrusted source,
okay? Such as nulled ones.
They may contain vulnerabilities or the
code has been modified by the attackers.
Thank you everyone for listening to my talk.