Conf42 Python 2022 - Online

Security considerations in Python Packaging

Video size:

Abstract

Popular programming language index websites (TIOBE index) and developer surveys (Stack Overflow) place Python as one of the fastest-growing programming languages. However, this popularity also puts in the target range of attackers. The attackers perform malicious dependency attacks and use misconfiguration tools to reveal confidential information. Jukka Ruohonen, Kalle Hjerppe, and Kalle Rindell in their research paper “A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI” claimed that they scanned PyPI for security issues in Python packages and found the presence of at least one security issue in about 46% of the Python packages.

In addition, security vulnerabilities can be present in the source code of the package. In this talk, we will address the security issues related to python packaging and possible solutions to make python packages secure. The talk begins with the importance of a secure package and vulnerabilities in the Python package index. Then, I will discuss Python packages such as Bandit for identifying common security issues in Python code and “safety” for dependency check. Next, I will discuss verifying and signing Python packages using GPG. Finally, I will discuss general guidelines for secure coding practices in Python.

Outline 1. Importance of a secure package and vulnerabilities in python package index. (05 Minutes) 2. Bandit for identifying common security issues in Python code (4 Minutes) 3. Safety for dependency check (4 Minutes) 4. Verifying and signing PyPI and conda packages using GPG and Twine (4 Minutes) 5. General guidelines for secure coding practices in Python (5 Minutes) 6. Summary and Questions (3 Minutes)

Summary

  • Gajendra Deshpande is assistant professor at Institute of Technology. Also run a startup called Eyesec Cyber Security Solutions Private Limited. Today I will be presenting a talk security security considerations packaging Python packaging. I will conclude talk with general guidelines for secure coding practices in Python.
  • Python has reached the number one position in tob index. Also ranked third in terms of number of active repositories on GitHub. This popularity itself has created a problem for Python because many of the hackers are targeting Python.
  • There's a misconception about security of open source software. Problem is when people start using unsecure third party packages. Bandit is a tool designed to find common security issues in the Python code. An insecure package will make your application vulnerable and prone to external threats.
  • Bandit comes with several plugins which you can use depending on which kind of application you have written. The next is safety checks your installed dependencies for known security vulnerabilities. You can also run the safety command for testing purpose.
  • SafetyDB is a database of known security vulnerabilities in Python packages. Samgrep is an open source static analyzer. Follow secure coding principles while writing code. Use tools to check vulnerabilities before using them in your projects.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, my name is Gajendra Deshpande. I am working as assistant professor at Institute of Technology. I also run a startup called Eyesec Cyber Security Solutions Private Limited. Today I will be presenting a talk security security considerations packaging Python packaging let us see the outline. So in today's talk I'll be discussing about importance of secure package and vulnerabilities in Python package index. Then we will see some tools such as bandit safety and Semgrep. So Bandit is used for identifying common security issues in Python. Code safety is used for checking the dependency vulnerabilities and Samgrep is for static analyzer. Then finally I will conclude talk with general guidelines for secure coding practices in Python. Now on this slide you can see that the tob index for Python. So recently you might have seen that Python has reached the number one position in tob index. So Tob is nothing, but it's a website which ranks programming languages as per their popularity. Now let us also see some other surveys, say for example in static workflow survey, Python has ranked third after JavaScript and HTML or CSS. If we see here Javascript and HTML or CSS, they are not really the programming languages or the things which Python can do. Then next is on GitHub stats. That's on jitter IO here. Also you can see that Python is ranked third in terms of number of active repositories. Now if you see all these factors, no doubt it shows that Python is one of the most popular programming languages or scripting languages in the world today. So this popularity itself has created a problem for Python because many of the hackers are targeting Python. Now let's see the security issues and some misconceptions related to open source software. So there's a misconception about security of open source software. The major reason people cite is that the code is open source. Because code is open source, everything is open, the folder structure is known. So they say that this open source structure or this information openly available makes it vulnerable. But generally, open source software are secure by design. Okay, so by default or by design, Python is secure by design. Say WordPress is secure by design. Any content management software, open source software are secure. The problem is when people start using unsecure third party packages and security issues are mostly due to the understanding or lack of understanding of secure coding principles. So that is because these open source softwares mostly allow third party or any person to create their own packages and integrate into the existing software. Now the problem lies with the developer because he or she may not be aware of the secure coding principles. So that's why we say that the Python or any open source software is secure, but the vulnerabilities may be present in packages. So most of the times this is the case. But again, I'm not saying that the vulnerabilities will not be there in the original software, they will be there. But there is a very huge community which is constantly monitoring the different aspects, including the security aspects, and they are continuously fixing the security issues. Apart from that, there will be bug bounty programs which will help you to identify the issues and fix them. Now, importance of a secure package so insecure package will make your application vulnerable and prone to external threats. We don't know what kind of vulnerability is present in the package. Sometimes the package may be insecure because the developer has not fixed those issues, or many times it's a hacker or a cracker who might have issued, who might have introduced these vulnerabilities purposefully to extract the information. So compromised and unauthorized disclosure of information may result into personal and company reputation and money. So unsecured code may damage the systems of users, and also sometimes it may lead to physical damage. So because of these reasons, you need to scan your package, scan your environment, and ensure that only secure packages are installed. Now let us see some news articles which have been published and which have highlighted the security issues in the PYPI index. So first one you can see other portsfigure has published an article which highlights the dependency confusion or attack mounted via PYPI repository and that exposes flawed packages installer behavior. Then similarly JFrog has written an article which detects malicious PYPI packages stealing credit cards information of users and injecting the malicious code. Then Developer.com also writes an article and cites that there are many PyPI python repositories which consists of vulnerabilities. Then there is a blog article which identifies potential remote code execution in PYPI index. Then also supply chain flaws have been found in the Python packages. Now let us see some tools which will help us to identify the vulnerabilities which have been shown in our previous slides, but they are just the examples. There are many other vulnerabilities present. Now what's the bandit? Bandit is a tool designed to find common security issues in the Python code. Now what it does is it processes each file, builds the abstraction text tree from it, and runs appropriate plugins against abstraction text tree nodes. Once bandit has finished scanning all the files, it generates a report. You can install it using pip three command, so you can say pip three install bandit and to run it against any code repository, you can use bandit minutes r switch and specify the code path. It can be a local path, it can be a remote path. Now how to use it? You can run bandit against your project code just by specifying the code path and the minus r switch. So as I have said, it can be your folder project folder on your local machine, or it can be a remote folder such as GitHub repository. Then you can also run bandit with a specific profile. Say for example we want to check bandit whether there is a shell injection. So minus p switch can be used to specify a particular profile. So in this example we are checking all the files under examples folder and we are checking whether the shell injection vulnerability is present. Then you can also run bandit with a standard input. So in that case you can just supply the file to bandit command. Now bandit also allows specifying the path of a baseline report to compare against using the baseline argument. So that can be done by specifying minus b switch and the baseline. So this is very useful for ignoring the known vulnerabilities that you believe are non issues, especially whenever you are performing testing. So one such example is specifying a clear text password in a unit test, and sometimes also you can ignore some known warnings. Then to generate a baseline report, simply run bandit with the output format set to JSON. So that is JavaScript object notation. And note here that only JSON formatted files are accepted as a baseline and output file to a path specified. Then you can also write tests. So these tests are custom tests and this also allows you to extend the functionality of bandit and you can also write custom tests. Now to write a test, the first step is to identify a vulnerability to build a test for and create a new file in examples folder that contains one or more cases for that vulnerability. Then consider the vulnerability you are testing for and mark the function with one or more of the appropriate decorators. So you can use decorators such as add checks, call add checks, import and import from and add checks string. Then create a new Python source file to contain your test. Then the function that you create should take parameter context. So this context is nothing but the instance of a context class you can query for information about the current element being examined. You can also get the raw abstraction textory node for more advanced use cases. So this information you can specify in context py file extend your bandit configuration file as needed to support your new test. Then execute bandit against the test file you defined in the examples folder and ensure that it detects the vulnerability. Then consider variations on how this vulnerability might present itself and extend the example file and test the function accordingly. Now bandit comes with several plugins and these plugins are grouped into several groupings and each grouping is given can id. So there are seven series starting from B one series to B seven xx series. So in B one series, miscellaneous tests are included. In B two series application or framework misconfiguration tests are included. Then in B three series blacklist calls are included. In B four series blacklist imports are included. Then in B five xx series, cryptography related tests are included and in B six series injection test plugins are included. And finally in B seven xx that is, cross site scripting tests are included. So on this slide you can see some of the plugins. Banded test plugins have been specified. Say for example B 10 one is the assert used, B 10 two is execute used. Now if you see here, B 10 three deals with whether it checks whether bad file permissions are set. Then v 10 five checks with whether hard coded bind all interfaces. Then there are some tests such as 105106 and 10 seven. They deal with the passwords, then 10 deal 10 eight checks whether it's a temporary directory, and v one 10 checks for try accept pass. That is it check whether proper exceptional cases have been taken care. Then there is something called as flask debug true which is identified with b 20 one. So it checks whether your flask application debug parameter is set to true. Note here that in production environment it has to be set to false. If you set it to true, then hackers can perform some malicious operations and it may generate some errors and you know that usually errors also disclose some information about the system and this information can be used to carry out further attacks. Then b 50 one checks whether there are certifications with no validation. Then b 502503 and 50 four deal with the SSL, that is secure sockets layer. They check whether it has a bad version or bad defaults or no version is specified. Then similarly there is b 50 five which checks whether the cryptographic key is weak. Then 50 seven checks ssh no host key verification. Then similarly there are some things such as subprocess without shell equals true. Any other function with shell equals true, right? Then start process with a shell start process with no shell, start process with partial path, and so on. Then it also check. Then it also checks. Yeah, so are there any hardcore SQL expressions? Then Linux commands white card injection. Then b six 10 and six one one they check with Django's parameters whether the Django's extra field is used whether it is raw SQl is used now note here that raw SQL generally enables you to perform SQL injection attacks, but that can be taken care in Django by query parameterization which happens by default. But sometimes you may also need to require to write custom query. So in that case you may have to use raw SQl. Okay then similarly the last one that is b 70 three that is Django mark safe. So these are the test plugins which you can use depending on which kind of application you have written, whether it is a plain python application, or it's a flask application, or it's a Django application. The next is safety command. So safety checks your installed dependencies for known security vulnerabilities. So by default it uses open Python vulnerability database safety Db, but can be upgraded to use PyAp IO safety API using the key option. It supports Python 3.5 and above. So you can install safety command or safety module by Pip install safety command for testing purpose you can also install insecure package. So it's really insecure, just used for demonstrating the working of safety command. So you can install it by saying Pip install insecure package. Now to check your currently selected virtual environment for dependencies with known security vulnerabilities, you can simply run safety and check command. Now you can also check the packages which have been specified in requirements txt file whether there are any dependency vulnerabilities. So that you can do by specifying minus r switch and the requirements txt file. So the command will be safety check minus r requirements txt. Then you can also read the input from the standard input. So for example there's a requirements txt file which contains the list of python modules so that we are displaying using a cat command, but we are not only displaying it, we are just passing the output of cat command to the safety command on the standard input. Then you can also use pip freeze command. So what this pip freeze command does is it check the present virtual environment or all the modules which have been installed by Pip command in the present virtual environment, and that output can be passed as an input to safety check command on the standard input. Then you can also check for specific packages. For example here we are checking whether the insecure package version 0.1 is safe or has it contained some security vulnerabilities. Now this is how you can run so safety space then check so it shows the packages on my system. It has checked for 221 packages and it has shown which are all the packages which have been installed and which are the affected versions say for example if you consider white noise package, then it says that 4.1.2 is the installed version and the affected versions are greater than 4.1.3. So to solve this issue you can upgrade the package so you can install 4.1.3 or higher versions and this issue will be solved. Now I have run this command once again after installing insecure package and now you can see here that it shows that it has checked 222 packages on my system. And now one more package has been identified with the vulnerability that is insecure package. Now the version installed is 0.1.0 and the affected version it says that version is less than 0.2.0. Okay, so the solution is to upgrade the package. The next is safety db. So safety Db is a database of known security vulnerabilities in Python packages. The data is made available by Piup IO and synced with this repository once per month. So most of the entries are found by filtering CVE, that is, database of common vulnerabilities and considerations, and change logs for certain keywords and then manually reviewing them. The list is not a denial list or package to be avoided. Okay, so this is a very very important statement. Just because it is appearing in Pyap IO's insecure package list, it doesn't mean that you should avoid it. Okay, so you have to visit the list and see which version is affected and use the upgraded version or see what is the vulnerability and then decide whether to use it or not to use it because you will also find some of the most popular packets also in the list. Okay, so you can install safety db by using Pip command. Now safety db usage. So as I have said, you can visit this URL that is Pypi GitHub IO safety URL to see the list of insecure Python packages. Now to use in the program there are two JSON files. One is insecure JSON file which contains the package name and just the insecure releases as a plain text. It doesn't give you any additional information such as description. But if you want description then you can go for insecure underscore full JSON file which consists of cv description and URLs and the relevant part of the change log. So you can install safety db. We have already seen it. Now in order to use it in a program, you can import by saying from safety and score Db import insecure and insecure full. Depending on your requirement, you can use the appropriate file. SafetyDB also has some tools, so first one is the safety CI which is a deep GitHub integration that is available on Pyap IO. It checks your commits and pull requests. Then safety is a command line tool that checks virtual environments and requirement files either locally or on a CI server. We have already seen some examples. Then you can also check for Django environment. So it's a package for Django that wants you in admin area if your installed Django release is insecure. And similarly there is safety bar application which is macOS menu bar application which is in just alpha version and it gives very minimal information. Then there is something called pre commit hook by Lucas Simmons. It checks your python dependencies against safety DB. Then there is pip and check relies on safety and safety DB to check for known vulnerabilities in locked components. Now let us see the final tool. That's a Samgrep. It's an open source static analyzer. It works on 17 plus languages such as Python, Go, Java, Ruby, typescript and so on. And it also works with legacy languages. It is not controlled by any vendors. Thousand plus rules have been written by the community members and it enables you to write your own rules and rules look similar to code results available in the terminal editor or CI CD. It addresses OS top ten issues such as SQL injection, broken authentication and so on, insecure serialization, and so on. Then it eradicates classes of bugs by enforcing code guardrails at every stage of the development workflow. Then it also helps you to hunt vulnerabilities by iteratively exploring a code base with lightweight queries and a repl workflow. So I have mentioned the URL of a playground that is Samrep dev editor. You can visit this and select the language as Python and start exploring it. There are some default examples or built in examples are available. Start exploring it and if you just go to YouTube and search for Samgrep tutorial, you'll find a lot of very good tutorials available and they help you to write or they help you to explore more on Samgrep. Finally, let's see some general guidelines. If you are a package maintainer, then ensure that the package you are maintaining is secure and practice secure coding. As an application developer, follow secure coding principles while writing code. Then use tools to check vulnerabilities before using them in your projects. Periodically scan your environment, that is, even after upgrading it, because the new packages may come up with new vulnerabilities. It's possible. Or you might have installed a new package which might have some vulnerabilities. Then sign and verify packages using PGP keys. Then use twine for improved security and testability, then you can also scan packages before upgrading, so scanning packages after upgrading is also recommended. Then ensure that you install code from trusted source, such as the official repository or the correct repository. Don't install packages from untrusted source, okay? Such as nulled ones. They may contain vulnerabilities or the code has been modified by the attackers. Thank you everyone for listening to my talk.
...

Gajendra Deshpande

Founder & Managing Director @ Eyesec Cyber Security Solutions

Gajendra Deshpande's LinkedIn account Gajendra Deshpande's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)