Conf42 DevSecOps 2022 - Online

SOS: Sustainable Open Source

Video size:

Abstract

FOSS is eating the world, but is at the same time a victim of its own success. Enterprises rely on code maintained by a single individual in Nebraska, or a single vendor that isn’t being a good open source citizen. We need to support maintainers, and avoid regressions from creeping into our systems.

Summary

  • In 2022 Austra open source security and risk report, 97% of commercial code bases contained open source software. Sometimes vulnerabilities are in our code bases for many weeks, months or even years before anyone notices. How can we contribute to the viability and sustainability of open source?
  • One of the issues is relicensing projects in order to avoid free writing. Another issue is the projects that are maintained by the proverbial single individual in Nebraska. Lack of resources prevent maintainers from spending the time that a project warrant. I would love to see open source be a more inclusive and equitable space.
  • The Commons Clause aims to restrict commercial free writing on open source code. Cloud service providers who don't give back to the FOSS community. There are also ethical licenses like the hippocratic license. Are these licenses really required for economic sustainability of a project?
  • A survey of 400 open source maintainers found that 46% of maintainers are not paid at all. Only 26% receive as much as $1,000 per year for maintenance work. We need to give individuals incentives for staying in open source and maintaining the software we've come to rely on.
  • The log for j or log for shell flaw scored ten out of ten on the common vulnerability scoring system. Developers are looking at open source for solutions, not problems. 98% of projects have safe versions available. Most vulnerabilities are patched before they're aiven disclosed.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
At Devopsace Eindhoven this year, I suggested an open space about how Devsecops is just a band aid for a bullet wound. After a talk about supply chain security tools, I know a risky move, and now I'm telling you about it at this Devsecup conference. Anyway, I opened with while we should certainly scan our code for vulnerabilities, and ideally have those checks be automated, we should also invest in mitigating some of the root causes for vulnerabilities creeping into our code bases through open source use in the first place, shifting devsecops further left, if you will. And I don't mean Sec DevOps. So one of the people in the room mentioned that just having a mirror of all of components in use as a solution, which congratulations, you're now the maintainer of a bunch of mirrors. But also sometimes vulnerabilities are in our code bases for many weeks, months or even years before anyone notices. So new releases bring fixes as well. In this modern world, we rely on a lot of components to make our stuff work and make it continue to work. I know you know this to be true, but I will also bring you some stats the 2022 Austra open source security and risk report produced by the Synopsis Cybersecurity Research center examines the results of over 2400 commercial code bases and the audit came back that 97% of those contained open source software. Four of the 17 industries that were represented in this report, computer hardware and semiconductors, cybersecurity, energy and clean tech and IoT, contained open source in 100% of their audited code bases. The remaining verticals had open source in 93% to 97% or sorry, 99% of their code bases. Large enterprises rely on libraries that are maintained by a single individual that is in over their head. Sometimes projects are handed over to other maintainers who don't always have the best of intentions. Individuals or organizations may restrict the use of their technology or end of life versions of their software, posing real challenges for organizations who rely on that software. So how can we contribute to the viability and sustainability of open source? Hi, my name is Flor, I'm based in the Netherlands. I'm a staff developer advocate at Aiven IO. We manage your favorite data tools open source data tools without exploiting the projects or their maintainers. Previously I worked in developer relation roles at Grafana Lamps and at Microsoft. I'm a Devopsdays code member and I organize the DevOps days Amsterdam and DevOps days Eindhoven city chapters. I am a Microsoft MVP for developer technologies and I organize a bunch of meetups, including but not limited to contributing today. Devrel Solon, Amsterdam and the Amsterdam ruby meetup. So what are some of the issues that we see in open source? One of the issues is relicensing projects relicense in order to avoid free writing, to make sure that bad people can't use our code to do even more bad or to alleviate responsibility. Another issue is the projects that are maintained by the proverbial single individual in Nebraska. That's a shout out to the XKCD comic that you see on the slide. While Curl is successfully maintained by Daniel Stenberg mostly in his lonesome, for every curl there is a log for J. And with every mpm library that you bring in, you bring in a whole host of MPM libraries and their transitive licenses and possible vulnerabilities too. Lack of resources prevent maintainers from spending the time that a project warrant, given how businesses depend on it globally, and maintainers can make rash decisions. They're much like other humans in that way. We've seen maintainers pull their code to avoid it from being used by the likes of ICE, the US Immigration and Customs Enforcement, or more recently, to protest Russia's attack on Ukraine. Are these the only issues that are plaguing open source? No, I don't think so. I would love to see open source be a more inclusive and equitable space, but for the next 30 minutes or so, let's look at some of those licenses. License changes, maintainer drain, and the rise in supply chain attacks in recent years, we've seen an increase in kinda open source licenses. Let's have a look at some of those licenses. The Commons Clause aims to restrict commercial free writing on open source code, especially cloud service providers who don't give back to the FOSS community. Commons clause conflicts with the FSD, which is the right to use software for any purpose, and the OSD, the open source definition, in that the license shall not restrict any party from selling or giving away the software. There is a bunch of ambiguous wording in the Commons clause like value derived entirely or substantially because what is considered substantial. Mongo used the Commons clause for a while, as did Redis labs, which combined it with the Apache license. So it was a dual license, which is anyway tricky business, and both moved to a nonstandard available source or cloud restricted licenses afterwards. So in Mongo's case, MongoDB moved to SSPL in 2018, which is kind of like GPL but with restrictions, and it's not approved by the open source initiatives who are the stewards of the open source definition SSPL forces wide copy left impact on the cloud infrastructure. Its justification again is that notice or this notion that large cloud vendors capture all the value but contribute nothing back to the community. In this case, it was directed at Amazon Web Services in particular. Then there is the Redis source available license for certain redis modules created by Redis, while code redis remains under the BSD three, the TLDR is that Redis source available is a license to do all the usual actions. So use, modify, distribute, copy and sublicense except when your application is distributed or made available as a database product. So that would allow the community to develop their own applications but not distributed or make available for use as in or as a database product. Because, you guessed it, cloud providers elastic 20 then again, you'll find clauses to prevent hosted or managed service providers from using the project. It is copy left like SSPL, but with straightforward prohibitions. So it prevents using elastic as part of a hosted or managed service provider. It prevents third parties obstructing trademarks or branding, and it can embed license keys to prevent circumvention, which is very much not an open source thing. Its impact was Elasticsearch Kibana. It all got removed from hosted surface infrastructures like Azure and AWS. Then there are some others like the timescale TSL, which basically says no timescale as a service, no forking, or the confluent community license with which you can use modify distribute unless that competes with confluence business, which could potentially be a moving target. There's also ethical licenses like the hippocratic license, which prohibits the use of software in the violation of internationally recognized human rights, or the Mi five, which makes an explicit connection between the license and a code of conduct. The ethical source working group says that over the past 20 years, open source community has come to thrive, enjoying wild success and permanently changed the technology landscape. But the world has also changed in the past two decades, and they think it's time for open source to evolve to meet the magnitude and complexity of today's social, political and technological challenges. Open source developers don't seem to have any records, no way to prevent their work from being used by people to harm others. And that's where that working group is determined to make a change. This tweet by former colleague at Microsoft Tierney hits right in the fuels for me. Tierney is a staff developer advocate at Twilio and works on code Electron, OpenJSF and NPM. The currently accepted community understanding of open source as a concept is fundamentally at odds with the open source definition provided by the open source initiatives is what Tierney says. And they go on to say that more specifically, the accepted community understanding of open source usually includes some level of humanity, users, community maintainers, and it's simply missing from the definition. If this hits you in the fields too, and you want to learn more about ethical source in particular, I suggest you check out ethicalsource dev because I won't go into it much, but I think it's wildly interesting. I know what you're thinking. Open source is not really about licenses, it's about community sharing, openness, freedom. Licensing was supposed to be just the instrument, right? A way to formalize the relationship. I think the discussion around the impact of cloud restricted licenses was an important one to have with the open source community. But I think we can all agree that cloud restricted licenses are not a way to save open source because they're taking the code and the project pipe. Maybe that's okay for those projects. Are these licenses really required for economic sustainability of a project? Mongo and elastic argue that yes, they felt used by cloud infrastructure service providers, but new Linux is used commercially also by everyone and they still have great community. Perhaps because of and not despite of MongoDB and elastic themselves were large companies in their own right before the license change. And even taking enforceability out of the picture, being and winning in cases of copyright or Pentagon infringement is actually really hard. Changing to a more restrictive license might cause companies and community members to walk away, which could be what is actually detrimental to a project and the ecosystem. They do prevent free writing, right? Like cloud providers have stopped using these services, but they also push the open source community to create alternatives or to move to open tools. So some argue that these projects, Mongo Elastic were never really open source to begin with, but I don't think I agree with that. I think they brought tremendous value to the community but then confused open source for their business model and couldn't reconcile with others making money over their businesses. Let's look at some more examples. Lightbend changed aka's license from Apache 2.0 to the BSL version one one if you're interested, which is a business source license, and it would start with Aka 2.7, which was delivered last October. And with any such change there is talk of a fork. I've seen people advocating for foss with an aggressive copy left license so that the now proprietary licensed original can't make use of bug fixes to the fork. It remains the question how effective this would be and if hurting our fellow developers is anything but really misdirected and anger aka can't be replaced. There is a lot of projects that build on top of AKA. A disclaimer before we move forward. I work for a company that is very invested and involved in driving Opensearch forward as the open source alternative to elasticsearch. When elastic released the publication informing about the license change, a shockwave went through the community. Several players eventually decided to collaborate and fork Elasticsearch, including AWS, Apache, Kafka development. Kafka is also a project in Ivan's portfolio, or rather the decision of what makes it into that project is primarily in confluence hands. The single vendor issue is rather prevalent in open source databricks has a strong hold on sparks. Google and Beam is a very similar story as well. Grafana, Loki and Tempo relicence from Apache two to AGPL, which is an infectious copy left license. Google warns against using HTPL, saying that the risk heavily outweighed the benefits the cloud native Computing foundation so the CNCF, in response to the license change of third party dependencies to AGPL, encourages to either switch to an alternative component, to freeze the component at the version prior to the license change, or to seek an exception from the governing board. Needless to say, they're not big fans. If you install electron, you have to add 87 packages, and that means 87 license dependencies. Every single package is likely to have their own dependencies as well, and therefore another license that you have to comply with. As you can imagine, license management can be done manually and when done incorrectly, can result in technical depth. There are over 300 open source software licenses, and that list is only growing. However, the good news is that around 20 licenses account for 80% of all the commonly used open source in enterprises. So a deny and allow list of those licenses, together with a scanning tool already provides a very good starting point in managing them. What you can do to help track licenses inside your code is the license auditor tool, which sends notification after spotting a potential problem. There's also a little cheat sheet, a link to a little cheat sheet on this slide where you can find, but more about what kind of different licenses there are. License litigation may end up forcing you to release code under the same license as the package dependency that you've used. Other potential problems include being sued for financial liability by the creator of the component, getting penalties and restrictions on selling your software until the compliance is met, or losing reputation and getting negative press coverage, certainly in more sensitive industries. I want to switch gears a little in 2021, a tight lift survey of 400 open source maintainers found that 46% of maintainers are not paid at all, and only 26% receive as much as $1,000 per year for maintenance work. Over half, 59%, have quit or considered quitting maintaining a project, and almost half of the respondents listed lack of financial compensations as one of their top reasons for disliking being a maintainer. Open source libraries enable you to move faster, but if they're poorly maintained, if they're not healthy, they become a single point of failure. The 2016 example was leftbed. All that leftbed did is pad out the left hand side of strings with zeroer spaces. Still, thousands of projects, including node and Babel, relied on it with leftped removed by NPM by the maintainer of a spythe. These applications and widely used bits of open source infrastructure were unable to obtain the dependency and thus fell over during development and deployment. Leftpat's maintainer felt pushed in the corner by messaging Kick's lawyers over another one of his NPM libraries, also called Kick. The lawyers went to NPM admins, claiming brand infringement. When NPM took kick away from the developer, he was furious and then unpublished all of his NPM managed dependencies. The maintainer later said that the situation made him realize that NPM is someone's private land where corporate is more powerful than the people what happened to fix the Internet? Which was really not a hyperbole? Laurie Foss, who is the CTO and co founder of NPM, took the unprecedented step of restoring the unpublished library. NPM has forcibly resurrected that particular version to make sure that everyone's stuff kept running. Maybe had the Leftpad maintainer had access to representation, maybe by foundation, the Leftpad incident could have been prevented. This maintainer had over 200 libraries to his name. We need to give individuals incentives for staying in open source and maintaining the software we've come to rely on. For better or worse. Seth Fargo after discovering a contract between software automation companies chef and Ice, deleted his code, and in doing so, more or less discontinued chef's services. It's a temporary thing, for sure. The nature of open source means that we can just roll back an unarchived previous version, and legally there is nothing that Seth can do. He licensed his code as open source. So Seth claims that his code lived in a personal repository on GitHub and under a personal namespace on Ruby gems, but they were actually created in a time when Seth was still an employee of chef. But then again, no OSI license or employment agreement requires Seth to continue to maintain code on his personal accounts. They were conflating code ownership over code stewardship is what Seth said, and he added to that that he has some very specific instructions in his will and how to deal with the code that he owns when he dies. So he basically said that if he would have died that day, the same thing would have happened. That kind of makes you think, doesn't it? Another example then, the GitHub project colors JS is simply known as colors on the NPM repository, has scored over 3.3 billion downloads throughout his lifetime, and has over 19,000 projects that depend on it. Similarly, Faker JS exists on NPMs Faker and has been retrieved 272 times a million times from the NPM repository, and has over 2500 dependents. Both projects are developed and maintained by the same author. The immense download rate of these two components can be attributed to the basic but essential functionality that they provide to JavaScript developers. Colors lets you print colorful text messages on the console, whereas Faker helps developers generate fake data for their applications for testing and staging purposes. The hijacked colors version trapped applications in an infinite loop, printing liberty, liberty, liberty. And then, followed by some gibberish, the developer himself introduced that infinite loop in colors, thereby sabotaging its functionality and perched functional code from the Faker package in version six, which I mean really, the version number should have given it away. It's likely that this stunt relates back to November 2020, where the developer explicitly expressed an intention of no longer wanting to support big companies with his free work, and that businesses should pay him a fee in the six figure area. Then, mid March this year, the developer behind the popular NPM package node IPC, released sabotaged versions of the library in protest of the ongoing war in Ukraine. Mid March this year, developer behind the popular NPM package node IPC, released sabotaged versions of the library in protest of the ongoing war in Ukraine. Newer versions of the Node IPC package began deleting all data and overriding files on developers'machines, in addition to creating new text files with piece messages. With over a million weekly downloads. NodeiPC is a prominent package used by major libraries like the Vue JS CLI. The package appears to have been originally created by the developer as a means of peaceful protest, as they mainly edit that message of peace on a desktop of a user installing the package. But then chaos unfolded when select versions of the code APC package library were seen launching a destructive payload to delete all data by overriding files of users. Installing the package for users in Russia and Belarus, only this has been called Protestware and is one of the newest versions of supply chain attacks. Open source is part of our infrastructure, products and tooling, and for this reason we need to care about them like they were our own projects. No company will leave crucial parts of their in house developed tech stack unmaintained, so why are we willing to do so for the ones that are open source? I want you to ask yourself the following questions. What are the departments or roles in your company responsible for identifying and mitigating impact of license changes? What projects in your stack do you think may be at risk of posing a similar challenge as elasticsearch did? Who is looking at the health of the software that you rely on? Who leads research and due diligence of alternatives so that when you will need to change, it won't be a knee jerk response? I'd be remiss if I did not talk about the log for j or log for shell flaw today. The remote execution code execution vulnerability that scored ten out of ten on the cvss, which is the common vulnerability scoring system. The impact of log for J was and is huge. Even if you scanned your code base and you thought that you could relax after confirming that you don't use log for J anywhere, you were not in the safe yet, right? Like you could be depending on a library that in turn uses log for J and still be exposed. Security firm Snick actually found that 60% of Java applications rely on the library indirectly, versus the 40% that rely on it directly. Log for J has been developed by the Apache Software foundation and that certainly signals health, right? And yet this happened. We sometimes talk about open source being inherently secure. The code is out in the open. If something is broken, people will see it and they will fix it. But then how do you explain law for J or heartbeat or the starts vulnerability? The many eyes argument is very shaky. It needs the right people to look in the right places and security is hard. I find that developers are looking at open source for solutions, not problems. Installing an NPM package introduces an implicit trust on 79 3rd party packages and 39 maintainers, which creates a very large attack surface, 150 dependencies, which is kind of typical for a Java project. And those dependencies maybe release a new version ten times a year, which is an average amount per year. That makes 1500 updates for you to consider. A software builder materials or an S bomb is a list of all of the open source and third party components present in a code base. An S bomb also lists the licenses that govern those components, the regressions of the components that are in use their patch status, and that allows security teams to quickly identify and associate security or license risk. The concept of a bill of materials derives from manufacturing, where a bill of materials is an inventory detailing all items that are included in a product. Sounds like a bunch of work. There's a good thing then that there is software compensation analysis tools or SCA tools that can help you do the job, like the one by Thomas Steinbergen OSS review toolkit ORT, which includes software package data exchange SPDX, which is an open standard for software bill of materials. SPDX allows the expression of components, licenses, copyright, security references, and other metadata that is related to your software. It is the perception that open source okay, it is the perception that open source is risky, but actually 98% of projects have safe versions available. Most vulnerabilities, they are patched before they're aiven disclosed. Lock for J was patched in 15 days and the patch was made available by the time that the CVE went public. We're just not really good at managing open source. When asked, 68% of IT leaders are confident that they're not using vulnerable versions. But that same number, the same 68% of applications use a component with a known vulnerability. So what's the deal? Produced in partnership with the Harvard Laboratory for Innovation Science and the open source Security foundation, or the OpenSSF census two is the second investigation into the widespread use of free and open source software and aggregates data from over half a million observations of fauce libraries used in production applications at thousands of companies. It aims to shed light on the most commonly used FOSS packages and applications at the application library level. Such insights will help identify critical open source packages to allow for resource prioritization and address security issues in widely used software. And one of the outcomes that just enterprises what I've just said is that much of the widely used Foz is developed by only a handful of contributors, and the OpenSSF sees this as a problem as well. Related project is the open source project criticality score, maintained by the members of the OpenSSF securing critical projects Working group. The goals are to generate a criticality score for every open source project, to create a list of critical projects that the open source community leans on, and to use this data to proactively improve the security posture of these critical projects. A project criticality score defines the influence and the importance of the project, and it's a number between zero least critical and one most critical. Based on an algorithm by Rob pike. Using the parameters on this slide, the tool derives the criticality score for any open source project. So what can we do? We can contribute. We can contribute with our time, with code, with code reviews, documentation, but only when it's appropriate. Sometimes there is no lack of community contributors, but the maintainer just lacks the time to look at those. Maybe, probably and definitely. Unfortunately, they can't work on the project full time at the place of work. So be an excellent open source citizen. Written communication is very hard, which is why it's extra important to invest in your communication skills. Be patient and be graceful, even when your otherwise excellent pr doesn't make it into the project. An advocate on behalf of Foz inside your VR company even with open source being the standard and so widely used, many people don't know about its abundance or the rules of engagement. Your organization could maybe sponsor a project. Many companies run a foss fund of some sorts, and they regularly award sums of money to projects. Another caveat here is that some projects are not ready nor equipped to deal with large sums of money and don't know how to distribute it between core contributors. And also money coming in the one month and then nothing the next can be as bad as no money coming in at all when your company uses open source software. Spoiler the answer is yes. We discussed this. Here is how you can help it stay secure. Manage your third party licensing exposure just like your security exposure. Be careful with automation and third party library or package updates during CI CD. If you're extending OSS functionality, maybe prefer plugins over downstream modifications. And when you're fetting a new component, here's maybe what to consider. Which license are they using and who is behind the project? What is their governance policy? And sometimes vendor distributions or software as a service solution can really shield. I recognize the irony in saying that after we discussed how cloud providers aren't always seen as the good guy. But sometimes, if a maintained service is considered a well meaning citizen in the open source world, that is the sweet spot where you want to be. Look into tools like the criticality score, the OpenSSF scorecard, maybe sonotypes tooling which is open source as well. So you can help improve those tools as well and navigate participation in open source well by abiding by the principles of authentic participation, which were derived at the sustained Summit 2020 event in Brussels, Belgium. There, Duan O'Brien and other specificated discussion groups loosely focused on corporate accountability in the context of open source, the principles are these it starts early. This came out of the discussions about organizations showing up with mature, fully baked contributions over which the community had no input. It puts the community first. This reflects the consensus that when an organization and the community want different things, the community prevails. It starts with listening. This is a reflection of comments of companies showing up to a project telling that had no context whatsoever in telling them all the things that they did wrong. It has transparent motivations. Without a shared understanding of the motivations, it's impossible to resolve differences. So there should be no hidden motives. It enforces respectful behavior. Participants they agree to adhere to community installed codes of conduct and organizations commit to holding participants on their side accountable for their behavior, and it ends gracefully. That means that there's no sudden withdrawal of resources without notifications and a contingency or an exit plan. There should be clear documentation that will allow the community to pick up projects when a company decides to withdraw their support. Further, as a company, when you're hiring maintainers, make sure that they're empowered to balance internal and external feature needs and please don't change scope and strategy on them with every new fiscal year. Foundations like the Apache Foundation, Linux foundation and the CNCF, maybe the Eclipse foundation, they act as stewards for the open source projects in their care and in their incubation pipeline. Supporting these organizations to do more great work is definitely a way to leave open source a better place than you found it. As a foundation supporting member, you gain a seat, or maybe multiple seats at the table. So make sure that you make that seat count and send the right people there. Discussions around the sustainability of open source are hard, but they're necessary. Free and open source software is ubiquitous. It's omnipresence. Yet we're still struggling to live with open source in a healthy, safe and productive way. We need better support systems to avoid maintainers burnout and to avoid regressions from creeping into our supply supply chain. We need to spend less time firefighting and more time nurturing our open source software supply chain. Thank you.
...

Floor Drees

Staff Developer Advocate @ Aiven

Floor Drees's LinkedIn account Floor Drees's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)