Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Henrik Plate and I'm the security researcher
working with Endolabs to secure the way developers
and development organizations consume open source in. The way we want
to do this is by considering and analyzing the program code
of those open source components, all the single functions and methods getting
executed, and not by looking at metadata
that people attach to coarse grain projects or packages,
and which often leads to drowning developers and false positives
at endolabs. In a nutshell,
we use such program analysis techniques to support
open source risk and vulnerability management and
to increase developer productivity. But I'm not going
to talk about our own solution. You will need to look this up yourself.
But about the general problem, the promises of program analysis,
and one open source solution called Eclipse study that I've
been developers with fellow researchers over the course of several years.
This is how the agenda looks like. I will start with a quick motivation for
program analysis. I will explain how eclipse
study works, I will demo this solution,
and then I will finish with a couple of takeaways and limitations.
So we all know the software industry depends on
open source. Open source is just everywhere.
80% of typical application code bases come from opensource
projects. In other words,
20%, sometimes less, are specifically developed
for a given application. For example, the business logic
or some glue code. We speak of dozens to
hundreds of dependencies, only a few of them being handpicked
consciously by developers, but the majority
of them being transitive dependencies, automatically pulled into a
development project. And this is
not even considering all the open source code that we download and
execute in our CI CD pipelines on developer workstations
and all the lower parts of the software stack.
Another way to look at this is to not count components
or dependencies, but count all the individuals
who dedicate their time to develop
those solutions. This is a screenshot I took
two or three years back, showing that there are ten
direct contributors to eclipse steady developing its
functionality, compared to more than 23,000
indirect contributors, people that contribute
code to any of the open source components that eclipse
daddy is built upon. And this is really amazing.
It shows the beauty of this whole open source ecosystem and model.
But this heavy dependencies on open source
also comes with security risks, one being the
use of components that have known vulnerabilities
and which can be exploited in production environments by attackers,
for example to create denial of service attacks
or to exfiltrate information from those
systems and applications. The first time I came across this,
which made the problem really apparent back then, was a paper called the
unfortunate reality of insecure libraries back in 2012,
and the authors found that in
average, a Java application comprises 30 open source libraries,
typically at least one with a known vulnerability,
and those 30 open source libraries represent up to 80%
of the code base. So this is the first time, the first
occurrence of this 80%, that you read every
now and then on different blog posts and vendor reports.
They came up with a second version of that report in 2014,
where they found that the download
number of Java components from maven Central
doubled from 6 billion to 13 billion between 2012 and
2014, and those numbers, of course,
only ever increased since then.
In 2013, the open web application Security
project included the use of components with known
vulnerabilities in their top ten security risks for web
applications, and in 2021
that has been promoted from the 9th to the 6th place and
slightly renamed to also include the problem
of using outdated components.
The prevalence has been and is still widespread.
Everybody's using open source, but the detectability got
better over time with open
source solutions and commercial vendors becoming active
in this area. A study
from 2017 found that twelve out
of the top 50 data breaches in 2016 were
due to the use of components with known vulnerabilities,
including the Panama Papers breach.
And all that was before Equifax happened and before lock
for logshell.
So by now, developers really entered this
treadmill of activities they need to carry out
over and over again. They need to detect
whether any of their components is subject
to a known vulnerability. They typically use certain
tools which come with false positives and false
negatives, so they need to figure out
and do the triaging and for all the false positives.
So there is indeed a known vulnerabilities
in one of their components. They need to understand whether this vulnerabilities piece of code
is really relevant in their specific application context.
Can it be executed? Can it be exploited by
attackers to take advantage of the vulnerabilities?
If yes, if the conclusion of that assessment
is positive, the developers has to mitigate
the problem, which could be very easy,
maybe just updating a minor version or a
patch version, or it can be more complex in case
the application is on a relatively old release and the migration
path is not straightforward, or it's an
unmaintained vulnerability, unmaintained project
and there is no patch available at all, and so forth
and so on. Once the patch is produced,
that application patch has to be given and delivered to
the customers, which again can be very easy. In case of cloud applications,
you just deploy once and the fix becomes available and rolled
out to just everybody. Or more complex in
case of on premise applications where an
application patch has to be given to all the
customers who need to test it before releasing
and deploying this in their respective production environment, which could
be a lot of effort on the shoulders of
these customers.
This slide explains why vulnerability assessment
or impact assessment is so difficult. You see on the right hand side
the dependency graph of the graph maven plugin.
The plugin itself is the gray box on the very top. It has eight
direct dependencies and 34
transitive dependencies. The depth of the dependency tree
is seven, and so you see really
the majority of the dependencies of this plugin is
transitive, and so it is not necessarily known or
understood by the application developer why a
certain dependency is existing in this dependency graph.
Why should he? It's kind of considered internals
of the direct dependency that
he chose at some point in time.
Still, interestingly, the majority
of the vulnerabilities are found in the transitive dependencies.
And moreover, interestingly,
not all of the code of those dependencies is
used. Sometimes only certain functions and features of
a dependency is used and sometimes,
even though it is pulled into the project, no single
line of code is ever executed or could be executed in
the context of an application. It's just load ending up in
the dependency graph of an application.
So maybe you wonder why bothering with such complicated
assessment? Why not just simply update such
vulnerability dependencies to a fixed release? Well,
this depends on the lifecycle phase your
application is in and the deployment model.
The earlier you are in your application lifecycle, the easier
it is to just do this other update and
resolve potential problems as part of the development
effort. Anyhow, it is maybe not released and
shipped and used anywhere and
it is much more difficult and more costly and more
important to understand in later lifecycle
phases. So again, suppose you have your application already shipped to
customers. Just updating would
mean you create a new application patch that has to be provided
to, tested by and installed by all those customers.
Again, it's quite some effort and you will only want them to do this for
vulnerabilities that really matter. So you wouldn't want
to always update by default, causing all this effort
for those guys. And of course, updates can also bring
in breaking changes and semantic changes which
could fail the compilation of your application or change
the logic.
This slide is another example why
it is difficult for developers to use security,
the little information part of security advisories,
to perform such impact assessments. So here you see a vulnerability
in a component called Eclipse Majara published
in 2018, and what you see on the screen
is really the whole thing, the whole content of
that CVE, a short description, two sentences,
a severity rating following CBS references
as well as an enumeration of affected products.
And here CPEs rely
on CPEs which are so called common
platform enumeration identifiers.
They denote vendors
and products and versions affected by the respective
vulnerability, in this case eclipse Mojara,
up to excluding version two, three, five.
So why is it difficult for developers to use this information
for context specific assessment? First,
the description is very short and concise.
So if eclipse Mojara is a transitive dependency, how could an application
developer ever know whether the method get local prefix
in the class resource manager matters in his application context,
that is very detailed information of no
immediate use I would say for the application
developer. And also those references
in this case to the fixed commit and to the Jira issue are
not always much of a help and provide rather
unstructured or little additional information supporting
developers. Secondly, those CPE identifiers
I've been mentioning, they do not really correspond to the Maven artifact
identifiers. So here we speak, we have two different languages,
the CPE identifiers talking about eclipse
Mojara. But if you search for Eclipse
Mojara on Maven central, you will find 36 search
hits, but none of them is
the actual artifact that contains the vulnerable
code, which is or glassfish jarbox faces.
So this demonstrates the problem of mapping names
given by people to each other in the hope
to find the right match. Another problem
is not all ecosystems
are well covered by NVD. Java is relatively
well covered. NPM, for example, is not so well covered.
And yet another problem is that those
descriptions are error prone.
So when we looked at this vulnerability back in 2018,
we looked at the source code, the fixed commit
in eclipse Mojara's git repository,
and the bytecode, and we found that the
versions two three five and two three six still contain the vulnerable
code. So we reported this to NVD and they were then updating
the description.
So there are on high level two different approaches
to support developers in detecting dependencies
with a known vulnerabilities. The first is based
on metadata, so data about code
primarily based on package names,
package versions, CPE identifiers or digests,
which are mapped against each other to understand whether there
is a vulnerabilities dependency, a vulnerable component in a given dependency
tree of an application. A good example for this is ovas
dependency check, a nice lightweight
solution they've been developing for a couple of years now.
It can be easily run from the command line without any really
installation efforts, it will download the entire NVD
and try to map cves
against the dependencies of a given project.
The other approach is based on code
avoiding as far as possible the use of any
major data. What you
see on the right hand side is a method that was introduced by
developers to fix a given vulnerability, CVE 2022
something.
When looking at code, the presence or absence of
this method would already give you some clues and
indicators on whether you use a vulnerable version
or a fixed version of the respective component without
needing to look into package names and package
versions and to do this mapping. One example
for a code centric approach for
a tool implementing this approach is Eclipse steady and is
relatively heavyweight compared to oversp dependency
check on the one side because you need to run a docker compose application on
a workstation or better in some internal
cloud, but secondly because it requires
fixed commits, fixed commits again being the
changes done by the opensource developers to overcome a problem.
And those fixed commits are not systematically collected by
public vulnerabilities databases, and so
coming up with them is a little bit of a manual
effort which makes it more involved
to use open source code. This opensource code
centric solution, but once you have it, you can do
additional analysis beyond the vulnerabilities of
metadata based approaches which do not know much about
the code. One example being this impact
assessment reachability analysis. What you see here on the right hand
side is the result of a call graph analysis,
a call graph that was built starting from the application methods
here in green and in this
call graph there was also a path from those application
methods to one of the vulnerable methods
of this component, in this case a
class constructor called interpolator string lookup.
And so the presence of this path shows the developer
that this vulnerabilities
code is potentially executable in the context
of his application. And also this
information can be used to maybe implement
custom controls, so he could pick
maybe the most suitable method to implement a
custom sanitization or an authorization check if
ever the update is not possible.
Another possibilities include the support
of developers in choosing the best update path or
migration update path, because you can understand whether
methods that you call
APIs in a dependencies that you call are also present
in the newer releases, or maybe not.
And if not they would result because if not they would result in compilation
exceptions. So I hope this gives you an idea of
the possibilities and opportunities of using code centric
analysis for vulnerability management. Vulnerability assessment
coming back to the strategy mentioned before. So metadata
centric approaches due to the unawareness of
code can only support in the initial detection,
while code centric approaches can also help at other later
activities, reachability, impact assessment
and mitigation. But not only that, they can
also identify and remove bloat,
which is a very interesting feature. Coming back to what I was
mentioning earlier on, only fractions of those
open source dependencies are actually used in a
given application context. Sometimes entire dependencies
are pulled automatically, but not touched, not used at all.
And of course you could remove those dependencies and this
will improve several aspects of your application. The whole application
will become a little bit smaller,
slimmer build times may become
better, and for those that you remove, there will
be no new known vulnerabilities popping up the next time you
run your open source vulnerability scanner. So this is
really, this load removal and bloat identification
is one of the very promising features
supported by program analysis.
So next I mentioned how eclipse steady
approaches the whole problem and how this implementation looks
like. So when performing the assessment, what you really want to know is,
is a given vulnerability exploitable in my specific application?
If yes, fix it now, if no, fix it
later, reduce the priority, look at the things that
matter more. But this is
a very difficult question to answer because
exploitability really depends on so many aspects,
could be configurations, limitations in
place depending on how the application is deployed,
depending on the architecture. And so we
decided to address and answer
easier questions. And so rather than judging
exploitability, we like to answer
with eclipse study two questions. First, is vulnerabilities code
contained? And if vulnerable code is contained,
then is that executable in the context of
an application? And so this is based on the assumption that
if an application executes code for which a security
fix exists, then there is also a significant risk that
the vulnerability can be exploited. It must not be exploitable,
but there is a risk it is exploitable in this specific
application context.
To answer whether vulnerabilities code is contained, we will have a look at,
or we look at fixed commits.
And to answer whether vulnerable code is executable.
Is executable. Eclipse static uses a combination of
static and dynamic analysis,
and if it is found executed, for example as
part of junit tests,
or if it is potentially reachable according to call graph
analysis, then we say the vulnerabilities is
there is high risk, you better do something about it. If not, there is
low risk. So how do we answer those questions?
For the first one, is vulnerable code contained?
We start from an advisory, could be a CVE,
could be any other disclosure on Twitter,
whatever. What is important is to have the patch information, the fixed
commit, fixing the problem which is our starting point,
and getting there. This is why I put a question mark is not
straightforward. It would be nice if public vulnerability databases
collected this information on a systematic basis,
but they do not, unfortunately. But once we have
the fixed commit, we can basically look at every commit and
the changes introduced by them to understand which are the methods
and constructors added, deleted and modified.
And that is what we call a change list. You will see
one such change list later on.
This is a screenshot from the UI and
here you see some elements of a
change list of the vulnerability in Apache comments text.
So here the fixed commit was b nine b
something in the git repository on GitHub.com
and they were modifying an enumeration, deleting a method,
modifying a constructor and so forth.
Here's the interpolator string lookup you were
seeing earlier on in the call path and
this change list is then input to
answering the second question, which is is that vulnerable
code executable or not? What you see here on this screen is
basically one application with one method number four,
part of the set of
application methods. This application has a direct library
dependency with methods five and six, and it also
has a framework dependency with methods
two and three. The library dependency also
brings in two transitive dependencies with methods
seven and eight respectively.
Those methods call each other according to those edges
and arrows.
The red ones are dynamic invocations could be
reflection could be, as in the
case of frameworks like spring boot invocations
happening by classes generated at runtime.
And so those dynamic invocations typically represent
a problem for static source code
analysis tools. So how would we
use eclipse study to understand whether the vulnerable methods
two and eight are relevant are reachable in the context of
this example application?
We do this by running several analysis,
one after the other. We would start with a reachability
analysis, a call graph analysis
starting from the application method number four, and we would
try to see what are the other methods that can be reached starting
from four, which in this example are five and six.
And we stop there because the call graph construction fails to
understand the dynamic information, the dynamic invocation from
six to seven, and also since we used
four as an entry point, as a starting point of the analysis,
we also do not find this dynamic
invocation of the framework method three of
the application method number four. As the next step we
run a dynamic analysis. What we
do is we instrument all those methods of the
application and all of the dependencies, and every time we find
that such a method is executed either as part of a
junit test or as part of really a runtime
integration test, or maybe an application deployed in a
runtime environment. We trace this information,
we write it down, and in this example
the dynamic test found that the framework
method three and the method seven part of the transitive dependencies
dependency was also executed. And in the next step
we can run again a static analysis, but this time we do not take
the application methods as a starting point to construct a call graph,
but we take the traced methods three and seven
as a starting point. And so this basically allowed
us to bridge these
dynamic invocations. To get beyond those dynamic invocations
and starting from three and seven respectively, we find that the
vulnerable methods, two in the framework and eight in
this right, most transitive dependencies are
potentially reachable.
You can read all the details in the research paper here put
on the slide, including an empiric study for
how many vulnerabilities were reachable in the respective
after the respective analysis steps.
And we found that, if I remember well,
something around 10% of vulnerable code
was only reachable due to this combination
of static dynamic and static analysis.
So how does it look like in practice?
For this, I would like to use again this Apache Commons
text vulnerability I've been mentioning earlier on,
and it has been disclosed in October 2022.
Apache Commons text offers different features
related to strings terms, not very surprisingly,
for example computing the Levenstein distance
or the Jakart distance, and so forth.
But it also offers what they call a string substitution
where you can include variables in strings
that will be replaced at runtime. In the
first box, the Java version and the operating system
name will be replaced by the actual values
by the method replace system properties.
The vulnerabilities that was discovered is
basically about certain dangerous
interpolators or substitutors being included by default
and enabled by default, one of them being for JavaScript.
And so in the versions prior to 1100
this would have allowed for remote code execution.
If a user controlled attacker controlled input
contains a JavaScript variable
that is replaced by the library. In this
example shown in the second box,
the malicious script
that would be executed as part of this replacement as
part of this substitution would be Javalang runtime get
runtime exec, and then in order to
operate an operating system, execute an operating system command.
In this case it would just touch and create a certain file.
Thing about Apache comments text and this vulnerabilities is that
users are only affected if they directly
or any of their dependencies make use of this string
substitution. They certainly know for their own code, but how could they
know for all of the direct and especially all the transitive
dependencies? And to figure
this out, we will have a look at eclipse study.
So this is how the application, a very simple sample
application, looks like it has two dependencies,
Apache comments, text in a vulnerable version,
and junit. The only class,
the main class, has a print
method in which such a dangerous
string substitution takes place.
This is really copy and paste from the original advisory
coming from the GitHub security team. And so here,
if the replace method is called on this interpolator,
this runtime exec statement
would be executed and there will be a file
foo being created. So I can quickly demonstrate this.
First I will remove the file that is existing,
and then I can just run the test, because there
is one test case which is invoking this print method. So once
I do this, you will see this foo file popping up again on the left
hand side. There it is. Now, in order to
analyze this application with eclipse steady, we will do the following.
I already included the profile
pointing to or referencing steady's
maven plugin, so I can invoke
the goals of this steady plugin
very easily. And so here, first of all, I would clean
the target folder locally on my disk,
but I would also clean the analysis results that already exist
in steady's back end, because I ran some analysis early on. So let me
just clean all this. Once that is done,
I can basically create a method
level bit of material using the app analysis
goal. What happens if I run this?
Basically, the steady maven plugin looks
into each and every method of the application
and each and every method of all the dependencies to see
whether there is any vulnerable code contained. And this is how it looks on
the front end.
So here we have the demo comments text vulnerabilities.
Here we have all the dependencies, the two direct ones you
have seen earlier on in the pump file,
as well as two transitive ones brought in
by, I think, one by junit, the other one by Commons
text probably.
And we also see that Commons text has been found to
contain vulnerable code. And this decision hasn't been made on
the basis of the name of this archive or the
coordinates of this archive, but it has been taken on
the basis of the presence of vulnerabilities,
methods and constructors. And so here you see this kind
of the same information I've shown you earlier on on the slide. So these
are all methods added, deleted, removed as
part of the fixed commit, which is this one by the
opensource developers and for each
construct, for each method and so forth, we check whether it is contained
in this respective archive.
So the archive could be really named completely different. We would still be able
to find the signature of those methods,
constructors and so forth.
Now we just
found that vulnerable code is present. We do not know yet whether it's
reachable or not. To do this we can
just run another goal and which is called a to
c. So this is the call graphs construction. Starting from the
application a to c, the change list
of modified open source methods.
If I do this, there will be a call graph constructed and it
will be checked whether the vulnerabilities methods are part of this call graph and how
this path is looking like from the application method to the vulnerable
method. So if I
reload this toggle the advanced
analysis, there is now this red paw popping up,
indicating that some of the methods have
been found reachable according to the call
graph analysis. And all those methods found reachable
have this red paw in this detailed table. If I click on it,
there is this call path we have been seeing earlier on. Again,
main is a method from the application and by
calling different other methods of different other
artifacts, or potentially different artifacts, in this case it's
always Commons text one nine, you end up in
one of the vulnerable methods. In one of the methods, the developers had
to change to fix the problem. Right now
that was the static analysis.
But I have explained earlier
on, there are deficiencies, problems that
make static analysis fail.
Reflection, dynamic programming techniques. And so we
could also run additionally a dynamic
test where we instrument all the code.
This is done using yet another maven
goal here. In this case it's a little bit more involved. I basically
have to prepare an agent comparable
to the agents of Jacques or other coverage tools.
I run the junit tests and then I upload all the information that is
saved locally. Also I say
how much information I would like to collect
during the instrumentation. So in this case
I would like to find all touch points which
are API invocations from the application to
the library. And I would also like to have the complete
stack traces showing me the entire path.
If I run this terms
are executed. Foo would have been created yet another
time if I deleted it and then everything is uploaded
and the front end.
Reload this once more. Now the
red paw on the very rightmost column also appears.
You see here that some of those methods
are both statically and dynamically found reachable. Here,
hoovering over it, you see the test,
the execution time. If you
go into the call graph, the call path,
you will see that there is this reddish overlay over the black
edges that existed before, indicating that,
indicating the dynamic execution. So here we start from a test
method, test print, which invokes
the print method you have seen earlier on in visual studio code,
which again takes this path all the way to one
of the vulnerable constructs in Apache Commons text.
Right. This being the different analysis goals,
giving the developer a feeling for whether the vulnerable code matters
in his application context. Now let me quickly go
to other features of eclipse study that I mentioned earlier
on. So next step would be to mitigate and
here we see basically the number of vulnerabilities
per component and the latest
release published, which would be alternative
to fix the problem. If I go on this,
I will see first of all those touch points in the
first table. And sorry for all this wealth of information that is
always very developer oriented.
The first table contains those touch points. So this is where
our application calls into a
library. So here there are three of these
calls into a static initializer,
into the replace method. You have seen this earlier on, and the
create interpolator. Those touch points
have been found both in the call graph and were actually traced
during junit test execution. This is not always the
case. Sometimes you have trues on one in one column
and false in another column, showing how
the different techniques complement each other, and then
earlier, further below you have basically the
available non vulnerabilities releases to which the developer
could upgrade. And we see that for all these different touch points we
check are those that I call also available in
the nonvulnable release, in the fixed
release, because if not I would have a compile issue.
And we also show what we call the body
stability, trying to indicate how
different is the fixed version from the version that
I use. And so in this example, when going
from one nine to 1100,
170 out of the
171 methods
that were part of the call graph have the same
method body. So there was very little change
from one version to the other, and so the likelihood of regression is
relatively low. And in other examples
you would see that if there are multiple fixed
releases available, you would see how this reachable body stability
decreases the greater the distance gets between what
you use and this nonvulnerable fixed version.
And last but not least, I would like to quickly show this
table. This is a first attempt to understand what are
bloated dependencies. I mean, here we have four,
so there's not so much likely no bloat at all.
But the table gives you an idea of how
this can work out. So here you see two
compile time dependencies. Test dependencies were excluded.
They have a total of 3671
and 1149 methods
and constructors, out of which,
according to the junit tests, to the dynamic
tests, ten and 130 respectively
were actually executed and six
and 142 were potentially executed.
They were part of the call graph that we have built and so
removing this would likely cause some troubles. But in other
cases in bigger dependencies, whenever there is no
single method potentially reachable or actually
executed during the tests, it would be worthwhile
to removing this dependencies and seeing whether the application
still works.
This is concluding the demonstration, so let me get back to
the slide check. So to conclude,
I hope you have seen the capabilities
and possibilities of supporting vulnerability management
with code centric program analysis
techniques. They can reduce false positives
and false negatives. So due to compared to
all these metadata centric approaches trying to map names against each
other, but also false negatives because they
can cover this phenomenon of rebundling where classes from
one project are rebundled, copied into
Java archives or other projects which would not be found
on the basis of mapping names against each other.
You can prioritize findings because of
having this application context, understanding code
reachability, and they
can help you with updates, avoiding regressions, and to identify
software bloat. On the negative side,
I have to mention that of course they struggle
with so called configuration vulnerability. So if a fix consisted
only of changing a default configuration, a properties file
or XML file that is coming with a Java archive
code centric analysis cannot do much. And secondly,
reachability and reachable is
not the same as exploitable. There is a certain class of
vulnerabilities, namely deserialization vulnerabilities,
especially those using the
old Java way of serializing
Java objects.
For a vulnerabilities to be exploitable, it is sufficient to have a
vulnerabilities class, a gadget class in the class path.
So in this special case, it doesn't matter at all
whether this vulnerable class is in
a call graph computed on
the normal expected behavior of an application,
just sufficient to be hidden somewhere in the
class path to be exploited by attackers that
send deserialized objects of that class
to an application in order to trigger the vulnerability.
The particular limitations of eclipse steady are as follows.
First, as I've mentioned earlier
on, none of the existing public vulnerability databases
references fixed commit in a systematic fashion.
This is why the tool has
a limited coverage. So right now there are 700 vulnerabilities
in a dedicated opensource project called Project KB where
we listed the fixed commits for those 700
vulnerabilities, so they are the input for
the analysis you have seen earlier on. Secondly, we only support,
or I could say only supports, Java Python
limited Python support is available, but that is rather
beta and does not include all the reachability
analysis. And last but not least, compared to other approaches,
it is relatively heavyweight in the sense that you need to have a
Docker compose application running, ideally in a private cloud.
Right. This concludes my talk, I hope. One it
was interesting and I convinced you of the
potential of using program analysis techniques.
Thank you so much you.
Thank you.