Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome back to my talk about software security.
You my name is Marco Valon and I'm going to
talk to you from the perspective of a Gitops
DevOps cloud container engineer.
Give it a name. But just for the record, I'm not real.
I am very passionate about software dough and hence
that's why I'd like to talk about it.
What want to do today is to discuss
with you the ability to reveal the
safety of software without revealing
application logic. Even if you're developing
open source software, for many people,
analyzing the code and judging it is
way too complex. They need other means to identify
whether or not software is safe.
And in this respect, safe does not mean that
it's performing well. The quality of your product is
something totally different. I hope
to create a bit of awareness about the tools available,
the techniques available, and I'd like to encourage you
to adopt this in your workflow as well.
And in order to do so, I want to make a
sidestep to another industry which has similar challenges.
But before we do so, I'm going to present you a QR code
which you can scan and you can use it to
have a look at the presentation. Later on.
It will be online behind the
link that is shown in the QR code.
In the context of a talk like this for Python
developers, we have to start, of course, with the
code. It all starts with the code, and eventually
the code ends up in production somewhere,
whether it's in an appliance, or on a host, or inside
a container. Could be a mobile phone, it doesn't
really matter. The code is where it all starts.
And if the code is unsafe,
this will propagate all the way down
the line. In order to demonstrate what is happening
down the line, I'm going to use an application which is written by Jerome
Petazo. It's an application called Worker
Py, and it's used to demonstrate some techniques inside Kubernetes
clusters. Its function is not
relevant for today, but the nice thing is,
it has a flaw. Only one,
but it's just enough for demonstration purposes.
We do not care about the application logic. We only
want to make sure that the application does not contain too many cfes.
And the big question is, how can we do this?
We want to make sure to our colleagues that they can safely
use this application.
An industry that is facing similar challenges is the
food industry. The food industry has to prove to you
that food is safe to consume.
Then you could ask yourself the question,
would you consume this? It's an unlabeled
empty jar. It could
be sweet, it could be sour on the inside,
it could be dog food, it could be delicious.
Nobody knows up front it.
Or would you consume this?
Maybe you would, maybe you wouldn't. It depends on allergies.
Or maybe you're a vegetarian and you don't want to consume anything
that contains fish, for instance, or milk.
The food industry has to inform you upfront
about nutritious facts, or they
have to inform you about potential risks if you have an allergy.
And even though you might be very adventurous, some people
would probably open up the unlabeled container and consume
it. There is a risk involved and I'd
like to eliminate the risk as much as possible if it comes down to
software.
So with food, it is nice to know what is inside.
And even if you know what is inside, it doesn't mean you have to
consume it. But in the food industry, they use
food labels to tell you about the contents
and it could look like this. We'll see this picture a bit later.
Again, it tells you all that's
inside, but it doesn't tell you how it tastes. It doesn't tell you
what the recipe is, how it was produced.
It just tells you where it's from. It's from France and
it was sold in Singapore.
Well, we could ask ourselves the question, why do we not
do something similar with whatever we've got? Hardware,
software, SaaS solutions, you name it.
And often it is already done. Many companies
have a CMDP where they
try to manage their assets.
So from a hardware perspective, they often know what they've got,
what the serial numbers are,
what the components are inside, et cetera, et cetera.
If we want to do it for other things,
we might look at other bombs.
Bombs are basically bills of material and
they describe what is inside.
If you want to build a device, you get a bomb,
which is a shopping list, and once you've got all the
components, you can start to assemble it.
Well, if you would follow the URL to GitHub.com,
you'll find some examples from organization called Cyclondx.
They have different examples on file
formats on how you could exchange information
about the bill of materials for hardware, software,
sound solutions, you name it. And during this
talk we'll be focusing on the software bill of materials,
hence the S bomb.
Now why would we use an S bomb?
Basically, we use S bombs to be in control of our software or
to convince others that software is safe to use.
Well, let's look at a few examples and let's see if
we can find other purposes for sbombs,
you might have seen this announcement where
people tell you that there is a flaw in a curl library.
If your application is using the curl
library, you probably
want to be able to easily identify if you
have to fix this or not.
Did you identify whether or not your app was affected and
if so, how did you figure it out? How long did
it take you to figure it out? Was it easy to
figure out? Or were you lucky enough? And could
you simply enter the CVE number into a
database with all bill of materials of the software?
You have to reveal that in
our database there is only one application listed
as vulnerable for this particular CFE.
So once again, here it comes again. A bill
of material tells you what is inside, just like a
food label does.
It only looks different. The most
common formats are JSON and XML,
and if you look at the Sbom snippet in
the presentation, then you see it has a lot of identifiers.
It tells you a bit about the package name, the package type,
the location where it was found. There could be a checksum
involved, et cetera, et cetera.
But just like the food label, it tells you what is
inside and not whether it's harmful or not.
A food label doesn't know about allergies you might have
an Aspom doesn't know much about cfes,
but the trick is to know what you've got and to
compare it to a list of known vulnerabilities that
are presented by different organizations around the world.
And once you know which packages you've got, you can easily match
it against the database of known vulnerabilities. See if there
is a match, and if so, you have to take
appropriate action. That is basically how
every security scanner works internally. They generate asbomb
files and match them against the database.
But there are some advantages to keeping
the asphalt information stored.
One of the reasons why you want to store it separately probably
is that more and more often people present this information on
GitHub or it's a requirement
for purchasing process. For instance, the US government
requires s bomb files prior to purchasing software
nowadays. That allows them to evaluate the quality of
the software without knowing the application logic.
And it will tell them which risks
are involved in installing the software and it
helps them to make a decision to
purchase it or not.
In the example on the screen, you can see a
GitHub repo where somebody is distributing
an s bomb file.
If you want to know the risks involved in installing this
software, you can first download it and then analyze it
instead of the other way around. Downloading it,
installing it and then scanning it is probably not
the best way of doing things.
Once you've got it downloaded, you could upload it in a tool like
this where you've got a GUI that
does a bit of analysis for you. You could also
use command line tools. There are plenty
of choices, and I'm not here to endorse one or the other,
so feel free to do whatever you
think that is best to make this world
a bit safer. But as you can see that
in the example we've got a container images
with 1200 contents, and there
are about 100 vulnerabilities.
Many of them could be fixed. As you can see by the yellow
triangle, the fix is already present. So basically
these components are outdated.
But let's go back to the app I mentioned earlier,
and let's see what the app itself is doing
in regards to security.
And then let's see what happens when we start to containerize
it. Let's bundle it with an image and we might see some
shocking results.
The app by itself is not very sophisticated.
It imports a couple of modules, it has a couple
of loops. No real fancy,
complicated application. We've decided
to distribute this app in a container because that's what
our customer wants, and we have to pick
an image to use as a base.
And the Python latest image is quite popular in this respect.
Almost everything works inside that particular image.
But in order to be on the safe side, we've also tested two
other images. One is the Python Alpine image
based on Alpine, and the 3918 slim
images based on Ubuntu.
Let's see what the differences are.
The build process is always the same. We take a docker
file and the only thing that changes is the front line.
Everything else is similar. And at the end of the build
process we validated that in
all three images, the application is running
fine. So from a consumer perspective or
a user perspective, there is no difference between one or the
other.
The first thing that's interesting to note,
without even going into the cves, is the
difference in size.
The result of a build with the
Alpine based image results
in a container image roughly 110
megabytes in size, which is quite nice.
If you look at the image which is built on
the Python latest image, you'll see that it's more
than ten times as big. It's 1.5 gigs almost.
Well, if you take the difference between the two, then you end
up with more than 1.3 gigs of stuff that
apparently are present in the image, but not required to run the application,
but it might cause all kinds of hassle, as we'll
see later on.
In order to do a bit of an analysis, we've taken a
tool called sift. Sift is used to
create the S bomb file with
all the information about the packages present and required,
and we analyze them with grip.
We don't do this because it's the best tool, but it has the nice output
for this presentation. All the numbers
shown are valid at the time of writing of this presentation,
quite likely. Since then, new cfes have been discovered
and the results probably will be worse over time.
And that's one of the reasons why you might want to keep an SBOM file
at hand, so you can reevaluate it over time and
see if you need to fix your code,
starting with the Python application itself.
Well, we're in pretty good shape. There is one medium CVE found
nothing to worry about.
As an Ops engineer, I'd be more than happy to deploy
this that does not apply to the image
which is built using the Python latest image.
As you can see, we get a bonus of
1699 vulnerabilities
simply by storing it in this particular
image. Even worse is that
a lot of them are criticals and high cves,
which could have a big impact.
Yes, I do realize that it's a containerized application.
It's running isolated, but if you're able
to compromise the application inside the container, you do
have a lot of tools at your disposal, and the
likelihood of getting into the container is quite big
as well, because there are enough vulnerabilities
to abuse.
We could make this a bit safer application, but by simply changing
the base image, we don't have to rewrite any code or whatever.
But by buying the Python 3918
slim image, suddenly we only have 101
vulnerable matches,
which is compared to six to 1700, quite a reduction.
The difference is huge, but we can do better.
If you look at the alpine based
image, then you see that we have only
one high 18 medium,
no criticals or whatsoever,
which means that we are in a pretty good shape.
But we could be even be in a better ishare because the image itself already
is a bit outdated as well. And inside the image
there is already a fix for nine known vulnerabilities.
So if we put this in a table, well,
I think it speaks for itself. You probably can guess which image I prefer
to deploy as an engineer.
Well, like I said,
I always like to keep the SBOM files at hand somewhere.
It allows me to do evaluation over
time, but being a bit lazy,
I leave it up to tools to do it for me.
And one of the tools that's very neat is dependency track.
Basically it is a database in which you can upload
the SBOM files, and dependency
track will do periodic analysis
every 24 hours or so. It will download the newest CVE
databases from different sources and match it
against the SBOM files that are stored inside the database
to see if there are any matches. And if new
cves are discovered, it will create
tickets for you or send you notifications or
whatever, basically helping you to be
on top of the quality of your software with
regards to safety. You could
even consider buying tools like renovate to automatically fix
these vulnerabilities to make sure you're
always in a good ishare. But as you can imagine
that over time you might want to upgrade to different
base images. Alpine will
have a successor every now and then, and then only
by keeping your images up
to date, you keep the application itself safe as
well. Another advantage of having the S bomb
files at hand is that you can convince
customers that your software is safe
from a CFE perspective. A CFE
does not prove that you don't have malicious code somewhere.
It only tells you that the malicious code is executed in a
safe manner. More and more you'll see that tools
like Docker build and others also include the ability to create
these Sbombs and store them in container
registries, which allows users to pull
the S bomb prior to pulling the container image again
in an attempt to be preventive in
the sense that you want to analyze the S bomb first before
you allow your system to deploy it.
And this is quite common and nowadays in Kubernetes clusters
that you analyze the container prior
to deploying it, because once it's deploying
you're often too late. I hope
that I've given you some good reasons to start working with SBOM
files. It's pretty easy to do,
it's easy to integrate in pipelines.
It is a great way to show to your customers that
you're on top of things, that you're updating your software on a regular basis.
But if you enter this world, keep in mind that
there are many scanners,
but there are not all good. Some are
great at only scanning Java packages. Other really focus
on analyzing Python applications.
They'll evaluate the requirements of the. TxT files as
well and take it into consideration.
Other tools are great at doing analysis
of the base OS packages, for instance like
the Debian or the RPM packages, and some try
to do it all. So a bit of benchmarking here
helps to determine what is good for your environment.
In order to make it easy to start, I've got
a couple of risks that I find interesting that I'd like to
share with you. If you visit this application
online, this presentation, then you have the ability to
click on them as well. If you're only
looking at this one, I'm sorry, then you have to type them. But as you
can see, there are many tools listed here to generate
s bombs. Microsoft has some tools.
There are some tools that run inside kubernetes. There are some
tools that you can use from the command line.
There are tools that you can use to store sbombs.
Sbomb OCi means storing an SBOM
file in an OCI container registry which allows you to
evaluate images before you download them.
Dependency track is a nice tool that
fits in the middle of the supply chain that
also analyzes
them on a regular basis.
Trify is well known nowadays.
Trivia is often used in a pipeline because it can do all in one.
It can generate the SBom file for you and do an analysis
of vulnerabilities which allows you to break a
pipeline. If a critical CVE is discovered,
it prevents you from releasing bad software.
Cube clarity is a nice illustration of software that you
can run inside the Kubernetes cluster.
It does basically the same and it will tell you which
container images have to be replaced due to
CVE issues and
that's about it. I hope this
is enough for you to get started.
Like I said earlier, it's not complex at all,
but it's adding great value to all of us in the
industry. So thanks for your attention and good luck.