Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello. Good morning. Good afternoon. Good evening. Thank you for
joining today's talk on containing security when working
with containers. My name is Adrian Gonzalez and I'm
a principal software engineering lead here at Microsoft.
As part of my role, I am accountable for ensuring that security is
part of my team's engineering fundamentals and incorporate as
part of all out puts. Whether it's working with a customer or working
on a product outside of work, I enjoy experiencing new cuisines,
going on wine tastings, traveling the world, most outdoor
activities, and playing and watching baseball throughout
the presentation. Today we're going to cover the four phases that I've
experienced when working with security around containers.
First, we're going to explore the top quadrant here,
which is around creating updating container images. In this
section we're going to cover four topics and outline some examples,
as well as additional details around what are some
of these best practices when creating and updating container images?
The first practice I want to talk about is ink, making sure that
you're running the latest versions or most
relatively recent versions of the OS,
and as well as the container environment.
In this image and in the future slides, I'm going to be using Docker as
an example. Typically, the steps to perform this
will vary. The syntax will vary based on your operating
system and tool of choice, but typical steps include things like uninstalling
old versions, updating any packages that are required
to perform the commands on the CLI,
adding GPG keys to ensure security when
downloading dependencies, setting up repositories to be able
to store where the downloads will take place,
and then ultimately determining what version of either the
OS or the container framework you're using.
So in this case it would be a particular version of Docker,
engine container and Docker compose. And in this images
here you see how I outline the command that I use to determine what
are the recent versions of Docker as well as what
version I want to use for my dependencies.
The next best practice is ensuring to use non root
users, and in this particular case you'll
see examples both in the top left and top right images where
I define a particular username, I give it a user
id, and I create a particular
group for that user to be associated with.
And you'll see a command highlighted there where I just
want to share that. In my experience, it's okay to have certain non root
users be able to perform pseudo operations, but you do want to
limit how many users and also definitely segment the users,
potentially as separate groups as well. The last
line in the top left image is that effective use
of ensuring that when the container runs, it's using that username
and not a root user. The bottom right image is effectively
the same structure, but using a real example of where I
needed to download a Golang base image from docker
containers, and then I had to actually go back and use the
root user to install some dependencies.
In this case, I removed most of those dependencies and just showed
the run command where I would update dependencies. And then
once I finished that custom tailoring of
my image, I completed the docker file
by switching back to that non root user.
Another best practice is making sure to use trusted sources or trusted registries.
Here in this image I show how what
is the relationship between a registry, an image, and a container,
where registries is effectively nothing more than a repository
of images? Images are static representations of running containers,
and then a container is the actual instance that is running, and you'll
see that when working with Docker. It follows that same convention when
you see indications of the from command, where the
first piece of that from command is a registry followed by a slash,
followed by the repository name as well as the image name,
and then followed by a colon and the specific
version of that image. Now when using trusted sources
and registries, know that that can be both on prem as well as public cloud
through other providers, whether that be azure,
AWs, Google, some other cloud, or JFrog artifactory.
Another best practice is making sure that when defining images that
they are as lean as possible. That typically translates to only
install the dependencies that that container requires to perform its
function. But it also includes
another mindset that I outline in the top left image,
which is separating build versus runtime dependencies.
In this image we're installing what looks to be some kind of a node
application, or we want to create some kind of a node application that
is a web app. And so the first
part that I highlight here is all the steps required to build that
solution, and that's performed by running that
run yarn run build command. And of course I
had to install yarn to do that. As you can see, yarn install
or, sorry, install all the node NPM packages to do that.
And depending on the solution those can get quite hefty and include
a lot of dependencies. Now the second piece that I highlight
is a totally different docker base
image called Nginx Alpine. And here I'm copying only
the outputs of the run build command in the
previous step and storing them in this new base image.
And now I no longer have to worry about all the other NPM packages that
were required to actually build a solution. And my image is now leaner,
has less dependencies, and is therefore less prone
for vulnerabilities. Also in the bottom right
example is another extension of how I was
able to extend a Golang image and a python based
image. And again, I removed all the custom dependencies
I had to install for keeping this brief. But you
see how I start using in line ten the Golang image,
I download it, and then in line twelve I download the Python image.
I would perform my custom dependencies and install them there.
And then in line 18, all I do is copy everything from the Golang base
image into my current working image, and that is what I
would ultimately produce. The next phase in our cycle
here is storing that container registry I
want to talk about a few concepts as outlined in this slide.
First up is around using, considering what is using
and what is a private registry. Well, really a private registry is
nothing more than just a regular public registry, but it
is segmented by having stronger
network security policies in place. Now these can be things like
firewalls, source IP range policies, or poor policies.
The importance here is security by not
even knowing that a registry exists and only granting
individuals, or in this case networks, the ability
to connect to your registry on a need to know basis.
Another important concept when working with registries and securing
a registry is affiliating a concept called digital signatures to
that registry. I provide a couple of definitions on
what that is here to keep it brief. A digital
signature really is nothing more than what it sounds like. It's a
digital signature that creates trust and a chain
of custody of any image or any version of an
image that is available for you to consume.
It gives you that additional sense of confidence
that that image is produced by
whoever said produced it, and gives you that
sense of accountability that you know who to hold and reach out to. If there
were any issues with that image and the
three images I have here, I show how I use Docker to first
create a digital signature key docker trust
key generate. I'm not Jeff, but I would replace that with my name.
The second image, I add that key that was
just generated. And it's important to note that
prior to generating the key, I do have to provide that password
to be able to generate the private key. But once I set
my password and I have a private key, I would affiliate that
key to the registry. That's what the middle image command
is performing, and you can see it's saying that it's adding Jeff to
the registry and it is prompting for that password so that it can
successfully do so in the bottom
image. We're effectively running the
command to publish that image to that registry and you
can see the command docker trust sign the
name of the registry. The repository is called admin and
the image name is called demo and it's version one. And again,
I'm going to be asked for my password. Once I enter my password, that published
Docker image in that registry will now be digitally signed by myself
and any consumer can get that information that I was the
one that signed it at a particular point in time. Another best practice is
around identity access management, item and role based access management.
To start, let's define two important terms here. A role is
nothing more than a set of a unique id set
of permissions and what asset or assets
those permissions are being granted to for that role.
An account is a set of id
and roles, and that account I'm visually
representing as keys in this image.
In the bottom left image you can see how the service
Azure container registry, which is Azure's registry
solution, has a total of six or seven different roles,
and each role has different sets of permissions.
Now one best practice to consider here is always consider
minimizing highly privileged accounts and only
grant permissions that are required per each account minimal privileges.
And so in this example, the least privileged would be
a key to my image that may only contain,
say the ability to download that image. So that would be the
ACR pool command.
Another more highly elevated privileged key would be the
one on the far right, which is assigned to the Myelm
chart and the base node repository. And that one,
let's say for a sake of argument, is given the ability, the role
contributor, so that whoever has that key can then perform
all of these permissions and operations that are outlined in the left image.
And the most highly privileged key is the one
that is affiliated at the registry level. Even if it's the ability
to just do, say pool of images for all repositories
within that repository, I still consider that a highly privileged account
and should be highly secured and restricted
in terms of who can use it. The next phase is
around container devsecops operations.
And again, we're going to cover a few topics here from ensuring
that the CI agent scans have access to the registry to what is
container scanning and the various CI stages.
So first, when working with CI pipelines and
container registries, it's important to first make sure that your CI solution,
and in this case in the image I use Azure DevOps has the ability to
connect via the network to that container registry.
That may involve making some security network changes on the firewall
or security policies. Second is
to consider creating as many account
keys for different teams or individuals that will
be using the CI pipeline platform solution.
And the same mindset that we did before applies where we want to
be as granular as possible. The most common account key I
create in my experience is CI agent number three.
So for each account only have maybe
one or the minimal number of repositories and only granting that
account ACR pool permissions, a little
more elevated permission might be the CI agent number two key,
which has in this case visually two repositories that
grant that keyholder contributor and ACR
image signer permissions. This is great for a pipeline that will be
doing pushes to those repositories. That way the pipeline
can digitally sign those images and consumers know which
pipelines or which team that uses that pipeline produced said images.
And again, the most highly privileged key is CI agent number
one. This would be very limited,
especially on a CI level. So definitely very
cautious to have a CI have this account key performed
because as owner it has complete control over that entire registry
and all its underlying repositories. But it may be useful for a Devsecops
team to perform to fully automate the provisioning
and main management of future repositories.
Next is talking about the CI stages involving containers
and devsecops. We're going to go through each step here shortly. But just
like all CI pipelines, everything is based on a container code change
or code commit, specifically those that maybe pertain to the
container definition, like the image or docker file in this
case, step one build. Don't worry about the
syntax from here on out. This is Azure DevOps and
what I want to make sure is that we focus more on the actual
docker commands or the tools that I'm using that
I'm working that I'm going to be showing you all. So in this case
the build this step is relatively very simple. It's just running docker build
with a particular tag using the image name
variable that would be passed in as part of CI pipeline.
It would be putting in all build arguments that we'd be putting in
as part of the docker build process and then a
parameter called docker file name that tells what the docker
file name is for docker to look and build a container image
from. Step two is running tests. Think of this
as unit tests for your container. The first thing I highlight
here is a CLI command that you may not be familiar with called talks,
and then the rest of that command e test infra
make target environment. That's basically just a way to distinguish
whether the container is suited for dev
or all the way up to production, and then the parameter for
image name as before. Before we get into what talk this,
I also want to talk about the second task in this image,
which is only triggered when the previous task succeeds.
And if it does succeed, you can see that what we do is we effectively
echo. In this particular case, what we're really doing is we're just
setting a variable called test to pass and giving it the value
of true that we will be using in further steps of the CI
pipeline. So what is tox talks is a virtual
environment management and test CLI that relies
on Pytest, which is a Python package, to effectively
run Python code that is comprised of
methods that are test cases. And each test case has assertions.
Let's show an example of what one of those looks like.
Here is one very rudimentary example of what
a file will look like. And you can see, like I described, each method starts
with the def syntax and then we pass
in variables like host like file content,
and then inside of the method we perform the actual assertions.
So the first method tests to see that certain files
exist in the container, the running container
instance. The second test test container running checks
to see if the user that has an active
session in that container is root. Note we just talked
that we don't want to run that we want to run containers as non root.
So I would argue that this assertion should be changed to say process
user is not equal to root. So we give
more freedom in our test case assertions that other users
are allowed but root is not. And then other assertions
include things like testing certain properties on what the host system is,
checking to see, environment variables that are set,
ports that are exposed, or sockets that the container may
be listening on. Again, this is just a place to get
started. Definitely encourage folks to look into additional
assertions that make sense to test to ensure that the
container is properly configured and defined. Next step
is around scanning for vulnerabilities at the container
level. In this example I use a software solution
called trivia and you can see first step is
for me to download and install trivia and all I'm doing here is making an
HTTP request to install the DBM package
and ensure that trivia got successfully installed.
The second task is where things get interesting. I'm running
two scans using trivia. The first scan in the
first portion of the line there in the script is effectively telling trivia
to pass the pipelines even if it finds vulnerabilities with severity
low and medium when targeting a particular image repository
and a particular image tag version.
The second scan, however, will fail a pipeline
if the severity if it detects any severities at
the high or critical level. Again, this is subject
to risk tolerance based on the industry and the team and
the particular solution on our development. But I definitely would encourage
individuals to side on the air of caution and ensure
that there are no high nor critical vulnerabilities in the container
dependencies here's an example of what a test
vulnerability report from preview looks like. I've highlighted the key things
just to keep an eye on. You'll see the total number of vulnerabilities and their
classifications, and then at the bottom you see a table
with really rich information around what's the dependency or library
specific vulnerability id, its severity, the installed
version that it was found, and then trivia actually goes
and searches its data, sets its database to see
what is the fixed version where that vulnerability is no longer in
place, further exposing you and the team to decide
how to fix the vulnerabilities. Here are more
example scanning tools that I encourage you to look into.
That includes Aqua sonar, cube, white source.
The next step is around versioning and publishing the image. Now I'm
going to break this down by first saying there's two parts to this.
In my experience, it's worthwhile to publish even the images that
failed previous tests. And the reason for that is it makes it easier
for a subset of consumers to download those failed images
and troubleshoot them, patch them, fix them,
and then publish the code changes to fix it to the true repository.
Now the caveat here is just like how we talked about many different keys.
That needs to be a different key that grants a unique
repository in that registry called effectively failed,
with some form of like a failed syntax or indicator.
That's what we're doing in the first highlighted section of this image.
We're effectively on a test passed,
variable value being set to false, which would happen if the
vulnerabilities can, or if the talks test
failed, we would set that to false. Now we will run
a script to create a tag for that image and
append at the very end of it the failed value
as seen near the top of the images. After we've
tagged that image appropriately, the failed images, we now publish
that image. And again, the condition here in Asto. That syntax is effectively
the same as before and all we're doing here is pushing it
to the proper repository with a proper suffix.
Now what's key here is just I wanted to call out for you the
CI credentials that's being used to authenticate the CI pipeline
with the registry is in Azure DevOps,
found using this parameter called service connection.
Just want to mention that it's specific to Azure DevOps, but just want
to make sure that we are still doing this securely for all these examples.
The continuation is now to also publish the happy path if
an image passed all the way through, we want to make sure that we tag
that image appropriately and you can see that that's taken place in
the first task at the top and we're using a value called
latest to give it a name called latest for that version.
The middle task is effectively pushing
the docker image, but using a parameter image
tag instead of the value latest. Here you can
decide to use the conventions such as major
minor minor minor, or use a convention that
maps the CI build guid or job
id to that image as well. I've seen both options work
pretty well in my experience. The bottom
task publishes the same image, but now publish it
using the latest tag or the latest version
label. Now I caution to use
this just because it is prone to more convenience, but it's
also prone to having less control covers. How large
the impact is if that docker image did in fact contain
a vulnerability that just sneaked through that was missed because
if latest is available, consumers will typically opt to do
convenience and use latest, and you may find that you have a lot wider
number of users that may be impacted if you were
to push a vulnerability to latest versus pushing it to a very
specific version as is done in the middle task.
The next phase and last phase in our cycle is around
best practices when securing the production environment that uses
containers. The first best practice I want to
share is the concept around network segmentation,
specifically in your own time. I encourage to read up on
a concept called nanosegmentation when working with containers.
Just like with any other infrastructure, it can be pretty
segmented and locked down to have security
policies that limit who can connect to it as well as
limit what other infrastructure that component can connect
to. So with containers we're going to do the same thing.
We want to wrap containers around a subnet or even
be more nano about it, and wrap individual containers
within multiple subnets so that
it's further segmented and have pretty strict policies in place
to limit what can connect to it and what
the container subnets can connect to. Again, this is
great to minimize the potential impact radius if
there was in fact a vulnerability that was exploited
with that infrastructure or that container. Next up is
a great preventative measure to prevent denial of service or
a depletion of container resources, and that is
setting resource quotas. By default, containers have
no resource contain. So if a running container was
hijacked and for some reason it started really consuming all
the cpu, memory or other infrastructure resource,
it could deplete all of it. So the example I show here is
how we can use kubernetes. The same can
be done with Docker as well. But in my example here I show
how I do it with kubernetes to limit at each
either namespace or at the container level,
what is the default number of cpu and
memory allocated to each container, and what is the maximum
that can be granted to that container. Another best practice
is around continuous container monitoring.
There's three pieces to that environment, hardening, vulnerability assessment,
and runtime threat protection for nodes and cluster environment hardening
is effectively any solution that performs container monitoring,
such as Azure. Microsoft defender for container should
provide these three solutions environment hardening checks to
see if there's any misconfigurations or configurations that
are not secure. For example,
if there are no resource quotas, Microsoft Defender would
flag that as a vulnerability in its continuous monitoring.
Vulnerability assessments performs the same thing we did earlier in our
CI pipeline and just scans for vulnerabilities in container image
dependencies. But why do we will need to do that again and continuously?
Well, the reason for that is vulnerabilities can come up at any point in
time in the future. Not all vulnerabilities are known from the get
go. So as you have images that pass vulnerability
scans and now they're in the registry and they have running
instances or solutions from those images.
You want the ability for a platform to be
able to continuously run vulnerability scans and map any
vulnerabilities to actively running containers
so that you as a team can determine how to best mitigate
and minimize the chances of security issues.
And then the last piece is runtime threat protection,
which ultimately will scan the behavior of each
running container and just raise any anomalies,
whether it's the container doing a highly privileged operation like
user management at the cloud level or
at the active directory level, or whether it's the
container performing a highly privileged operation against some
other core piece of infrastructure that it supposedly typically
has not done before. So any deviations in behavior would also be
flagged in this slide, I encourage you to look up what
Azure's container protector tool offers and
what it checks against, and the particular link that you'll be able to look in
your own time is center for threat informed defense.
Teaming up with Microsoft to really build the notion of
the attack container matrix that outlines all of these different checks
that are performed by these tools as part of runtime threat detection.
Here I wanted to provide a sample vulnerability assessment provided
by the Azure container Defender solution,
and you'll see, as I highlighted here, it does surface certain
infrastructure misconfigurations. It surfaces
things like active container image with running instances that
have vulnerabilities installed and then also checks for kubernetes
to see that it has certain azure policies enabled for further
protection. Couple resources I also want to share here
the top right or top left QR is Microsoft's commercial software
engineering playbook. As it states in the slide,
this is a collection of fundamentals, frameworks, just general best practices
and examples that both myself and many other
individuals have contributed to over the years. It's open sourced,
so we continue updating this as
better practices or new best practices are surfaced
and in the bottom left or bottom right QR code is our open
source for dev containers. I really like to share
this out because it offers a great storing place on what
good practice well defined docker images
look like. Dev containers are a little more specific in nature as that
allows vs code to run within a containerized environment,
but that's another story for another day.
Great resource just to look at best practices on Docker container
or container image definitions and that wraps up our entire lifecycle.
If anything, I really want to share five key takeaways.
One is make sure that the entire team has awareness on
container developers practices. It's going to make them feel
more bought in, more informed and educated versus making it seem
like it's just a lot more requirements and work that is being brought down to
the team. Second, enforce RBAC policies
to prevent individuals from disabling control gates at
the CI pipeline level. This tends
to be something that's overlooked in my experience,
and it is a vulnerability where if a developer
or a team is really in a rush mode, they might
want to disable contain control gates that are there for a good reason.
So really limit who can manage those
control gates and limit individuals
that can perform those operations. Third, hold all members
of the team accountable for adhering to secure container management and
make sure that they know that they can hold each other accountable as well.
After all, security is a team effort and everyone
is responsible for raising
issues and or covers. Fourth is depending on the
level of maturity around developers that you are experienced
with or working in, there may be need to influence
change. And like all things that require
influencing, it's most effective when done as a community and
when individuals connect business,
the business mission and the business success criteria to
these principles of security as well. And last but not least is probably
one of my favorites.
Decisions are all about ratios between convenience and
security and there is no servo bullet.
Everything needs to be custom tailored based on industry,
based on solution, based on who the containers are.
But one of the key thing that I've learned in my experience is
especially when starting off at the beginning, weigh security heavier
and covers time, you'll find that it's less costly
and easier to shift the balances to find the right ratio
between convenience and security. And the reason it's less costly
is because if we were to shift that and weigh convenience
heavier over security at the beginning, that is set
up a potential for a vulnerability to be exploited and for there to
be a data breach or some other type of attack.
And with that, I'd like to conclude by thanking you all for attending
and wish everyone continue having a safe rest
of the calendar year. And in case we don't get to touch base
later, wish everyone a happy new year in 2023.
Thank you.