Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I am Sentul from Ericsson.
Today my talk is going to be about patterns
for encrypting data at rest.
In cloud native applications,
data at rest actually refers to data that
resides in some sort of a storage.
And nowadays several
applications make use of a huge amount
of data. For instance, machine learning applications.
These applications have to process a huge amount of data
and it is very vital from a security
standpoint to encrypting the data
at rest. And this is more or less a primary requirement
for many organizations that handle data.
So in these talk, I'm going to talk about
what cloud native applications are,
how to look at data encrypting data rest
using a layered approach, and I will also be
talking about patterns for encrypting
data at rest. So let's get started.
A brief introduction about me I
work in Ericsson and my primary job
is to architect and develop cloud native
AI ML platforms. And by the way, these are platforms
that are used for running different types of machine
learning models in a highly distributed fashion
and with massive scale.
And many of the machine learning
applications rely on huge amount of data. And this
data can be in batch format or
it could be in real time streaming format.
So whatever it is, when data is
processed by any cloud native application,
it needs to be stored in some sort of a storage.
And when data is stored, data needs
to be encrypted in order to protect these sensitive information.
So that is what I'm going to talk about today.
And by the way, I am also the organizer
for Kubernetes Community Days Chennai
2022 that is coming up sometime next year
and we are gearing up to hosting the very first event of KCD,
Chennai 2022. I am maintainer
of an open source project called as Cube Fledged which is
all about caching container images
directly on the Kubernetes worker nodes.
I am a tech blogger and I am fairly active in Twitter
and I am a big fan of the money
heist series in Netflix. And by the way,
season five part two is launching on third
December. So don't miss it.
The agenda for my talk is going to be three
different sections. In the first section I will talk
about what is a cloud native application. So why
is cloud native application development
very popular and how data
is being stored, processed and transferred
in a typical cloud native application.
My second section of the talk is data
encrypting at different layers of a cloud
native application. This is where I will split
these application into different layers
and we will see how we can implement a data encryption solutions
at these different layers. And third,
I will be talking about various patterns
for encrypting data at rest.
So what is it? So you might have
heard about the term cloud native quite often,
right? So what exactly is cloud native?
Is it just a jargon or does
it has any real meaning? So what is cloud native?
Cloud native application is fundamentally
a way of developing applications, right?
So it is not applications targeted for
a specific deployment environment. It's not applications
targeted for deploying into specific cloud
providers, right? When we say cloudnative apps,
applications, these are applications that predominantly
have more or less these four elements
inside in it, right? First of all,
DevOps, right? So DevOps way
of developing an application and
adopting the DevOps principles in developing
applications, in maintaining the application, and how different
teams collaborate with each other
to produce the final piece of application. So that is a key
element of cloud native application. And second,
you will always see a continuous integration and continuous
delivery. So this is actually a
process by which the software is developed incrementally
and there is a high degree of automation so
that you are able to develop features
and push these features even into production on a continuous
basis, right? So that is another salient
feature of cloud native applications.
And typically, cloud native applications are also
containerized applications because of the benefits containers
offer in delivering the
various benefits that are required out of a cloud native
application. Containers are quick to start,
containers can be easily packaged and run in
the same manner in a multitude of platforms and environments,
and containers are now the defacto standard
of packaging and distributing cloud native applications.
And the fourth dimension of cloud native application are
microservices. Cloud native applications are
typically developed as decoupled
microservices. So each and every microservice
has the business logic and also the data store required
for storing the data. And microservices
expose their business logic to other microservices
and also to the external environment via clearly defined
APIs. So now you have an understanding
of what cloud native applications are.
Now let's talk about encryption.
So in order to explain about encryption,
so let me first dissect the entire cloudnative
apps application into four different layers.
At the topmost layer we have the microservice,
which actually encapsulates the business logic
or the programming logic of the application itself.
And underneath the microservice layer, we have the
database layer. This layer is responsible for
storing these data. Invariably,
microservices rely on some sort of storage
for processing the information and for storing the
state of the application. And microservices
can also be developed as a stateless microservice,
which means it will not be holding the state information
or the data information in itself.
But typically, microservices also have a
database in which they store the state of the state of
the application, and also sometimes the state of the environment
in which the application is running. And the third
layer is actually the volume,
these volume on which the database is running.
And this is typically a volume that is carved out of
a physical disk or a virtual disk. And volumes
are the point at which databases create
files and store files. Right.
And the fourth layer is the actual infrastructure
layer, which consists of the disks themselves.
And these could be physical disks that can be found in disk
attached storage on a server, or it could be
virtual disks that
are created and supplied by the cloud service provider.
So whatever it is, a disk is the
undermost layer within the entire layered
architecture. And when you
see encryption through the prism of these different
layers, right, data can be encrypted
in any of these four layers. For instance,
data can be encrypted by the microservice
itself before it is stored in the database.
So in this case, what happens is, apart from the
business logic, the microservice will also have
the logic for encrypting and decrypting the
data by itself. And it will also rely on some
sort of key management system.
Either it will be managing the keys itself or it
will be using the service of an external key management service in
order to manage the keys. And the microservice itself
will be capable of keeping track
of what keys are used for
encrypting certain piece of data.
So it knows how to decrypt these data.
So every logic of encrypting these data, and also the
decrypting of data is taken care by
the microservice itself and the database or
any layer underneath it is not doing
any sort of encryption or decryption. Okay?
Whereas in the second case, where encrypting data, the database,
right, the microservice doesn't perform
any sort of encryption or decryption. It handles
plain text data. And it is these
responsibility of these database to do the encryption.
And again, databases can offer many
advanced features. For instance, certain databases
will be able to generate and manage their own
keys, whereas certain databases will
again rely on an external service to do that.
And again, if you are talking about databases
that are consumed as a managed service
from a cloud provider, there could be databases
which support user managed keys, and there
could be databases which support only the cloud
provider way of managing the keys.
And there could be also performance related implications that
you have to keep in mind, because certain database engines
are capable of offering very good
performance even on encrypted data. But certain database
engines are not that performance. So you will have to be
very careful in determining whether the
performance merits that is supported
by the database will be suitable for you. And also
you should take care of what is the overhead that
you need to bear in terms of managing the keys and what
are the repercussions if in case a database
is getting breached. So those are
the other considerations that you need to take care of.
And the third thing is volume level encryption.
This is where you simply run
your database and microservice assets
and both these layers will still be handling plain text
and the entire responsibility of
encrypting the data will be taking place
at the volume level. So this is where you will be
typically using a storage provider,
a solution that is provided by these storage provider
which will be responsible for encrypting and
encrypting the data. And for instance,
if you are talking about a Kubernetes environment,
then you will probably make use of a CSI based provisioner
for provisioning these volumes and provisioners
come in different feature sets. So you
may have to check whether a volume provisioner
supports encryption at the volume
level or they support the encrypting at a storage
class level and what is suitable for your use case.
And accordingly you will have to choose the solution.
And finally, these last layer
for doing the encryption is the disk itself.
And these are disks that could be either
physical disks or it could be virtual disks.
So whatever it is, the encryption and decryption takes
place at the disk level. Okay, so typically
these are implemented by using certain kernel
modules and these kernel modules will actually
intercept the data that is actually getting written into the file
system on the disk. And these kernel modules will
be capable of managing the keys and encrypting and
decrypting the data. So as you see, there are four
different layers at which you can implement these
encrypting at rest. And there are
actually benefits, merits and demerits of
each of these layers.
So that is what we are going to talk about in the subsequent
slides. So let's enter into the patterns
for encrypting data at rest, right?
And let's say encryption by the microservice
itself. So what happens in this way
of encrypting data at rest?
So we saw that earlier. So this is the case in which these
application microservice itself has the logic
or the responsibility for encrypting and
decrypting the data. So typically
in these cases you will have to watch out for things like sorting
and searching of data. So if your application is doing
more amount of sorting and searching of data then
this is not possible because the data is stored as
an encrypted format in the database and the database engine doesn't
know how to sort the data and it will not be capable of searching the
data. So this is something that you will have to keep in mind whether
your application can be written in such a fashion that
it can tolerate this limitation. And by
the way, if you are having existing applications
which are already talking to a database,
right, and it could be expensive for
you to redesign applications for you to introduce the
logic of encrypting and decrypting the data in your application.
And by the way, this patterns of
encrypting at the microservice level has the least
attack surface. The reason being at the very first
layer itself where the data is generated,
the data is getting encrypted and as the data cases through
the underneath layers it passes in an encrypted fashion,
which means the attack surface in this case is the least.
So you get a high degree of protection for your data.
And of course using this pattern you should
be very mindful of key management issues because the
key management is now the responsibility of the application. And of
course the application can rely on other microservices
to perform the key management activities.
But at the end of the day these application is still
accountable and responsible for doing the
key management related activities. So this is something that you will have
to keep in mind if you are going to choose this pattern.
The next pattern is encryption by the database itself.
Now this is where the database itself has the necessary
capability to do the encrypting and the
decryption. And by the way,
predominantly this requires the database
containers to be run in privileged mode
and this might not be suitable for special use
cases, special security requirements. So your organization
might be having some security guidelines
which will prevent you from running privileged containers in
production.
This kind of patterns you will have to carefully
choose and see whether the database is capable
of performing the encrypting data when it is run without
these privileged mode. And most of the
databases that provide the encryption functionality use
a tool called as Dmcrypt. And typically
these databases have written some wrappers around Dmcrypt
in order to provide some functions
and key abilities that the database engine can use.
So you will have to be aware
what kind of mechanism that the database
employs or what is the functionality
that the database employs, whether it uses d encrypt kind
of solutions or the database itself
has its own solution for encrypting and decrypting
data. And in these kind
of pattern you will see limited support in
open source software. So typically if you are
used to using open source software as
your database solution, you will see
that not every open source solution has
this capability and you will nevertheless
have to invest in commercial plugins or enterprise
licensed versions of your database, if at all
you choose this pattern. But at the end of the day,
it is a decision that you will have to make considering the benefits
and benefits and the overhead
that you will have to bear in terms of cost and complexity.
And database encryption is very
simplistic because you don't have to rewrite your applications
and you don't have to also consider changes
to your storage solution or challenges to
your infrastructure in order to encrypting data addressed.
So it is highly a plug and play solution
if at all you find the right fit of database
solution for your application's needs.
The third pattern that I want to present is volume level
encryption and you will typically find
this pattern of encrypting widely used
in public cloud environments. Public cloud providers invariably
provide you with volume services which
have inherent capability of encrypting and certain
public cloud providers also provide you the mechanism
of managing the keys yourself rather than
the cloud provider managing the keys. So that could be
another sweet spot for you to consider. Public cloud
providers managed service for volume encryption and
third party storage providers many of these storage
providers support volume level encryption and
these storage providers have encrypting. Sometimes they have
implemented their own key management key management solution
for managing the keys, or sometimes they allow you to
bring your own key management solution which the
storage provider will talk to.
But invariably we are seeing many such storage providers
supporting volume level encryption. So this could be
a choice for you if you
have the capability of choosing these
storage provider and if you have the control over the infrastructure
aspects of storage on which your application is running.
And by these way, CSI plugins also have support
for encrypting. By the way, they have exposed some
APIs of the storage provider, but not all CSI
plugins have exposed the complete encrypting
feature of the storage provider. So there are some limitations
that you might encounter in CSI
plugins. So if you are deploying Kubernetes applications which rely on
CSI plugins to provision volumes, then you will have to carefully
see what is the support that is provided by
the CSI plugin or sometimes you may have to
write your own CSI plugin which has the complete functionality that
you require for data encryption address and many OSS
solutions are available which support volume level encryption.
So if you are into open source software and this could
be a very viable solution for you to do the volume level encrypting
data. One key disadvantage in volume level
encryption is that if you are deploying your application into an
infrastructure or into an environment in which you don't
have much of a control, right,
then that environment might not have the ability of doing
volume level encryption. Then your application cannot
assume that whatever volumes that it consumes
will be encrypted, right? So if you want to
ensure an end to end security of
the applications that you deliver to your customers,
and if you want to enforce certain rules
on how the storage provisioner should work,
and if sometimes it might not be feasible for you to enforce those
solutions, then you may have to consider the other previous
patterns for applying encryption. The final
pattern that you should consider is the disk level
encryption. And again, this is not feasible
sometimes because you will never have control over the disk
disks, might not get fully exposed to
your applications. You may have to consume
only volumes at these application level. And if the encrypting
is happening at these disk level, then unless
you have a tight control or visibility into your
infrastructure and environment, it might not be feasible to do
disk level encryption for you. And this
has the highest attack surface. That's because the
encryption happens only at the bottom most layer,
right? So at the microservice layer, at the
database layer, at the volume layer, everything is plain text. Only when
the data enters at the disk, then the data
is encrypted. Okay, so which means the attack surface
is high. The attacker can still steal the
data at the microservice level, at the database level,
or even at the volume level. And by the way, this is considered
to be these most simplistic solution. The reason being
disk level encrypting data been around for
a while and many of the disk level encryption solutions
are very mature and you have lots of tooling
to automate this kind of encryption.
And so this could turn out to be
the most simplistic solution for your needs, if at all.
You have the required amount of control and
visibility on the environment. And by the way, you have the
luxury of a standardized format for hard
disk encryption. For instance luks,
which is actually a Linux based format
for disk encryption, which means your applications could
be highly portable because then you rely on a
standardized format of disk encryption.
So these are some of the advantages that you gain out of disk level encryption.
But again, as I said earlier, you will have to have
good control and visibility on the infrastructure. So if
you are deploying your application into an environment which you
can design, in which the infrastructure portion
is something that you can design upfront, in which you can
enforce certain rules, in which you can bring in your disk level encryption
solutions, then this could be a most simplistic solution.
But again, the attack surface on this pattern
is very high. So if you are looking for a highly
sophisticated, highly secure solution, then this might not be
suitable for you. Okay, so these are some of the considerations
that you may want to consider. Okay,
now that's it. I am more or less at the end of my presentation,
so we talked about what cloud native applications are,
what are the salient features of cloudnative apps applications.
And then we saw what are the different layers of
a cloud native application, and how encryption can app
can happen in these different layers.
And finally, we talked about the
various patterns. And inside each of these patterns, what are the considerations
that you should be aware of? What are the benefits and
what are the disadvantages in each of these patterns.
So using this information, I hope you will
be able to choose what is the right
type of data encryption at rest. So solution for
your needs. So with that,
I come to the end of this presentation. Thank you so much
for watching this talk and please connect
with me on Twitter and if you have any questions,
you can reach out to me on Twitter itself. Thank you
so much, have a nice day.