Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Yeah, so it's my turn to
talk about this topic. It's actually around the
data security because you can see this title here,
all the key phrases, the dynamic data masking or
data encryption that's all around these data
security, especially for your database.
Yeah. So I'm Trista pan the Sophia.
Yes. Cofounder CTO the CTO.
Today we will actually talk about the data security,
all of the good technologies to help us to
deal with the data security. So today's
content speak of the data's content. We will
go through the following items.
These first one I will give a little introduction
about the data lifecycle management.
And these, it's like the background why we'll
consider about the data security. And there
are many what's the popular technologies
to help us to deal with the data security?
And then we will delve into such good technologies
and use them to make
our solutions to do such these
stuff. And if we have time, I will
introduce the hands on practice or the demo
show from the hands
on the side hub practice
to learn more about today's solution.
But if we have no enough time, you can just refer to this slice
and to do it by yourself. I will give more introduction
about these technologies, about the solutions,
about the architectures and how to use the open source project to
help us do such stuff.
Yeah, so that's something about myself. I'm the Sophia
yet co founder and CTO and because I
really love the open source and the coding part.
So I spend a lot of time in Apache
software foundation to give some tips
to other open source incubator
protected to help these to run these open
source project especially around the database because
my professional area it's about the distributed database
system and cloud. So if you're
interested in today's topic or later on topic of
futures, for example the open source post or
the open source business post or the database posts,
you can just give a look at my linking and GitHub or
Twitter. Yeah. So let's get started with the first part,
the data security. It's a
big topic these days because when you Google and you
found in this world there are many
places that are happening such events or the bad news
that our company or other companies suffered
some data breaches or we need to do
the law or policy compliance
or we need to protect our data, all these stuff.
So no matter which issues
you want to solve, you will consider about data security.
These last part we will consider that what's
the popular or common technologies can
help us to do the data I mean the
security or data protection. So we
need to first know that the whole lifecycle
of the data management, because your technologies
or good common techniques need to
apply for the different phase of the whole
data life cycle. So when
speak of the data lifecycle management, there are
some important phases here. The first
one we need to generate it or connect data
from the different places or different
data sources. These second one we need to store all the
data, we need to manage them and the last one
we need to share them, share to different department to allow
them to use all of these data to create the value forward
industry. Forward companies. And the last
phase for the management is that we found
that some data is unnecessary for the production
environment. So we need to archive them or delete
them during the whole management or the
whole lifecycle. We found that if
we want to do the data security or protection
stuff, we need to apply
different policies or standards to different phases.
For example, at the first stage when we
want to get generated or collected
data from different places or data sources,
we need to manage our data source, right? We need to
allow the trusted data
source, trusted data come to our system.
And so we have some policies or
strategies or have some specter
or detector tools to help us
manage the data sources or the data.
I mean the generation precise or connection
precise. That's just the policy.
I believe these are many such tools or
business tool or open source tool can help us do that part.
But at least I want you know that if you want to do
the data life management, especially about per death
or data, you need to put some attention to that part.
A second one about these store data or manual data,
the data encryption, it's necessary one and it's the
common one, it's the popular one. When we speak of the data
encryption, you need to consider the precise of the
data because you need to first move your data
or transit your data from one place to another and
manage it. So the data encryption will happen
in traffic and at rest.
So data encryption is the key
technology. So what's the data encryption?
Data encryption actually is just like the name changes.
In order to protect our data, we need to use
the data encryption algorithm or mechanism
and the encryption key to help us to convert our
plain text or data into the cipher text.
So therefore if someone want to visit
data, want to gather plain text, they have to use some encryption
algorithm or encryption key to help them to
know what's the exact data of
such encrypted data. That's a precise called
decryption, right? But if you have
know the correct key or you are not allowed to
gather data. For example when we suffer from the cipher
issues or the attacks. Then even
though you gather data from your database because you have know
the encryption algorithm or encryption key,
then you can just get the cipher tag. So you don't know what's the meaning
of each column or each row. Yeah, so that's the data encryption.
There are some, not the best. And the popular encryption
algorithm for them, the AES or the DS
or the RSA for our users
or for you to want to use such a technologies,
these users, you can just use them because they are so mature
and I mean they're popular.
Yes, but I know that if you can just create your
own encryption algorithm, it's fine, it's okay. It's just a
bit. Your scenarios. These next one, it's about the data
masking. The data masking.
It's one to solve the issue about the data
usage and sharing. Because we need to further share our
data, allow people to visit our data and
allow people to use it. So sometimes,
for example, we need to use some the
production or online data for the analytics
or for the testing or for the training. These we cannot
just allow our privacy data or message
to be used in the testing
environment, right? So therefore we need to
do the data masking, to scrap your data
and to create static
data to allow others to use such these data to
create the value, to do the test, to do the analytics.
All the stuff there is the image here.
It clearly tell us for example
this column SSN maybe before
it, just like here, one, two, three of the
plain text, you can know it's meaning. But later on we
will scrabble the data and to use
some these simplest to replace them to become
another type of the data. But actually they share
the basic structures of the data. Therefore people
can leverage the part
of the image of such type of the data,
right? So that's the data masking. They are
so common when we speak of the data
security. Oh, by the way, about the data
encryption in transit. Actually these also have
some mature technologies for us to consider.
For example the SSL secure
sockets layer. It's practical. I mean for example
most of the data source or databases, they implement
such the layer or such the interface. So you
can just do the configuration to allow
your system or other databases to encrypt your
data in transit, in transition, right?
So therefore if some of the people, I mean,
get our data from the transition, they don't
know because we use the SSL or
TLS, TLS is advanced SSL
to protect our data when transit all of them
and about the archive and destroy.
I mean first we need to consider that such data is
necessary or not. And then we can use this
time to leave to magnet them to allow our system
to automatically to archive or expire
our table rows. Yeah, because we can
not just create data, we need also to delete our data,
right? So that's all the popular technologies to do
the data lifecycle management, especially around these
data security. But today we will just use two
important or popular ways to
finish the data to protect our data. The first
it's these data encryption in transit and at rest.
At rest. And second one we will use the
dynamic data masking. These features function
to help us to protect our data.
But no worry because later on I will introduce
some these open source project, you can just use it to do such
the good features. But here I had
to introduce the
deployment architecture because
even though we want to do the data encryption or data masking,
there are so many tools
or projects can help us to do such features.
These have their own props and cons. So I want
to give some introduction around them.
Therefore, even though today you don't want to use the open
source project, I recommend you can just consider other ways.
But at least you will know the basic props and cons
of each solutions. The first one,
you can just do the data encryption
at the application level. That means you
have the small cases and you don't want to
just do a lot of work around this
data security area. Then you can just let
your programmers or developers based on your current
cases to do some these code changes
or developings. For example just using these AES
or other these encryption algorithm to finish
it. It's so simple, right? That means you add the
application liable to do the data encryption. These second one it's
around the proxy livo to do the encryption
because you know some the data proxy and the
firewall or gateway, they can
help you to do the data, the encryption
or data masking. And today I will introduce such
solution because first it can help you
because you don't do the code changes at your
application level. Especially for a lot of these. You have
thousand for microservice service,
right? You cannot encrypt each of these Microsoft's
one by one. That's a terrible workload. So data
encryption at proxy level can help you do
not do all of the tedious work.
And the last one about the database encryption, because such
the database, they have the basic encryption capabilities.
But that's just up to the database
because you know some database support that some not.
And some databases have a good support, some not.
So you need to do some research about your databases and
some of the database, they can just encrypt
or decrypt your data in a hole.
The whole data needed to be decrypted or
encrypted automatically. So it's not so flexible.
The same case will happen or more serious at
the file or disk liable description.
Because your file or your disk,
they don't know the meaning of each query
or each column, each table or each row.
So they just regard them at
the whole data or the same, right, the same.
So they just decrypt or encrypt the whole
data. Even though you just want, for example, in some
of the cases you just want the user privacy
information are automatically decrypted and encrypted.
You don't want to search the other data or do the same
stuff because you know that
will make the performance become lower,
right? Because you need to do the encryption computing work.
But for some of the database or for
your files, for your disk, they don't know which
data, it's the user data or private data.
They just decrypt all the stuff together
one time, right? So that make the performance
become lower. These need to do more extra work around all the
stuff. Yeah,
a lot of different solution to reach the goal, the data
encryption or to nap data masking. So you can just
pick up one of them. But today I will introduce the
proxy liable encryption because I introduced
first one that there is no code changes.
The second one because some of the proxies they know
they can allow your user to do the customized
encryption strategy or policies.
So you can just partially do the data
encryption for your scenarios,
right. And in order to finish that one, we need
to use this open source project that's these sharding Sophia.
Actually it's the distributed SQL engine
to do the data sharding, data scaling and data
encryption. So data encryption is just
one of the key features of this project.
It also can help you do the data sharding or
these data skilling. But today we will just focus on the data
encryption. This feature how to use this project to help us
automatically do the data encryption and dynamic data masking.
Actually this project has been open source for more
than 60 years. So it has the bigger community you
can see here, or id released for more
than 50 times and have more
than 1500 contributors. And it
has a lot of documents and user cases for you to
learn more. So you don't worry about that.
It's brand new. So you worry about unit inward
production because you can see here, it's really
a mature project. It's an Apache top
level project. Yes. So the
nice part about this project actually
provide two clients for you to choose. The first one,
it's for the Java application because this
sharding Sophia GDBC client,
it's lightweight Java framework.
So you can just use the maven to use the sharding
Sophia GDBC to do the data encryption and data
masking. Another client of
this project is the sharding Sophia proxy. Like I said
before, it's a database proxy. So the
database proxy works or act
as a database server. It will pretend itself as
the PostgreSQL server or MySQL
server. So let your application just use the
traditional approach to
visit sharding Sophia proxy. As if you are postgres
server or MySQL server. Right. And this proxy
can understand the meaning of your acquiring and can
help you automatically do the data encryption and dynamic
data masking. So you can see here,
this proxy actually like I said before,
because it deployed between your application
and your database. And it
can pass your sql. So it can help you do the
data sharding, data encryption, data masking,
rewrite, splitting and distribute transaction. All the
good features when you use a database. But today we
will just use the two important feature of this project.
That is data masking and data encryption. We'll use the sharding
Sophia proxy to do the demo show. But if your application
is developed by Java, you can consider sharding Sophia
GDBC because it's just so lightweight.
You can just use mean. Yes.
So for the proxy part,
this project also provides some helm charts or operators
can help you by one click to deploy the
sharding Sophia proxy. And then it's so
simple, even though your application now it's living in
the Kubernetes cluster. Yeah. So basically
how to use it to do the data encryption and
the data masking. You can just first
deploy. Sorry. You can just first
deploy these sharding Sophia proxy. And each
of your query will first visit sharding Sophia
proxy and it will pass your SQL know
what's your meaning, get your data and to
decrypt or encrypt your data automatically and send
the encrypted data into your database.
So even though some people do
these cipher attacks to your
database, it gets your data.
All the data are already encrypted.
Right. And so here you can see for
example that's your application, it's your postgres.
Well instance. That's these unit. First deploy issuance feed
a proxy and you create a table. The table
will contain all of your user information, has a user
id, user name, user telephone and user address,
right? So here we will
use user telephone to do the data masking. That means
you tell the shirting Sophia proxy. Hey, if you found
that I want to insert a row that into
user these table. Please help me to do
the data masking on user telephone.
Therefore, when you get the data from your database,
from your application interface, you found the
user telephone, it's already data masked,
right? And the same here. We can let the user
address is automatically encrypted
into your database. And when your
application want to get the plain text of
the user address, then the sharding Sophia prospect will
first get the cipher text from your database
and automatically to decrypt that the user
address row and make it become
the plain text and return the plain text to your application.
Therefore for your application everything is
transparent. But you have found your older data
in your database are already encrypted. So in order
to tell Sharding Sophia to do such actions,
because it's a sharding Sophia, right. It's not a
single common or standard postgreSQL databases.
Therefore we need to use another SQL. That means
the distributed SQL, the distributed SQL.
It's the SQL dialect of sharding Sophia.
So you use this SQL language to tell Sharding Sophia
to do the data encryption. To do the data masking, to do
the data sharding. It's very easy for
us to use it because you can see here this case,
it's similar to your standard SQL.
For example, if you want to create a regular,
I mean a normal table, you can run the create a
table, right? But if you want to
tell these postgreSql or Sharding Sophia that please
help me do the data masking or data encryption.
Then you could just create an encrypted table.
Here you can see we want the
user id, I mean use
the AES encryption algorithm to
do the data encryption, right? So it's just
another distributed language to communicate
with Shirley Sophia proxy to help us do such advanced
feature. So here this demo show
will actually teach us how finished
all of the work. I have no time to give the more
introduction, but you can refer to do it by
yourself. You can see here, for example, we can
create the mask rule for this user table
and we can create the.
Sorry. We can create the encrypted
rule for this table. Therefore, when you
get the data through sharding Sophia proxy,
you can have the following information.
Your address is automatically decrypted and
encrypted. But here because you get data from the
proxy, so proxy already decrypt
all the data from your databases. So you
can see all the plain text through your command,
right? And because we use data masking for
user telephone, this column. So from your
application view you will see
all the part of these user telephone is
data masked, right? But if
you visit your postgresql
database, you can see here everything about the
user addresses. It's all the cipher
text here, right? And everything about these URL
telephone. That's all these plain text. It's original
content. There is no data masking in your
postgresql database. So that's the demo show.
All right, that's all about today's talk. I mean
if you have any questions you can just twitter
me or at my linking to talk with me.
I hope it's really helpful for you and see
you next time.