Conf42 Quantum Computing 2023 - Online

Dynamic data masking & encryption for MySQL/PostgreSQL with no code changes

Video size:

Abstract

How to ensure data privacy and security? How is sensitive information in the database protected? How do data masking and encryption, a basic requirement for protecting data privacy and security, enable this function? This talk will focus on these questions in detail.

Summary

  • Trista pan is the Sophia co founder and CTO. Today we will talk about the data security, all of the good technologies to help us to deal with the data. And if we have time, I will introduce the hands on practice or the demo show.
  • Data security is a big topic these days. What's the popular or common technologies can help us to do the data? Data encryption is the key technology. If we want to do data security or protection stuff, we need to apply different policies to different phases.
  • Data encryption in transit and at rest. And second one we will use the dynamic data masking. These features function to help us to protect our data. Later on I will introduce some open source project, you can just use it to do such the good features.
  • There are so many tools or projects can help us to do such features. These have their own props and cons. Even though today you don't want to use the open source project, I recommend you can just consider other ways.
  • Sharding Sophia is an open source project that's the distributed SQL engine to do the data sharding, data scaling and data encryption. Today we will use the sharding Sophia proxy to do data encryption and data masking.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Yeah, so it's my turn to talk about this topic. It's actually around the data security because you can see this title here, all the key phrases, the dynamic data masking or data encryption that's all around these data security, especially for your database. Yeah. So I'm Trista pan the Sophia. Yes. Cofounder CTO the CTO. Today we will actually talk about the data security, all of the good technologies to help us to deal with the data security. So today's content speak of the data's content. We will go through the following items. These first one I will give a little introduction about the data lifecycle management. And these, it's like the background why we'll consider about the data security. And there are many what's the popular technologies to help us to deal with the data security? And then we will delve into such good technologies and use them to make our solutions to do such these stuff. And if we have time, I will introduce the hands on practice or the demo show from the hands on the side hub practice to learn more about today's solution. But if we have no enough time, you can just refer to this slice and to do it by yourself. I will give more introduction about these technologies, about the solutions, about the architectures and how to use the open source project to help us do such stuff. Yeah, so that's something about myself. I'm the Sophia yet co founder and CTO and because I really love the open source and the coding part. So I spend a lot of time in Apache software foundation to give some tips to other open source incubator protected to help these to run these open source project especially around the database because my professional area it's about the distributed database system and cloud. So if you're interested in today's topic or later on topic of futures, for example the open source post or the open source business post or the database posts, you can just give a look at my linking and GitHub or Twitter. Yeah. So let's get started with the first part, the data security. It's a big topic these days because when you Google and you found in this world there are many places that are happening such events or the bad news that our company or other companies suffered some data breaches or we need to do the law or policy compliance or we need to protect our data, all these stuff. So no matter which issues you want to solve, you will consider about data security. These last part we will consider that what's the popular or common technologies can help us to do the data I mean the security or data protection. So we need to first know that the whole lifecycle of the data management, because your technologies or good common techniques need to apply for the different phase of the whole data life cycle. So when speak of the data lifecycle management, there are some important phases here. The first one we need to generate it or connect data from the different places or different data sources. These second one we need to store all the data, we need to manage them and the last one we need to share them, share to different department to allow them to use all of these data to create the value forward industry. Forward companies. And the last phase for the management is that we found that some data is unnecessary for the production environment. So we need to archive them or delete them during the whole management or the whole lifecycle. We found that if we want to do the data security or protection stuff, we need to apply different policies or standards to different phases. For example, at the first stage when we want to get generated or collected data from different places or data sources, we need to manage our data source, right? We need to allow the trusted data source, trusted data come to our system. And so we have some policies or strategies or have some specter or detector tools to help us manage the data sources or the data. I mean the generation precise or connection precise. That's just the policy. I believe these are many such tools or business tool or open source tool can help us do that part. But at least I want you know that if you want to do the data life management, especially about per death or data, you need to put some attention to that part. A second one about these store data or manual data, the data encryption, it's necessary one and it's the common one, it's the popular one. When we speak of the data encryption, you need to consider the precise of the data because you need to first move your data or transit your data from one place to another and manage it. So the data encryption will happen in traffic and at rest. So data encryption is the key technology. So what's the data encryption? Data encryption actually is just like the name changes. In order to protect our data, we need to use the data encryption algorithm or mechanism and the encryption key to help us to convert our plain text or data into the cipher text. So therefore if someone want to visit data, want to gather plain text, they have to use some encryption algorithm or encryption key to help them to know what's the exact data of such encrypted data. That's a precise called decryption, right? But if you have know the correct key or you are not allowed to gather data. For example when we suffer from the cipher issues or the attacks. Then even though you gather data from your database because you have know the encryption algorithm or encryption key, then you can just get the cipher tag. So you don't know what's the meaning of each column or each row. Yeah, so that's the data encryption. There are some, not the best. And the popular encryption algorithm for them, the AES or the DS or the RSA for our users or for you to want to use such a technologies, these users, you can just use them because they are so mature and I mean they're popular. Yes, but I know that if you can just create your own encryption algorithm, it's fine, it's okay. It's just a bit. Your scenarios. These next one, it's about the data masking. The data masking. It's one to solve the issue about the data usage and sharing. Because we need to further share our data, allow people to visit our data and allow people to use it. So sometimes, for example, we need to use some the production or online data for the analytics or for the testing or for the training. These we cannot just allow our privacy data or message to be used in the testing environment, right? So therefore we need to do the data masking, to scrap your data and to create static data to allow others to use such these data to create the value, to do the test, to do the analytics. All the stuff there is the image here. It clearly tell us for example this column SSN maybe before it, just like here, one, two, three of the plain text, you can know it's meaning. But later on we will scrabble the data and to use some these simplest to replace them to become another type of the data. But actually they share the basic structures of the data. Therefore people can leverage the part of the image of such type of the data, right? So that's the data masking. They are so common when we speak of the data security. Oh, by the way, about the data encryption in transit. Actually these also have some mature technologies for us to consider. For example the SSL secure sockets layer. It's practical. I mean for example most of the data source or databases, they implement such the layer or such the interface. So you can just do the configuration to allow your system or other databases to encrypt your data in transit, in transition, right? So therefore if some of the people, I mean, get our data from the transition, they don't know because we use the SSL or TLS, TLS is advanced SSL to protect our data when transit all of them and about the archive and destroy. I mean first we need to consider that such data is necessary or not. And then we can use this time to leave to magnet them to allow our system to automatically to archive or expire our table rows. Yeah, because we can not just create data, we need also to delete our data, right? So that's all the popular technologies to do the data lifecycle management, especially around these data security. But today we will just use two important or popular ways to finish the data to protect our data. The first it's these data encryption in transit and at rest. At rest. And second one we will use the dynamic data masking. These features function to help us to protect our data. But no worry because later on I will introduce some these open source project, you can just use it to do such the good features. But here I had to introduce the deployment architecture because even though we want to do the data encryption or data masking, there are so many tools or projects can help us to do such features. These have their own props and cons. So I want to give some introduction around them. Therefore, even though today you don't want to use the open source project, I recommend you can just consider other ways. But at least you will know the basic props and cons of each solutions. The first one, you can just do the data encryption at the application level. That means you have the small cases and you don't want to just do a lot of work around this data security area. Then you can just let your programmers or developers based on your current cases to do some these code changes or developings. For example just using these AES or other these encryption algorithm to finish it. It's so simple, right? That means you add the application liable to do the data encryption. These second one it's around the proxy livo to do the encryption because you know some the data proxy and the firewall or gateway, they can help you to do the data, the encryption or data masking. And today I will introduce such solution because first it can help you because you don't do the code changes at your application level. Especially for a lot of these. You have thousand for microservice service, right? You cannot encrypt each of these Microsoft's one by one. That's a terrible workload. So data encryption at proxy level can help you do not do all of the tedious work. And the last one about the database encryption, because such the database, they have the basic encryption capabilities. But that's just up to the database because you know some database support that some not. And some databases have a good support, some not. So you need to do some research about your databases and some of the database, they can just encrypt or decrypt your data in a hole. The whole data needed to be decrypted or encrypted automatically. So it's not so flexible. The same case will happen or more serious at the file or disk liable description. Because your file or your disk, they don't know the meaning of each query or each column, each table or each row. So they just regard them at the whole data or the same, right, the same. So they just decrypt or encrypt the whole data. Even though you just want, for example, in some of the cases you just want the user privacy information are automatically decrypted and encrypted. You don't want to search the other data or do the same stuff because you know that will make the performance become lower, right? Because you need to do the encryption computing work. But for some of the database or for your files, for your disk, they don't know which data, it's the user data or private data. They just decrypt all the stuff together one time, right? So that make the performance become lower. These need to do more extra work around all the stuff. Yeah, a lot of different solution to reach the goal, the data encryption or to nap data masking. So you can just pick up one of them. But today I will introduce the proxy liable encryption because I introduced first one that there is no code changes. The second one because some of the proxies they know they can allow your user to do the customized encryption strategy or policies. So you can just partially do the data encryption for your scenarios, right. And in order to finish that one, we need to use this open source project that's these sharding Sophia. Actually it's the distributed SQL engine to do the data sharding, data scaling and data encryption. So data encryption is just one of the key features of this project. It also can help you do the data sharding or these data skilling. But today we will just focus on the data encryption. This feature how to use this project to help us automatically do the data encryption and dynamic data masking. Actually this project has been open source for more than 60 years. So it has the bigger community you can see here, or id released for more than 50 times and have more than 1500 contributors. And it has a lot of documents and user cases for you to learn more. So you don't worry about that. It's brand new. So you worry about unit inward production because you can see here, it's really a mature project. It's an Apache top level project. Yes. So the nice part about this project actually provide two clients for you to choose. The first one, it's for the Java application because this sharding Sophia GDBC client, it's lightweight Java framework. So you can just use the maven to use the sharding Sophia GDBC to do the data encryption and data masking. Another client of this project is the sharding Sophia proxy. Like I said before, it's a database proxy. So the database proxy works or act as a database server. It will pretend itself as the PostgreSQL server or MySQL server. So let your application just use the traditional approach to visit sharding Sophia proxy. As if you are postgres server or MySQL server. Right. And this proxy can understand the meaning of your acquiring and can help you automatically do the data encryption and dynamic data masking. So you can see here, this proxy actually like I said before, because it deployed between your application and your database. And it can pass your sql. So it can help you do the data sharding, data encryption, data masking, rewrite, splitting and distribute transaction. All the good features when you use a database. But today we will just use the two important feature of this project. That is data masking and data encryption. We'll use the sharding Sophia proxy to do the demo show. But if your application is developed by Java, you can consider sharding Sophia GDBC because it's just so lightweight. You can just use mean. Yes. So for the proxy part, this project also provides some helm charts or operators can help you by one click to deploy the sharding Sophia proxy. And then it's so simple, even though your application now it's living in the Kubernetes cluster. Yeah. So basically how to use it to do the data encryption and the data masking. You can just first deploy. Sorry. You can just first deploy these sharding Sophia proxy. And each of your query will first visit sharding Sophia proxy and it will pass your SQL know what's your meaning, get your data and to decrypt or encrypt your data automatically and send the encrypted data into your database. So even though some people do these cipher attacks to your database, it gets your data. All the data are already encrypted. Right. And so here you can see for example that's your application, it's your postgres. Well instance. That's these unit. First deploy issuance feed a proxy and you create a table. The table will contain all of your user information, has a user id, user name, user telephone and user address, right? So here we will use user telephone to do the data masking. That means you tell the shirting Sophia proxy. Hey, if you found that I want to insert a row that into user these table. Please help me to do the data masking on user telephone. Therefore, when you get the data from your database, from your application interface, you found the user telephone, it's already data masked, right? And the same here. We can let the user address is automatically encrypted into your database. And when your application want to get the plain text of the user address, then the sharding Sophia prospect will first get the cipher text from your database and automatically to decrypt that the user address row and make it become the plain text and return the plain text to your application. Therefore for your application everything is transparent. But you have found your older data in your database are already encrypted. So in order to tell Sharding Sophia to do such actions, because it's a sharding Sophia, right. It's not a single common or standard postgreSQL databases. Therefore we need to use another SQL. That means the distributed SQL, the distributed SQL. It's the SQL dialect of sharding Sophia. So you use this SQL language to tell Sharding Sophia to do the data encryption. To do the data masking, to do the data sharding. It's very easy for us to use it because you can see here this case, it's similar to your standard SQL. For example, if you want to create a regular, I mean a normal table, you can run the create a table, right? But if you want to tell these postgreSql or Sharding Sophia that please help me do the data masking or data encryption. Then you could just create an encrypted table. Here you can see we want the user id, I mean use the AES encryption algorithm to do the data encryption, right? So it's just another distributed language to communicate with Shirley Sophia proxy to help us do such advanced feature. So here this demo show will actually teach us how finished all of the work. I have no time to give the more introduction, but you can refer to do it by yourself. You can see here, for example, we can create the mask rule for this user table and we can create the. Sorry. We can create the encrypted rule for this table. Therefore, when you get the data through sharding Sophia proxy, you can have the following information. Your address is automatically decrypted and encrypted. But here because you get data from the proxy, so proxy already decrypt all the data from your databases. So you can see all the plain text through your command, right? And because we use data masking for user telephone, this column. So from your application view you will see all the part of these user telephone is data masked, right? But if you visit your postgresql database, you can see here everything about the user addresses. It's all the cipher text here, right? And everything about these URL telephone. That's all these plain text. It's original content. There is no data masking in your postgresql database. So that's the demo show. All right, that's all about today's talk. I mean if you have any questions you can just twitter me or at my linking to talk with me. I hope it's really helpful for you and see you next time.
...

Trista Pan

Co-founder & CTO @ SphereEx

Trista Pan's LinkedIn account Trista Pan's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)