Abstract
As the most popular open source relational database in the world, PostgreSQL keeps attracting the significant attention it deserves. With the ever increasing data storage and query requirements, new challenges are brought forward for horizontal elastic expansion and security of the PostgreSQL database.
How to provide existing PostgreSQL databases with incremental capabilities such as data sharding, data encryption and other functions is of great concern to many PostgreSQL users.
This will focus on introducing how to empower PostgreSQL thanks to the ecosystem provided by Apache ShardingSphere - an open source distributed database, plus an ecosystem users and developers need for their database to provide a customized and cloud-native experience.
ShardingSphere doesn’t quite fit into the usual industry mold of a simple distributed database middleware solution. ShardingSphere recreates the distributed pluggable system, enabling actual user implementation scenarios to thrive and contributing valuable solutions to the community and the database industry. The aim of ShardingSphere is the Database Plus concept.
Database Plus sets out to build a standard layer and an ecosystem layer above the fragmented database’s basic services. A unified and standardized database usage specification provides for upper-level applications, and the challenges faced by businesses due to underlying databases fragmentation get minimized as much as possible.
To link databases and applications, it uses traffic and data rendering and parsing. It provides users with enhanced core features, such as a distributed database, data security, database gateway, and stress testing.
ShardingSphere uses a pluggable kernel architecture for Database Plus. That means there’s modularity, which provides flexibility for the user. Demos and notable use cases in production environments that are from the Asia equivalents of FAANG (Facebook, Amazon etc.) will be used to introduce the use and implementation of these functions for PostgreSQL databases.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, this is Chester. So I'm happy to meet
you and give a talk about how to create your postgresQl
PostgreSQl distributed secure database ecosystem postgresql
instance. Here is something about myself.
I'm the Trista, the cases co founder
and these CTO. It's my job title,
my area. It's about the distribute built database,
about the data sharding, about the database
AI management platform developing. So I'm a developer
but apart from that I love open source
and also I get involved a lot
in Apache software foundation. I'm the member and incubator
mentor to help to provide some ideas to other
incubator projects. So today my
talk will focus on how to use Apache
sharding Sophia to create your secure
and sharding database ecosystem with
your postgresql.
Well if today you have some question about my talk,
you can contact me on the linking or GitHub or
Twitter any channel you like. Yeah,
so let's get started. So today's content
is just to solve one issue. The issue
that how to create the secure distributed
postgreSQL database system.
But before to import the solution.
I just want to give some background about our postgresql.
Yeah, I guess most application will use the PostgreSQL
in the second part. That means you will create
these cluster with your postgresql. You have the primary
node and one or more replica
nodes to help to improve the availability of
your postgresql instance. Also your replica
nodes can get some or share
some traffic with your primary
nodes, right? That's the standard usage for the
database. But in this classic
structure or architecture of your database we
also can see more needs or requirements
from the modern application or website.
That means like high scalability or high
availability. Also you want to get your readout
as quick as possible and also how
to get the elastic skill out with your growing
up your application and about how to make
your data encrypted and decrypted automatically.
So that's the more needs or new requirements from
the current your database architecture.
So in order to solve such issues I
want to introduce project, open source project or
community to help us to deal with such issues.
That is the Apache sharding Sophia.
So what's the Apache sharding Sophia? Yeah, that's the first
question I give the answer for myself. That is
these ecosystem to transfer any
database into a distributed database system
and enhance these system with like sharding skew
out distributed transaction or SQL audit
or more enhancement features around
this distributed system. Yeah that is definition we're
spoken but actually I will give
more introducing about the features or architectures but
let me first add its community or this data
on GitHub because that is very important for us
to pick up or choose this project if it has a very active
community and that means you can get more help
from this community and that means this community receive
a lot of user cases. So the
same mission you may be met but will be stopped before.
Yeah. So on GitHub, this community
or this project receive more than 15,000
stars and have near
like 318 contributors
now and it released more than 14
variants. That means actually it's active
or diverse or maybe have a lot of user
cases. Right, so that's Apache sharding,
Sophia's community and it's mostly developed by
Java. That means if you interested in
this project or you want to get some helps
from the community or you want to do some contribution to this
community, I guess that maybe is a good project for you to
join in open source area. Yeah, so because this
project has been open source for more
than five years. So I guess it's so
mature for us to create your sharding
Sophia or sharding ecosystem around
your postgreSQL or MySQL users.
And if you want to learn more about
this project, it has a website, Twitter linking any
channel you can pick up. Yeah so that's
these community part, the NAS part we will enter into
the technique or about its feature
or what this project can help us do.
First. It's like the catalog
of this project because this ecosystem
it has many features around your database.
It has two clients or products for user
to choose. Also it support many
traditional dbms. So here
you can see for the feature part,
maybe you want to use a feature like Vrise
building shadow database for your
load test and sharding database gateway,
SQL audit or provides any features
I mentioned here you can just use and by
tell this system that you want to use one of these
features and you can use the distributed SQL or
configuration YaML file to tell this
product or tell this application that
you want to use sharding, you want to use the database gateway.
But today we just focus on sharding and data encryption.
There is the interesting part, I want to mention that
although you will see many of the features but these features,
apparently they are independent but you
can uses some of them together. That means
you can pick up some
features and group these features into a
complete new solution. Yeah. So today because
we want to create a secure and sharding cluster
with sharding Sophia with your postgres instance.
So that means we want to group sharding
and data encryption to create our cluster.
So that's about the features part. The next part I
want to say that this ecosystem or this community
have two clients for you to choose. First one is
sharing Sophia Proxy. Sharding safe proxy like
the gateway or like the proxy,
it can help you do some computing or to
handle ways to manage your traffic over statements
or queries. But you need to independently
deploy it like
the second diagram showed you.
The second part here is the sharding SaFi proxy. The right
column right. You can see you need a server
and to deploy it. And then your
sharding SaFi proxy these end plus your
postgres instance, they can group and
become a distributed database.
Because this sharding surface proxy, it works
as a server. So you can use like a CLI
or some DB admin
tools to connect to the sharding sufficient proxy and
use it as a standard postgres instance or
deserves.
Currently this sharding Sophia proxy supports two
dbms. First, MySQL. That means you
can have a sharing Sophia proxy for
MySQL. Alternatively you can have
the sharing Sophia proxy for PostgreSQL.
It's up to you. So today we will choose PostgreSQL
to become this the storage nodes
of this distributed system.
Another product is sharding Sophia GDBC.
It's a lightweight GDBC driver.
So if you want to use sharding Sophia GDBC, that means
it integrates with your application, your application
and sharding Sophia GDBC will are
deployed in the same machine or server.
But sharding SUV GDBC, it implements
most of these GDBC interface.
That means sharding SUV GDBC can support PostgreSQL,
MySQL, Oracle or SQL server.
Yeah, so it has more sports for your database.
All right, so today, like I
say here, we choose the sharding data encryption
features and choose the sharding field
proxy for PostgreSQL to create
our solution. Yeah. All right,
so before we enter into the introducing of
the solution, I want to give some case about how to
underlying sharding with Sharding Sophia proxy
and data encryption with sharding Sophia proxy.
What's these meaning of such issues? First case
is the sharding, right. In sharding case that means at the
beginning you have the primary node for your postgresql
replicate nodes of your postgresql. But when you
import the sharding Sophia or deploy the sharding
Sophia and all of the calculation just widows to these
sharding Sophia proxy, that means these computing
nodes and the proxy
will do some calculation and to locate
which PostgreSQL or which instance is for
these target SQL or this target statement.
So the Sharding Sophia proxy will help you manage
all of the sharding of this distributed
system. Yeah, but another case about data
encryption. So what's the data encryption? Data encryption
in this case, actually it's not a distributed
system. Right, because you can see here the
application just width one proxy and one
proxy will just wit one PostgreSQL instance.
But it's a feature or function that it can
help to decrypt or encrypt your
privacy information automatically or internally.
Here, like you have a table named these user table
and there is the telephone number column of
this table. So you cannot store
the plain text into your postgresql,
right? So the sharding Sophia can help you to
decrypt this privacy info
and these store it in the postgresql.
So these you can see the cipher text of your telephone
number. Conversely, if your application
want to get the exact plain text of
your telephone number, the sharding SaFI proxy will first
get the target cipher
text and then decrypt the
information and return these
plain text to your end users.
So that's the function of this case. The last one I
just give a mention, that is the database gateway.
Database gateway. That means here
the computing nodes or sharding Sophia proxy, it's like
the I five or like the
NGX. It can help you manage
your traffics or queries or statements.
How? Here you have two
postgres cluster, one of that. These primary
nodes have two replicas. So you
want proxy to automatically split
or share your traffic among this
privacy nodes. And two of the replica nodes,
maybe like 20% of the requirements will
be sent to the first replica, but remaining
18% of the sqls will
visit the second replicas. But nor about
the whole process of the traffic
distribution. Your proxy or
sharding Sophia proxy, just tell him that by
your application about your uses or strategies around
your traffic. Then sharding Sophia can help pass
the SQL and then send the
SQL to the target replica or your
primary nodes. So that's the database gateway case.
But today we just use the sharding plus data
encryption to help us to solve today's issue.
So entering today's case
we just want to create these cluster
distributed cluster. In this cluster you can see that your
application in the top level and these your
application standard. Visit your sharding Sophia proxy.
That means the computing nodes and then the sharding proxy
do some calculation and cases your sql understand
SQL and resend
this SQL or one sql to one of the
Postgresql instance gather without merge
out and then return the final readout to
your application. But here there are two
features mentioned. First it's the sharding
because imagine that your proxy have
the logic table. I mean the logic table,
it's not these physical tables because the actor table or
the physical table located in your two
of the postgresql instance maybe named t
user zero. From t user zero to t
user one. And these same here in demo
ds one, right. So there are two postgresql,
demo ds zero, demo ds one. They all called
storage nodes and the table
actual table located in such postgresql instance.
But for your applications, for your proxy
these are only one table t user, right?
So one table t user,
it's my ping to four actual table or physical
tables. Yeah, so that's the one keeps
the next one is imagine this,
users have the privacy information
telephone number. Yeah, telephone number.
We need to encrypt the plain text
and send the cipher tax
into your postgresql instance. So that's
the encryption feature. So this case you
use sharding and data encryption together. That's the
value of this case. Then how
to achieve that goal or how to create this solution?
Well I gave the basic steps about how
to create that cluster. First you need to prepare
your storage nodes, that is the two
PostgreSQl instance. Second,
you need to deploy one proxy. Also you can deploy
many proxies. You can group a proxies like
the cluster and maybe your case need
more computing nodes, need more
computing power. Then you can deploy
more than one proxy instance to
help you balance all of the traffic from
your application or web cases.
So here in this case we just deploy one.
And third step
is to log in your proxy by the Db
admin tools or GUI or CLI.
Anyway you can visit your proxy or you can
visit your postgresql server because as
I before proxy, it works as
the postgresql server, right?
Yeah. So by step one, two, three you can
create architecture on the
left column and then
that means by step one, two, three you create
a distributed cluster
and next one you need to buy using step
of four five, six to fill this
distribute the cluster with the sharding
rule, sharding strategy, encryption rule,
encrypt strategy or create many table,
many schema, many database on these
empty distributed database.
That's clear, right? But how
to tell these postgres to create like
a sharding table not standard or single table to create more
table with the encryption feature that means you
need to use distributed SQL. Okay, I will give
some details about this part. So by
step four, five, six actually you make
your cluster have the sharding feature, sharding function
and the data encryption function. Though your application
just witted these computing nodes,
just witted the proxy and this distributed
system can help you automatically sharding
data and encrypt your data, decrypt your
data. Yeah. So the step
seven and eight will just give a test
or input or insert some data or records
and to do some test. So the NAS part or
the later part I will give the folks on
step two, how to deploy the proxy instance.
Step five, how to use distributed SQL to create
a sharding rule and encrypted rule. And the last one,
these is some highlight on this part with your digital SQL.
Yeah. So how to deploy your proxy.
There are two ways for you to pick. The first one just uses
these quickstart on sharding Sophia web page.
It's easy, right? But there is a more
efficient way for you to choose that is use the
Sophia ex booting. Sophia ex
boot is developers by the Python language
and it can help you quickly one
command to create your computing nodes
and the governance center together.
So by use these vs boot it can help
you help you quickly create the cluster,
the computing cluster and with one command.
So that's the efficiency of this product.
There are two way for you and I also give your links
for you to visit after you deploy
the sharding Sophia proxy. These next one that you need to learn
and use distributed SQL. So what's
these distributed SQL. When this
word mentioned SQl that means it's a language like
talk about your postgresql. If you want to make
your postgreSQl instance to work like create a schema,
create a database, create a table, you needed to use postgres and
SQL dialect, right? And here distributed SQL,
it's sharding Sophia SQL dialect for
your postgres instance. Maybe you have
other language, other SQL like MySQL
dialect, Oracle dialect. Here it's these
sharding Sophia dialogue card distributed SQL. So you
use PostgreSQL SQL standard SQL to create a
table to create alter
the table or drop your schema.
Yeah, it's very simple, right? But you use SQL to
communicate with your database with your postgres instance
here, the distributed SQL. It's the SQL
like statement. It can help you to communicate
with sharding Sophia. Although sharding Sophia can understand
all the postgreSQL standard SQL
sharding Sophia can also understand Sharding
Sophia SQL dialogue. That means distributed SQL
advanced feature, advanced understanding capability.
But why do you want to uses them?
Because as I said, your sharding Sophia
plus your traditional dbms,
they work together and collaborate
with each other to create distributed
database, cluster or a distributed database.
So if you want to use the database, if you want
to use this distributed base to create some
sharding table or distributed table,
distributed database and distributed schema
or like make these distributed
system have some data encryption SQL
audit or some distributed privileges
this advanced or the enhanced features.
That means you need another language that
distributes SQL to tell this distributed
system. Could you get a point to run while?
Yeah. So you can see some examples of
distributed SQL when you
first give a glance. It's very similar
to your postgres SQL but it's
a little different, right. So you can just log
in by CLI to run this distributed SQL to
help you before you need to
five step five. You need to run distributed SQL
to create a sharding rule and encrypt rule, right? That means to fill
more metadata or create a sharding table in
this cluster database.
Cluster sharding system. Yeah. So that's
these distributed SQL also plus it can help you
manage this cluster here. Like this cases.
Sorry, this case when you
run some of the distributed SQL,
it can also tell you that this sharding
system, it has two
storage nodes and one computer nodes and what
status of each of the nodes.
So use them to manage your
cluster. Yeah, I have no
more time.
All right. I introducing
the distributed SQL and value for these
sharding system. You can see here
like the show instance nodes to show you
that what's these nodes in these system.
And here I give the complete
test example or demo for you to create
this architecture. But I have no
time to introduce them one by one.
So just give you the skeleton
by logging, add the resources
or your storage nodes and then
create the sharding table
by these distributed SQL and insert
some data for testing. And then here
when you run the select statement on
proxy, you will see that all of the logic information
that this T user is a logical table and
you can gather text plane of your telephone number
but when you log in your physical postgresql
instance you will found that all of the telephone
number is encrypted into some
string value. And also when you log
in your actual postgresql you will found that there is
no logic table t user but have one of the
physical or actual table of this t
user named t user zero or t
user two. Yeah so that's the magic of this proxy
or it really helping user to deal with a lot of these
process about the sharding about the SQL audit
et cetera. Yeah so that's these complete case
of today's solution. If you
want to have a these you can just copy and paste
and also if you have any questions you can just contact
me on my twitter or my linking or just
visit the sharding Sophia GitHub and raise
an issue. And also I'm waiting for your pr
maybe if you want to join this community so that's fair
time. See you next time.