Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello guys, this is Trista. So today this
talk will give the introduction
about the database, especially distributed database is
on Kubernetes and how to solve such issues.
I'm Jess Japan now is the co founder and the
CTO of Sophia ex. Actually this company, this startup,
it's built from open source project.
So it's open source commercial stuff as
my area. It's around the distributed database
and around the cloud databases.
Sometimes I will give some post around the open source,
around the Apache Software foundation, around the
distributed database, all the stuff on my Twitter and linking.
So if you have some questions about today's topic
and also want to talk with me more,
you can give a look there my Twitter and my linking.
Yeah. So let's get started.
Today our talk will include the following
atoms. The first one will give the issues because
if there are no issue, it's no necessary
to give some talks around the solution.
Around this talk. Last one we will talk about
the Kubernetes and databases and also about
the distributed database architecture. Then based on
all of the background of this knowledges I can give
them the new idea or solution to help
you solve how to leveraging your
existing PostgreSQL or SQL SQL
such popular open source monolithic
databases on the Kubernetes and upgrade
them to become a distributed one. Therefore you can have
such distributed system can help you
solve for example the high availability
or you need more performance TPS or QBS.
Well, you want this solution to help you
manage the tremendous data stored
in your existing PostgreSQL or MySQL SQL.
And the last part, if I have time I will give the demo
show to introduce them step by step.
But if I have no time, I suggest that you can
slide up by this slice by yourself.
All right. So the background or issue that
our service from the monolithic architecture
to become a distributed one, that means the
microservice or the serverless architecture.
Then we will cofounder CTO leverage this wonderful
open source platform Kubernetes to help
us to manage the traffic
and to manage the macro service,
right. And or server. You can see here as
the infrastructure most of them consider
to move our infrastructure from the
on premise to the cloud. Because the cloud have the
best service to help you skew up or skew
in, skew out or skew in, skew up
or skew down. Your infrastructure is best your service
server, right? So in the middle layer
that about your databases, how to deal with
your databases. We will consider the first one
how to make our existing monolithic databases
become the distributed one. To make your database
system to have the skilling feature,
to have a high availability feature and to let
your database system how to manage tremendous
enormous data and also offer
you the best performance,
right? And especially people will consider
could I put my database on the Kubernetes to help
this wonderful platform to help
me manage or the deploy
or our DB databases system
as same as the service or
application. All of the stateless application
or service. However,
when we speak of the Kubernetes, actually it's born
for the stateless application or services.
It can help us to automating our
deployment, sharding and management for all the container
application that especially for the stateless applications.
How about our database? Because everyone know the databases
is mostly different from the stateless service,
we need to consider the data persistence issue and
how to manage the status of our replica
or primary nodes of our database system.
And also we need to consider the backup and restore
our data. That means how to
backup our data or restore our data to one
specific point, right?
But however, no matter if the stateless service or
stateful service actually all
of the applications needs the monitor, high vivid
automatical deployment and security or
the service quality. All of the
features. That's the shared requirements
from our users or from our ops,
right? But today we will focus on
the difference between the status services and
stable services. Because today we all want to solve
is how to put our databases on the Kubernetes.
That means how to make our databases have
the skilling features or automatic
deployment or management features.
Traditionally, actually currently all
the database vendors, they provide some of the solutions on
the Kubernetes. That means put their distributed
database on the Kubernetes. They need to leverage the PV
PVCN storage class of a Kubernetes native
mechanism and use the stateful site to deploy
their database and also leverage the
pod identity or the mechanism to help them to
put our distributed databases on the Kubernetes.
I have to say it's a good way to evolve
to put our databases on Kubernetes. Therefore your
application now it's all born or
leave in the Kubernetes. And if we can put our
databases also in the Kubernetes, therefore your
application can directly visit your databases in the
same Kubernetes area
or this area
for this
part. Right? But however,
actually today I want to give another solution
about how to leverage your existing
postgreSQl databases
and put them into the Kubernetes. The first
solution, like I said before, you can also use some
postgresQL operators or use the pvPvC storage
classes to help you to deploy all of the
databases, all of the stateful applications
on the Kubernetes. But today I can give another way
to figure out that issue. Let us first look
at the distributed databases. Actually, when we speak of the
databases distributed databases system,
we will split this system into
two parts. The first one, that means your distributed
database system, it's made of two parts,
two important elements. The first one is computing
part. The second one is storage part.
For example, I can give them the architecture, the high level
architecture of the MongoDB or corporal DB. That's all
the popular distributed database system you
can see here. They also have their storage part,
storage nodes, and also the computing nodes.
Computing nodes. It's like the data proxy or
the data router. It can help
you deal with the request from our application,
but all of the sharding all of the
storage nodes, they can help to persist
your data, right? And it can help you to
manage your data part.
And on the other hand,
computing nodes help you deal with the computing part,
right? So that's the basic introduction
of the distributed database system. So if
we can really understand such an
architecture, then we can consider how
to upgrade your favorite existing PostgreSQL
or MySQL SQL databases to make it
become a distributed one. Because I know that
the PostgreSQL or SQL have
been popular for many years and people
love it and they already
deploy it and manage your existing production
environment, right? So can we don't
overturn your database infrastructure and
just to upgrade to become the distributed one,
right? That's another solution for you to solve the
distributed database system issue, right?
So like you can see here the
first solution for you. If you found that your PostgreSQl
cannot help you manage the enormous data and
you found the request from the postgres become slower
and you want to make it have the more performance
and the TPS, higher TPS or QBS,
you can just remove or
get rid of your postgres database infrastructure
and just use the current popular distributed
database one like ProperDB or other Aurora databases.
But another solution that the question here that
we can consider continue
using the PostgreSQl MySQL SQL in your
production environment. But at this point you just use
all of the PostgreSQL cluster as the
storage nodes. Storage part, storage elements of
this distributed database and all
the PostgreSQL instance, or we can call them storage
nodes, can help you persist your data or
do the local computing. And at this point
we can just import or use
global computing nodes into
this distributed system.
Then we can use the
PostgreSQL working as a server node and important
new global computing nodes work as the
database proxy and to group all
of the elements become a distributed one, right?
Therefore we can upgrade your SQL
instance or postgres instance become a distributed
one. So here the key point is that what's
the global computing nodes who
have the capability to work at the computing
nodes? That is Apache
sharding Sophia. I will introduce it later,
but now I can give a high level solution about
this, how to leverage your
postgres instance
to upgrade them, become a distributed one and
also put this distributed database on your
Kubernetes cluster, right? So as
I said, sharding Sophia can work as the database
proxy or the computing nodes of this distributed system.
And your postgresql can work or
act as the storage nodes. And to help you manage your
data and computing nodes can you deal with the request from
your application, right? So therefore,
because all of the two parts
are actually independent from each other,
so you can put your computing nodes
into your Kubernetes cluster. Because Apache,
Shorty and Sophia were computing nodes, they are the
stateless application and
Kubernetes is born for stateless
application, right? So if we
can put the computing nodes into your Kubernetes
cluster and can exactly fully
leverage the Kubernetes mechanism
to manage or deploy all of the status computing
nodes, and here you have two options
to deploy or manage your storage nodes.
The first one you can put your storage
nodes into your Kubernetes.
That means you deploy your storage loads.
That means postgres database
instance into the Kubernetes and just like the
computing nodes to visit their storage nodes and your
application just send a request to your computing
nodes, right? That's the first option. The second option
that I recommend because you know that Kubernetes
currently is not so good to help you manage the stateful
databases, right? So you can just leverage the
RDS on the cloud, on any cloud and
just deploy sharding Sophia. That means the computing
node of this distributed database system into
your Kubernetes cluster. Therefore your application just
send a request to your computing nodes and your
computing nodes will run the global
computing computing
work. And then to get or to persist
the data into or from
your storage node. That means your RDS,
MySQL RDS or postgres or RDS,
right? But for your application they will sync
your application, just visit a database,
a distributed database. Actually this
database, for the application it's a single one. But for
yourself, from the internal perspective,
it's made of a two part, right? But however,
you just independently deploy your storage
nodes and computing nodes in a different place.
The computing nodes live
in the Kubernetes and your RDS were
born on your cloud, right? Yeah.
So what's the benefit of this? My solution,
the first one, it can help you leverage your existing
databases. You don't want to do the
totally change your database infrastructure.
The second one, it can help you upgrade it to the
distributed one, right? Therefore it can meet
your new requirements for your databases infrastructure.
And the last one, because you import sharding Sophia in
your database distributed system, that means
this open source project can provide you
more grade four features. For example, the data sharding,
rewrite, splitting, SQL audit. That means SQL
Fairwall and elastic skilling skew out
such features. And the next one,
that it gave you another way to help
you put your distributed database on
the Kubernetes cluster, right?
Plus because sharding Sophia, it provide the
operators and provide the helm charts.
So it actually provide your out
of box deployment way to help you to
upgrade all of your database infrastructure,
become a distributed one and make it happen in the
Kubernetes cluster. Yeah, so I
mentioned many times about this open source project,
Apache Sharding Sophia so what's Apache Sharding Sophia?
It's an Apache Toplab project.
And this project, basically it's
a database proxy, right? And this database proxy
or database ending can help you.
Here it's introduction. Transfer any monolithic
databases into a distributed one. And also
provide more grateful features like I introduced before,
rewrite it and auto scaling case out data
sharding and SQL firewall or SQL audit
or logging all of the grateful features around your database.
And because this project has open sourced
for more than five or six years, so it has
a mature community, that means you don't worry that you
are the first person to use this project. Many people
already help you check this rifle and
to test this project.
And it provide many user
cases and the documents to help you quickly sign
up and use this project. Yeah, so that's the basic introduction
about this project. The last one I will give some introduction
about the features because that's the important
part, that's the value of this project.
So Apache Sharding Sophia, it has
two clients for you to choose. The first one, sharding Sophia GDBC.
Actually it's a Java driver for your Java
application. When you import sharding Sophia JDBC
into your application. It can help you do
the following features or the functions the data sharding
elastic skew out distributed transaction
rewrite, splitting or data
encryption or data masking actually
because okay, another client is sharding
Sophia proxy. It's a database proxy. So your application
no matter is the Java or your Golong or the
PHP, you can just use some the
standard database driver and to visit sharding
Sophia proxy and sharding Sophia proxy or sharding Sophia
GDBC for your application. You can just
regard it as a distributed database
or the server. But the Sharding Sophia proxy or
sharding Sophia GDBC actually it help you manage
your MySQL PostgreSQL
Oracle SQL servever database cluster.
That means it's not just to
help you manage your database cluster, it can enhance
this database cluster to make it become a distributed one
and enhance it with a lot of useful database
or the features you case around your system.
So you can see here that all the features
and all the databases it support and
it's the deployment architecture for you to choose.
So today I
will use this sharding Sophia proxy to help you do this demo
show. That means at the beginning your application
just visit your primary postgresql instance
or replica postgresql instance. But now
your application no need to care too
many replica instance or primary instance,
just visit sharding Sophia.
There's only single database server and
this database proxy. This database server help you to
manage all of your database clusters to do the
rewrite, splitting right data sharding,
data masking data SQL audit,
all of the great four features you want to use for your
application.
The NAS part it's about sharding Sophia and cloud.
That means okay so sharding Sophia is so great
but I want to easily use it and want to deploy
this stateless proxy,
I mean the computing node cluster into your
Kubernetes cluster. So sharding Sophia on
cloud is a ripple to provide the helm
chart operators to help you automatically
skill in, skill out and deploy this database
cluster. Yeah,
so you can see here,
first you need to use sharding Sophia operator charts to
deploy sharding Sophia proxy into your
Kubernetes cluster. And plus you need
to pick up postgreSql charts to deploy your
postgreSql into the Kubernetes. But if
you already have your RDs on AWs
or on google then you no need
to use the postgresql chart to deploy it into
your Kubernetes cluster, right? Like I mentioned before,
you can just leverage the RDS server
from your database vendor and just
deploy your computing nodes of this distributed
database system into your Kubernetes
cluster, right? And your computing
nodes can directly visit your rds or the
databases on the Kubernetes. Anyway,
yeah, so today I'll
give this solution detailedly
because this slides I introduce how to deploy it.
The second one, after you deploy it you will consider how
to create a sharding table or how to because
when you use sharding Sophia proxy, actually you are using the distributed
database system. That means if you want
to create a table, it's not a single table in one postgres
sentencing it's a distributed sharding table locating
in different postgresql. But for your application
it's just a single one logic databases
database or single one logic table.
But this logic table for example, this one
user table, it's made up of
1234 subtables or physical
tables live in different,
I mean here postgresql instance
and each cluster of Postgresql
has a primary node and replica
nodes, right? So you can see here there are two postgres
cluster. Each one has the one primary
node and replica nodes and your
logical single table logic
table user. For your application there is only one
table, but this t user table,
it has 11234
physical tables, right? So here we use
distributed SQL, this SQL
dialect of sharding Sophia to help you define
this user table. For example, if you
just use the created table t order, that means it
can help you create a single one table.
But you use the distributed SQL, this SQL dialogue
of sharding Sophia, it can help you to create a
sharding table here. So you can see here we use
this keyword sharding table, not just a table,
right? So it's very easy for you to get use of
this distributed SQL language to help you to
manage or define your sharding
database system or this
logic, I mean the tables or databases.
Yeah. So first
when you deploy it, second one, you create a sharding database
and sharding table. Then your application just send
a request to your computing nodes or
this whole database distributed system.
For example, this application standard requiring
here and when your proxy receive this
query it will do the following steps and
to calculate which postgres
shell instance, owen the
readout of this query and Charlene Sophia
will send the query to the
target. Maybe one or maybe many
postgresql instance. And to get the
local results together
and then calculate the final result or
merge the sub result
into the final one and send the final result to your application.
So that's the basic parasites of each
query. The last part is the dymo
show. I have no enough time here,
but you can see here we just deploy PostgreSQl
working as the storage nodes into your Kubernetes.
But actually here you can just use your RDS.
It's okay. The second one deploy sharding Sophia proxy
and then to create your sharding table
and insert some day testing data and to
ask you the query to test it's
okay or not. Yeah, so I
already do this demo by myself. You can see here
how to deploy it, how to create the
database and the table and
how to let your proxy vivid
your RDS or your postgresql instance
and then how to define your sharding
table here and then how to insert
the testing data and how to
ask you to test it. Works well or not.
All right, so that's all about this talk. If you have some
questions you can just ask
me here or visit my linking or GitHub or Twitter.
All right, thanks for your time. See you.