Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everyone, this is a gist. Today my talk is
about how to do the traffic governance
from your application to your database infrastructure
on the Kubernetes. Before to answer this question
I will raise another question and the
most request from all of our community. That is
how to upgrade your existing monolithic
databases cluster. Maybe MySQL or PostgreSQL
or Oracle or SQL silver to make those
databases cluster become the sharding architecture or
make it become the distributed database architecture,
right? And then to do the traffic governance because
in the sharding architecture the policy is
so complicated, right? So Trista
Pan, the Sofia yes, cofounder CTO the CTO
my area it's around the database mesh,
the database distributed database. About the
database Management AI platform development
because I'm a developer and spend a lot of my time
and efforts in open source, especially in Apache
Software foundation. So now I'm the mentoring
to incubator Apache project and be the member
of this Apache Software foundation. Sometimes I will post
some articles or some about the business
or startup or open source or the databases
cloud stuff on my Twitter linking. If you are interested inside
topics, please give a look there and let's
discuss all the stuff there, right? So today our
content will include the following item parts. The first
one gave the brief introduction about background.
A second one, light found the new requirements for the
databases on cloud. A third one I will give the architecture
and the solution AAS one will dive into a
SQL lifecycle in this distributed
database system. Then if I have enough time,
I will give the detailed introduction about the dymo,
right? So first part is the
digital transformation is so popular these
days, right? Because people really want to leverage all the
novel technologies to upgrade their
infrastructure or think out of
some new ideas to service
their customers and users efficiently and
effectively, right? So it's not just about how
to do the change about your infrastructure
from the development to your delivery
paradigm, it's also about the culture change or
mindset. So that's a background.
In this background, how about the database? Because all
of the changes will raise the new
requirements about our infrastructure,
about our databases, and currently from our community.
And from my experience, people have the following major
concerns. The first one, how to manage
and store so matches data,
right. The second one, how to make each
request to be answered as fast as it can.
The third one about how to do the traffic governance.
Because as I said before, especially in the distributed
database system, there are complicated
topology among your different computing nodes,
your application and your storage nodes,
right? So we hope that someone or
some application platform to help us to deal with all
of the topology and then we'll consider elastic
scaling. That means scalability for our
future scenarios and future data,
right? Then if there it's out of the box
solution or deployment or tools that will be
perfect, right? So to answer the following questions I
will give my answer about each of them. First,
large data to manage and efficient acquirings in
all of the needs. I will give the data sharding answer.
Next one about the czech government. Based on
the data sharding architecture we can do the
high availability and rewrite waiting or other like
the SQL audit or based
on some metrics to do the traffic strategy.
All the stuff then about elastic
scaling. That means we can help user to research
their computing nodes and SQLserver nodes
of a distributed database system. Later on I will give
some introduction about such part.
And then thanks to the Kubernetes,
we found that the macro service application works so well
and effectively on the Kubernetes.
So could we use the similar
prems, primitives or other tools or
use the similar mechanism on the Kubernetes to
help us manage our databases or data is
possible? I think it's possible because most of the
companies are working on that way, right? All right,
so how to do that part? How to do help us to
include all the solutions into one and just give
the out of the box solution to help a user to adopt
it. Before give the solution or the answer I
need to first give the background of database system.
Let us see this architecture
again to consider about the databases. Because if
we can know the fundamental
of the database system, then we will have the better way to
solve the issues I mentioned before. First I
want to say that here a database system,
actually they made up two parts. First one
is computing nodes, right? A second one
is storage nodes. That means our database,
no matter it's a distributed one or monolithic,
they have these two parts.
So important capabilities, right?
But for the distributed database system, they just
split the computing nodes and storage nodes separately
and deploy them in different
locations. So that means this database
become a distributed one. But whereas
our monolithic databases, they merge the computing
capability and the storage capability in it together.
So you can just one common to
deploy this single, I mean database
instance consisting of computing nodes and
storage nodes together in one machine, right? That's the difference
between the monolithic and the distributed
databases, right? So today we will
sync the solution in this way.
We could consider that if we already have
the Mysql or postgresql here, right?
That's over your databases cluster.
Existing database cluster, right? So could we just
regard such monolithic database as
the storage nodes of a new
distributed database system,
right? And we don't
do any change to our existing databases,
don't do any generous action on them,
just import the computing nodes
in this system, right?
So these computing nodes can works
as a database server, right?
If your databases is like the MySQL
or PostgreSQL, right? So now we can make those
storage nodes become the local storage nodes, I mean become
the local database instance. And we can
just import the global computing
nodes working as the database SQLServer.
Then the computing nodes plus the SQLServer nodes will become
the distributed database system, right? So in
that way we can upgrade your existing database,
become a distributed one. So the last part
is sharding Sophia. Sharding Sophia. It can work at this
role. I mean it could be work as a database proxy
or database pretend itself
aas a database server, like a MySQL server or
PostgreSQL server. So if
we import sharding Sophia here and working as a
computing node, and it will connect to different database
instance, that means the storage nodes of this
new database distributed database
system and sharding Sophia also have the governance
node to help synchronize all of the metadata change
among different computing nodes. I mean shirting Sophia,
then we can by this way upgrading
our existing monolithic MySQL cluster
or postgres cluster become a distributed
MySQL cluster or postgresql
database cluster, right? Another benefit from this solution,
that because we adopt the
computing and storage splitting
architecture, so we can do the
research on computing nodes or
storage nodes, that means sharding Sophia or
your databases independently,
right? At this point, if you found that you need
more computing power, then you can just
spin up more sharding Sophia proxy
or sharding Sophia.
So that means you can have more computing
power computing nodes working in this
distributed database system.
But another aspect that some users
found, they need more storage capabilities,
then you don't spend too much time, effort money
on such computing nodes. You can,
I mean create more database instance,
right? So here maybe one, two, three, maybe there are four or
five, one. And sharding Sophia can connect
to the new database instance. And to help you
reshard your original data
among one, two, true, the old and the new
one. I mean the new storage
nodes, your new postgresql nodes will contain
the data and resharded by sharding Sophia.
So that's the benefit of this solution.
The last part let us know more about this
row. Computing nodes Apache sharding Sophia so Apache
sharding Sophia. Why do I say that
it can help us do that work? Because here
sharding Sophia is an ecosystem to transfer any
databases into distributed databases
and enhance it with a sharding feature, elastic skilling
feature, data encryption feature or more,
right? So from the slogan it looks like Trista
pan work while and this project
also have strong community to help us to answer
some questions. And because now you can see here
the statistic on GitHub,
more than 17,000 search and
released for nearly 15 times
and also have more than 400 contributors
there, right? So that means you don't worry that you will
be the first person to use the project. You don't worry about
the analytic,
all about the issues about this community
because your issue maybe has
been found by others by this community, right? So it's a
mature community and mature project for us
to use. And hasten document is
so detailed and to help us enter hasten
concept help us to set
up this ecosystem or the solution.
So sharding Sophia provides two clients.
The first one is sharding Sophia proxy, the next one sharding Sophia
GDBC. But like I said before,
sharding Sophia proxy it can work as the computing
nodes Azure sharding suffer GDBC same. But today we will sharding
Sophia proxy do our demo show. And so you can see here
shorting CV proxy can visits MysQL
or PostgreSQL or RDS databases.
So that means shorting Sophia proxy can help us manage your database
cluster and also can upgrade make
it become the distributed one and help us to do the
traffic governance and to help us manage the
complicated topology of your distributed
database system, right? Because all the application will
first send the request to Sherdin Sophia. Sherdin Sophia
do a lot of computing, global computing and
to target which database cluster have the
expected data and then do
and merge the different local readout side to
become the final one and to return to our annulars.
And also another benefit that it
can help us to do what that's the real
rest bleeding. That means because your databases
like this, the first databases instance,
maybe it will have a lot of replica for this shard,
right? And have many replica for the second shard and
have replica for the third shard. And then
the sharding Sophia will to judge this request,
this SQL it's select
SQL or DML SQL the
update or insert. So it will automatically
route the SQL to different primary
nodes and its replica. You can
just random route the request to
the replica or to use other
strategies. But the main
target or main work of sharing
Sophia proxy or GDB say that it can
help you to leverage your replicas performance
and your primary nodes performance to make the
throughput to be input. So that's
the benefit of sharding Sophia,
right? So the last part that I
will give the final solution here, shorting Sophia.
Provide the sharding Sophia charts and you can use the helm by
one command to light the helm to help you deploy
insurancefia cluster in this Kubernetes.
And also today if you already have
your postgresql instance, that's okay, you can ignore this part.
But today I want to give this complete demo. So I
use the postgreSql charts to help me to
deploy two postgres instance here.
So now on this Kubernetes we have the
computing nodes, we have the storage nodes
and computing nodes plus storage nodes become the
final distributed system. Our application
can just send the request to our sharding Sophia
proxy. And sharding Sophia proxy will help
us to do short
data sharding or rewrite or SQL audit
or obstacation or authority such stuff.
So that's all the features of shorting Sophia,
right? You can see
here or the application just
send the request to shortening Sophia proxy and shorting Sophia proxy
will help us short data, rewrite,
splitting and to guarantee and
to found that maybe there are many replica.
If sharding Sophia found the primary crashed, it will
send loadercrest to our replica ones,
right? So that's the sharding Sophia proxies function
and sharding Sophia garners help us to synchronize the
metadata among different proxy and sharding Sophia
operator. It help us to guarantee the high
availability of sharding Sophia proxy and also
to do the elastic computing node skew in
or skew out. So that's the sharning sphereex operator.
It's like DBA on the Kubernetes to guarantee
this distributed system. Another point I want to
say that PostgreSQL or MySQL,
it's self availability.
We need to other tools to guarantee
its availability. But sharing Sophia
could be aware of the
different roles of the databases and to help to
route or reroute the traffic to different postgresql
instance. But today I have
no enough time to give more details but I
will give the final part. Let us to see that.
How about the SQL? So here when the application
status request to sharding Sophia proxy sharding Safari
proxy will use the sharding algorithm to found
which databases contain the
expected data, right? And to gather the data
from the database instance and to the sharding
CP proxy and merge the local readout site and to
return them to our end user. But another way
that it will also to judge
or to evaluate like here, it's a
select SQL right select statement. So sharing Sophia
proxy will automatically send the select
statement to the replica ones, not the primary ones.
Also you can to light sharding Sphereex
to help you to send a request to
the primary ones or replica ones. That's okay,
it's up to you. You can just write some the YAML file to
tell sharing CV to do the such stuff.
But all of the change statements, I mean insert
or update or create, it will automatically
send all of the change data statement to the
primary ones. So primary ones will synchronize
the changes to their replica. So that's the clear now,
right? Okay, so the demo show have no
time to give the one
by one step by step the
demo, but you can refer to
my steps to create or to
test that solution. All right, so see you
next time. And if you also have any questions,
just contact me on my Twitter GitHub or linking
and and that's all. So see you.
Bye.