Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, this is Trista. So at this
conference I'm going to share this topic, especially around
how to achieve the availability capability,
lack certificate around your database infrastructure
kubernetes. This is something about myself,
actually my area. It's around the databases,
especially the distributed databases, some cloud databases
and kubernetes. So that's why today I
choose this topic as my talk. I will talk the
DBRE database, database reliability
engineering stuff and also about your MySQL
PostgreSQL, all the monolithic databases on
the Kubernetes. Apart from
my job actually I
love the open source. So I'm the Apache member
incubator mentor to help to give some tips
and sessions to the open source community and their project.
Yeah, so that's my background. And now
I'm the co founder and CTo of Sofia Yas,
this startup company.
Sometimes I will post some articles about the startup open
source community, project Apache and databases
distributed system on my linking and Twitter.
So if you have some questions or you want to talk with
me more, please give a look there. Yeah, so let's get
started with today's talk. Today I will first introduce
some of the background about the SRE, SLA and DBRE
and then we will talk more about your
databases on the Kubernetes on the cloud. Give some
solutions and ideas and architectures about the
distributed databases, how to achieve the high scalability,
scalability and elasticity
around all of them. And if I have more
time I will give the demo, this demo, I do it by
myself, but if we have no enough time I will just give
a brave introduction about it. You can just refer
to this PDf file and do it by
yourself. Okay, so the background is
SRE, SLA, DBRE because it's actually the topic of
our conference. Right. So I don't want to give
much more talk about all the concepts. I just want to say
that currently I know that most of the people will
ask that do I do the career
switch or change about my DBA career.
I want to say especially when the big data become
the hot topic and we have more and
more popular distributed system or big
data platform comes out, I guess
it's a good time to consider about our job,
about our DBA and about our data,
all of the stuff. So I
think it's necessary to know more about the DBRE
and how to use your database
experience to provide good SLA
service level agreement to your services and to
the companies, especially the application layer.
Right. But actually when we
consider the SRE or DBRE, I want to
say because the database, it's different from your application
service and all the status services, right?
But databases, you need to consider about the vivid,
the connections data and the persistence
data, the data persistent, right? So you need to
give more specific consider about
your DBRE stuff but they all
share the same, I mean the overlapping area,
especially about the SLA,
SLO and SLI,
yeah. If you want to learn more about all the
ideas, I recommend the following two books.
But today we will just talk about the DBRE's
SLA, I mean the indicator, because no matter
you are the DBRE DBA or
you are the developers, you consider how
mean you have the high SLA
agreement, you commit to your users
and users that you will provide that levels
of service agreement,
right. The quality. So if
we give attention to the DBRE, I mean databases,
SLI, that is the service level indicator,
we use this indicator to watch
or to review that the service quality
or the service level agreement is good or not
for application or for our end user.
Yeah, so this book,
DBRE, this book gives some the good,
I mean the SLA for our databases,
for example, the latency, the throughput,
availability, the scalability or the efficiency
for us to consider about work quality.
But it's a high level indicators
for our production environment. So I just
give them the more practical indicators
around our databases and production
environment. For example,
how to large the tremendous data,
how to provide efficient acquiring answers
to our application, how to do the data security
and do your database cluster
have a good elastic scaling capability or
not and what's your backup and recovery and how
to do the matrix to help us to location the
error or exceptions if it come
out and how to make our infrastructure become
portable and do we have the
good automatical deployment
principles or the method, right?
Yeah. So that's the specific items for our databases
for us to review or
to evaluate the quality of our database service.
So on the right part I also give
some the technique to solve or to reach that
goals. For example, we can use the distributed system,
we can do the data sharding to help our database
infrastructure have the scalability
or electricity and also we can consider to
create more replicas to do the ray rasp bleeding stereotype for
traffic, for vivid to make this database
cluster have the high availability or
can have the higher throughput.
And also we can use some the data encryption,
I mean the open source project to help us to protect
our data. We can create the monitor or metric system
to help do the learning stuff,
right? And also we need to consider, use some
helm or operators, especially on the Kubernetes
to help to provide the out of the box deployment
and all the stuff, I mean the maintaining work.
So that's the techniques I can give to solve
part of the SLI or
the items on the left column.
Yeah, for our databases.
For the end user actually they don't care about all
of the specific indicators or all
of the stuff. They just want to know that the
end user, at the end user they just want to know that your
database service can provide the service
when I do every query range and at the
end user just want to know that do your databases can
provide the efficient query and to
answer my question as soon as possible. So that's
the main point your end user concerned. But for
our DBRe or databases or developers
or DevOps we needed to dive into
to consider the following indicators.
The last part I need to give the
solution for you to solve all of the issues
I mentioned before. I know that
most of you will use the postgreSQl RDs,
MySql, Oracle Sql sqr on your production environment.
But now we enter into the new era,
right? So we need to consider how to use the new solution to
solve the current headaches
about our databases infrastructure. First,
I suggest that you can just use some distributed database
vendor because different databases vendor,
they will solve one or a few headaches of
your databases. So you can give
the details look and do some the research around
all of the database vendor to look. They can
help you solve your headache or not. The second one we
can just consider to use the one stop solution.
You can choose the cloud
vendor. They definitely provide some databases
or big data platform or product for you
to solve. The current issue, a third one,
that's what I want to mention today. I want
to give a new idea about your databases
architecture. And if you don't
want to be locked in
database vendor or the cloud
vendor, maybe you need to consider how to make my database
infrastructure become more flexible and
portable of the stuff and how to leverage
my current databases and infrastructure.
Because I know if you don't have
the specific requirements about your
databases and you don't want to overturn
or change your database infrastructure, right?
So I want to introduce the third options for
you and to help you reach
the following benefits from this new solution.
The first, this solution can help you leverage your existing
databases. I mean, don't do a
big change to your MySQL infrastructure or postgreSQL
database infrastructure and just leverage them,
but can upgrade your current monolithic
database infrastructure to become distributed
one. I mean, to do the data sharding or data encryption,
such good features to make your current monolithic databases
become the distributed one. And also this solution
can give you more the necessary good
features around your database infrastructure. For example,
the SQL fair wall or SQL audit,
the traffic governance or the electricity
capability. Yeah. And last
part you consider, do I leverage all my
favorite postgres or postgraduate databases on the
Kubernetes, right. And also give
you some out of the box deployment way to help you quickly
do some demo to test is okay for you or
not, right? So that's the benefits
from today's topic. The basic
idea comes from the distributed basic architecture.
Because we know that if you want to make a
monolithic architecture become a distributed
one, that means we need to split
big databases or big project
into small one. All of them will
located in different servers or machines,
right? So that means the distributed,
right, because such architecture, I mean the
new architecture can help us to improve the
databases and infrastructure to make it have the
skewing, skew out scalability or elastic capability,
or can help us to manage more and more
tremendous data, right? We cannot just
rely on one server machine to help us
to manage the PB or TB,
the data, especially in this new era.
I mean, every day, every minute, every hours or
end clusters or applications create their data,
right? Yeah.
So look at here,
we want to adopt the distributed database architecture.
That means we split a single one into a lot
of small one, make this architecture have the scalability.
Basically when we speak the distributed database system.
Actually we'll split this single monolithic databases
into two important element elements.
The first one the computing node, the second one
the storage node. A storage node will help us to persist
your data or do the local compute to
get or store the partial data.
But for computing node, their capability
or their function is
to provide the vivid entrance and
do the global computing workflow,
right? And to answer the people's question queries.
So if you gave a look at the MongoDB architecture or
the corporal DB architecture or other one, you will found
that they use the same idea to do
the distributed one.
So we based on this idea and today
I want to leverage your favorite
MySQL or PostgreSQL or SQL server
databases and to make it or update
it to become a distributed one. So here we can leverage
your current databases to make it work
or act as the storage nodes, right? Because the
PostgreSQL people
has used it in many, many of the production environment,
we trust it's stable,
working as a single databases.
And here in this distributed database system,
we will look at. We regard it as one of
the storage node. It can help us persist your data
do the local computing. And here we
just want to import the computing node into
this distributed system. Therefore the computing node,
the new one, plus your current databases,
the PostgreSQL or MySQL databases working
as a storage node will
merge into a distributed database,
right? So who can be the computing
node? That's an open source project. It's the
Apache shirting Sophia.
Apache Shirley. Sophia will give the brief introduction about
it. It's actually a
database engine or database. It's a distributed
database engine or a distributed database proxy,
right? So if you use the sharding
Sophia working as the computing nodes,
then you can see here, the sharding Sophia plus
your postgresQl database cluster will
made up a distributed database system.
So for your application, you just send your quirks
to this proxy or to this distributed database
ending and sharing Sophia.
It can deploy it as a cluster. It can work
as the computing nodes of this distributed system.
It can pretend itself as the database server
to handle all of the queries from
our application. And this
proxy can help us to shard our databases
or shard our data. It can do the
rewriting splitting traffic governance
strategy. That means it's like the database gateway.
You just send your queries to the sharding Sophia. And sharding
Sophia will found out this query, we can send it to
one shard, right? We can send it to the replica
one, right? So sharding Sophia can help you
do the data sharding and rewrite splitting.
And also if one of your postgres crashed,
it can be aware of that status and updated
itself. Therefore your application,
next time your application with it,
this database proxy, sharding Sophia will never
send this query to your crashed instance.
I mean the crashed replica. Right? So you just
to tell sharding Sophia that you want such of
the features and shorting Sophia will work as the distributed computing
nodes or the database server to handle
all of the queries and also can help you handle
the visit the traffic governance,
right? Yeah. So that's the magic or
the function of the sharding Sophia. So when you
use sharding Sophia and your infrastructure,
database infrastructure will from this one become
this one. So before your application, just to
directly visit your postgresql instance,
your application needed to know it's the primary
node, it's a replica node. But later on, if you use sharding
Sophia, you don't need to care about all of
the replica one or primary one.
It's the first shard clusters.
It's the second shard clusters. You don't care about all of the stuff.
Sharding Sophia help you do all the stuff. You just send
your query to Sharding Sophia. Sharding Sophia will deal
with all of the requirements.
Yeah. Therefore it's the function and
I just want to introduce it and tell you how to
use it in your database infrastructure. And second
one, let's give a brief look at its GitHub statistic
because nobody wants to
use the new project. Because you worry about that maybe
one day this project disappeared or
closed sourced, become closed sourced, right? You don't
worry about that. It has been open source for six years
and it's the Apache top level project and
it's already received more than 400
contributors. And we have more than
nearly 18k stars on
the GitHub and already released more than
near 50 times. Right. So it's a
big community, it's a very popular open
source project and their document
and the user cases are so good,
a good quality for you to use it or to know
more about it. Okay,
so the sharding Sophia clients,
as I said before at this page in your application,
just use the databases proxy to visit
your databases. Actually, for this
database proxy, sharding Sophia provides two clients
for you to choose. First one is the driver client.
The second one, it's the proxy client.
So you can base your cases to choose one of them.
For this driver it named Sharding Sophia JDBC. It's just
suited for your Java application
because you can see its name Sharding Sophia GDBC.
It's a lightweight implement of your GDBC
interface. So you just use
this lightweight driver and this lightweight driver
will help you manage your database cluster and do the data
sharding or the react splitting or data encryption.
Such features. And also for your Golong
or rust or other language
development language applications, you can choose the
short and Sophia proxy. Actually you need to deploy short
and Sophia independently. It will really work as
a database server. It can also share
the same features with the Sharding Sophia GDBC
like shard your data, do the data
migration and do the ray ras splitting
and help you do the data encryption,
data masking, SQL audit, all of the good
features.
Here you can see this page. It's indexed
of all of the features and all of the databases this
project supports.
But I know you will consider how
to make this sharding Sophia to
work with such features. The NAS pay part
I will introduce the distributed SQL,
it can help you do the configuration stuff.
So the NAS part I need to introduce the out
of the box deployment, especially on
Kubernetes sharding Sophia on cloud this
ripple here, it can help you quickly deploy
sharding Sophia proxy on the Kubernetes.
One command to deploy every element
of this project and it
can also help you to deploy the
governance center or proxy or
your databases. Currently we support the
postgreSQL databases here,
so it provides
the helm charts and operators to guarantee the high
availability and automatical
deployment and the metrics
such features on the Kubernetes.
So our dymo I will use this
ripple sharding Sophia on cloud to help us quickly
deploy a sharding Sophia clusters and you
can to choose use the RDS working
as the storage node and just like sharding
Sophia unclouded rifle to help you
deploy the sharding Sophia proxies,
right? Because like I said,
one benefit of this solution that you don't be locked
in one cloud because here
you just use this open source project to working as
a computing node of this distributed databases and
it can reach the same qualities as the
distributed system. For example data sharding distributed
transaction, right? Another benefit of the solution that
your storage node or your databases
can be located anywhere. It could be
the RDS on Google Cloud or AWS.
It can be the open source databases
working on your kubernetes cluster.
If you use some the PostgreSQL operator or
PostgreSQL helm charts, you can just deploy
your databases on kubernetes, right? Yeah.
So no matter your database is located where
your database is located,
just like sharding Sophia with it, your storage node,
then your application just visited this
virtual distributed databases and server.
Then you can own this distributed
database system. So that's the flexibility of
this solution. So here my
dymo, I just choose one
popular postgres chart to help me to deploy
postgres instance on kubernetes. Actually you can
just use your current RDS service and I
use the sharding Sophia on cloud ripple to deploy the sharding
Sophia cluster. Then application just
visit your shorting Sophia proxy shorting Sophia cluster
shorting Sophia Proster can help you do the data shorting
and rewriting features.
So you can see here, I mean
use this rifle to deploy all the stuff.
And I want to say here, like I
mentioned before, how to create a sharding table here
I want to introduce the distributed SQL
like this one, create a sharding
table user. This user,
the sharding column of this distributed
table, its user id. And we use
the mod sharding algorithm
to split this logic table single
table into four shards or into
four actual table.
All of the four table will be located
in the first PostgreSQL
group and the second PostgreSQl
group. I mean PostgreSQl cluster. So you
can see here, it's just a SQL
dialect of PostgreSQl
SQl or MySQL SQl or Oracle SQl.
It's just SQl but it's different from your
standard SQL language. It's a new type.
We name it as the distributed
SQL language to help us to create
your sharding database architecture.
That means you just deploy
your postgreSQl databases
and sharding Sophia proxy. And you ran
all of the distributed SQL at sharding
Sophia proxy client at sharding Sophia
proxy. Then it will help you automatically
do the data sharding, right? So that
I mentioned before, if you want the
sharding Sophia to help you do the data sharding
or skewing, skew out or rewrite splitting or
the SQL audit, you can just use the
distributed SQl here and
tell him use the distribute SQL
and it will help you reach the
work with all of the functions I mentioned before.
See here the rewrap splitting function.
We can just tell Charlie Sophia, hey, this one
is the right data source.
This one is the read data source. And we want to use
the random strategy to help our application
to, to do
the rewrite splitting that function,
right. So it will randomly send your query
to the different replicas. For example, if you have the
three replica, all of them can help you
get the readout. And here we use this distributed
SQL to tell Sharding Sophia, hey, you can randomly
send the clusters select queries to different
three replicas. So that's the power of the
distributed SQL, this demo.
I use a lot of the distributed SQL to help us
manage our database clusters and to do and
query and gather readouts from this distributed
system. And if you have time, you can do it by yourself
to do the demonstration. That's all about today's
talk. And if you have any questions you can contact me
here or you can just raise the issue at
the GitHub. I mean the communities are
viewing to answer your question and to help you out.
Yeah, see you next time.