Conf42 Site Reliability Engineering 2023 - Online

How to achieve the scalability, high availability, and elastic ability of your database infrastructure on Kubernetes

Video size:

Abstract

This topic will present a solution to distribute your database clusters and make them scalable, elastic, and highly available by governing the traffic among your applications and databases.

Summary

  • Tristan will talk about how to achieve the availability capability, lack certificate around your database infrastructure kubernetes. Trista is the co founder and CTo of Sofia Yas, a startup company. Sometimes he will post some articles about the startup open source community on his linking and Twitter.
  • Today I will first introduce some of the background about the SRE, SLA and DBRE. Then we will talk more about your databases on the Kubernetes on the cloud. Give some solutions and ideas and architectures about the distributed databases.
  • DBRE's SLA is the service level indicator for our databases. We use this indicator to watch or to review that the service quality is good or not for application or for our end user. For the end user they just want to know that your database service can provide the service.
  • The last part I need to give the solution for you to solve all of the issues I mentioned before. First, I suggest that you can just use some distributed database vendor. The second one we can just consider to use the one stop solution. You can choose the cloud vendor.
  • Today I want to give a new idea about your databases architecture. This solution can help you leverage your existing databases. Can upgrade your current monolithic database infrastructure to become distributed one. And also give you some out of the box deployment way.
  • Sharding Sophia provides two clients for you to choose. For your Golong or rust or other language development language applications, you can choose the short and Sophia proxy. NAS part I need to introduce the out of the box deployment, especially on Kubernetes sharding Sophia on cloud.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, this is Trista. So at this conference I'm going to share this topic, especially around how to achieve the availability capability, lack certificate around your database infrastructure kubernetes. This is something about myself, actually my area. It's around the databases, especially the distributed databases, some cloud databases and kubernetes. So that's why today I choose this topic as my talk. I will talk the DBRE database, database reliability engineering stuff and also about your MySQL PostgreSQL, all the monolithic databases on the Kubernetes. Apart from my job actually I love the open source. So I'm the Apache member incubator mentor to help to give some tips and sessions to the open source community and their project. Yeah, so that's my background. And now I'm the co founder and CTo of Sofia Yas, this startup company. Sometimes I will post some articles about the startup open source community, project Apache and databases distributed system on my linking and Twitter. So if you have some questions or you want to talk with me more, please give a look there. Yeah, so let's get started with today's talk. Today I will first introduce some of the background about the SRE, SLA and DBRE and then we will talk more about your databases on the Kubernetes on the cloud. Give some solutions and ideas and architectures about the distributed databases, how to achieve the high scalability, scalability and elasticity around all of them. And if I have more time I will give the demo, this demo, I do it by myself, but if we have no enough time I will just give a brave introduction about it. You can just refer to this PDf file and do it by yourself. Okay, so the background is SRE, SLA, DBRE because it's actually the topic of our conference. Right. So I don't want to give much more talk about all the concepts. I just want to say that currently I know that most of the people will ask that do I do the career switch or change about my DBA career. I want to say especially when the big data become the hot topic and we have more and more popular distributed system or big data platform comes out, I guess it's a good time to consider about our job, about our DBA and about our data, all of the stuff. So I think it's necessary to know more about the DBRE and how to use your database experience to provide good SLA service level agreement to your services and to the companies, especially the application layer. Right. But actually when we consider the SRE or DBRE, I want to say because the database, it's different from your application service and all the status services, right? But databases, you need to consider about the vivid, the connections data and the persistence data, the data persistent, right? So you need to give more specific consider about your DBRE stuff but they all share the same, I mean the overlapping area, especially about the SLA, SLO and SLI, yeah. If you want to learn more about all the ideas, I recommend the following two books. But today we will just talk about the DBRE's SLA, I mean the indicator, because no matter you are the DBRE DBA or you are the developers, you consider how mean you have the high SLA agreement, you commit to your users and users that you will provide that levels of service agreement, right. The quality. So if we give attention to the DBRE, I mean databases, SLI, that is the service level indicator, we use this indicator to watch or to review that the service quality or the service level agreement is good or not for application or for our end user. Yeah, so this book, DBRE, this book gives some the good, I mean the SLA for our databases, for example, the latency, the throughput, availability, the scalability or the efficiency for us to consider about work quality. But it's a high level indicators for our production environment. So I just give them the more practical indicators around our databases and production environment. For example, how to large the tremendous data, how to provide efficient acquiring answers to our application, how to do the data security and do your database cluster have a good elastic scaling capability or not and what's your backup and recovery and how to do the matrix to help us to location the error or exceptions if it come out and how to make our infrastructure become portable and do we have the good automatical deployment principles or the method, right? Yeah. So that's the specific items for our databases for us to review or to evaluate the quality of our database service. So on the right part I also give some the technique to solve or to reach that goals. For example, we can use the distributed system, we can do the data sharding to help our database infrastructure have the scalability or electricity and also we can consider to create more replicas to do the ray rasp bleeding stereotype for traffic, for vivid to make this database cluster have the high availability or can have the higher throughput. And also we can use some the data encryption, I mean the open source project to help us to protect our data. We can create the monitor or metric system to help do the learning stuff, right? And also we need to consider, use some helm or operators, especially on the Kubernetes to help to provide the out of the box deployment and all the stuff, I mean the maintaining work. So that's the techniques I can give to solve part of the SLI or the items on the left column. Yeah, for our databases. For the end user actually they don't care about all of the specific indicators or all of the stuff. They just want to know that the end user, at the end user they just want to know that your database service can provide the service when I do every query range and at the end user just want to know that do your databases can provide the efficient query and to answer my question as soon as possible. So that's the main point your end user concerned. But for our DBRe or databases or developers or DevOps we needed to dive into to consider the following indicators. The last part I need to give the solution for you to solve all of the issues I mentioned before. I know that most of you will use the postgreSQl RDs, MySql, Oracle Sql sqr on your production environment. But now we enter into the new era, right? So we need to consider how to use the new solution to solve the current headaches about our databases infrastructure. First, I suggest that you can just use some distributed database vendor because different databases vendor, they will solve one or a few headaches of your databases. So you can give the details look and do some the research around all of the database vendor to look. They can help you solve your headache or not. The second one we can just consider to use the one stop solution. You can choose the cloud vendor. They definitely provide some databases or big data platform or product for you to solve. The current issue, a third one, that's what I want to mention today. I want to give a new idea about your databases architecture. And if you don't want to be locked in database vendor or the cloud vendor, maybe you need to consider how to make my database infrastructure become more flexible and portable of the stuff and how to leverage my current databases and infrastructure. Because I know if you don't have the specific requirements about your databases and you don't want to overturn or change your database infrastructure, right? So I want to introduce the third options for you and to help you reach the following benefits from this new solution. The first, this solution can help you leverage your existing databases. I mean, don't do a big change to your MySQL infrastructure or postgreSQL database infrastructure and just leverage them, but can upgrade your current monolithic database infrastructure to become distributed one. I mean, to do the data sharding or data encryption, such good features to make your current monolithic databases become the distributed one. And also this solution can give you more the necessary good features around your database infrastructure. For example, the SQL fair wall or SQL audit, the traffic governance or the electricity capability. Yeah. And last part you consider, do I leverage all my favorite postgres or postgraduate databases on the Kubernetes, right. And also give you some out of the box deployment way to help you quickly do some demo to test is okay for you or not, right? So that's the benefits from today's topic. The basic idea comes from the distributed basic architecture. Because we know that if you want to make a monolithic architecture become a distributed one, that means we need to split big databases or big project into small one. All of them will located in different servers or machines, right? So that means the distributed, right, because such architecture, I mean the new architecture can help us to improve the databases and infrastructure to make it have the skewing, skew out scalability or elastic capability, or can help us to manage more and more tremendous data, right? We cannot just rely on one server machine to help us to manage the PB or TB, the data, especially in this new era. I mean, every day, every minute, every hours or end clusters or applications create their data, right? Yeah. So look at here, we want to adopt the distributed database architecture. That means we split a single one into a lot of small one, make this architecture have the scalability. Basically when we speak the distributed database system. Actually we'll split this single monolithic databases into two important element elements. The first one the computing node, the second one the storage node. A storage node will help us to persist your data or do the local compute to get or store the partial data. But for computing node, their capability or their function is to provide the vivid entrance and do the global computing workflow, right? And to answer the people's question queries. So if you gave a look at the MongoDB architecture or the corporal DB architecture or other one, you will found that they use the same idea to do the distributed one. So we based on this idea and today I want to leverage your favorite MySQL or PostgreSQL or SQL server databases and to make it or update it to become a distributed one. So here we can leverage your current databases to make it work or act as the storage nodes, right? Because the PostgreSQL people has used it in many, many of the production environment, we trust it's stable, working as a single databases. And here in this distributed database system, we will look at. We regard it as one of the storage node. It can help us persist your data do the local computing. And here we just want to import the computing node into this distributed system. Therefore the computing node, the new one, plus your current databases, the PostgreSQL or MySQL databases working as a storage node will merge into a distributed database, right? So who can be the computing node? That's an open source project. It's the Apache shirting Sophia. Apache Shirley. Sophia will give the brief introduction about it. It's actually a database engine or database. It's a distributed database engine or a distributed database proxy, right? So if you use the sharding Sophia working as the computing nodes, then you can see here, the sharding Sophia plus your postgresQl database cluster will made up a distributed database system. So for your application, you just send your quirks to this proxy or to this distributed database ending and sharing Sophia. It can deploy it as a cluster. It can work as the computing nodes of this distributed system. It can pretend itself as the database server to handle all of the queries from our application. And this proxy can help us to shard our databases or shard our data. It can do the rewriting splitting traffic governance strategy. That means it's like the database gateway. You just send your queries to the sharding Sophia. And sharding Sophia will found out this query, we can send it to one shard, right? We can send it to the replica one, right? So sharding Sophia can help you do the data sharding and rewrite splitting. And also if one of your postgres crashed, it can be aware of that status and updated itself. Therefore your application, next time your application with it, this database proxy, sharding Sophia will never send this query to your crashed instance. I mean the crashed replica. Right? So you just to tell sharding Sophia that you want such of the features and shorting Sophia will work as the distributed computing nodes or the database server to handle all of the queries and also can help you handle the visit the traffic governance, right? Yeah. So that's the magic or the function of the sharding Sophia. So when you use sharding Sophia and your infrastructure, database infrastructure will from this one become this one. So before your application, just to directly visit your postgresql instance, your application needed to know it's the primary node, it's a replica node. But later on, if you use sharding Sophia, you don't need to care about all of the replica one or primary one. It's the first shard clusters. It's the second shard clusters. You don't care about all of the stuff. Sharding Sophia help you do all the stuff. You just send your query to Sharding Sophia. Sharding Sophia will deal with all of the requirements. Yeah. Therefore it's the function and I just want to introduce it and tell you how to use it in your database infrastructure. And second one, let's give a brief look at its GitHub statistic because nobody wants to use the new project. Because you worry about that maybe one day this project disappeared or closed sourced, become closed sourced, right? You don't worry about that. It has been open source for six years and it's the Apache top level project and it's already received more than 400 contributors. And we have more than nearly 18k stars on the GitHub and already released more than near 50 times. Right. So it's a big community, it's a very popular open source project and their document and the user cases are so good, a good quality for you to use it or to know more about it. Okay, so the sharding Sophia clients, as I said before at this page in your application, just use the databases proxy to visit your databases. Actually, for this database proxy, sharding Sophia provides two clients for you to choose. First one is the driver client. The second one, it's the proxy client. So you can base your cases to choose one of them. For this driver it named Sharding Sophia JDBC. It's just suited for your Java application because you can see its name Sharding Sophia GDBC. It's a lightweight implement of your GDBC interface. So you just use this lightweight driver and this lightweight driver will help you manage your database cluster and do the data sharding or the react splitting or data encryption. Such features. And also for your Golong or rust or other language development language applications, you can choose the short and Sophia proxy. Actually you need to deploy short and Sophia independently. It will really work as a database server. It can also share the same features with the Sharding Sophia GDBC like shard your data, do the data migration and do the ray ras splitting and help you do the data encryption, data masking, SQL audit, all of the good features. Here you can see this page. It's indexed of all of the features and all of the databases this project supports. But I know you will consider how to make this sharding Sophia to work with such features. The NAS pay part I will introduce the distributed SQL, it can help you do the configuration stuff. So the NAS part I need to introduce the out of the box deployment, especially on Kubernetes sharding Sophia on cloud this ripple here, it can help you quickly deploy sharding Sophia proxy on the Kubernetes. One command to deploy every element of this project and it can also help you to deploy the governance center or proxy or your databases. Currently we support the postgreSQL databases here, so it provides the helm charts and operators to guarantee the high availability and automatical deployment and the metrics such features on the Kubernetes. So our dymo I will use this ripple sharding Sophia on cloud to help us quickly deploy a sharding Sophia clusters and you can to choose use the RDS working as the storage node and just like sharding Sophia unclouded rifle to help you deploy the sharding Sophia proxies, right? Because like I said, one benefit of this solution that you don't be locked in one cloud because here you just use this open source project to working as a computing node of this distributed databases and it can reach the same qualities as the distributed system. For example data sharding distributed transaction, right? Another benefit of the solution that your storage node or your databases can be located anywhere. It could be the RDS on Google Cloud or AWS. It can be the open source databases working on your kubernetes cluster. If you use some the PostgreSQL operator or PostgreSQL helm charts, you can just deploy your databases on kubernetes, right? Yeah. So no matter your database is located where your database is located, just like sharding Sophia with it, your storage node, then your application just visited this virtual distributed databases and server. Then you can own this distributed database system. So that's the flexibility of this solution. So here my dymo, I just choose one popular postgres chart to help me to deploy postgres instance on kubernetes. Actually you can just use your current RDS service and I use the sharding Sophia on cloud ripple to deploy the sharding Sophia cluster. Then application just visit your shorting Sophia proxy shorting Sophia cluster shorting Sophia Proster can help you do the data shorting and rewriting features. So you can see here, I mean use this rifle to deploy all the stuff. And I want to say here, like I mentioned before, how to create a sharding table here I want to introduce the distributed SQL like this one, create a sharding table user. This user, the sharding column of this distributed table, its user id. And we use the mod sharding algorithm to split this logic table single table into four shards or into four actual table. All of the four table will be located in the first PostgreSQL group and the second PostgreSQl group. I mean PostgreSQl cluster. So you can see here, it's just a SQL dialect of PostgreSQl SQl or MySQL SQl or Oracle SQl. It's just SQl but it's different from your standard SQL language. It's a new type. We name it as the distributed SQL language to help us to create your sharding database architecture. That means you just deploy your postgreSQl databases and sharding Sophia proxy. And you ran all of the distributed SQL at sharding Sophia proxy client at sharding Sophia proxy. Then it will help you automatically do the data sharding, right? So that I mentioned before, if you want the sharding Sophia to help you do the data sharding or skewing, skew out or rewrite splitting or the SQL audit, you can just use the distributed SQl here and tell him use the distribute SQL and it will help you reach the work with all of the functions I mentioned before. See here the rewrap splitting function. We can just tell Charlie Sophia, hey, this one is the right data source. This one is the read data source. And we want to use the random strategy to help our application to, to do the rewrite splitting that function, right. So it will randomly send your query to the different replicas. For example, if you have the three replica, all of them can help you get the readout. And here we use this distributed SQL to tell Sharding Sophia, hey, you can randomly send the clusters select queries to different three replicas. So that's the power of the distributed SQL, this demo. I use a lot of the distributed SQL to help us manage our database clusters and to do and query and gather readouts from this distributed system. And if you have time, you can do it by yourself to do the demonstration. That's all about today's talk. And if you have any questions you can contact me here or you can just raise the issue at the GitHub. I mean the communities are viewing to answer your question and to help you out. Yeah, see you next time.
...

Trista Pan

Co-founder & CTO @ SphereEx

Trista Pan's LinkedIn account Trista Pan's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)