Conf42 Kube Native 2022 - Online

Online OLTP computing and traffic governance as a service for true digital transformation

Video size:

Abstract

Although RDS allows you to quickly create a database cluster on the cloud, efficiently distributed OLTP (Online Transaction Processing) queries or elastic computing remain a challenge. How to load balance and manage data traffic between apps and databases, to deliver optimal user experience while maintaining query cost awareness? How to SQL audit or block list apps’ visits? This talk will focus on providing you with a solution for elastic and secure OLTP cloud computing with your MySQL, PostgreSQL, SQLServer, and Oracle database.

Summary

  • Today my talk is about how to do the traffic governance from your application to your database infrastructure on the Kubernetes. My area it's around the database mesh, the database distributed database. If you are interested inside topics, please give a look at my Twitter linking.
  • Digital transformation is popular these days to upgrade infrastructure. People have concerns about how to manage and store so matches data. Could we use the similar mechanism on the Kubernetes to manage our databases? How to include all the solutions into one?
  • Sharding Sophia is an ecosystem to transfer any databases into distributed databases. Can enhance it with a sharding feature, elastic skilling feature, data encryption feature or more. More than 17,000 search and released for nearly 15 times on GitHub.
  • Shorting Sophia allows you to deploy insurancefia cluster in Kubernetes. Sharding Sophia proxy will help us to do short data sharding or rewrite. Sharing Sophia could be aware of the different roles of the databases. It can also route or reroute traffic to different postgresql instance.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Everyone, this is a gist. Today my talk is about how to do the traffic governance from your application to your database infrastructure on the Kubernetes. Before to answer this question I will raise another question and the most request from all of our community. That is how to upgrade your existing monolithic databases cluster. Maybe MySQL or PostgreSQL or Oracle or SQL silver to make those databases cluster become the sharding architecture or make it become the distributed database architecture, right? And then to do the traffic governance because in the sharding architecture the policy is so complicated, right? So Trista Pan, the Sofia yes, cofounder CTO the CTO my area it's around the database mesh, the database distributed database. About the database Management AI platform development because I'm a developer and spend a lot of my time and efforts in open source, especially in Apache Software foundation. So now I'm the mentoring to incubator Apache project and be the member of this Apache Software foundation. Sometimes I will post some articles or some about the business or startup or open source or the databases cloud stuff on my Twitter linking. If you are interested inside topics, please give a look there and let's discuss all the stuff there, right? So today our content will include the following item parts. The first one gave the brief introduction about background. A second one, light found the new requirements for the databases on cloud. A third one I will give the architecture and the solution AAS one will dive into a SQL lifecycle in this distributed database system. Then if I have enough time, I will give the detailed introduction about the dymo, right? So first part is the digital transformation is so popular these days, right? Because people really want to leverage all the novel technologies to upgrade their infrastructure or think out of some new ideas to service their customers and users efficiently and effectively, right? So it's not just about how to do the change about your infrastructure from the development to your delivery paradigm, it's also about the culture change or mindset. So that's a background. In this background, how about the database? Because all of the changes will raise the new requirements about our infrastructure, about our databases, and currently from our community. And from my experience, people have the following major concerns. The first one, how to manage and store so matches data, right. The second one, how to make each request to be answered as fast as it can. The third one about how to do the traffic governance. Because as I said before, especially in the distributed database system, there are complicated topology among your different computing nodes, your application and your storage nodes, right? So we hope that someone or some application platform to help us to deal with all of the topology and then we'll consider elastic scaling. That means scalability for our future scenarios and future data, right? Then if there it's out of the box solution or deployment or tools that will be perfect, right? So to answer the following questions I will give my answer about each of them. First, large data to manage and efficient acquirings in all of the needs. I will give the data sharding answer. Next one about the czech government. Based on the data sharding architecture we can do the high availability and rewrite waiting or other like the SQL audit or based on some metrics to do the traffic strategy. All the stuff then about elastic scaling. That means we can help user to research their computing nodes and SQLserver nodes of a distributed database system. Later on I will give some introduction about such part. And then thanks to the Kubernetes, we found that the macro service application works so well and effectively on the Kubernetes. So could we use the similar prems, primitives or other tools or use the similar mechanism on the Kubernetes to help us manage our databases or data is possible? I think it's possible because most of the companies are working on that way, right? All right, so how to do that part? How to do help us to include all the solutions into one and just give the out of the box solution to help a user to adopt it. Before give the solution or the answer I need to first give the background of database system. Let us see this architecture again to consider about the databases. Because if we can know the fundamental of the database system, then we will have the better way to solve the issues I mentioned before. First I want to say that here a database system, actually they made up two parts. First one is computing nodes, right? A second one is storage nodes. That means our database, no matter it's a distributed one or monolithic, they have these two parts. So important capabilities, right? But for the distributed database system, they just split the computing nodes and storage nodes separately and deploy them in different locations. So that means this database become a distributed one. But whereas our monolithic databases, they merge the computing capability and the storage capability in it together. So you can just one common to deploy this single, I mean database instance consisting of computing nodes and storage nodes together in one machine, right? That's the difference between the monolithic and the distributed databases, right? So today we will sync the solution in this way. We could consider that if we already have the Mysql or postgresql here, right? That's over your databases cluster. Existing database cluster, right? So could we just regard such monolithic database as the storage nodes of a new distributed database system, right? And we don't do any change to our existing databases, don't do any generous action on them, just import the computing nodes in this system, right? So these computing nodes can works as a database server, right? If your databases is like the MySQL or PostgreSQL, right? So now we can make those storage nodes become the local storage nodes, I mean become the local database instance. And we can just import the global computing nodes working as the database SQLServer. Then the computing nodes plus the SQLServer nodes will become the distributed database system, right? So in that way we can upgrade your existing database, become a distributed one. So the last part is sharding Sophia. Sharding Sophia. It can work at this role. I mean it could be work as a database proxy or database pretend itself aas a database server, like a MySQL server or PostgreSQL server. So if we import sharding Sophia here and working as a computing node, and it will connect to different database instance, that means the storage nodes of this new database distributed database system and sharding Sophia also have the governance node to help synchronize all of the metadata change among different computing nodes. I mean shirting Sophia, then we can by this way upgrading our existing monolithic MySQL cluster or postgres cluster become a distributed MySQL cluster or postgresql database cluster, right? Another benefit from this solution, that because we adopt the computing and storage splitting architecture, so we can do the research on computing nodes or storage nodes, that means sharding Sophia or your databases independently, right? At this point, if you found that you need more computing power, then you can just spin up more sharding Sophia proxy or sharding Sophia. So that means you can have more computing power computing nodes working in this distributed database system. But another aspect that some users found, they need more storage capabilities, then you don't spend too much time, effort money on such computing nodes. You can, I mean create more database instance, right? So here maybe one, two, three, maybe there are four or five, one. And sharding Sophia can connect to the new database instance. And to help you reshard your original data among one, two, true, the old and the new one. I mean the new storage nodes, your new postgresql nodes will contain the data and resharded by sharding Sophia. So that's the benefit of this solution. The last part let us know more about this row. Computing nodes Apache sharding Sophia so Apache sharding Sophia. Why do I say that it can help us do that work? Because here sharding Sophia is an ecosystem to transfer any databases into distributed databases and enhance it with a sharding feature, elastic skilling feature, data encryption feature or more, right? So from the slogan it looks like Trista pan work while and this project also have strong community to help us to answer some questions. And because now you can see here the statistic on GitHub, more than 17,000 search and released for nearly 15 times and also have more than 400 contributors there, right? So that means you don't worry that you will be the first person to use the project. You don't worry about the analytic, all about the issues about this community because your issue maybe has been found by others by this community, right? So it's a mature community and mature project for us to use. And hasten document is so detailed and to help us enter hasten concept help us to set up this ecosystem or the solution. So sharding Sophia provides two clients. The first one is sharding Sophia proxy, the next one sharding Sophia GDBC. But like I said before, sharding Sophia proxy it can work as the computing nodes Azure sharding suffer GDBC same. But today we will sharding Sophia proxy do our demo show. And so you can see here shorting CV proxy can visits MysQL or PostgreSQL or RDS databases. So that means shorting Sophia proxy can help us manage your database cluster and also can upgrade make it become the distributed one and help us to do the traffic governance and to help us manage the complicated topology of your distributed database system, right? Because all the application will first send the request to Sherdin Sophia. Sherdin Sophia do a lot of computing, global computing and to target which database cluster have the expected data and then do and merge the different local readout side to become the final one and to return to our annulars. And also another benefit that it can help us to do what that's the real rest bleeding. That means because your databases like this, the first databases instance, maybe it will have a lot of replica for this shard, right? And have many replica for the second shard and have replica for the third shard. And then the sharding Sophia will to judge this request, this SQL it's select SQL or DML SQL the update or insert. So it will automatically route the SQL to different primary nodes and its replica. You can just random route the request to the replica or to use other strategies. But the main target or main work of sharing Sophia proxy or GDB say that it can help you to leverage your replicas performance and your primary nodes performance to make the throughput to be input. So that's the benefit of sharding Sophia, right? So the last part that I will give the final solution here, shorting Sophia. Provide the sharding Sophia charts and you can use the helm by one command to light the helm to help you deploy insurancefia cluster in this Kubernetes. And also today if you already have your postgresql instance, that's okay, you can ignore this part. But today I want to give this complete demo. So I use the postgreSql charts to help me to deploy two postgres instance here. So now on this Kubernetes we have the computing nodes, we have the storage nodes and computing nodes plus storage nodes become the final distributed system. Our application can just send the request to our sharding Sophia proxy. And sharding Sophia proxy will help us to do short data sharding or rewrite or SQL audit or obstacation or authority such stuff. So that's all the features of shorting Sophia, right? You can see here or the application just send the request to shortening Sophia proxy and shorting Sophia proxy will help us short data, rewrite, splitting and to guarantee and to found that maybe there are many replica. If sharding Sophia found the primary crashed, it will send loadercrest to our replica ones, right? So that's the sharding Sophia proxies function and sharding Sophia garners help us to synchronize the metadata among different proxy and sharding Sophia operator. It help us to guarantee the high availability of sharding Sophia proxy and also to do the elastic computing node skew in or skew out. So that's the sharning sphereex operator. It's like DBA on the Kubernetes to guarantee this distributed system. Another point I want to say that PostgreSQL or MySQL, it's self availability. We need to other tools to guarantee its availability. But sharing Sophia could be aware of the different roles of the databases and to help to route or reroute the traffic to different postgresql instance. But today I have no enough time to give more details but I will give the final part. Let us to see that. How about the SQL? So here when the application status request to sharding Sophia proxy sharding Safari proxy will use the sharding algorithm to found which databases contain the expected data, right? And to gather the data from the database instance and to the sharding CP proxy and merge the local readout site and to return them to our end user. But another way that it will also to judge or to evaluate like here, it's a select SQL right select statement. So sharing Sophia proxy will automatically send the select statement to the replica ones, not the primary ones. Also you can to light sharding Sphereex to help you to send a request to the primary ones or replica ones. That's okay, it's up to you. You can just write some the YAML file to tell sharing CV to do the such stuff. But all of the change statements, I mean insert or update or create, it will automatically send all of the change data statement to the primary ones. So primary ones will synchronize the changes to their replica. So that's the clear now, right? Okay, so the demo show have no time to give the one by one step by step the demo, but you can refer to my steps to create or to test that solution. All right, so see you next time. And if you also have any questions, just contact me on my Twitter GitHub or linking and and that's all. So see you. Bye.
...

Trista Pan

Co-Founder & CTO @ SphereEx

Trista Pan's LinkedIn account Trista Pan's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)