PostgreSQL on Kubernetes: Dos and Don'ts

Video size:

Abstract

Running a database in Kubernetes isn’t easy. That’s true for PostgreSQL itself. Optimizing it for performance and latency is even harder. Let’s find the Dos and Don’ts for Postgres on Kubernetes. And yes, there is more than using an operator, but it’s a good start.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

All right. Nice that you're all here. very appreciated. Oh, let's dive right in. One thing I learned, giving this presentation a few times in the past is that the title slide is very often very much misunderstood, or, at least say, it helps people or giving people a laugh because a lot of people actually read it as Duke Postgres don't do Kubernetes. which is not how it meant. it's actually a good read. but I think it's the wrong way. because from my perspective, Postgres really loves Kubernetes. And the other way around, Kubernetes really loves Postgres. I expect that there is quite a few people in the audience already that use Postgres. there is a good chunk of people that I guess uses Kubernetes. And there is probably still a sizable amount of people that already use Postgres on Kubernetes. If you don't, I hope that by the end of the presentation, I actually got you to the point where ah, that sounds like a really good idea. I should totally do that. So a few words about me. Just say I'm Chris. I'm a developer advocate at Simplyblock. been with a few companies in the past. some might be more known than others. Some people might have heard about this weird little gaming company called Ubisoft. If you heard about those, Probably in the context of copy protection, but, that was not what I worked on at least. but afterwards it's very much all about data. Hazelcast was an in memory data grid. meaning that everything was about the actual, data in memory that you could, analyze, that you could share state between different instances stateless. Stuff like that, right? it was all about data and Stata was all about observability. So it's still all about data, right? it's metrics. it's, traces it spans. It's all about the combination of different services and how they work together in the whole infrastructure, stuff like that. it's all about data. Clever bit, was my own startup. We actually made hot end software for animal husbandry, measuring like CO2 ammonia levels, trying to correlate them with like medication and in general, trying to prevent medication from giving. from being given to the animals, to make better meat quality, in general, get the animals a healthier life for, at least for the time they actually live. we all know how that ends, right? but at least making their lives as good as possible while they're around, and timescale, is a time series database at. Or on top of Postgres, and as you can see in the picture, that is from last year. I really like that, that conference picture. At a timescale, I love to, to explain people why I think Postgres is the ultimate database for time serious data. And now I'm at SimpliBlock, and SimpliBlock is also all about data. It's all about data storage, how you can, expand storage, how you can scale storage. so there, there's a little bit of a red thread through my life, even at Ubisoft, a lot of stuff in, in, in games these days, is it about data is about interaction, how players interact with your games. so there's like this general thing. if you want to drop me a message, if you have questions, you see my socials. the presentation will be shared. just feel free to, to drop me a line and ask a question. What I learned over the last couple of years is that you want to start. presentation with a game, making it interactive, making it fun. So for the ones that don't know what family thought is, it's basically a, a game show where, two families fight against each other, trying to guess. Like the first answers that popped into people's minds. So the idea is we asked a hundred people, what is your first thought when it comes to X, Y, Z for today, I thought it makes sense to have, like, why shouldn't you run a database in Kubernetes? And because it's hard to do this life right now, I actually went ahead. I was nice enough to fix, to figure that out for you. So I went to Twitter. And ask like, why shouldn't you run a database on Kubernetes? What do you think? I please help me. I need your help. when you do this things on Twitter, sorry, X, you have to remember one thing when you ask, a controversial question on Twitter, X, I have to get used to that on X, always ask for a friend. It's important asking for a friend. it's not about you. Obviously you would never do this. so asking for a friend is like the most important thing for those kinds of stuff, there were answers. and the answers were like, Kubernetes wasn't designed in with databases in mind. Which is very much true. Like when Kubernetes was designed initially, what is it? 10, 10 years ago, We had the Kubernetes, yeah, so 10 years ago, Kubernetes was designed for stateless systems and stateless only, but about five or six years ago. I don't want to pinpoint to the exact point in time. Kubernetes has changed. People started to figure out, okay, stateless systems are nice, but they're not easy to do be designed, right? So they started to integrate elements for stateful services, what they call it, right? So stateful service in general, true databases weren't Well, thought about at, I guess at that point, at least, maybe not like first thought. but that changed quite a bit. So if you're stuck in the past, yes, Kubernetes wasn't designed for databases. These days, that's not actually very much true. then. Never run stateful workloads on Kubernetes. Okay, that's a harsh one. as I said, stuck, if you're stuck in the past, that's true, but stateful workloads aren't always just like database. It might be the first thing people think about and that pops into their minds. But in general, there's a lot of stateful services. It's really hard actually to write a fully stateless service because oftentimes You have like at least a minimal, state that might be ephemeral. So it doesn't really matter if it gets lost, but you want to keep it because it makes stuff easier. It makes stuff faster. so stateless services are actually very hard to do. persistent data will kill you because it's too slow. that's a good one. and, you might be true about that, but I think that very much depends on you. Where you go in the sense of how do I store my data? this is up to you find a good data solution. I might be biased and say, simply block is a good one. but I think this very much depends on you. And there's a lot of options these days for Kubernetes. even simple stuff like NFS or local hard disk, like local persistent storage, it's all there. Wouldn't say do this or we'll get back to that later, but there's a lot of options. So I think that is not true. if you, if you say, okay, I want to plug in a USB stick and I store my database on this USB stick via persistent volume. Be my guest. Don't expect it to be fast. nobody understands Kubernetes. And that is a really interesting one. I had to, hard, really, love to, really hard when I saw that, that, that specific answer, because, Yes and no. there is some truth to it. I'm pretty sure nobody understands Kubernetes, including the Kubernetes developers, including me. there is still a lot of things in Kubernetes that I've never used. And I probably never will. I think it's a little bit like getting used to programming languages, right? You can say you're a proficient in a programming language, but does that mean you know every little bit about those programming languages? No, you don't. But if you feel like you're not understanding Kubernetes at all, please don't run a database on Kubernetes. Don't run an application on Kubernetes. Please don't run Kubernetes at all. if you feel like you're not understanding anything about it, just don't use it. there is no need in your case. Most probable. The next one is what is your benefit? databases don't need to scale or especially outer scale. And that one kind of confused me. it feels This is a person that necess didn't necessarily run a database, of a size of a meaningful size. many databases probably don't need to outscale. but especially, things were like you, you store user behavior, click click flows, any kind of iot data, all that kinda stuff. Databases need to outscale and out scaling can be a lot of things. It could be database, it could be compute, it could be storage. There's a lot of stuff. then databases and applications should be separated. And I'm very much on par with that. I totally agree your database and your applications normally shouldn't run on the same notes. and we'll see a little bit of a solution for that later on. so yes. fully agree is not an argument against Kubernetes. and the last one is actually something I very much agree with, not another layer of indirection and abstraction, abstractions and indirections can be complicated. they don't have to be, they're a blessing, they're made. They're made in heaven as long as stuff happens and works, because they take away complexity, but they can be, a beast from hell if stuff goes down the cell or goes down the drain, and now you have to figure out where stuff is actually broken. That is really a problem, and Kubernetes doesn't make things easier using, all those like extensions for Kubernetes, like service mesh and all that kind of stuff, all the things that make your life easier, actually make it harder. The second you have an issue and you try to figure out what is happening. So the question, I saw those answers and I was like, ah, oh, how do, where do. Do I get from here, right? Like, how do I make this work out for the presentation? I wanted to tell you about why you actually should do that. and why it's a great idea. do I have to burn in hell now? this is complicated. What am I doing? I want to go to the happy place. So let's, how do we, how can we do this? So my first initial thought, Hey, let's cheat. Cheating is good. I work for Ubisoft, so I know what I'm talking about, right? I'm a big gamer. I know exactly what I'm talking about. So that is something, that a lot of the older folks in the audience might actually know. it's the so called Konami code. It's probably the most known or most well known, cheat of all times. for the younger generation, just look it up on Wikipedia. Konami, like the gaming company Konami code. it was a great pleasure. So now that we cheated, we can actually go to the real topic, right? So where do I get from here? Why do we want to run Postgres on Kubernetes? And I think. There is a good set of reasons. And first of all, I'm a big hater of cloud vendor lock in. I think it's a really big problem. cloud vendors, especially hyperscalers love to offer you all kinds of services, and make them easy to integrate to lure you in. But the problem is. Those services might be nice and might be good for the first, iteration, maybe the first couple of iterations, but at some point. You're hitting the upper limit of what is possible. Those services are meant for the mainstream. They only work because they can be used by a lot of people. And a lot of people means that the second you have something special, there is a problem, right? There, they don't. They don't care for you anymore because you're not part of the mainstream. and specifically for databases like RDS and stuff that can come in very fast. If you use custom extensions or special extensions. sure. like the standard extensions are all available, like post GIS and stuff. The second you come to something like, for example, timescale, you might have an old version, you might only have an Apache license version, which doesn't have all the features, stuff like that, right? You might actually have quite the issue really fast, even though you feel like. You're, you haven't outgrown the size of something like RDS. faster time to market, decreasing costs. I think that goes and automation. I think that goes all well, hand in hand, right? If I have everything automated, if I have everything easily deployable, It's super easy to, get to Mac market fast, do a couple of iterations, see if stuff works that decreases cost. I might actually save, one or two people. that is where automation comes in. I tell Kubernetes, Hey, or like whatever I need, a new, postgres cluster, it spins it up and make sure that everything is around, that can be a single DevOps person instead of a team of, database engineers. I'm sorry. I know that I'm bashing on database engineers here. but don't get me wrong. You still have a job. It's just not the boring stuff. Now you can actually look into the cool things, right? a unified deployment architecture. I think I'm a big lover of that kind of stuff. I think, unified, architecture makes things easy to understand. we have the databases deployed the same way as we have deployed our application, all that kind of stuff. That's why people love Terraform so much, right? Because Terraform gave you like a single tool to actually do all this kind of stuff. Somebody has to write the recipes, but it doesn't have to be you. and last but not least, I'm a big fan of read replicas. If you have a use case where this like small performance or a small lack of, data, through, physical replication is not much of an issue and that is probably true for a lot of IOT use cases, like my own startup, like Cleverbit, we actually have that we used read replicas quite a lot, we, we got values or, metrics. like every 10 seconds, but we averaged them on a minute scale anyway. So it didn't really matter if there's one or two messages being lost or not lost, but not arrive on a read replica yet. So what we did is we literally just spread the query load to all nodes, writes always went to the primary, but we had read replicas for query, scale out. okay. now that we know why we want to do this. What do I get from here? let's get a few things out of a way, which aren't necessarily like superior Postgres specific. from my perspective, the most important thing, and I can't stress this enough, enable TLS, enable data encryption at rest. You want to make sure your data is encrypted. It doesn't matter if you're like a 10 people company and you feel like everyone can have access to the database. That is not the point. The point is make sure that somebody from the outside. Can not get to your data, like at all, like no way possible there's nothing worse than a data breach. And luckily to this day, I can say I've not been at a company at least, while there was a data breach of, of any kind, which I think is. Good. I know this is not true for other people, and I know how I can't imagine how embarrassed that is. the problem with a software update that basically killed all Windows systems just a few weeks ago, right? if you're on Kubernetes, use Secret, Kubernetes Secrets used to be certificate manager, ertman or cert manager. use, the integration from cloud providers, for credential management, all that kinda stuff. Just make sure you have security locked down as much as possible, yay, we, we actually got security, right? Yoo hoo, okay, the other thing, think about backups are important, and when I talk about backups, I don't just mean a stupid PG dump, don't do this, from my perspective, okay? PgDump is not designed as a backup solution. PgBaseBackup is okay, but please don't try to roll your own solution. when we talk about backups in Postgres, we specifically, or in most databases, we specifically mean continuous backup, meaning the, that the actual writer headlock needs to be stored, same as. a database snapshot and there is great tools for that. We're in the Postgres ecosystem. Everything has 10 plus one tools, because with Postgres distribution doesn't give you those things. So everyone started to build their own. I actually have amazing experience with PG backrest. it is the tool that we used, quite a lot. it's from my perspective, rock solid. I know a lot of people actually using barman and that just works as well. there's PG horde and there's so many other tools. my recommendation would be either go with Barman or PgBackrest, and just use those tools are, industry proven, rock solid, they are proven, they just work. from my perspective, that is the way to go, and store your data somewhere that is not, Your own notes don't store them somewhere in a notes, right? don't set up a min IO on Kubernetes and say, Oh, I have an S3 API. So I can just store my backups there. that's not how a backup works. Store it to object storage, like S3 or, cloud, cloud store or, blob storage or whatever. Just make sure they're not on your notes. and the other thing is test your backups. please. I can't stress this enough. test your backups. and backup testing doesn't mean like once a year or once in a while tested regular, like once a month, once a week, once every day, with my own startup, we actually went for like once every week. and the way we did it, I think Kubernetes made this very simple for us. We had a Kubernetes backup, Kubernetes. Cluster and the cluster was basically, one primary to read replica. So what we did is we just scaled the cluster up and said, Hey, we need a third, read replica, basically a four note cluster. it restored it from a backup. the notes try to join the cluster. So it had to set up physical replication streamed in the remaining stuff that was not in the backup and eventually joined the cluster. The second, it joined fully, had the cluster fully joined and was able to service requests. We knew that our backup strategy actually worked. we scaled it down again, and Kubernetes just deleted the old node, whatever that was, basically. So we had a little bit of a rolling, a restart, a system as well, which tested a few other things like cluster health and stuff like that. so I think Kubernetes makes those things like very easy and very nice to use. The other thing that kind of surprised me initially when I started using Postgres or setting up Postgres on Kubernetes is that the actual Postgres configuration isn't much influenced at all. you still want to configure like your shared buffers, workman, blah, blah, blah, right? All of these standard parameters, you're actually changing normally. but that doesn't mean that they, Are influenced by how you set up Kubernetes or how you set up Postgres on Kubernetes. You need to remember that you have a container and your container has like certain size limits. So you want to make sure that you're actually, configuring based on that, but that's about it. The other thing is if you have no idea about those parameters, they're very much, those are the easy ones. There is like way more advanced ones worker sizes or worker numbers and stuff like that. if you don't feel like you have a good idea what they mean and how they affect your Postgres installation, Try to get his consultant to help you set it up. don't just play around with it yourself. I know it sounds stupid to get a consultant for those things. but believe me, that is money well invested. you, you get to, to a good point, like much faster, with much less problems, you not over configuration, configure Postgres. The worst problem is that if. Everything seems to work out and you get like super fast response times and stuff. but when you actually put load on the system, you have it over configured, you run out of memory, stuff like that is not nice. So make sure you have somebody who understands that. and the other thing is. Postgres loves used pages. so wherever you run Postgres, make sure you have future pages enabled in your kernel settings, blah, blah, blah. With Kubernetes, there is like a little bit of an issue and we'll get back to that later on. Just remember, huge pages are great. and because we're in the Postgres ecosystem, much more knowledgeable people than me actually wrote about that before, or even recorded a video. make sure that you actually listen to them. they give you a lot of information about how to, how post, Postgres should be configured for performance, how you set up the, huge pages in Postgres and stuff like that. the slides are shared, as I said before, so just follow the links afterwards and, you're in for a good read. Then you have, Postgres extensions and that is where it becomes a little bit more complicated. You remember, when I set that, on hyperscaler setups, you, extensions might already be an issue if you have something like timescale. PG vector is very Common these days to, to be seen on those setups, but in the first couple of months, PG vector was still an issue, right? So if the second you use extensions, you might be out of the scope of hosted setups. the problem is you might not be out of a scope in the sense of when it comes to Kubernetes, right? So the standard image for Postgres. does not contain a lot of, extensions. And to my surprise off the, I think the standard image doesn't even, support post JS, which I find very confusing. but, it is what it is. that means the second you need extensions, you actually need to build. Your own layers, like image layers, Docker image layers on top of the standard image. I would always start from the like standard Postgres, like the official Postgres image, and then just keep adding and installing the extensions. the alternative is that there is some magic and we'll get to that in, later on in the presentation, magic that I really um, the. Other thing that I find very important, is to keep an eye on, on, on your versions. we all know that Postgres versions can be complicated, especially when you do major upgrades, and you have to run something like pgMigrate, but that is always an issue because you have. To have both Postgres versions, the version you're running on right now and the version you're wanting to go to PG 15 and PG 17. that is true when the data file format actually changed and you have to run a migration. for those kinds of things, various. Two things to consider, as I said, first of all, you need both versions of Postgres in your image. And secondly, you need to double the storage because that is not an in place update. It's actually a copy operation. So it copies the database in the new format to a different location. keep that in mind because for a while we'll actually double your storage requirements. and for Kubernetes versions, it's the same thing. it's. It can be complicated if you use like a hosted managed Kubernetes, like AKS or EKS or whatever. they often only support like the last three or four versions. And I think, Kubernetes provides a new version like every three months. After a year, you're out of support. AWS just recently started to provide extended Kubernetes support, which I think goes up to five or six versions. the second you switch from Kubernetes to extended Kubernetes, you're paying 10 times the price. So invest that money slightly better into somebody can actually help you with the migration of Kubernetes. So that means at least once a year or something. Around that timeframe you're in for a Kubernetes upgrade. keep that in mind. if you run your own hosted Kubernetes, you can get around that. I would not recommend that. There's a lot of cool stuff that comes in new Kubernetes versions, especially related to storage. So I would. Try to keep them up and running as, much as possible. All right. So now that we have all of that out of the way, and there were already some hints, what is really different? where do we have to put our thoughts when running Postgres and Kubernetes? And the first and most important one is storage, right? we heard that people think storage will kill you because it's too slow. in Kubernetes, you have the idea of persistent volumes. and persistent volume is basically your virtual storage entity, your unit, whatever it is, it could be backed by NFS. It could be backed by local volumes. As I said, I think they are a bad idea. Don't do it. they could be backed by a remote storage. they could be backed by Amazon EBS or cloud disk or whatever. there, there's a lot of options how to get storage into Kubernetes. the good thing is, Kubernetes for. a couple of years now has the CSI interface, which is the CSI stands for Container Storage Interface. It was designed as a pluggable storage API for, for Kubernetes. But what it became is that it's now like a more Container Storage Interface. Standardized container interface. a lot of container runtime supported, Docker supports it. container D supports, cryo supports it. Not all providers support all backends. but I think that is just a matter of time. but for Kubernetes, CSI drivers are the way to go. Make sure your CSI driver supports encryption address, right? You remember, always encrypt stuff. make sure you find something which is high IOPS. NFS might be out of the picture here. go for SSD and NVMe, like storages, you want something low latency because we know databases love fast storage, Postgres loves storage as fast as possible. so if you really want to do something, that's a great solution. again, I might be biased, but I think. Disaggregated storage, is the best way. Disaggregated means that you basically separate your storage from your application. I think this is the only meaningful thing because databases Tend to grow more in data size when they tend to grow in, compute power, right? new queries might introduce a higher overhead for compute, especially stuff like reports and things. if you have something like that, especially running reports, I would recommend, using a CSI provider that can take snapshots and clones. and spin up a second database on the data, on the cloned data, and run your report on that, throw it away afterwards. that would be my approach. But because there is so many, CSI drivers already, I found about 150 and that is not all, there's way more out there, but because it is so complicated to figure out what I actually want. I built a while ago, I built like storage class info, that lists all of the CSI drivers that I found and what the features I think they support. it's really hard to read the code because not all give you all the information in their documentation. So you basically look for every single source code and try to figure out what they actually support and whatnot. if you use something and you find an issue, please raise. A bug report. it's a massive YAML file on, on, GitHub, raise an issue. I'm happy to fix it. but what it gives you is like the option to, to filter, the CSI drivers by, capabilities by life cycle, by access mode, stuff like that. So storage class info, CSI drivers is the way to go for you. the other thing is, and I hinted at that containers, containers have requests, limits, and quarters in, in Kubernetes, but. Are different elements, request, they all kind of manage resources or available resource for your database. they're not necessarily easy to understand. And from my perspective, that is one of the things where you really have to wrap your head around. It's not, it's, as I said, it's not easy. so there's a great beginner's guide, by Ketamite. read it, try to understand it. It was one of the things that took me the longest to really. Wrap my head around. What is the difference between requests and limits? What are quotas? But in general, if you have a sizable database, Just make sure, or I would actually recommend just running the database basically as the only thing on those specific notes. we'll get back to that later. but, that is something you really have to understand when you're in the world, even for stuff like Docker, but with Docker it's slightly different, right? Normally you use Docker on a single node. Kubernetes is all about, clusters. and multiple instances. So make sure you have that in your mind and you'll remember pages really enable YouTube pages for Postgres. and I said, it's a little bit more complicated with Kubernetes because you actually have to configure it in the host operating system. So yeah, you have to go into the Kubernetes host, like the worker nodes itself, and say, yes, I want YouTube pages. tell the kernel about it. And my recommendation is tell the kernel also that you want to have a certain amount of memory, reserved for huge pages, because then the kernel at boot time will, reserve a huge chunk of memory just for you, like a continuous chunk of memory, and that is what you want for huge pages. Huge pages are a great thing to minimize memory, allocation overhead. The thing with Kubernetes is that you actually have to configure it basically three times. So you have to configure it in your host operationing system. You have to tell the, in the deployment descriptor, in the resource descriptor, that this part or this container specifically is supposed to use huge pages. So you get access to them because they basically have to be mapped into your container around time or container environment. And. Tell Postgres that there is each pages available, right? great read from the Percona guys. people that are much more knowledgeable in running massive Postgres Kubernetes, set up. So those are the guys to ask. Then there is high availability. And, as I said, Kubernetes is all about clusters. So it's really something you want to look into. and I think it's important, right? now that we have Kubernetes, the only meaningful thing is to run. Postgres as a cluster and I personally have Massive, like good experience with Patroni. I think Patroni is awesome. I think it's also the most deployed, a high available, high availability and sorry, and cluster manager, but there's also rep manager and, PG out of failover and stuff like that. just make sure you use a cluster manager. you want automatic failover. You want to have. Primary, promotion, stuff like that. there's a great set of tools. I would always recommend just going with Patroni. the most interesting thing about Patroni for me specifically was that it is developed by Zalando. Zalando to me was. a company selling shoes and clothes, right? Stuff like that. it was not necessarily like a high tech company, but what I figured out is that basically all of the database setup at Zalando, is Postgres. And so it makes sense. We're pretty deep into the Postgres ecosystem. the other thing, and that is one of the things again, that you can't really stress enough connection pooling. I would claim never run Postgres without connection pooling at all. connection pooling is basically a proxy in between your database and your application. And the reason you want that is because Postgres really doesn't like connections being opened and closed. connection management is very expensive. in Postgres, so using a connection pool or keeping connect back end connections to the database open, while, having the option to the front to give to the clients and say, Hey, just do whatever you want is, removing a lot of overhead or resource utilization from Postgres stuff that shouldn't really be in Postgres at all, right? Um, the other thing is. When you go into the cloud, especially when you go for like serverless, environments and serverless, infrastructures or application architectures, now it really becomes a thing, right? you're starting and stopping functions. on a second basis. And now imagine I said, as I said, Postgres doesn't like connections being open and close, but if you start your function and you close it down, you're opening a connection, you're closing it. at that point, you really want to go into connection pooling. So in general, if you don't use connection pooling now or as of now is the time to, to fix that, schedule a maintenance for tomorrow. but there's a couple of more things it actually, together with something like Petroni, connection poolers really nicely handle fail overs. They handle the switching of primaries. Your application doesn't need to know about stuff. they often support like an, an internal retries of stuff fails for the first iteration. They just switch over and go back. All of that kind of stuff is really nice. Um, and only give you like the error afterwards. But, on the other hand, it also handles, the read replica setup. if you have read replicas and you say, okay, I want to use those as actual query targets. all of those tools support reading from read replicas. I personally prefer, and I know it's controversial, PGPool2. but there's also PGBouncer. just make sure not everything that looks like a read only query like a select query, might actually be non mutating, with timescale, we have, create hyper table, which is a mutating operation, but because it's a function, it's called by select. So sometimes you have to tell those proxies, hey, those are operations, or those are queries we specifically want to send to the client. to the, primary, for write operations. then you have, Kubernetes features. We're in Kubernetes, so use stateful sets, use replica sets. all that kind of stuff. Interestingly enough, Timescale just a week or two ago, promoted, or released a blog post about how they removed stateful set with the new abstraction because stateful sets are very limited in what they can do. so go and read that one. you can find it on the timescale block. You really want to see that. then network policies, you remember, right? Use network policies that there's TLS, enable it as much as possible. Use security policies. If your setup has role based access control, that's something you really want to set up as well. think about policy managers, various OPA, the open policy agent, various Kubernetes, be careful with those. I would recommend thinking about those. They're basically sitting in between like your API client and the Kubernetes API. and they can prevent stuff like, Hey, there's some container that wants to mount a volume. but it's not one of the database containers. so you really want to prevent that, right? You don't want to have, potentially arbitrarily injected container, running, mounting your database volume. The only problem is, I heard that at least about Kubernetes. you want to be careful. Kubernetes has the, Default of preventing all access, which we're blocking all access, which sounds meaningful. but at least for that person, Coverno, the Converno containers itself died. and they couldn't restore it because Coverno didn't have access to be. Kubernetes APIs in order to be Kubernetes APIs anymore. So there can be problems. be careful. then there is, observability and I think our alerting, make sure that you have a good monitoring observability tool in place. unfortunately for, Postgres, Prometheus is still the best thing we have. I hope that this will change in the future, but it does an okay job. collect locks, make sure that your locks are actually, More than, I don't know, like a gigabyte or the last 10, 000 entries or stuff like that. If you have an issue, 10, 000 lines of lock messages might be over really fast, right? So use one of the tools, the typical, lock collection tools, aggregate, analyze, trace. I would always go for full observability. You remember I work for Instana, and, observability basically gives you an insight into the whole stack from front to back. that means don't roll your own. you might've figured it. I'm not a big fan of rolling your own stuff. use a tool like Datadoc. I'm not sure about Instana as it was acquired by IBM. these days, most of the development goes to IBM technology, but there is other things. There's Steinertrace. There's New Relic. There's Grafana. There's I don't know what else, use one of those tools. you're in for a good game. and then is the point where all of that comes together, right? Like the little bit of okay, here we are, with Postgres. We have to configure a lot of stuff. We need to make sure that we have high availability. We need to make sure that our backups are set up. we need, automatic failover. we all need all that kind of stuff. and while Helm charts are okay, they are a static setup system, right? So Helm charts help you with the initial, static, deployment. They don't help you with the day by day operation setup. And that is where operators in general come in. Kubernetes operators, in this case, specifically a Postgres Kubernetes operators, they take away a lot of the tasks for setup. They make sure that everything runs, they make sure that, you can scale out, scale in, they bring this like feeling of cloud native to Postgres. Whereas that's a little bit of a far fetched thing. And when we see that in the last slide, but it integrates Postgres really nice with Kubernetes, where that is also. Depending on the operator. So there is a few operators and that's not all of them. if you're, but I think those are like the big ones. if you're in for a very old version, like 9. 6, you're, there's no way around QDB, the other ones don't even support that anymore. Zalando again, from my perspective, CloudNativePG and StackRest from Ungrost are the ways to go. those are like the big new. Commerce or upcomers from my perspective. we see that all of these standard features are all supported where it falls apart a little bit is backups. And that is something I really find confusing. we're all about. automation in Kubernetes, but Crunchy and Zalando don't really support Kubernetes which, backups, which are nicely integrated into Kubernetes. I think that's an oversight, might be fixed in the future, but it's confusing. the other thing, that I think are, is important, use, that they use a default Postgres image, like the official standard image. Cloud Native PG doesn't actually do this. so I'm a little bit confused about that. you see that stack rest is the only one that has all the check marks. If you really need a web UI, I think that's up to you or to decide. but as I said, cloud native PG stack rest, both are great solutions. both are great options. very simple. Absolutely nothing to say anything against them. The only things Dekris has like a cool feature. You remember the magic for image layers? they can, they actually have a virtual image repository where they take the standard image and then you say, Hey, I want this extension and this extension. They spin up virtual image layers with the extension code, on the fly. And that even works at runtime. You can say, Hey, I want this. And it magically appears in your image and you can just load it. but again, There's more people that actually wrote about that. in this case, I actually wrote the first one, but there's Operator Hub. anything Kubernetes operators, that's your, first, way to go. And the, data on Kubernetes community, created a massive Postgres specification, operator specification with all features for Postgres and, and operators and how they are supported. And I think there's over 30, Postgres operators in there. So you really want to look into that. It doesn't have a nice UI yet, but we're working on that. the last thing, when you have a meaningful size database, so anything that needs to scale, make sure that you use dedicated machines for your databases, use, node pools, dedicate a certain set of nodes in an, or dedicate a node pool to your database, to your primary and to reach replicas, make sure that those. containers or the database containers are pinned to those hosts, taint the hosts so that nothing else is running in it, at it except like the standard Kubernetes services like kube, kube proxy and stuff. but that's about it, right? Make sure your database has basically all the resources and use Kubernetes for automation, for orchestration, Not for virtualization in the sense of over provisioning. I think your database still deserves as much resources as possible. so with that, I think we're at the end. A lot of you might know Kelsey Hightower. I normally say when Kelsey speaks, the world listens. And Kate Kelsey had a few things to say about Postgres or databases on Kubernetes, which is you can run a database on Kubernetes because it's fundamentally the same as running a database on a VM. okay. So far, so good. Oh, we can do that. The biggest challenge is to understand that rubbing Kubernetes on Postgres won't magically turn it into Cloud SQL. So what he's saying is just because Postgres now runs on Kubernetes, it doesn't mean it gives you this, Proclaimed infinite scalability of something like Cloud SQL or Aurora or stuff like that. You're getting close and you can scale out, with the necessary, correct storage underneath. You can get very far, but I also say most people or most companies probably don't need something like Cloud SQL. It's a very overhyped, unnecessary thing. From my perspective, it makes stuff easy to get started. I would claim this is not where we need to go. Anyway, the data on Kubernetes community is, anything data, stateful workloads on Kubernetes. it's mostly like a lot of the people that initially started implementing stateful workloads on Kubernetes and the stateful set and a state stuff like that, they eventually, joined together in the, doc, community. they wrote an amazing. white paper on data on Kubernetes, read it, go with that, and that's it, thank you very much.

Slides

Download slides (PDF)

See all 40 talks at this event!

Conf42 Internet of Things (IoT) 2024 - Online

December 19 2024 - premiere 5PM GMT

PostgreSQL on Kubernetes: Dos and Don'ts

Video size:

Abstract

Summary

Transcript

Slides

Chris Engelbert

Chief Developer Advocate @ simplyblock

Join the community!

Featured event

2025

2024

Info

Conf42 Internet of Things (IoT) 2024 - Online

December 19 2024 - premiere 5PM GMT

PostgreSQL on Kubernetes: Dos and Don'ts

Video size:

Abstract

Summary

Transcript

Slides

Chris Engelbert

Chief Developer Advocate @ simplyblock

Join the community!