Conf42 Cloud Native 2023 - Online

Event-Driven Change Data Capture pattern using Apache Pulsar

Video size:

Abstract

Apache Pulsar has taken enterprise event-driven messaging system to a new level, unifying message streaming, queueing, mediation, and transformation all on the same platform and operating efficiently in cloud native environments. We’ll look at how Pulsar enables the Change Data Capture pattern.

Summary

  • My topic is about event driven change data capture pattern using Apache Pulsar. This particular pattern is ideal if you are running things in a cloud, a cloud native kind of environment.
  • Mary Grygleski is a streaming developer advocate at Datastax. The company's flagship product is based on Apache Cassandra. Also recently acquired another company called Cascada, which is a real time AI software. We're going to talk about change data capture CDC now.
  • CDC is to serve data sources such as databases or data lakes, or even like data warehouses too. The idea is to detect changes and then essentially to capture the changes too. These changes be propagated to downstream data systems too. Today's talk is really about a new way of doing things in real time.
  • What make up a CDC system? The change detection piece of it. And then from there basically the changes will get propagated. In fact it's more desirable would be using a log based type of approach for CDC.
  • What are the requirements for a modern CDC system? Definition of modern. It can be either on prem type of or private cloud, right? Or public cloud or hybrid cloud. The system itself needs to be very scalable, very elastic, so to speak. And then also too, the system must be very responsive.
  • Apache Pulsar makes use of the pub sub approach of message delivery. It is designed with the cloud native in mind and basically very good design. It separates out the compute and storage. The client binding the APIs can be in different languages.
  • Apache Pulsar is increasing in popularity. In 2021 it was one of the top five Apache projects too in the foundation. Big companies such as Yahoo, Overstock, Splunk, Verizon, all these GM are already using it. New UI that we're coming out to that helps you in writing your producer and consumer code.
  • We want to get back to CDC enabling CDC for AstraDB. So in other words, you can actually go do detecting changes in an Astra database, Cassandra database. And then you write all the transform data back down to the downstream targets. So really, really flexible data stacks too.
  • The link to our documentation describes how you can do CDC for estradb. It will guide you through how to set up CDC change data capture. From that point on you can then connect your streaming tenant to an elastic.
  • We offer like $25 credit to per month you can use for free. These are resources for our Apache Pulsar and Astra from Datastax. Here is additional credit if you're a small business. You need more credits.
  • Thank you. Follow me on Twitter on LinkedIn, connect with me and also on my discord channel. I'll be open to any kind of conversation at any time. Good luck to you for all your projects and happy coding.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
My talk I'm very happy to be here at the 2023 edition of Conf 42 cloud native track. Thank you to everyone for listening to me talk. My topic is about event driven change data capture pattern using Apache Pulsar. You might be wondering, well, what does it have to do with cloud native kind of this track? And in fact I wanted to point out to you that this particular pattern in fact is ideal if you are running things in a cloud, a cloud native kind of environment. And I'll explain shortly. So this particular pattern, let's first take a look and see what it is. But before we do that, let me have a quick introduction of myself. If you have not heard me talk before, my name is Mary Grygleski. I'm a streaming developer advocate at Datastax. Datastax is a company that's based in California, and Datastax specializes in data management, type of software and the flagship product if you really is based on Apache Cassandra, if you're familiar with NoSQL, I'm sure you will be very familiar with Apache Cassandra as well. So we are the company, commercial company behind this Apache Cassandra, and also essentially too, it is a NoSQL database supporting big data, very fast and very efficient. So much like that too. Our streaming platform, which is newer, about two years old, and that's where I'm working for, streaming platform essentially is also open source based, and that's on Apache Pulsar. And that's what I'm going to spend some time today talking about is Apache Pulsar. It's an event streaming platform really built ideally for the cloud native platform. Also one note to point out is that Datastax also has recently acquired another company called Cascada, and that's a real time AI software. So company is also moving into that direction. I mean, given the fact that we already have data in motion very strong with Apache Cassandra, then we're also supporting data in motion, this event streaming platform. Then it comes time to really apply it to higher forms of specialty, which is the real time AI. So that's something new. I just want to point out to you, that's the company I'm working for right now, have been here for a year and then previously too I worked for IBM as a Java developer advocate. My areas back then included reactive systems and taken also the open source websphere side of things, which is the open Liberty microprofile Jakarta Enterprise system, the Jakarta ee two. So those are kind of my area. I'm based in Chicago, I'm also a Java champion I'm also a community builder. I currently lead the Chicago Java users group. I'm the president of that since beginning of pandemic and I'm still keeping it strong. And I also actually, well, I should give you a bit of background, is that I previously was an engineer myself, so I have hands on experience too, but have become an advocate since about five years ago. And so my area includes a lot of software engineering work, design development, as well as some technical architecture as well. So my area too, that I came from primarily Java open source, then leading into cloud systems too, and DevOps and all those things. So that's briefly about me. And now let's then show you the agenda. We're going to talk about change data capture CDC now, as some of you based in the US CDC, do not mistake that with our center for Disease Control, not that change data capture here is what we're talking about. And the purpose of such is to serve databases, data sources such as databases, data lakes, data warehouses. And I also want to point out a bit of a history of CDC, how it came about, basically pointing out ETL, extract, transform, load, and then I want to introduce to you the components of a CDC system, then also requirements of a truly modern day cloud native supported CDC system, what you should have the requirements for those. And then I'll introduce to you Apache Pulsar and why it's a good choice to be used in combination with, for example, Apache Cassandra that my company Datastax has too. So a CDC support for Astradb for the managed cloud platform, and also showing you a quick demo after that. So first, let's start, why change data capture? What is CDC for? Basically, I already pointed out earlier, is that it's to serve data sources such as databases or data lakes, or even like data warehouses too. Essentially, too, we're treating the data sources as the source of truth, of the data. That's what CDC has to serve. But the thing is, how does it kind of serve? Right? It serves it because data, as we know in today's world, is that the state of data can keep changing too. And we need to be able to capture those changes. Any kind of times that something happens to the data, we want to be able to make sure we don't lose track of it. We basically have to keep track of certain types of changes. And then with those data changes, we want to transform them too at certain times, right when we need to filter them out or enhance it, or do whatever is needed to the data. And then after the data has been transformed, then it will be propagated. These changes be propagated to downstream data systems too. So what we're looking at is basically CDC is capture changes in source of truth of data, and then do all sorts of manipulation that's needed and then propagate that downstream, and also then essentially sends it downstream and taken, that becomes what is called derived data source. And this is a very simplified illustration of CDC. I mean, in case just talking about it is not clear. So let's first take a look. Data sources, right? Basically there can be changes. So over here, up there too is looking at could be we are kind of using a soldier in some sort of monitoring type of process in there. Its job is to detect changes. So in the old ways too, very often we may be using polling to do things right. So essentially every so often you kind of go out and check your data source to see if there is any changes like that. But it's not just polling. Polling can become basically not as fast, because if something happens, we basically too, in today's world, we want things to happen in real time as it happens, we want to capture it. So doing polling may not be the way to do it, but it's just illustrating right this detection of changes can be. Polling can also be some form of database trigger, for example. But database trigger has, we know too, may not be an ideal kind of situation too, because the trigger can only be trigger it from maybe one database and map it to the database. So there may be subjected to some database type of limitations and the dependencies on the database itself. But the thing is, too, is that the idea is to detect changes and then essentially to then capture the changes too. Hold on a second. Capture the changes and taken, capturing it too, and then sends it down to a CDC system, which is to kind of transform. Oops, sorry. Okay, here we go. Okay, so we capture the changes, and then we capture the changes, and taken, we transform it. Sometimes we may want to do filter, do mapping, or do some sort of enhancement or even some sort of remediation maybe. We know that there are some data that comes through and we need to actually detect and also notice that if there are some sort of discrepancies, we need to adjust those data and all that. But in other words too, some sort of transformation that needs to happen to the data that's been captured and then from after it has been captured and basically to what we are doing then next will be to propagate it, to send that down to downstream data system. So that's what essentially this whole CDC process is regardless of how we want to implement it. But now then, let's take a look at what was there before CDC. Okay, so CDC is not new. In fact, it can be, CDC can be a slow way. But then today's talk is really about a new way of doing things in real time. Very responsive. Okay, so before CDCs, basically there's also a process called ETL, right? Short form describing it, extract, transform, load. As such, it describes the process of extracting the data from the sources and then transform it and then load it back into a target database or data source, some kind. So as we all know, right, ETL, it still has its place in the it world, don't get me wrong. But you have to bear in mind too, ETL traditionally is a very slow type of processing because it is synchronous in nature. Processing will occur in batches. So maybe it's collecting some data and extract a set of data. They are all being done. It's like bundled them up. It's ready. Now, maybe a certain size of this set of data are captured within a certain time period. From there, we kind of take that batch of data and then solely send it through another processing of transformation of some kind. As you can see, it's a very synchronous type of process, meaning that things are not decoupled, so to speak, too. So it kind of sends through the transformation, and then when the transformation is done, then it will be loaded back into some sort of data sources, right, so to speak. So as you can see, it can also kind of incur or use a wide, big network bandwidth too, because now we're dealing with data in bulk. And basically it requires a large set of data. Everything is being done okay, nicely, methodically, but synchronously too. So very slow. And the process set up too tends to be also very heavy if you kind of look around in today's world, right, there are ETL tools that are out there, produced by sometimes companies that are more catered for the older, traditional enterprise type of environments. These tools too, the licensing can be quite expensive, as you know, and it also very often make use of a lot of different database or replication behind the scenes, all of these things. And they tend to be expensive and less of an agility that's associated with it, essentially not able to keep up with the times. But then again, going back, right, it's not like it is bad or anything, but I think the key, the trick is that it depends on your usage circumstances. And in today's world, we're talking about. We want systems to respond in real time. We want data to capture data in bulk and process them as they come. Not wait for it, not like synchronously. We are kind of looking for ways of doing things in a more asynchronous time of fashion. And the advantage of such would be the speed that they can come through. Right. We want data to be ingested in high speed, high frequency of ingestion as well as processing them with very low latency. So now let's take a look know understanding this kind of needs. Then next comes to the components of a CDC system. What make up a CDC system. I already show you that picture. But let's kind of take a look a bit more like the components, right? So one is the change detection piece of it. This again we're looking at abstract, right? This change detection again can be a process, can also be, if you are on a Unix system it can also be maybe some sort of scripting language that write this change. Capturing changes to. I mean theoretically you can do that or doing a program that would go into the database and kind of sense the kind of changes that are occurring in the database. And of course in some cases it could be a file system too. It isn't limited to just a database. So some form of change detection processing process is needed. And then basically too from there you send it to any kind of change that happens. You capture it and sends it through this capturing engine process. And then from there it may or may not need transformation. But I think a lot of times too we probably want some sort of transformation. And then from there basically the changes will get propagated. It kind of sends it down to some target kind of system. So let's also take a look too, right? Traditionally too we might be doing capture data using SQL, all of these but yet doing polling. For example the SQL kind of you can go through like select some field and see if it has change. Maybe since a certain time you set some criteria to kind of test it. So that would be more of a SQL database kind of approach. But yet there's a way that actually it's probably more efficient. In fact it's more desirable would be using a log based type of approach for CDC. So let's take a look in here. So let's say we have two clients kind of writing to database. So let's say client one on here it set a variable, we kind of for example x equals to one and another client comes along and set variable x to two. So as you can see then right now there are some changes now to the column. We are monitoring the column changes x. So immediately then the data gets captured. But the thing is too is that what is happening now is that at the same time there are also transaction log messages that are sensing the changes and writing them. This is kind of recording those changes in a transaction log and they must also occur in the order as they happen too. That part is very important. So the transaction log will capture all of the changes. And then the next thing to do is basically then propagate these change to downstream system. Now we apply the changes. We first kind of take the x equals one writing to the database, to the target data system. And then the next thing we know is that it gets overwritten because there's set x equals two. So using this lock based system, when we do any query at the target system data system level is always the most recent change, that the state change from this particular column will get returned to you. So that's more of a log based type of system. It doesn't make use of any kind of polling involved. It's only occur basically the logs will get written when there is some kind of changes. Now let's take a look know understanding a little bit more about CDC. What are the requirements for a modern CDC system? Definition of modern. Let's kind of take a look. What does it mean by modern? It can mean many things to many people, but in today's kind of cloud native cloud, so many things are about cloud. Now it can be either on prem type of or private cloud, right? Or public cloud or hybrid cloud. All of these things. These are really like about cloud native systems. We want systems to be very efficient. We don't want system, especially if you're cloud native, we don't want it to take up extra resources. Why? Because especially in a public cloud environment, you get charged by how much resources you are using. So you want to be very careful in how you are utilizing the resources. It needs to be absolutely lean and also be cost the least for your business. So cloud native, we want that. And then also too, the system has to be very responsive. If there are requests coming in, you want it to be immediately being able to respond back with the right thing. Very responsive type of kind of flow of data through the system. And also not only that, and at certain times too, you could have tons of data flowing through the system and at other times it could be very little. So the system itself needs to be very scalable, very elastic, so to speak, right? So you have different components, different nodes in a cloud based system, you want that the node to be able to shrink and grow as the needs arises, right? So very scalable type of system. And another thing is that the resiliency aspect too is basically if a node goes down, right, it shouldn't slow down everything. Another node should kind of pop up to kind of overtake whatever is down. So in other words, it's a very dynamic type of environment that we're talking about for CDC now that's kind of like the environment taken. Let's take a look too at the data itself. Right now we are talking about data being transmitted, data in motion, and they are being sent as messages. So let's kind of take a look what kind of techniques that we're using. Okay, let's take a look again, going back some of these requirements, reliability, resiliency, right. So there should be, this system should implement like a quality of service type of scenario. Basically messages must be delivered at least once or some cases, or actually I should say exactly once or at least one time too. All these kind of mechanisms probably needs to be properly defined for data to guarantee to be delivered, to be able to get transmitted to the destination, delivery needs to be guaranteed. In other words, something goes down, should not affect delivery. The data, once your messages get sent, it must be delivered accordingly to destination. And then another kind of key point of a modern cloud native system will be the responsiveness, right? So how is it being achieved is basically your system needs to be lightweight, very loosely coupled too. You can have multiple kind of components of these things. They send messages in a synchronous manner, or asynchronous manner, I should say. So basically sender send messages, it doesn't need to wait for the receiver. It's basically I'm going to send the messages and somebody will take care of it. It's essentially in a pop sub published subscribe type of system, would be the broker that handle the messages, kind of labeling the messages accordingly. And then basically the receiving side, the subscriber side would say I need the messages and they will subscribe to the messages. And basically you can be going about doing your own thing. But on the producer side, oh, I have messages to send. I send it to the broker to actually through call a topic and then subscriber will then subscribe to the topic when the data comes, only when the data comes. And other times it could be doing something else and not holding up the whole system, so to speak. So the responsiveness, right like that. And then there's also scalability aspect. Now I just brought up the pub sub type of approach. So that would be really a good approach because then in a published subscribe type of system, again, it deals with loosely coupled type of systems. The publisher and the subscriber, they don't tightly get coupled to each other. My job is to publish. I will just keep publishing and then subscriber. I need data, I'll just subscribe to the topics. So in that sense, if you kind of think about it, the scalability aspect of a system can be guaranteed too. So basically you can be producing many messages and then let the broker takes care of all of these messages and kind of deal with maybe sometimes they are message retry, all of these things are reduced duplication of messages, things like that. So the scalability aspect is absolutely crucial too. And then also too, one last thing to kind of point out in here, when it comes to data transmission, the order of the messaging is very important, especially in some systems, for example, right. And how the message is coming, you can be like showing the step of maybe some money gets deducted from an account and then maybe adding back. All of these needs to be kept in the order as it happens too. So we kind of have to make sure a CDC system too will preserve the ordering of the messages as well. So all of these, right, just now we just talk about requirements, but how can it happen, right, that there are so many type of different systems and basically there's no one single system that can handle that type of requirement for kind of modern day kind of processing, so to speak. So in today's world, though, good news is that we have solutions and these all require all of us, right? If we're implementing the systems or designing the systems, we need to think in a different way. It's basically requiring us to have a paradigm shift in what we are used to. So how about taken taking an event driven approach that would be actually perfect for a CDC type of implementation now, right. Now let me then give you a very quick kind of overview of Apache Pulsar. Okay, so pulsar, right? And folks probably have heard more about Apache Kafka, but that's what Apache Pulsar is capable of and also has the same kind of approach, is making use of the pub sub approach of message delivery, so to speak, right. Message receiving from producer and delivering it. Pulsar itself is an open source created by Yahoo Project back in early 2013 or so and taken, basically Yahoo decided to taken donate it to the Apache Software foundation in 2016 and quickly it became a top level project in 2018. It is designed with the cloud native in mind and basically very good design. And it's cluster based already kind of built in like that. Also too, it has what is called multitenancy. It's basically a way of allowing you to organize your data according to how you want to kind of segregate or organize essentially, right? You can have a company of different departments, you want to have different tenants handle all these messages accordingly. So it's already built into Pulsar also too. If you want to interact working with Pulsar, the client binding the APIs can be in different languages, including Java, C sharp too, if you're a net person, python and go, and even community contribution like Scala, Ruby, all of these other things. And also pulsar too. It separates out the compute and storage. We talk about CDC, we're dealing with tons of messages coming through. Well, I should say they are like detecting changes in the database and it could be a lot too. And basically you want to have Pulsar worry about delivering all these messages to the target, kind of propagating them downstream faster, and not worry about the storage part as such. Right. Messages it comes, we don't just throw them away, we need to have some ways of logging them. Think of how I talk about you can use a log based type of system. So in some ways we are leveraging pulsar as the logging mechanism for kind of storing all of these messages. So pulsar is a very efficient way. It makes use of the Apache bookkeeper to help it to kind of with storing all of the messages. And then you can also do message replay afterwards, all of these things, which I won't go into all the details now. And also too, basically pulsar, the requirement is about message delivery, the ordering very important, and also not just ordering, basically it's the delivery part is absolutely important. You send the messages out that should never disappear. So Pulsar has that built in already. Basically pulsar is the broker, serverless runtime, and it guarantees that any messages that comes through to pulsar will be delivered to the intended target. And there's also a pulsar function kind of framework that allows you to transform data as well, which comes in very handy when I talk about transform. That's what you can use in terms of, without using any external dependencies, you can immediately leverage on pulsar functions to help you transform the data as they come in and before you propagate them down to a derived data source target system. And then there's also another note to point out is that pulsar is also very smart in recognizing if your data becomes cold, meaning that you are staying in warm storage. Why it costs more money. So if you are not using those data, it will move them to offline storage such as S three buckets, hdfs, the Hadoop file system. These are much less expensive kind of storage space. So pulsar takes care of things like that for you as well. Okay, so here just want to kind of quickly bring to you what is pulsar, Apache Pulsar over there on the right hand side is basically you can see increasing number of committees, GitHub stars and the activities. So it's kind of definitely like increasing in popularity. And in 2021 it was actually one of the top five Apache projects too in the foundation. And again, this is a brief history. I won't go into all the details. And who else is using pulsars here? As you can see these are big companies such as Yahoo, Overstock, Splunk, Verizon, all these GM, these are like big companies that are already using it. And also too it has what is called like connectors and clients too, as you can see this is over here, all the symbols, logos from different companies and as you can see. What are the language bindings, language clients on the right hand side, what kind of messaging systems, including Kafka as well, it interacts well, it's absolutely a very flexible type of system. And also this here I won't go into all the details we already taken about producer consumer is what you have to write that you can write code about. Right. And there are also actually new UI that we're going to be coming out to that helps you in writing your producer and consumer code. And the broker serverless process, very efficient managing all of the message delivery. Communicate with the bookkeeper which manages all the backend storage will be bookkeeper but broker communicates with it as well as Broker also communicates with Zookeeper is another Apache project that helps with managing your cluster metadata handles all the coordination tasks between the pulsar clusters and so on and so forth. And here too I won't get into all of the details, I just thought I'd provide them here. Just so you know, that's what it is. We want to get back to CDC enabling CDC for AstraDB, right. I work for a company data stack, so we know Apache perfect, right. Data and storage. We have Apache Cassandra. So as you can see, database table, we want to kind of basically look into it and create a serverless database and then create some tables, one for data and one for CDC messages. And taken create the streaming tenant. Right. That's kind of the procedure, how you can build out the change data capture and then enable the CDC for Astra to connect everything together. So that's kind of essentially the steps you do. And then basically too, over here, downstream targets will be you building up a streaming sync to kind of receive the changes that has been kind of transformed and send it back down to the CDC messaging table. Right. So you can then add these data back to the data table. So in other words, you can actually go do detecting changes in an Astra database, Cassandra database, and then having the streaming target in there and basically help you create the tenant and then enable CDC. Make sure you do that. And taken essentially any kind of transformation, you can use pulsar function, as I mentioned before, and then you write all the transform data back down to the downstream targets. And then you can also kind of essentially over here, send it back into the Cassandra database, but with the transform too. So even over here. Oh, I should have done that. So CDC for Astra streaming sync. And these are what our company can do for you. And also to wanting to point out to you, pulsar meets you where you are here too. We have managed cloud platform called Astra streaming, and that's our managed pulsar. And then if you want to manage your own cloud, you can use our Luna streaming. So in other words, you can use that and write your helm chart and work with deploying it to your Kubernetes cluster, everything. Or you can use essentially the open source version, completely open source, and run it on your desktop as well. You can do that too. So really, really flexible data stacks too, also has different streaming related products too. There's also Luna streaming. I talk about if, you know, you use our customer manage kind of platform, Luna streaming, you manage yourself. And there's also CDC for Cassandra too. And then if you want to use our managed cloud, you can use Astra streaming. And then there's also CDC for Astra database within the GUI, which I'll show you shortly. Okay, so let me do that. Or actually, you know what, I probably don't have time now. I do have to say right now, given like 30 minutes or so. So what I'll do is that I'll give you the link and then you can then take a look into that. Right. So, okay, maybe let me still do that. Okay, let me do that. Sorry about that. Okay, here we go. So this actually is the link to our documentation. It describes how you can do CDC for estradb so the link again is towards the end of our document. So if you want to reference that and then go to this page, it will guide you through how you can actually set up CDC change data capture. And this particular example here is basically help you create a tenant, create a topic using our pulsar or actually the Astra streaming do that and basically you can also then create a table or use any table you want that you want to capture changes from a column. You can do that and then you have to then enables the CDC feature for the Astradb. Once it's enables, then you are all set. And basically from that point on you can then connect your streaming tenant to an elastic, for example, this example is using elasticsearch, but you can actually replace it with astradb as a sync is the place where you receive the changes that has happened after capture it. So this particular example is assume the use of an elastic search and taken basically too, it will guide you through how you can do too. Okay, so has such, I don't have very much enough time so I won't get into all the details. Again, this particular link in here has been shared in the slide itself. So let me get back to the slide, but I also want to really quickly point out to you, over here is our estra database. We offer like $25 credit to per month you can use for free. And every month it will refresh itself for you to do personal experimentation and projects too. So in order to sign in, you can example, oops, in my case is you sign it with my account. And somehow I think there's a bit of a slow thing today. But let's see. Okay, so this is just as easy as when you sign up. You can just give your, you don't need your credit card for the $25 credit free tier. Give your name and also your email address. And that's what's needed. So over here, if you want to create database, it's over here. And you can create your database or you can do create your tenant over here like that. So on the left side it shows you streaming and database. So this is how you create it. And then also as you step through, as I mentioned earlier with the example CDC, you can step through this example and see how it is being done at your own pace. Okay, and let me then go back now we finish this presentation. Where to go from here? Please do keep in touch. These are resources for our Apache Pulsar and Astra from Datastax. Also the pulsar bookkeeper and zookeeper projects from Apache. Also, we have here in the middle right here is our Astra database link. And you can. Oops. And then you can kind of take a look in here. Actually, I'm sorry, I should have gone back to a little bit here. Okay, so all these are Astra streaming, Luna streaming, and also CDC for Astra. So that's where that page comes from. So please do take a look into that. And here is additional credit if you're interested. You're a small business. You need more credits. Our company is very supportive of you too. So please visit this link and use this particular open source 200 code to get additional $200 credits. Or if you like to stay in touch with me and I'll help you too. Community resources if you want to stay in touch with the community side of the open source Pulsar Apache project, here are the links and also select channel and I myself also has a twitch stream every Wednesday at 02:00 p.m. When I'm not traveling or just follow me when it's up. When you are available you can watch it or there's also recording being kept for 60 days too. Thank you. And with that, I really want to thank you for having sat through here and watch me this today's presentation. Yes, please stay in touch. Follow me on Twitter on LinkedIn, connect with me and also on my discord channel. I'll be open to any kind of conversation at any time. Everybody, good luck to you for all your projects and happy coding.
...

Mary Grygleski

Streaming Developer Advocate @ DataStax

Mary Grygleski's LinkedIn account Mary Grygleski's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)