Conf42 Kube Native 2022 - Online

Building Real-time Pulsar Apps on K8

Video size:

Abstract

I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi.

Summary

  • Timothy Span: My talk today is building real time applications with Pulsar running on kubernetes. He says it lets you easily build streaming applications for modern cloud and environments. Span: It's easy to scale those up and down based on the number of messages.
  • When we do a Kubernetes deploy of pulsar, we provide you with a couple of yamls that you cloud specify. And then that gets deployed as a graph of connections between all these functions. This makes it great for doing things like event sourcing or any high bandwidth workload.
  • Tim Spann: Pulsar lets you manage a number of clusters, all your tenants, namespaces and topics within the environment. Makes it very easy for you to take in data, scale it out in a real time messaging environment. Have fun with the rest of your talk.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Timothy Span. My talk today is building real time applications with Pulsar running on kubernetes. I'm a developer advocate at Stream native. If you want more information, you could scan my QR code there. I've been working on the flip or Flippin stack that lets you easily build streaming applications for modern cloud and environments. Clearly kubernetes has to be a big part of that to be able to scale up, scale down, deploy things with ease on all different types of clouds and environments. I've been working with big data and streaming for a number of years with a number of different companies. If you want to keep up with me, I have a weekly newsletter. Lots of cool stuff there. Definitely check it out. Stream native is the company behind Pulsar and we help people with that, whether they're running in our kubernetes, in our cloud environment, or you need help on your own environment. Reach out to us. So Pulsar, as opposed to some other streaming and messaging system, was from the very beginning designed to run natively in the cloud and in containers, as it was designed with a separation of concerns between the different components in this streaming data platform. So that gives us support for hybrid and multicloud. Obviously all the different types of Kubernetes containers, great fit for microservices and designed to run in your cloud natively without any kind of squeezing to fit it in there. To show you how we run Pulsar and Kafka within K eight, you could see the infrastructure here a little bit. We've got protocol handlers to handle different messaging protocols such as Pulsar, KAfka, MQTT, AMQP 91, and those just plug and play as you want, pretty easy. Those go into topics, those are handled by the pulsar brokers, which scale up separately from the rest of the platform. That's how you interact, send a message, get a message, and what's nice there is, it's stateless. So very easy to scale those up and down based on the number of messages, based on the number of clients. Very dynamic. And then outside of that, we have Apache bookkeeper bookie nodes for storage. And we also tier out the tiered storage, as you see here, where you see Zookeeper. We now have expanded that to support Etsyd to make it easier to run in kubernetes, so you don't have to run another tool since you have Etsy D anyway. So under the covers we provide pulsar operators to make this easy to run it wherever it may be, and we have experience running that on premise public cloud, private cloud, wherever you want to run this. And just to give you an idea of the different pieces, a lot of things that are pretty standard for most Kubernetes native apps around Grafana, Prometheus, nothing going to be too different from other things that you're running. We have full command line tools to interact with what you need to. That's not part of the standard Kubernetes controls as well as a full rest endpoints for all the admin and functionality that you need. So you could easily use any kind of DevOps tool. Extrapolate that out from what you're doing elsewhere and this breaks that down in a pretty simple visual way. We've got those pulsar brokers, we do the routing, we do the connections. We have a little cache but you don't have big storage there. Thankfully this helps with the automatic load balancing and we break down those topics into segments to make it into more manageable sized chunks. All metadata service Discovery things you need in a cluster, we store that in a metadata store. This is an API that lets us add more. ETCD is probably the preferred one for Kubernetes environment. Bookkeeper also uses that same metadata store for metadata and service discovery. As you might expect, Pulsar sends those messages to bookkeeper where they're started and managed. And if they need to be tiered to tiered storage, that'll come out of there underneath the covers. When you're running this whole thing in your environment, however many Kubernetes clusters you may have, we've got this metadata quorum in there, we've got these cluster of brokers and we've got what we call a bookie ensemble which is a number of these data nodes. We keep that within one Kubernetes cluster would make sense. And then your global metadata store is probably your eTCD that's sitting in your Kubernetes environment. Just to keep that simple. Why do we have all this? Well, Pulsar lets us produce messages in, consume them out. This is a great way to wire together different Kubernetes applications, wireless stateless functions, things like if you want to do your own AWS, lambda type applications in a robust open source environment makes it easy to do that. It really is a nice way to asynchronously decouple different applications regardless of versions of things, operating system programming language, what type of apps, what type of protocols, what's the final destination for your data. Very straightforward for you to do that. And this is a very scalable environment and we mean scale. It's not just scale out, but sometimes you got to scale down based on workloads. What's nice with pulsar is topics that have your messaging data can easily be distributed and moved without you having to hand do things or bring down nodes. We could do this live and it'll be handled for you by the platform. And you're not concerned about that. You need to bring down a cluster or shrink it down. Very easy. Could set an offloading schedule. So whatever disk you might have locally or within Kubernetes storage can be offloaded to s three, HDFs, Azure, wherever you might have your storage there, minio anything that's s three compatible. Pretty easy. Now how do I build these apps? So I've got a Kubernetes based messaging streaming platform and I've got a way to deploy it store my messages. We also provide pulsar functions. These currently support Java, Python and go as your language of developing these functions. Can use any third party libraries that make sense with those languages and lets you build asynchronous microservices the easy way and deployed on Kubernetes. So with configuration that cloud be done from a command line tool, a rest endpoint or automated via code. I apply a number of input topics to this function that I have and specify an output topic perhaps, and a log topic perhaps. What's nice is within the function I can dynamically create new topics based on what the data looks like, what the data feed is, or whatever needs I have. So this is not in any way hard coded to a function. You could change this dynamically, which is great. When you're deploying these in different environments, you could always add another topic as an input. So I have a new set of data that wants to use my spell checking function. Just add it. One that does etl any bit of code you might have in Java or Python ago. Easy to put it there and have it get every message that comes into those topics event at a time, process them and do with what you want with it. Pretty easy to do. Great way to do your computation isolated from the standard cluster in the same kubernetes cluster or a separate one. We have a function mesh that makes it easy to run as many of these functions as you want and connect them together in a streaming pipeline. Now this is an example of the functions you don't have to use a lot of boilerplate pulsar code, not very specific to pulsar, but this is an example of a Python function. Very little you have to do here. I'm importing that function from pulsar, so I got to have the pulsar client create a class in Python, define my init there. And this is my major function here with the process. Self input is the input you're getting from whatever event. And context is an environment that you get from pulsar that can let you create a new topic, send something to a topic, get access to the logs, get access to shared data storage. Couple of different things you could do there. Very helpful. In this example, I just take that input that gets sent in from the event, process that to a sentiment analysis, and then just return a json. As you see here, nothing pretty tied to pulsar. This will just go on to whatever output topic you sent. If you wanted this to be routed to different topics, you'd use the context to do that. Pretty straightforward. So our function mesh lets you run these pulsar functions, and this was designed for kubernetes and it's a great way to put these together in a pipeline. One thing I didn't mention with pulsar functions is the dog fooding aspect. We have used these pulsar functions to build connectors for people to use in the platform. So there's ones created for Mongo, MySql, Kafka, tons of different things. And there's source and sync ones. What you do is point it to your database or whatever you have there, put in any criteria it needs. That just goes in a Yaml file, everyone's favorite. And then that gets deployed as a graph of connections here between all these functions you might have in a pipeline and that gets deployed to the function mesh. Pretty straightforward, as you see here. When we do a Kubernetes deploy of pulsar, we provide you with a couple of yamls that you cloud specify. Again, that bottom one's the function mesh that defines how we run these functions for you. Kubecontrol, everybody's favorite Kubernetes API. We've got the function mesh operators out there to deploy that within the cluster and that will connect to the pulsar cluster, which is probably an adjacent Kubernetes cluster to keep those separations of concerns there and scale up separately so you can have compute completely isolated from any of the data there. This makes it great for doing things like event sourcing or any high bandwidth workload you might have coming off of all these events. I have a lot of pulsar resources available for you. There's my contact information. I love chatting about streaming Pulsar, and if you've got any ideas for improving how to do this in maybe your own Kubernetes environment, or if you've got other tools that you think might be of assistance, please contact me. This dinosaur here, scan it and you'll get right there. Or here is the link within GitHub. Pretty straightforward. I have a couple minutes left, so I want to show you why you might want to use Pulsar for things. This is a very simple app run in one pod. It's just Python running an HTML page, but that HTML page has some Javascript, which makes a websocket call to Pulsar. So make sure you got to have all those ports opened up and that will get the data dynamically. So it consumes the data from the API and we're able to just stream that out as it comes in. We also have a management console that could sit outside of the environment. Very easy for you to use that. This is completely open source and it'll let you manage a number of clusters, all your tenants, namespaces and topics within the environment. Pulsar's multi tenant, which is great for having one people use that one cluster you have, regardless of their use cases, create a couple of tenants, maybe one's for Kafka users, one's for the public, whatever ones make sense. And underneath there you'll go to all the namespaces that make sense for that environment. And under there is where we have the topics related to that namespace. And you cloud have as many of these as you want, get access to all the topics, create new ones, delete ones. It's a nice little admin tool. If you want to see some example functions, I have them out there in my GitHub. One of them we'll dive into real quick is one I have for weather, which you saw that chart there. What this does, this is obviously Javas is a little more verbose than we're liking in other languages, but it implements a function, and again, that's that pulsar library there. And we say what is the input type, what's the output type? And we're just going to take raw bytes, the default format for pulsar messages, if you haven't specified a type or schema. And then we apply that context we get from Pulsar, and that context lets us get a logger. And it also does this most important feature here. Let me dynamically send a message. And at that same time, if I built a brand new topic, if the topic that I applied this message to doesn't exist. See that topic here? It will create it for us as long as you've set the security for that. Add any metadata properties I want to add to that message, add a key, easy to track and send it and we are away and I could send this dynamically to as many places as you want. Here I'm using a schema that I built from just a standard java bean if you notice here persistent. We could have non persistent messages if that makes sense in your environment. And here I've got my tenant, my namespace, my topic. That's as easy as it is. You saw how we do that in python, really easy. There is a lot of different connectors in and out of pulsar. Makes it very easy for you to take in data, scale it out in a real time messaging environment. I've been Tim Spann thank you for joining me today. Have fun with the rest of your talk.
...

Tim Spann

Developer Advocate @ StreamNative

Tim Spann's LinkedIn account Tim Spann's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)