Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Timothy Span. My talk today
is building real time applications with
Pulsar running on kubernetes. I'm a developer
advocate at Stream native. If you want
more information, you could scan my QR code there. I've been working
on the flip or Flippin stack that lets you easily
build streaming applications for modern cloud
and environments. Clearly kubernetes has to
be a big part of that to be able to scale up, scale down,
deploy things with ease on all different types of clouds and
environments. I've been working with big data and streaming for
a number of years with a number of different companies.
If you want to keep up with me, I have a weekly newsletter.
Lots of cool stuff there. Definitely check it out.
Stream native is the company behind Pulsar
and we help people with that, whether they're running in our kubernetes,
in our cloud environment, or you need help on your own environment.
Reach out to us. So Pulsar,
as opposed to some other streaming and messaging system,
was from the very beginning designed to
run natively in the cloud and in containers,
as it was designed with a separation of concerns between
the different components in this streaming data platform.
So that gives us support for hybrid and multicloud.
Obviously all the different types of Kubernetes containers,
great fit for microservices and designed to run
in your cloud natively without any kind of squeezing
to fit it in there. To show you how we run Pulsar and
Kafka within K eight, you could see the infrastructure
here a little bit. We've got protocol handlers to
handle different messaging protocols such as Pulsar,
KAfka, MQTT, AMQP 91,
and those just plug and play as you want, pretty easy.
Those go into topics, those are handled by
the pulsar brokers, which scale up separately from the rest
of the platform. That's how you interact, send a message,
get a message, and what's nice there is, it's stateless.
So very easy to scale those up and down based on the number of
messages, based on the number of clients.
Very dynamic. And then outside of that, we have Apache bookkeeper
bookie nodes for storage. And we also
tier out the tiered storage, as you see here, where you see
Zookeeper. We now have expanded that to support
Etsyd to make it easier to run in kubernetes, so you don't
have to run another tool since you have Etsy
D anyway. So under the covers
we provide pulsar operators to make this
easy to run it wherever it may be, and we have
experience running that on premise public cloud, private cloud, wherever you
want to run this. And just to give you an idea of the different pieces,
a lot of things that are pretty standard for most Kubernetes
native apps around Grafana, Prometheus,
nothing going to be too different from other
things that you're running. We have full command line
tools to interact with what you need to. That's not part of the
standard Kubernetes controls as well as
a full rest endpoints for all the admin and
functionality that you need. So you could easily use
any kind of DevOps tool. Extrapolate that out from
what you're doing elsewhere and this breaks that
down in a pretty simple visual way.
We've got those pulsar brokers, we do the routing, we do the connections.
We have a little cache but you don't have big storage there.
Thankfully this helps with the automatic load balancing
and we break down those topics into segments to make it into
more manageable sized chunks. All metadata
service Discovery things you need in a cluster,
we store that in a metadata store.
This is an API that lets us add more. ETCD is
probably the preferred one for Kubernetes environment.
Bookkeeper also uses that same metadata
store for metadata and service discovery. As you might expect,
Pulsar sends those messages to bookkeeper where
they're started and managed.
And if they need to be tiered to tiered storage, that'll come
out of there underneath the
covers. When you're running this whole thing in your environment,
however many Kubernetes clusters you may have, we've got
this metadata quorum in there, we've got these cluster
of brokers and we've got what we call a bookie ensemble
which is a number of these data nodes. We keep
that within one Kubernetes cluster would make sense.
And then your global metadata store is probably
your eTCD that's sitting in
your Kubernetes environment. Just to keep that simple.
Why do we have all this? Well, Pulsar lets us produce messages
in, consume them out. This is a great way to wire
together different Kubernetes applications, wireless stateless
functions, things like if you want to do your own AWS,
lambda type applications in a robust
open source environment makes it easy to do that.
It really is a nice way to asynchronously decouple
different applications regardless of versions of
things, operating system programming language,
what type of apps, what type of protocols, what's the final
destination for your data. Very straightforward
for you to do that. And this is a very scalable environment
and we mean scale. It's not just scale out, but sometimes
you got to scale down based on workloads.
What's nice with pulsar is topics that
have your messaging data can easily be distributed
and moved without you having to hand do things or bring down
nodes. We could do this live and it'll be
handled for you by the platform. And you're
not concerned about that. You need to bring
down a cluster or shrink it down. Very easy. Could set an offloading
schedule. So whatever disk you might have locally
or within Kubernetes storage can be offloaded
to s three, HDFs, Azure,
wherever you might have your storage there, minio anything
that's s three compatible. Pretty easy.
Now how do I build these apps? So I've got a Kubernetes
based messaging streaming platform and
I've got a way to deploy it store my messages.
We also provide pulsar functions.
These currently support Java, Python and go
as your language of developing these functions. Can use any third
party libraries that make sense with those languages
and lets you build asynchronous microservices
the easy way and deployed on Kubernetes.
So with configuration that cloud be done from
a command line tool, a rest endpoint or automated
via code. I apply a number of input topics
to this function that I have and specify
an output topic perhaps, and a log topic perhaps.
What's nice is within the function I can
dynamically create new topics based on what the data
looks like, what the data feed is, or whatever needs I have.
So this is not in any way hard coded to a function.
You could change this dynamically, which is great.
When you're deploying these in different environments, you could always add
another topic as an input. So I have a new set of data that wants
to use my spell checking function. Just add
it. One that does etl any bit of code you
might have in Java or Python ago. Easy to put it there
and have it get every message that comes into those topics
event at a time, process them and do with what
you want with it. Pretty easy to do. Great way
to do your computation isolated from the standard
cluster in the same kubernetes cluster or
a separate one. We have a function mesh that makes it easy to
run as many of these functions as you want and
connect them together in a streaming pipeline. Now this
is an example of the functions you don't have to use a
lot of boilerplate pulsar code, not very specific to pulsar,
but this is an example of a Python function.
Very little you have to do here. I'm importing that function from pulsar,
so I got to have the pulsar client create a class in Python,
define my init there. And this is my major function
here with the process. Self input is the input
you're getting from whatever event. And context is an environment
that you get from pulsar that can let you
create a new topic, send something to a topic,
get access to the logs, get access to
shared data storage. Couple of different things you could do
there. Very helpful. In this example, I just take that input
that gets sent in from the event, process that to
a sentiment analysis, and then just return
a json. As you see here, nothing pretty tied
to pulsar. This will just go on to whatever output
topic you sent. If you wanted this to be routed to different topics, you'd use
the context to do that. Pretty straightforward.
So our function mesh lets you run these pulsar functions,
and this was designed for kubernetes and
it's a great way to put these together in a pipeline.
One thing I didn't mention with pulsar functions is
the dog fooding aspect. We have
used these pulsar functions to build connectors
for people to use in the platform. So there's
ones created for Mongo, MySql, Kafka, tons of
different things. And there's source and sync ones. What you do
is point it to your database or whatever
you have there, put in any criteria it needs.
That just goes in a Yaml file, everyone's favorite. And then
that gets deployed as a graph
of connections here between all these functions you might have in a pipeline
and that gets deployed to the function mesh.
Pretty straightforward, as you see here. When we do a
Kubernetes deploy of pulsar, we provide
you with a couple of yamls that you cloud specify. Again, that bottom
one's the function mesh that defines how we run
these functions for you. Kubecontrol,
everybody's favorite Kubernetes API. We've got the
function mesh operators out there to deploy that within the
cluster and that will
connect to the pulsar cluster, which is probably an
adjacent Kubernetes cluster to keep those
separations of concerns there and scale up separately
so you can have compute completely isolated from any of the
data there. This makes it great for doing things like event sourcing
or any high bandwidth workload you might have coming off
of all these events. I have a lot of
pulsar resources available for you. There's my contact
information. I love chatting about streaming Pulsar,
and if you've got any ideas for improving how
to do this in maybe your own Kubernetes environment,
or if you've got other tools that you think might
be of assistance, please contact me.
This dinosaur here, scan it and you'll get right there. Or here
is the link within GitHub. Pretty straightforward.
I have a couple minutes left, so I want to show you
why you might want to use Pulsar for things.
This is a very simple app run in one
pod. It's just Python running an HTML page,
but that HTML page has some Javascript,
which makes a websocket call to Pulsar. So make sure you got
to have all those ports opened up and that will get the
data dynamically. So it consumes the
data from the API and we're able to just
stream that out as it comes in. We also have
a management console that could sit outside of the
environment. Very easy for you to use that. This is completely
open source and it'll let you manage a number of clusters,
all your tenants, namespaces and topics within the environment.
Pulsar's multi tenant, which is great for having
one people use that one cluster you have, regardless of
their use cases, create a couple of tenants,
maybe one's for Kafka users, one's for the
public, whatever ones make sense. And underneath
there you'll go to all the namespaces that make
sense for that environment. And under there
is where we have the topics related to that namespace.
And you cloud have as many of these as you want, get access
to all the topics, create new ones, delete ones. It's a
nice little admin tool. If you want to see some example functions,
I have them out there in my GitHub.
One of them we'll dive into real quick is one I have for weather,
which you saw that chart there. What this does,
this is obviously Javas is a little more verbose than we're
liking in other languages, but it implements a
function, and again, that's that pulsar library there.
And we say what is the input type, what's the output
type? And we're just going to take raw bytes, the default format
for pulsar messages, if you haven't specified a
type or schema. And then we apply that
context we get from Pulsar, and that context
lets us get a logger. And it also does this most
important feature here. Let me dynamically
send a message. And at that same
time, if I built a brand new topic, if the topic that
I applied this message to doesn't
exist. See that topic here? It will create it for us as
long as you've set the security for that. Add any metadata
properties I want to add to that message, add a key, easy to track
and send it and we are away and I could send
this dynamically to as many places as you want. Here I'm
using a schema that I built from just a standard java
bean if you notice here persistent. We could
have non persistent messages if that makes sense in your environment.
And here I've got my tenant, my namespace, my topic.
That's as easy as it is. You saw how we do that in
python, really easy. There is a lot of
different connectors in and out of pulsar.
Makes it very easy for you to take in data,
scale it out in a real time messaging environment.
I've been Tim Spann thank you for joining me today.
Have fun with the rest of your talk.