Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here, senior sales engineer.
today I'm going to be doing building IOT applications with open source
and we'll cover some different options, different things you can
do in different, technologies.
And again, just give you a little sampling.
If you look through my back catalog, there's a lot of different things.
This will just get you started.
And when you're ready, contact me.
And we can explore some other areas of cool technology.
that we find interesting.
Every week I do AI, streaming, data, all kinds of stuff, NiFi, Polaris, Flink,
Kafka, ML, AI, Streamlit, Jupiter, Iceberg, all the cool stuff out there.
You could either subscribe and get it automatically, or just
come out to my GitHub, or Medium, or Dev2, or Hashnode, Sub stack.
I put this everywhere so you don't have to look for it.
It'll find you So check that out if you're interested in this kind of technology
mostly open source, but also covering my employer snowflake So you can see
some cool stuff there I'm tim span.
My blog is at data in motion.
Dev I'm at twitter or whatever it is there.
I got to add my Blue sky for you worked at a bunch of interesting
data companies I'm based in the New York, New Jersey, Philly area.
Always looking to, collaborate in the community in an open source and at
different, events, conferences, meetups.
I run a couple of meetups.
If you're looking for a speaker or you want to collaborate on an
event in those areas, let me know.
If it's virtual, I can collaborate with you anywhere.
I am one of the top IoT experts at D Zone.
Love working with devices.
I love working with my friends at NVIDIA.
We do some cool stuff, whether that's AI at the edge or data streaming, rag,
vector databases, whatever you have there.
So check it out.
My medium is here.
Ton of different examples at my GitHub.
So we'll do an intro, look at some devices, look at some apps,
messaging, get some ideas there.
NiFi from Apache Project.
It helps us do a lot of the different stuff there.
Now one of the cooler things out there is the new Raspberry Pi 8 gig of RAM.
They've also added a AI kit.
And there's two different versions.
Now right now I'm running the older one but I have the
newer one sitting right here.
So at some point hopefully soon we'll get a little break in
the action maybe for Christmas.
of an AI powered Christmas tree with that 26 tops instead of the 13.
13 is pretty good.
just a couple of years ago, you got a powerful NVIDIA device.
It only had 10.
13 and 26, pretty amazing.
we're also gonna be looking pretty shortly at the Raspberry AI camera.
As soon as I figure out, different projects.
Again, if you've got some interesting, IOT plus Edge AI use cases.
Let me know.
I'm always looking for stuff, especially if you're someone using Snowflake.
I'm going to show you one of the examples, is on the AI side,
ML side, is pose estimation.
We're going to look at a person.
I guess that includes me.
And it's going to figure out where my eyes, ears, shoulders,
and joints are in motion.
Pretty cool.
Now you combine this with an NVIDIA device and a robot, and you could collaborate.
Maybe it could follow along.
Maybe we could do a group dance.
Probably pretty expensive to do that with some robots, but pretty cool.
Now this Halo 8 is on the Raspberry Pi AI kit already.
And it finds all these 17 points.
right now I'm sending out the, eyes, but there's a lot of new stuff coming.
And again, I'll probably do that in the, hopefully over the holiday break.
We'll see, how the weather is and how much time we have.
Highly recommend you get this, especially get the new one with, 26 tops.
Really easy, really, designed well for the Raspberry Pi 5.
I'm curious if it will work on the new, keyboard verse, version, 500.
I haven't tried that yet.
something to think about.
Probably need some kind of adapter to get, that going.
But, knowin the community, someone's probably workin on it.
clip zero shot, I gotta work on that one next.
Again, 26 tops for the new one.
looking at some of the other, options out there.
If we look at the, Jetson AGX Orin, it's 275 tops, 64 gig of RAM and 2000
cores, and this is today, December, 2024.
By next year, I would not be surprised if there's edge devices with
double those, performance numbers.
And that could keep going on for a while.
I know the temperature and keeping in the small package, but they are
advancing these units pretty quickly.
I have an old Xavier.
That's only at 21 tops.
Now we mentioned this new Raspberry Pi is at 26.
the speed of, innovation here is moving.
There's also specialized cameras, which I've been playing with from Mufti.
Again, they do some things that you can pair that.
with Raspberry Pi.
I'm trying to see if you can combine that with the new AI
kit for one of the AI cameras.
A bunch of different things.
There's also specialty devices.
This is going to be coming soon.
This is the Synthcap Watcher.
Lots of stuff going on in AI, in IoT and AI and Edge stuff.
Now you can make an Edge server and these can be running anywhere.
Now with the new Jetson.
You can run a lot of containers of apps.
You got 64 bit processor.
You got a lot of RAM.
You got fast Wi Fi, Bluetooth.
So you're just running a server that can be in a moving vehicle and a robot.
So the thought of IoT is just the, the little teeny small
powered devices is over.
Certainly those are extremely important, especially in a
mesh where they can communicate over a lighter weight protocol.
and feed you in a whole bunch of data and you have one of these edge servers doing
a lot of workload to you know aggregate that for you and then maybe you send
that out over Kafka or MQTT or Pulsar or a different protocol up to you and
one of those little devices sitting next to me I'm not going to touch that one's
plugged in There's a little warm in here.
I got to check my temperature is, Adafruit Funhaus.
This returns it as Jason.
Now, since it's Jason, I could just, and this is going over MQTT, which can go
right into a Pulsar, server or into Kafka over a proxy or to a regular MQTT server.
Then you'd have NiFi or someone else decide if I want
to take it raw, convert it.
Aggregate it, maybe drop it into S3, or push it into, Configure Connect,
go right into Snowflake, or go through one of the various, Snowpipe channels.
Lots of options here.
This one is fun.
This does not update too frequently, so this isn't huge data, but if I
had hundreds of these over multiple facilities, giving me updates every few
seconds, the data starts to pile up.
You can see this is one of the more traditional small devices, not too much
RAM, pretty slow processor, but that's great for doing this simple, read a couple
of sensors, push that out to you, and do that either over CircuitPython or some C
variants, a couple of options out there.
Raspberry Pi can do a lot.
I have one that has some thermal sensors.
If you haven't seen the garden breakouts, they are inexpensive from UK and
they've got lots of different sensors.
Very easy to add them and pop that stuff out.
Those are pretty fun and the thermal camera on there is really cool.
I take a lot of pictures with that one and again you can put this on various versions
of the Raspberry Pi, whether it's the one, two gig, eight gig, ten gig, expecting
a 16 gig to come out soon and I'm like, I got to save some money for that one.
So data and cat.
I like cats.
So we're gonna take IOT data, open source, get it into Apache spread it around.
But if you're in some enterprise environment, use those IOT open
source tools, either land it right in.
An S3 bucket and have it processed.
Send that in directly over JDBC into Data Warehouse.
Send that into Kafka to get pushed into Kafka Connect.
I have Navi push it right into Snowflake.
There's a lot of different options there.
But suffice to say taking IoT data using open source tools to get that into
your production and cloud data lakes.
Pretty straightforward.
Now I'm going to recommend that one of your main tools here,
let me update this right here.
This is like 200.
I keep forgetting to update that.
And it's annoying.
let's get back in there.
Apache 9.
5, if you haven't tried it download it now, version 2 and greater,
or use a provider like Datavolo.
This lets you ingest, move, route, enrich data, especially IoT data, which
can be sparse, which can come in very fast, which can sometimes be broken,
partially broken or come in weird formats.
NiFi guarantees you get delivery, make sure things are buffered appropriately,
allows for back pressure, slow thing down so you don't break things downstream,
lets you prioritize messages, so if there's A message coming from one of your
devices that says, Hey, I'm overheating.
That's more important than the standard data.
That's yeah.
Temperature is 70, 70, 70.
So you could push that through faster.
You can control latency and throughput, change your tolerance.
You can play around with a lot of settings, just in the
GUI, depending on your needs.
For most people, you'll never have to do that.
It's just those features you may need once in a while come in handy.
And.
The killer.
That's awesome.
Data governance, data lineage, data provenance is part of your data
management and quality and security.
Hugely important.
So that you know who owns the data, where it came from, where it's
going, especially in the day of AI.
When any kind of source that you could prove where you got it is going to become
important when they start questioning, how did you train these models?
That's awesome.
why, where did this data come from?
Why are decisions made?
You could say, hey, I could take this data provenance and lineage
from NiFi that shows me when it was consumed, how big it was, what it
looked like, where it came from and push that into, say, Snowflake Tables.
So I have a huge, data warehouse that I could offload into Apache Iceberg.
Keep that in cold storage on S3 in your data lake.
So when someone goes, Hey, where did this data came from, where
I need to rebuild my, data that trained a model, you could do that.
Lots of different processors, lots of different sources,
fully secure, fully clustered to run as many nodes as you need.
Version control on your data.
And what's really cool in the day of, RAG and AI is support for binary and
unstructured data, images, tabular data, PDFs, documents, email, slack,
discord, whatever that data is, be able to ingest it, enrich it, process that,
and do that with a visual environment, be able to work with event processing,
route data, connect to, any kind of central messaging, whether it's MQTT.
Rabbit, Pulsar, Kafka, even old MQ series, any of those, messaging protocols you
might want to do, Redis, and be able to get things through Kafka really easy.
NiFi 2 had some really important things for, AI, being able to
run Python processors, easily parameterize your data using Redis.
Some of the newer JDKs, which are faster, better performance, better threads.
JDK 21 and beyond, it's a game changer.
If you're like, I don't know about Java, Java is back.
This is awesome.
Again, that lineage we mentioned, there's a rules engine there
to help you with development.
And some specialized features if you use the Datavolo version.
Everything to connect to Azure.
Integration with Slack and Zendesk and a ton of things.
Be able to look at a table and say, Snowflake through JDBC
and use those as schemas to validate data through the system.
So someone creates a table which you could do really easy in Snowflake
by uploading like an example JSON of what this IoT data looks like
and then have NiFi automatically use that table as the schema to let you
know what things should look like.
Or you could use Amazon Glue if you use that.
Support for OpenTelemetry if that's your, environment.
NiFi runs as a cluster right now with Zookeeper to decide who's primary one
running things, coordinates the cluster, keeps some provenances in there, and
it has a great way of keeping all the different workloads isolated very fast.
Now I wrote, as I've been going through different example code, The
thing that's going to come up in IOT.
Beside you, I got a sensor.
What type of sensor?
What's the value?
Where is it?
These are real world devices, real world systems.
They exist somewhere.
And that somewhere is important based on how you might want to join data locality.
So knowing the lat long, I'm able to process that really big.
So I wrote a.
Library that takes an address, converts it to lat long, so say I'm walking
down the street and I want to see what sensors are reporting near me.
Again, if that's, you subscribe to the service or you can build
your own and have friends do this.
There's also some open source sensors for air quality out there
that you could communicate to.
But having where you are, being able to do that, really cool.
So I created a Python little script here to do that and
made that a processor for NiFi.
Again, you could also use that regular Python inside of a Jupyter Notebook.
Now if I've got all these things in a stream and I want to do some
SQL analytics on them before they land in your cloud data lake.
Or before I push it to the next layer, with Cortex AI
or wherever it happens to be.
Being able to do streaming analytics with Flink SQL, again,
open source, pretty powerful.
And we'll take a look at that in the next talk, or you can
look at some of my old ones.
I've been looking at different Edge models here, and we don't have a lot of time to
talk about them, but I wanted to mention a couple ones that have been showing up.
Hugging Face has A new SMOL, which is cute, language model that's only 1.
7 billion parameters, runs pretty, quick, doesn't take up too much
resources, so I can run this on the Pi.
Certainly NVIDIA has bigger models, and it could run, it could support
some incredible stuff, and so can that, our AI, kit there can run a
little better, but there's a ton of SMOL language models, check it out.
them on Hugging Face, where you can get a lot of work done in these constrained
environments and get work done locally before it's sent out, because you might
need to make decisions right away, especially for something that's in
movement, like a vehicle or a robot.
The future is here now, so run as many of these models.
We'll go through some code here.
and some things running.
Yeah, I should probably do that before I give you links to everything
because things may time out.
I am, let me run my consumer here.
I have, a device running over here.
Did I stop?
I stopped it.
So I have an environment agent running over there.
And this is one of those, sensors that I got on a Raspberry Pi.
You can see here it's getting a bunch of different values,
humidity, pressure, something.
So I'm sending those into Kafka and you can see if you wanted to be in a Snowflake
Notebook, Jupyter Notebook, somewhere as a data scientist or data engineer,
you can grab this data immediately.
You can certainly wait till it lands in a table or it's somewhere else
where I can consume it and I could start using it to train models.
as part of an application.
I could put this into a Streamlit app just to get you some ideas
and maybe I could just consume a couple and then start working.
Or I could have Flink SQL consume it, do some kind of aggregate
and push it somewhere else like S3 or directly into Iceberg.
Read the Iceberg table.
You got a lot of options there.
makes it pretty easy.
Now I took a one JSON file and I loaded it here, which was pretty
easy and had it build a table for me.
And you could see it automatically, created all the types and got it
ready for me, which is pretty awesome.
And then I loaded it and now I have this available.
So now I can use different connection to load more.
Now I could land a file S3 and have it loaded in.
I can push it to Kafka, use JDBC, call right into there.
I've got a couple of options there, depending on what we could do.
I can also have, NiFi do that.
Of course that timed out, everything times out eventually.
I can also have NiFi consume that from that same topic, get
that data and push it to an edge.
And we can see we got a lot of data here, but it's, the one's
coming off there and with its provenance, we can see the data here.
Yeah, I know this is Jason, Shelby and Jason, and you can see we got the,
all the different sensors on there.
And I like to say, get the time in there, what server I came off of, put
a unique ID in there, all the different sensor readings, plus what host was
it's one of my Raspberry Pis, what's the MAC address, if you're concerned about
that, certainly people can fudge that, how much disk is left on my devices, I
tend to put those in the raw messages.
If you want to pull those out, or take them out.
Some of the values or break this up into individual values, whether you do that
on the device or do that in say NiFi, or do you do that when it lands, you
have a raw table, then you put it into a table specific to where you're going.
That's up to you there.
Now we can look at the data provenance to see, when the data arrived.
And we could see the details, size, what component touched it, when,
all the different, data there.
we could see the Kafka offset.
You could see, if it had multiple partitions, the timestamp.
All that kind of, important information there.
Pretty, useful to give you an idea.
Now we've got another IoT app.
We'll keep this running because that's fun.
I'm going to run the pose estimation one.
This is on a different Raspberry Pi, that's got a camera right in front of me.
I'm doing that because it's grabbing that, though it looks, pretty silly.
It's sampling some of those.
I was sending every single picture, every single time, and it was too much.
So I learned my lesson.
That's just, into my own little local network here.
So we got a bunch of those coming in.
those are going to a different Kafka topic.
And it's also going, as you can see here, into my Slack channel.
And it's a picture of me waving my hands around.
obviously in different environments, it could be more interesting.
But just, it is an example of what you can do.
Pretty cool there.
Now we've got our Kafka data coming in and it's JSON with all these fields like we
saw and we could have pushed it to Avro.
We could push it into other formats, out of schema, what have you.
again, if you're doing a more in depth app, you've got a lot of options there.
So let's look at some of these resources.
This is more on that pose estimation stuff.
I've got an article, got source code, everything you need there.
Then we got another one that does a little, more on that.
I've got the code separate, code for the other piece, walkthrough article.
I also have a group, how to be an AI engineer, walk you
through some of the basics and different open source tech there.
You might want to check that out.
This one is combining AI and IoT, talking about air quality and how
you could do rag against that.
It's still interesting.
I do another one with street cameras in New York.
They have a lot of them publicly available.
As long as you sign up, you can see what's happening in the streets of New York and
as if we've seen recently, potentially very interesting things going on.
I'm Tim Spann.
check my code out.
Make sure we automate everything.
But as long as we have enough cats involved, we'll be in a good place.
thank you for, attending this virtually.
Oh, I got some other example here on, GTFS.
If you haven't heard about that.
This is not exactly IoT.
But this is where, different transport systems, give you access to data, and
you can, see where, for example, this one where the Boston buses are, and I wrote
a Python processor here that takes, a GTFS URL, converts it into JSON, and then
we could split it up, convert it just to fields we want, get it into format we
want, and then I could do a lookup Now I can probably convert that into a snowflake
lookup and we'll be pretty cool there.
But if you like anything you see here and you're looking for more,
definitely take a look at my newsletter.
There's always all the examples.
When this video comes out, I'll have it here with the slides here.
Everything you need.
Thank you.