Transcript
This transcript was autogenerated. To make changes, submit a PR.
I am a principal engineer at simply smart Technologies, private limited,
and been working with that particular product since last three years.
And predominantly the responsibilities includes
like we manage all the software side development to development
lifecycle to DevOps and all those sort of things. And that's
pretty much the intro. So then let's start with our talk.
So talking about the topic that I have chosen,
like the connecting with dots. So why connecting with dots?
Firstly, we work with IoT, so it has
multiple nodes and we are connecting all those nodes to one another.
So at the end we are connecting multiple dots to one another.
But ironically, that has nothing to do with the talk name
that I have worked, that I have figured it out. So it's due to
because growing up I was fascinated by all
the things that have been done and how Steve Jobs used to present all those
things. And so in one of his speeches at Stanford
University, he talked about how easy
is to connect the dots or the incidents that
happened with you in the past than to connect anything that in the future.
So this is a talk about my experiences and how
we have evolved through it and what sort of experiences and learnings we
had while improving our own product.
And that's why I'm connecting all those dots, looking backward.
And that's why the name is connecting with dots.
Now, what is a simplismart? So simply smart is an end
to end water management tool. And in simple words,
I would say you can predominantly look out from
the motto that we have making simple things smart. So what we
did, what we revolutionized, or I would say like the game changing factor
that we have done, we have taken a simple
thing, simple as water and we have made
it smart. So how we did that, so what we did, we have given
an end to end water management tool all the way
from the hardware itself to all the way managing all the things remotely using
a mobile app and admin tools for numerous
residential as well as commercial projects,
which in result save a huge amount of water at the end
and which is our main goal while growing up, to save water.
Yeah, so let's connect the dots then, starting with.
Yeah, so this is just a gif to show what we are
connecting here. Now, let's start with the first one.
That is the time series data. So while working with all the people who have
worked any way or any fashion with the IoT sort
of things, we all know that we work with time series data. So what is
the time series data? So whatever that is
periodically given to us, or I would say like whatever it is dependent
or the time is the main constraint in this case. So in
our case, considering water meters, so all the readings from the meters
that we have received at the cloud level and all those things is a time
series data from which we can perform any sort of analysis and all those sort
of things. Now moving ahead, what are the challenges that we have
faced which led to this sort of talk and all those things?
So while growing, the first challenges that we faced
dealing with all the maintenance related stuff, and we always have that
fact, like every day if your end
user is releasing a thing like we have an issue with this sort of hardware,
and then your hardware person is going like this, that doesn't sound good,
right? And growing up at the start,
it's kind of easy thing to manage, simple scale of the meters,
but once you grow from that hardware account from 1000
to 10,000 to 10,000 to 40,000 to 50,060, 70,
80,000, it's quite hard to manually doing those things.
So that was a bigger challenge that we have seen in the past like four
or five years working with simply smart. Now the
second part is abnormal utilization or usage. So this is
why I did terminalize it as abnormal usage,
because since we are working with water, so I'm considering that as a terminology to
consider usage. Now you can see like if there is abnormal
usage, then everybody will be as frightened and as surprised as this
guy is. And we actually
want to save water. So we need to resolve those abnormal usages
and all those Sort of things so that water can be saved properly.
So now what's the solution to these problems?
And always in this today's era, every solution drilled
down to the first thing that we came in our
mind is that machine learning? Yeah. Yes, that's the solution we
got. But we are like our most
complete system is designed in Ruby. Like all the end
user handling and all those sort of things are designed in Ruby. So is
it machine learning with Ruby? But is it really possible to
perform machine learning sort of application using Ruby? Because we all know that
there are so many popular languages, I would say programming languages
moving ahead, like Python R, even Javascript has
few things. And when I tried searching about it,
like can we do machine learning in Ruby?
At the start it was like in top ten, I didn't find any ruby
related sort of, nobody talked about it. Like we can achieve this thing using Ruby.
So then I can search a lot. And from one of the conferences
I got a contact of this guy, Andrew Kane.
Yes, somebody did work in Ruby to perform certain sort
of machine learning in Ruby natively in Ruby itself. So that
somebody who is good with Ruby and some sort of mathematics
related background, we can solve these sort of problems using simple
solutions, using ruby itself. So you can definitely look
at this guy, that is Andrew Kane. He has done a lot of work in
these sort of things and as well as some sort of postgres sort of optimization
as well as some graphical analysis. And all those
things you do look at. So you can see here like there is a
blazer, Ruby, gem gem or a package
which works on business intelligence made simple. So he has solved
smaller problems using simple solutions. And it's inheritedly
what we are doing with the water meters, making simple things smart
as possible. So let's look
at some few packages that we going ahead before going
to the actual solution and how we solve those problems.
So moving ahead. The first one is Nemo array.
So we all in some sort of way worked
with numpy because we all know machine learning drill downs to mathematics
at the end. Yeah, we all can agree on that fact.
So I thought like how we can do
similar stuff, like similar to numpy, some sort of matrix multiplications as
well as some sort of scientific calculations and all those things in natively in
Ruby. Whereas while looking at Andrew Kane's profile and
other solutions as well, this was one of the package that I have found
that is called pneumo array, which can easily
perform all those sort of scientific level or mathematical level calculations
easily similar to numpy and other similar sort of libraries
in Python and R as well. Now you might be wondering like
the screenshot that I have added here is some sort of
looking like a Jupyter notebook,
which is first kind of predominantly used
for Python and how I'm running the ruby code
in a Jupyter notebook.
So for that also there is a gem called irubi which
helps us in easily configuring these things to use a ruby kernel
instead of a Python kernel. Do look that out as well.
Yeah. Now comes the part how to handle the huge data,
because we all know that machine learning and all those things
work, do need like a huge amount of data to process and
to work around with. And in Python that
sort of work is handled using data frames,
using pandas while moving ahead.
So I wanted a strong solution, or I would say like performance related
optimized solution to move ahead and to
get the data to process the data efficiently. So I said like
is there any library to go ahead with data frames?
Now when I searched about it,
I found many search outer solutions. But out of that, Rover is one of the,
I would say like most optimized one and predominantly written in Ruby itself.
So I'm always emphasizing on written Ruby because
there has been a solution earlier that they were made
wrappers around it, the initial libraries and all those things,
and they have made some sort of copies of those Python libraries
in Ruby itself. But whenever I try to override
any sort of applicability and
the solution to my custom solutions, I always have like it's
written in Python. So inherently, whatever I'm inheriting,
or I would say like rewriting
it around, it's in Ruby. So how that will get an optimized solution? Because I
can't override all the things, because every language has its own power and
own sort of way of doing things. So by having
a plane like something which is written,
I would say like natively in Ruby, that helps me a lot to override
the things, to optimize the things in a different level fashion. Now,
firstly we had a mathematical level calculation, secondly, how to handle the data.
Now comes the part how to plot it out.
Yeah. So before plotting it out, we all know that we
wanted, while growing with scripting sort of languages,
we all know that we want some sort of solutions which are already there and
optimized way in Python.
It's kind of had a scikitlearn sort of option where easily
you can within like six, seven to eight lines of code,
you can train a small level of
applications and do those things. In here there is a ruml.
So you can see from the code that what we are doing, we are loading
a sample data set similar to that
we used to do in Python as well. Then we have transformed it using components
and randomly detecting it. And then transform is nothing
but like we are loading a model into it.
Now we have fit that particular, all those samples
to a transformer, then we
fit it and then we have trained it with SVC linear model,
so support vector kind of classification level model.
And we have again fitted the transform data to that particular
model. And then we opened it. Now comes the part
why do we need like file open and we are dumping it somewhere and all
those things? Because what I believe that we always can't
just train all the things at runtime and then gives the solution. So we need
to kind of pickle it out or dump somewhere that
model which is trained on huge amount of data. So we are dumping it in
the first block, I would say code block. Then we are
again loading that particular, like we
are loading the test related data. Then we are loading the classifier
from the files, loading it out. Transformers as well as classifiers. Then we are
trying to predict it. After predicting, you can
see like we are getting some sort of 98% sort of accuracy here.
So it can differ from different sort of application, different sort of solutions, or different
sort of problem statements. But we are getting it done
with not more than like, I would say ten to twelve lines of a
code. Yeah.
Ironically now this has been added like considering some indian
sort of audience. So there is a package named Daru in
Ruby, which works with data analysis. With Ruby. And whereas
in hindi slang or in indian slang,
it kind of represents a liquor. So it's kind of an ironical
situation or a laughable situation that some sort of data analysis sort
of package is named like a liquor.
Yeah. Now there are so many other applications,
there are some other references as well, like some sort of textual.
If you want to perform some sort of textual analysis, you can go with fast
text. If I want to do plotting sort of things, these two are there,
nya plot and Vega. There are so many others as well. Do look it out
and do explore it more and do contribute to it.
Now moving ahead with our own problems that we have really faced,
like challenges and all those things. Now the solution is anomaly detection.
So we all know that whatever, like if that particular reading is
behaving resulting into some sort of anomalies,
there may be some sort of problem with it. IoT can be related to a
maintenance related problem. IoT can be related to abnormal utilizations and all those things.
But the solution is anomaly detecting and to do that.
So let's just understand what sort of anomalies detection and how
those are impacting the water related solutions.
Or I would say like Iot related solutions. Moving ahead. The first one
is point outliers. So I just added a graphical scenario
so that everybody can understand. So in this case you can see like
we are having some sort of point level outliers here.
And you can see. So now in terms of Iot solution
where this sort of anomalies can happen. So considering a water level
solution, there can be some power failures.
And due to that there can be some junk reading gets sent and we can
see like that sort of problems we can detect. And if that
sort of problems are continuously happening regularly within
a day, there are two, three these sort of outliers. Then we need to
check the support level system which is supporting like electrical
system and other systems which are supporting the hardware that we are actually installed.
At the site. Now comes the part, subsequent anomalies.
So now these are the two types of subsequent anomalies. Like if you
are working with single time series property or single time
series sort of data that we are going at, then you can get
univarried time series sort of data. And you can see here we are
having some sort of outliers. There is one outlier, there is an o,
two outlier, and it's kind of subsequent, it's not just
a single outlier, single point outlier. And now the case
happens, like if you get this sort of anomalies here,
what does it depict? Because at the end it's all about what analytics
we are getting out of it. So from
these we can understand, like there is some issue with the meter because it's
continuously, it's fluctuating up and down, up and down, up and down. If there's a
subsequent anomalies here, in this case we can
say like there might be some sort of issue with the hardware itself,
not the supporting system that we are providing to it. And from that we can
inform a hardware personnel proactively to go and check.
So that kind of solve these first two can solve a proactive
maintenance sort of case. Now comes the part, the abnormal utilization
sort of part, how we can solve that. So to solve
that abnormal utilization part,
let's just first detect these sort of outliers. Then we can discuss about that thing.
So in here we are using profit gem.
There is a library named profit with Google, and they have
made a similar sort of solution in Ruby. Easily detect some
sort of basic outliers to it, so that we don't need to
do all those things, because at the end it's all about solving problem
efficiently in a lesser amount of time. So to do that, this is actually
helping us. And in this, you can see what we are doing here.
We are just iterating over
the data that we are getting to get it a series level data from the
CSV that we are providing to this particular solution. And then
we are asking profit to detect the anomalies with this series. And now we can
see it detecting the anomalies with the x and y coordinates to it
from the values as well as other time series related time
data. And using this sort of simple
solution, we can detect the point outliers as well as the multivariate, single variate,
multivariate outliers as well. And we can easily solve
the proctimal maintenance related problems in a huge IoT systems as
well. Now comes the part let's plot the data.
Why did the plot that? Because it's just showing like some sort of scientific level
data. I'm just plotting. And you can see like there is some issue in this
particular time frame that we are getting an outlier.
Now comes the part how to detect abnormal
utilization. So we can do it by
detection, by forecasting. It's a very popular method that
in a machine learning what we do, we always forecast things. We always predict the
things like how things should be, how things can be and all those sort of
things. And doing that, what we have observed that
in our solution, like water level solution,
we can't just go ahead and see is there any spiked
and all those sort of things, since it's a continuously increasing thing.
So what we can do in this sort of solution, we can predict what
could be the next possible value. Like from the trends, from the past
learnings, like how the meter is consuming, how that particular resident or
some sort of commercial project is consuming. Like let's say they're
continuously consuming some sort of odd between, like in the morning, ten to twelve,
they're continuously consuming some sort of 1000 liter, whereas in afternoon
there is some dry period where they are just consuming 200 to 300 liters and
afterwards in the evening, again, that's a rise. So this is a pattern
that we are learning. We are asking our machine
learning model to learn and then to detecting some sort
of outliers from that. Like if suddenly at the afternoon it's
showing some sort of 1200 or 1300 liters, then it's
an abnormal case and which will say, like there might be some leakages or
somebody, like there is a continuous consumption due to some tap open and somebody
forgot to close the tap. And to save the water, we need to proactively
alert the resident or a commercial, whoever is responsible
to the thing and to solve that thing, we have detection by forecasting.
So you can see here from the graph as well, like the black dots are
the patterns which are acceptable patterns, acceptable forecasted values,
like the forecasted values. The green one is kind of forecasting
pattern to that thing. And whereas the red ones are like actual values
which are out of the forecasted values. So that we can say, like in this
particular problem, particular time frames, there is some sort
of problems in our system or somebody needs to go and check that
the utilization is happening properly or not.
Now how to do that? So as I illustrated, we need to use
those applications as well. So in this case, you can see I'm having a
time frame and a simple one level reading. So we are going with single variate.
Firstly, we have loaded the data from CSV
using Rover to get it in a data frame sort of application.
Then what we did, we have loaded the profit
model from profit. Then we have fit the complete data,
like fit it's in terms like training it with the data that
we have loaded. Then what we are doing, we are performing,
we are asking it to forecast it.
And then we are asking. So firstly,
we are asking him to get some future data frames because we want to
have some sort of fixed parodicity in this one.
Then we are asking our model to predict it for those
future dots. Now let's see what
it forecasts. So you can see here that
the line that we are seeing is a forecasting line,
whereas the black dots are actual consumption. From that we can see
the dots that are quite slightly away from
the forecasted line. We can see like there is some sort of abnormal consumption in
that line. So in this particular data, it's continuously increasing
data. There might be some cases whenever you are having some sort of hyperbolic
data as well and you can detecting it using that as well.
Now let's summarize what we have discussed in this particular talk,
a short talk I would say. Firstly, we have discussed what
is the time series data and how it impacts most of the Iot solutions that
we work with. Now comes the thing, the challenges that
we have faced and most of the people who are working with IoT related
solution do have faced these sort of problems in one or the other way.
Now comes the part how we can solve this using machine learning with Ruby,
not with just normal machine learning problem and solving it using the traditional python related
solution. And now the last one, how we used anomaly
detection technique to solve our challenges.
Thank you. You have been a great audience, like you are listening
and you have washed it out now.
Thank you. So you can do visit our website like simply motor deck
to understand more like what we do and how we do. And you can connect
to me on X at Vishwa
Desurukar and let's move things over. And I do
urge people to do look at these solutions. If you are a ruby developer and
a budding ruby developer as well, you can do look out these packages and do
contribute to it. That will help a lot to our community.
Now. Thank you. Con 42 Internet
of things to speak and to share my knowledge and whatever
I have. I would say like my experiences over the period.
And yeah, for all the guys,
do enjoy the solutions and do connect with me if you have
any if you are interested in knowing how all the other things we do
from the end to end solutions to all the other things using different sort of
architectures. Thank you.