Conf42 Internet of Things (IoT) 2023 - Online

Connecting the Dots: Unleash the Magic of Machine Learning in IoT with Ruby!

Video size:

Abstract

Revolutionize IoT: Preventive Maintenance & Anomaly Detection with Ruby! Witness language-driven magic as we showcase live demos of Ruby-powered IoT devices detecting anomalies and ensuring optimal performance. Join us to discover how Language and AI empower IoT with preventive maintenance.

Summary

  • I am a principal engineer at simply smart Technologies, private limited. Simply smart is an end to end water management tool. We have taken a simple thing, simple as water and we have made it smart. The talk name is connecting with dots.
  • The first challenges that we faced dealing with all the maintenance related stuff. The second part is abnormal utilization or usage. Now you can see like if there is abnormal usage, then everybody will be as frightened and as surprised. And we actually want to save water.
  • Is it really possible to perform machine learning sort of application using Ruby? Rover is one of the most optimized one and predominantly written in Ruby itself. Now comes the part how to handle the huge data.
  • The solution is anomaly detection. IoT can be related to a maintenance related problem. Also related to abnormal utilizations and all those things. Using this simple solution, we can easily solve the proctimal maintenance related problems in a huge IoT systems. Now comes the part how to detect abnormal utilization.
  • I urge people to do look at these solutions. If you are a ruby developer and a budding ruby developer as well, you can do look out these packages and do contribute to it. Con 42 Internet of things to speak and to share my knowledge and whatever I have. Thank you.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
I am a principal engineer at simply smart Technologies, private limited, and been working with that particular product since last three years. And predominantly the responsibilities includes like we manage all the software side development to development lifecycle to DevOps and all those sort of things. And that's pretty much the intro. So then let's start with our talk. So talking about the topic that I have chosen, like the connecting with dots. So why connecting with dots? Firstly, we work with IoT, so it has multiple nodes and we are connecting all those nodes to one another. So at the end we are connecting multiple dots to one another. But ironically, that has nothing to do with the talk name that I have worked, that I have figured it out. So it's due to because growing up I was fascinated by all the things that have been done and how Steve Jobs used to present all those things. And so in one of his speeches at Stanford University, he talked about how easy is to connect the dots or the incidents that happened with you in the past than to connect anything that in the future. So this is a talk about my experiences and how we have evolved through it and what sort of experiences and learnings we had while improving our own product. And that's why I'm connecting all those dots, looking backward. And that's why the name is connecting with dots. Now, what is a simplismart? So simply smart is an end to end water management tool. And in simple words, I would say you can predominantly look out from the motto that we have making simple things smart. So what we did, what we revolutionized, or I would say like the game changing factor that we have done, we have taken a simple thing, simple as water and we have made it smart. So how we did that, so what we did, we have given an end to end water management tool all the way from the hardware itself to all the way managing all the things remotely using a mobile app and admin tools for numerous residential as well as commercial projects, which in result save a huge amount of water at the end and which is our main goal while growing up, to save water. Yeah, so let's connect the dots then, starting with. Yeah, so this is just a gif to show what we are connecting here. Now, let's start with the first one. That is the time series data. So while working with all the people who have worked any way or any fashion with the IoT sort of things, we all know that we work with time series data. So what is the time series data? So whatever that is periodically given to us, or I would say like whatever it is dependent or the time is the main constraint in this case. So in our case, considering water meters, so all the readings from the meters that we have received at the cloud level and all those things is a time series data from which we can perform any sort of analysis and all those sort of things. Now moving ahead, what are the challenges that we have faced which led to this sort of talk and all those things? So while growing, the first challenges that we faced dealing with all the maintenance related stuff, and we always have that fact, like every day if your end user is releasing a thing like we have an issue with this sort of hardware, and then your hardware person is going like this, that doesn't sound good, right? And growing up at the start, it's kind of easy thing to manage, simple scale of the meters, but once you grow from that hardware account from 1000 to 10,000 to 10,000 to 40,000 to 50,060, 70, 80,000, it's quite hard to manually doing those things. So that was a bigger challenge that we have seen in the past like four or five years working with simply smart. Now the second part is abnormal utilization or usage. So this is why I did terminalize it as abnormal usage, because since we are working with water, so I'm considering that as a terminology to consider usage. Now you can see like if there is abnormal usage, then everybody will be as frightened and as surprised as this guy is. And we actually want to save water. So we need to resolve those abnormal usages and all those Sort of things so that water can be saved properly. So now what's the solution to these problems? And always in this today's era, every solution drilled down to the first thing that we came in our mind is that machine learning? Yeah. Yes, that's the solution we got. But we are like our most complete system is designed in Ruby. Like all the end user handling and all those sort of things are designed in Ruby. So is it machine learning with Ruby? But is it really possible to perform machine learning sort of application using Ruby? Because we all know that there are so many popular languages, I would say programming languages moving ahead, like Python R, even Javascript has few things. And when I tried searching about it, like can we do machine learning in Ruby? At the start it was like in top ten, I didn't find any ruby related sort of, nobody talked about it. Like we can achieve this thing using Ruby. So then I can search a lot. And from one of the conferences I got a contact of this guy, Andrew Kane. Yes, somebody did work in Ruby to perform certain sort of machine learning in Ruby natively in Ruby itself. So that somebody who is good with Ruby and some sort of mathematics related background, we can solve these sort of problems using simple solutions, using ruby itself. So you can definitely look at this guy, that is Andrew Kane. He has done a lot of work in these sort of things and as well as some sort of postgres sort of optimization as well as some graphical analysis. And all those things you do look at. So you can see here like there is a blazer, Ruby, gem gem or a package which works on business intelligence made simple. So he has solved smaller problems using simple solutions. And it's inheritedly what we are doing with the water meters, making simple things smart as possible. So let's look at some few packages that we going ahead before going to the actual solution and how we solve those problems. So moving ahead. The first one is Nemo array. So we all in some sort of way worked with numpy because we all know machine learning drill downs to mathematics at the end. Yeah, we all can agree on that fact. So I thought like how we can do similar stuff, like similar to numpy, some sort of matrix multiplications as well as some sort of scientific calculations and all those things in natively in Ruby. Whereas while looking at Andrew Kane's profile and other solutions as well, this was one of the package that I have found that is called pneumo array, which can easily perform all those sort of scientific level or mathematical level calculations easily similar to numpy and other similar sort of libraries in Python and R as well. Now you might be wondering like the screenshot that I have added here is some sort of looking like a Jupyter notebook, which is first kind of predominantly used for Python and how I'm running the ruby code in a Jupyter notebook. So for that also there is a gem called irubi which helps us in easily configuring these things to use a ruby kernel instead of a Python kernel. Do look that out as well. Yeah. Now comes the part how to handle the huge data, because we all know that machine learning and all those things work, do need like a huge amount of data to process and to work around with. And in Python that sort of work is handled using data frames, using pandas while moving ahead. So I wanted a strong solution, or I would say like performance related optimized solution to move ahead and to get the data to process the data efficiently. So I said like is there any library to go ahead with data frames? Now when I searched about it, I found many search outer solutions. But out of that, Rover is one of the, I would say like most optimized one and predominantly written in Ruby itself. So I'm always emphasizing on written Ruby because there has been a solution earlier that they were made wrappers around it, the initial libraries and all those things, and they have made some sort of copies of those Python libraries in Ruby itself. But whenever I try to override any sort of applicability and the solution to my custom solutions, I always have like it's written in Python. So inherently, whatever I'm inheriting, or I would say like rewriting it around, it's in Ruby. So how that will get an optimized solution? Because I can't override all the things, because every language has its own power and own sort of way of doing things. So by having a plane like something which is written, I would say like natively in Ruby, that helps me a lot to override the things, to optimize the things in a different level fashion. Now, firstly we had a mathematical level calculation, secondly, how to handle the data. Now comes the part how to plot it out. Yeah. So before plotting it out, we all know that we wanted, while growing with scripting sort of languages, we all know that we want some sort of solutions which are already there and optimized way in Python. It's kind of had a scikitlearn sort of option where easily you can within like six, seven to eight lines of code, you can train a small level of applications and do those things. In here there is a ruml. So you can see from the code that what we are doing, we are loading a sample data set similar to that we used to do in Python as well. Then we have transformed it using components and randomly detecting it. And then transform is nothing but like we are loading a model into it. Now we have fit that particular, all those samples to a transformer, then we fit it and then we have trained it with SVC linear model, so support vector kind of classification level model. And we have again fitted the transform data to that particular model. And then we opened it. Now comes the part why do we need like file open and we are dumping it somewhere and all those things? Because what I believe that we always can't just train all the things at runtime and then gives the solution. So we need to kind of pickle it out or dump somewhere that model which is trained on huge amount of data. So we are dumping it in the first block, I would say code block. Then we are again loading that particular, like we are loading the test related data. Then we are loading the classifier from the files, loading it out. Transformers as well as classifiers. Then we are trying to predict it. After predicting, you can see like we are getting some sort of 98% sort of accuracy here. So it can differ from different sort of application, different sort of solutions, or different sort of problem statements. But we are getting it done with not more than like, I would say ten to twelve lines of a code. Yeah. Ironically now this has been added like considering some indian sort of audience. So there is a package named Daru in Ruby, which works with data analysis. With Ruby. And whereas in hindi slang or in indian slang, it kind of represents a liquor. So it's kind of an ironical situation or a laughable situation that some sort of data analysis sort of package is named like a liquor. Yeah. Now there are so many other applications, there are some other references as well, like some sort of textual. If you want to perform some sort of textual analysis, you can go with fast text. If I want to do plotting sort of things, these two are there, nya plot and Vega. There are so many others as well. Do look it out and do explore it more and do contribute to it. Now moving ahead with our own problems that we have really faced, like challenges and all those things. Now the solution is anomaly detection. So we all know that whatever, like if that particular reading is behaving resulting into some sort of anomalies, there may be some sort of problem with it. IoT can be related to a maintenance related problem. IoT can be related to abnormal utilizations and all those things. But the solution is anomaly detecting and to do that. So let's just understand what sort of anomalies detection and how those are impacting the water related solutions. Or I would say like Iot related solutions. Moving ahead. The first one is point outliers. So I just added a graphical scenario so that everybody can understand. So in this case you can see like we are having some sort of point level outliers here. And you can see. So now in terms of Iot solution where this sort of anomalies can happen. So considering a water level solution, there can be some power failures. And due to that there can be some junk reading gets sent and we can see like that sort of problems we can detect. And if that sort of problems are continuously happening regularly within a day, there are two, three these sort of outliers. Then we need to check the support level system which is supporting like electrical system and other systems which are supporting the hardware that we are actually installed. At the site. Now comes the part, subsequent anomalies. So now these are the two types of subsequent anomalies. Like if you are working with single time series property or single time series sort of data that we are going at, then you can get univarried time series sort of data. And you can see here we are having some sort of outliers. There is one outlier, there is an o, two outlier, and it's kind of subsequent, it's not just a single outlier, single point outlier. And now the case happens, like if you get this sort of anomalies here, what does it depict? Because at the end it's all about what analytics we are getting out of it. So from these we can understand, like there is some issue with the meter because it's continuously, it's fluctuating up and down, up and down, up and down. If there's a subsequent anomalies here, in this case we can say like there might be some sort of issue with the hardware itself, not the supporting system that we are providing to it. And from that we can inform a hardware personnel proactively to go and check. So that kind of solve these first two can solve a proactive maintenance sort of case. Now comes the part, the abnormal utilization sort of part, how we can solve that. So to solve that abnormal utilization part, let's just first detect these sort of outliers. Then we can discuss about that thing. So in here we are using profit gem. There is a library named profit with Google, and they have made a similar sort of solution in Ruby. Easily detect some sort of basic outliers to it, so that we don't need to do all those things, because at the end it's all about solving problem efficiently in a lesser amount of time. So to do that, this is actually helping us. And in this, you can see what we are doing here. We are just iterating over the data that we are getting to get it a series level data from the CSV that we are providing to this particular solution. And then we are asking profit to detect the anomalies with this series. And now we can see it detecting the anomalies with the x and y coordinates to it from the values as well as other time series related time data. And using this sort of simple solution, we can detect the point outliers as well as the multivariate, single variate, multivariate outliers as well. And we can easily solve the proctimal maintenance related problems in a huge IoT systems as well. Now comes the part let's plot the data. Why did the plot that? Because it's just showing like some sort of scientific level data. I'm just plotting. And you can see like there is some issue in this particular time frame that we are getting an outlier. Now comes the part how to detect abnormal utilization. So we can do it by detection, by forecasting. It's a very popular method that in a machine learning what we do, we always forecast things. We always predict the things like how things should be, how things can be and all those sort of things. And doing that, what we have observed that in our solution, like water level solution, we can't just go ahead and see is there any spiked and all those sort of things, since it's a continuously increasing thing. So what we can do in this sort of solution, we can predict what could be the next possible value. Like from the trends, from the past learnings, like how the meter is consuming, how that particular resident or some sort of commercial project is consuming. Like let's say they're continuously consuming some sort of odd between, like in the morning, ten to twelve, they're continuously consuming some sort of 1000 liter, whereas in afternoon there is some dry period where they are just consuming 200 to 300 liters and afterwards in the evening, again, that's a rise. So this is a pattern that we are learning. We are asking our machine learning model to learn and then to detecting some sort of outliers from that. Like if suddenly at the afternoon it's showing some sort of 1200 or 1300 liters, then it's an abnormal case and which will say, like there might be some leakages or somebody, like there is a continuous consumption due to some tap open and somebody forgot to close the tap. And to save the water, we need to proactively alert the resident or a commercial, whoever is responsible to the thing and to solve that thing, we have detection by forecasting. So you can see here from the graph as well, like the black dots are the patterns which are acceptable patterns, acceptable forecasted values, like the forecasted values. The green one is kind of forecasting pattern to that thing. And whereas the red ones are like actual values which are out of the forecasted values. So that we can say, like in this particular problem, particular time frames, there is some sort of problems in our system or somebody needs to go and check that the utilization is happening properly or not. Now how to do that? So as I illustrated, we need to use those applications as well. So in this case, you can see I'm having a time frame and a simple one level reading. So we are going with single variate. Firstly, we have loaded the data from CSV using Rover to get it in a data frame sort of application. Then what we did, we have loaded the profit model from profit. Then we have fit the complete data, like fit it's in terms like training it with the data that we have loaded. Then what we are doing, we are performing, we are asking it to forecast it. And then we are asking. So firstly, we are asking him to get some future data frames because we want to have some sort of fixed parodicity in this one. Then we are asking our model to predict it for those future dots. Now let's see what it forecasts. So you can see here that the line that we are seeing is a forecasting line, whereas the black dots are actual consumption. From that we can see the dots that are quite slightly away from the forecasted line. We can see like there is some sort of abnormal consumption in that line. So in this particular data, it's continuously increasing data. There might be some cases whenever you are having some sort of hyperbolic data as well and you can detecting it using that as well. Now let's summarize what we have discussed in this particular talk, a short talk I would say. Firstly, we have discussed what is the time series data and how it impacts most of the Iot solutions that we work with. Now comes the thing, the challenges that we have faced and most of the people who are working with IoT related solution do have faced these sort of problems in one or the other way. Now comes the part how we can solve this using machine learning with Ruby, not with just normal machine learning problem and solving it using the traditional python related solution. And now the last one, how we used anomaly detection technique to solve our challenges. Thank you. You have been a great audience, like you are listening and you have washed it out now. Thank you. So you can do visit our website like simply motor deck to understand more like what we do and how we do. And you can connect to me on X at Vishwa Desurukar and let's move things over. And I do urge people to do look at these solutions. If you are a ruby developer and a budding ruby developer as well, you can do look out these packages and do contribute to it. That will help a lot to our community. Now. Thank you. Con 42 Internet of things to speak and to share my knowledge and whatever I have. I would say like my experiences over the period. And yeah, for all the guys, do enjoy the solutions and do connect with me if you have any if you are interested in knowing how all the other things we do from the end to end solutions to all the other things using different sort of architectures. Thank you.
...

Vishwajeetsingh Desurkar

Principal Engineer @ SimplySmart Technologies

Vishwajeetsingh Desurkar's LinkedIn account Vishwajeetsingh Desurkar's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)