Conf42 Internet of Things (IoT) 2024 - Online

- premiere 5PM GMT

Predicting and Mitigating Emergency Situations on the Roads: A Data-Driven Approach

Video size:

Abstract

Predicting road emergencies saves lives. In this session, learn how IoT and data science combine to forecast incidents, optimize responses, and enhance road safety. Explore how sensor data, real-time analytics, and ML models can create smarter, safer transportation systems.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
My name is Max. So today we'll talk about predicting and mitigating emergency situations, on the road. So we'll use. Not only transport market examples, we will use specific examples from, railways business. but before we start, let me introduce myself. as of today, I have over 10 years experience in, dot science, machine learning and AI. So I worked in different business sectors, such as, telco, fintech, consulting and the entertainment industry. And recently I've worked in the warm music group as a director of research and analysis, but now I'm a dot science lead in consulting company Metis. So welcome on board. And today we will talk about to how to save environment and how to make money, how to bring additional value for our business. In our case, it was a transport company. It was a. This project was implemented for one of the largest railway companies in Europe and had a big impact, so the results were truly transformative. And of course we will talk about it as well. But, let me Introduce what the approach we used, I and my team, here's what we are going to cover today. the first, I'll introduce the problem we'll set out to solve. Then I'll walk you through our approach and the data we'll used. And, what kind of model we built on this, data. What the problem we have, what a challenge we faced. And, of course we'll dive into the results. actually the problem must sounds like, pretty simple, freight car derailments are happening across. The network of, railways companies at the main point when, we needed to address the problem and, develop and execute, the system which could prevent them by the events. we used, different data, but we needed not only based on data built the predictive model. We needed to. describe and we needed to define the patterns which could help us to prevent the derailment. This problem is not just about avoiding accidents. Of course, it's about saving lives, predicting problems and protecting the environment and saving million of dollars in repair cost. And, but the problem solving plan. And, our approach. looks like, the first one, we gathered historical data and identified, the key target. Of course, in our case, it was a derailments, but overall it could be just a problem. And, Some risks and some, emergency situations on the road. then we assess the quality of the data. Of course, we can't skip this step and to ensure the accuracy. So we developed a model that could predict the probability of right car derailments. And lastly, we evaluated the results. So we built a road map. Of end to end process that delivered real value from start to finish. So that's, the first challenge where we faced. It was the parameters and it was the data and the variable, gathering and the figure out what kind of data we have. But we used, 78 key parameters from various systems ranging from, track conditions to weather, locomotive data and wagon details. But in addition to those, 78 parameters, we used also sensory data, which, was captured on them. track on the wagon. And based on that, not only daily, activities, we've seen, we also seen, actually real time data from, these sensors. And, also we engineered 30 calculated indicators. This include features, average cargo weight. time between repairs and other factors that gave us even deeper insights into their understanding of problems and understanding of their business. data processing was also crucial. So we used standardization for quantitative variables and one hot encoding for our target variables. derailment, ensuring that our model could accurately interpret all of this. And when we figure out what kind of parameters we have, when we process all data, we switch to the next step, exactly model. So here is where we faced a major challenge, class imbalance because of the event of derailments. The event of emergency, it's really rare when we compare that to non events. So only 1 percent of the data represented derailments and then a percent were non events. So we have the two options, of course. Obviously, under sampling or oversampling, but, either reduce the number of non-events, it mean that we could lose, valuable data or we could use a more than, oversampling more. Smart way, use a SMODE. So SMODE is a synthetic, samples for minority class. In our case, it was a derailment. It was a target and, it help us to creating more diversity and improving the models, performance. So it means that it's smarter than just, adding a random, target, right? SMOAT, that's why we use it. So we did try building the model without any oversampling, but the results were poor. That's why SMOAT allowed us to improve the prediction accuracy without compromising data integrity. And, it's far better solution than the basic. Some oversampling because, basic oversampling is just a random oversampling, but small to use a patterns, which, our data, which our parameters, not only target, which our independent parameters consists of. When we implemented the SMOD, when we figured out the problem of, proportions, so we needed to address the step of what kind of model we need to use. So for model algorithms, we choose a random forest. of course, the standard question, why? And, because it's a really powerful model, that, Could help us interpret parameters well and mostly importantly it has less tendency to overfitting of course the many models which we built had a Tendency had a alignment to overfitting because it has a imbalance, but we covered that using SMOLT and we covered that using random forest. So given the assumption that we have the risk of imbalanced data, despite SMOLT, this was the best choice for balancing interpretability and the performance. So based on metrics. on the left side, you could see true positive rate and a false positive rate. So we get really pretty great. results like 80 percent of both, but now let's talk about, ROC, AUC and, PR AUC. Both are crucial for us, but they tell us, different things. ROC AUC show us how well the model differentiates between events and non events overall, which is great for understanding general. performance. As we can see, we had a strong score. It's 0. 91. But with the class imbalance, which we faced in our case, we needed to check that, using PR AUC because it's even more crucial in that case. It focuses specifically on how well the model predicts the minority class. So derailments, In our case, so we could, we had a good results of, PR you see as well, meaning our model excels at identifying events. So looking at both metrics gives us a full picture of the model strings. And when we build the model, here's where the real world applications comes in. So you can see the scroll button. It's probability of derailments in a technical language. It's called a cutoff threshold. So we can manage that based on the available resources. If our organization has a limited resources for inspections, we can set and concentrate. More, on the probability on the high value to only focus on the highest risk freight cars. This way we prioritize those that need attention most urgently, ensuring that resources are used efficiently. So of course, if we had a infinite resources, we could a bit decrease the probability, a bit decrease the threshold. It means that we could check, the wagon, who. Could, fall, but with a less probability than other. And, but the random forest we used, not only because it's a really great algorithms for a lot of data we could use, it's not only, because the, this kind of model, this kind of approach. really is working good with, imbalance, but, one of the key reasons we use random forest was to understand the Gini impurity, which helps us see how different variables impact, derailments. So simply. Talking, we just needed to use a variable importance. So for example, winter season shows a strong influence, but that's something we can't control. On the other hand, factors like a wagon type or cargo weight or the material of the track section are things we can't control and we needed focused on that. Allow the business to focus on area where changes will have the greatest impact. Even though we've identified key variables. These are not full risk profiles. It's just the importance of our variable, which we put into the model, but we needed to move beyond this to understand exactly which combination of factors lead to derailments to prevent this derailment. So we needed to get a scenarios. We needed to get a patterns, which could get to derailments of our. So to do that, to create the actionable risk profiles for business units, we fed the probabilities from our random forest model into the decision tree. So it looked like, just an ensemble of a model, but this was a critical because the decision tree offers clear interpretable results, which random forest does not offer. This showed us exactly how variables combine to create high risk scenarios, allowing for precise targeted interventions. Now we had a, not just the predictions, but, insights into the why and how derailments were likely to happen. So how we can read that, how we can interpret that. So from that profile, from a random forest probability, which we could put in the decision tree. we could get the next risk profiles that's let's consider the one example. So if the number of wagons is below 38. 5, the year since issuance are less than 51. 5 and the speed rate on the last section is below 1. 0. There is a, that's mean that our wagon get to fall with a 94. 5 probability of derailment. So that's why we need it to get, and this is what the company needs to prevent derailment. So not only probability, but clear actionable patterns. And finally, here's the, our results. First, we achieved an 80 percent of. reduction in accidents, fewer accidents, of course, it means less downtime and a safer, more efficient transportation network. Secondly, we help minimize environment risks. Derelements often lead to significant environmental damage, of course, but by predicting, preventing them, we've protecting both the company and the And, environment, of course, we, it's really complex to translate that, into money it's, but we try to address not only quantitative purposes, we try to reach the aim of, qualitative purposes. Approach and qualitative KPI as well. And third, we saved the company 12 million a year. This savings came from reducing the need for emergency repairs and ensuring insurance payout. So if you have any questions, please reach me out on LinkedIn. And, I hope that, this Short presentation was a bit inspiration for how IOT and data driven insights can not only be buzzwords, but it helped to predict challenges, but actively and actively create safer, smarter roads for everyone. Hopefully together we could reshape, this industry and the future of, connected transportation. But, by the way, today we'll talk about not only future, we talked about, the present where we could already implement and execute all AI applications. Thank you.
...

Maksim Kariagin

Data Science Lead @ Metyis

Maksim Kariagin's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)