Transcript
This transcript was autogenerated. To make changes, submit a PR.
My name is Max.
So today we'll talk about predicting and mitigating
emergency situations, on the road.
So we'll use.
Not only transport market examples, we will use specific
examples from, railways business.
but before we start, let me introduce myself.
as of today, I have over 10 years experience in, dot
science, machine learning and AI.
So I worked in different business sectors, such as, telco, fintech,
consulting and the entertainment industry.
And recently I've worked in the warm music group as a director of research
and analysis, but now I'm a dot science lead in consulting company Metis.
So welcome on board.
And today we will talk about to how to save environment and
how to make money, how to bring additional value for our business.
In our case, it was a transport company.
It was a.
This project was implemented for one of the largest railway companies in
Europe and had a big impact, so the results were truly transformative.
And of course we will talk about it as well.
But, let me Introduce what the approach we used, I and my team, here's
what we are going to cover today.
the first, I'll introduce the problem we'll set out to solve.
Then I'll walk you through our approach and the data we'll used.
And, what kind of model we built on this, data.
What the problem we have, what a challenge we faced.
And, of course we'll dive into the results.
actually the problem must sounds like, pretty simple, freight car
derailments are happening across.
The network of, railways companies at the main point when, we needed
to address the problem and, develop and execute, the system which
could prevent them by the events.
we used, different data, but we needed not only based on data
built the predictive model.
We needed to.
describe and we needed to define the patterns which could help
us to prevent the derailment.
This problem is not just about avoiding accidents.
Of course, it's about saving lives, predicting problems and
protecting the environment and saving million of dollars in repair cost.
And, but the problem solving plan.
And, our approach.
looks like, the first one, we gathered historical data and
identified, the key target.
Of course, in our case, it was a derailments, but overall
it could be just a problem.
And, Some risks and some, emergency situations on the road.
then we assess the quality of the data.
Of course, we can't skip this step and to ensure the accuracy.
So we developed a model that could predict the probability of right car derailments.
And lastly, we evaluated the results.
So we built a road map.
Of end to end process that delivered real value from start to finish.
So that's, the first challenge where we faced.
It was the parameters and it was the data and the variable, gathering and the
figure out what kind of data we have.
But we used, 78 key parameters from various systems ranging
from, track conditions to weather, locomotive data and wagon details.
But in addition to those, 78 parameters, we used also sensory
data, which, was captured on them.
track on the wagon.
And based on that, not only daily, activities, we've seen,
we also seen, actually real time data from, these sensors.
And, also we engineered 30 calculated indicators.
This include features, average cargo weight.
time between repairs and other factors that gave us even deeper insights
into their understanding of problems and understanding of their business.
data processing was also crucial.
So we used standardization for quantitative variables and one hot
encoding for our target variables.
derailment, ensuring that our model could accurately interpret all of this.
And when we figure out what kind of parameters we have, when we
process all data, we switch to the next step, exactly model.
So here is where we faced a major challenge, class imbalance because
of the event of derailments.
The event of emergency, it's really rare when we compare that to non events.
So only 1 percent of the data represented derailments and
then a percent were non events.
So we have the two options, of course.
Obviously, under sampling or oversampling, but, either reduce the
number of non-events, it mean that we could lose, valuable data or we could
use a more than, oversampling more.
Smart way, use a SMODE.
So SMODE is a synthetic, samples for minority class.
In our case, it was a derailment.
It was a target and, it help us to creating more diversity and
improving the models, performance.
So it means that it's smarter than just, adding a random, target, right?
SMOAT, that's why we use it.
So we did try building the model without any oversampling,
but the results were poor.
That's why SMOAT allowed us to improve the prediction accuracy
without compromising data integrity.
And, it's far better solution than the basic.
Some oversampling because, basic oversampling is just a random
oversampling, but small to use a patterns, which, our data, which our
parameters, not only target, which our independent parameters consists of.
When we implemented the SMOD, when we figured out the problem of, proportions,
so we needed to address the step of what kind of model we need to use.
So for model algorithms, we choose a random forest.
of course, the standard question, why?
And, because it's a really powerful model, that, Could help us interpret parameters
well and mostly importantly it has less tendency to overfitting of course the many
models which we built had a Tendency had a alignment to overfitting because it has a
imbalance, but we covered that using SMOLT and we covered that using random forest.
So given the assumption that we have the risk of imbalanced data, despite SMOLT,
this was the best choice for balancing interpretability and the performance.
So based on metrics.
on the left side, you could see true positive rate and a false positive rate.
So we get really pretty great.
results like 80 percent of both, but now let's talk about, ROC, AUC and, PR AUC.
Both are crucial for us, but they tell us, different things.
ROC AUC show us how well the model differentiates between events
and non events overall, which is great for understanding general.
performance.
As we can see, we had a strong score.
It's 0.
91.
But with the class imbalance, which we faced in our case, we needed to
check that, using PR AUC because it's even more crucial in that case.
It focuses specifically on how well the model predicts the minority class.
So derailments, In our case, so we could, we had a good results of,
PR you see as well, meaning our model excels at identifying events.
So looking at both metrics gives us a full picture of the model strings.
And when we build the model, here's where the real world applications comes in.
So you can see the scroll button.
It's probability of derailments in a technical language.
It's called a cutoff threshold.
So we can manage that based on the available resources.
If our organization has a limited resources for inspections,
we can set and concentrate.
More, on the probability on the high value to only focus on
the highest risk freight cars.
This way we prioritize those that need attention most urgently, ensuring
that resources are used efficiently.
So of course, if we had a infinite resources, we could a bit decrease the
probability, a bit decrease the threshold.
It means that we could check, the wagon, who.
Could, fall, but with a less probability than other.
And, but the random forest we used, not only because it's a really great
algorithms for a lot of data we could use, it's not only, because the, this
kind of model, this kind of approach.
really is working good with, imbalance, but, one of the key reasons we use
random forest was to understand the Gini impurity, which helps us see how
different variables impact, derailments.
So simply.
Talking, we just needed to use a variable importance.
So for example, winter season shows a strong influence, but
that's something we can't control.
On the other hand, factors like a wagon type or cargo weight or the material of
the track section are things we can't control and we needed focused on that.
Allow the business to focus on area where changes will have the greatest impact.
Even though we've identified key variables.
These are not full risk profiles.
It's just the importance of our variable, which we put into the model, but we needed
to move beyond this to understand exactly which combination of factors lead to
derailments to prevent this derailment.
So we needed to get a scenarios.
We needed to get a patterns, which could
get to derailments of our.
So to do that, to create the actionable risk profiles for business units, we
fed the probabilities from our random forest model into the decision tree.
So it looked like, just an ensemble of a model, but this was a critical
because the decision tree offers clear interpretable results, which
random forest does not offer.
This showed us exactly how variables combine to create high
risk scenarios, allowing for precise targeted interventions.
Now we had a, not just the predictions, but, insights into the why and how
derailments were likely to happen.
So how we can read that, how we can interpret that.
So from that profile, from a random forest probability, which
we could put in the decision tree.
we could get the next risk profiles that's let's consider the one example.
So if the number of wagons is below 38.
5, the year since issuance are less than 51.
5 and the speed rate on the last section is below 1.
0.
There is a, that's mean that our wagon get to fall with a 94.
5 probability of derailment.
So that's why we need it to get, and this is what the company
needs to prevent derailment.
So not only probability, but clear actionable patterns.
And finally, here's the, our results.
First, we achieved an 80 percent of.
reduction in accidents, fewer accidents, of course, it means
less downtime and a safer, more efficient transportation network.
Secondly, we help minimize environment risks.
Derelements often lead to significant environmental damage, of course, but
by predicting, preventing them, we've protecting both the company and the
And, environment, of course, we, it's really complex to translate that, into
money it's, but we try to address not only quantitative purposes, we try to
reach the aim of, qualitative purposes.
Approach and qualitative KPI as well.
And third, we saved the company 12 million a year.
This savings came from reducing the need for emergency repairs
and ensuring insurance payout.
So if you have any questions, please reach me out on LinkedIn.
And, I hope that, this Short presentation was a bit inspiration for how IOT and
data driven insights can not only be buzzwords, but it helped to predict
challenges, but actively and actively create safer, smarter roads for everyone.
Hopefully together we could reshape, this industry and the future
of, connected transportation.
But, by the way, today we'll talk about not only future, we talked about, the
present where we could already implement and execute all AI applications.
Thank you.