Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good morning. Hello. My name is Pavos
Kripek, and today with Anna Warno, we will tell a
little bit more about the time series forecasting.
We will focus on advanced ensembling methods.
As you probably know, we are working with the time series
for quite a long time using the
most advanced method, and today we want to briefly tell
about ensembling. So why the
ensembling is important? Why to use the ensembling?
How the ensembling is used by the leading forecasting
methods. I will start with the brief
introduction with some slides to introduce the topic and
to provide a little bit of theoretical background.
And then Anya will make a hands on session and
present the different ensembling methods.
The results on the real data sets using different ensembling
method. And Anya also will present
the advanced ensembling method based on the
machine learning algorithms neural networks.
The methods has been prepared by her.
So it's a very interesting and unique approach
for the ensembling. Okay, so starting
from the beginning, what is ensembling? Ensembling is
just a simply used outcome from the multiple forecasters
to achieve better accuracy of prediction,
better robustness and stability of the results.
It's nothing new. The ensembling has been used
for the very long time.
Mostly it was based in the simple form. I will present
that, but also
there were present some additional more advanced
algorithms of assembling. I will tell a bit
more about that in the presentation.
Why disassembly is important, as you know,
because we already told about that on the previous editions
of the data Science Summit.
The real breakthrough in the time series
forecasting has been achieved during
the M four and M five competition. It is
probably the most prestigious competition related
to time serious forecasting organized by
the professor Sparrows Macridiakis,
the legend in the time series forecasting.
It is very interested that both winning
method for the first and for the second place on
the M four were using ensembling
in the different way, but both in the very innovative way.
In my presentation, I will briefly say about the winning method,
about the ES hybrid, how the ES hybrid
used ensembling, and also how the ensembling
was extended in the NBeats method. It is
some kind of the successor or
next method related to the time series
forecasting, and in the hands on session,
Anya will tell more about other methods and
show how it works live.
The most important thing about ensembling is
that improves the accuracy and also the
stability, because having multiple ensembles,
multiple forecasters allow to avoid some
negative effect of the individual forecaster
in a particular period of the time.
As I said, the winning methods from the m four and
m five competition use very heavily the
ensembling. What are the types of the
ensembling? There are some, let's say simple
types of the ensembling, like voting, stacking,
bugging and combination of ensembling.
So it simply means that, for example, in the
voting, we are selecting the prediction which
is provided by most of the forecasters,
because assembling, of course could be used for the
regression and classification problem. Voting is of course
useful for the classification problem. And for
example, if we want to use voting, then we are
selecting the class which is
most voted. So most of the forecasters
predict that class. Of course, we can use additional rules
for that, like some
kind of the majority rules, or to
exclude the forecasters with no predictions
and so on. But general idea is quite
simple. It is also very similar for the regression. We can
use the average of the prediction. Of course we
can add additional rules like removing outliers and
so on, and other combination.
That's the first category of the ensembling. So this simple ensembling,
and there are also advanced ensembling approaches,
about two of them I will tell a little bit more
describing the methods, and one
even more advanced will be presented by Anya during
the hands on session.
Very important thing is that these ensembling methods
are currently very extensively developed.
So new concepts, new ideas appearing.
It looks that it's a very useful approach
for the time series forecasters.
How the assembling was used in the ES hybrid
method, it is the winning methods from the m four competition.
In this methods there are many innovative solutions or
concepts. I told about that more
on my previous session on
previous edition of data Science summit.
But from the ensembling perspective,
the ES hybrid method uses very innovative
approach for the ensembling. So grouping
different models to the particular time series.
So during the training process, each model is
rated, the metrics are calculated,
the used metrics is SMEP. The metrics
are calculated and based on the value of the metrics,
each model is assigned to one or more time
series on which the results
are the best. And per each time series, the set of
the models, the best models from the whole training
is keep kept. So after each
of the epoch, the model is evaluated against
all of the time series and assigned to the
best one. Of course, one model could be assigned to the multiple
time series.
Final prediction of the test set is done by
the set of the best model for the given time series
selected during the trade. So that was
a very unique approach to the assembling and
also very successful.
And the second method, which was not used
in the m four competition, it was some kind of
successor. Or the method which claims to
have better results than es hybrid, is the nbeats method.
And the NBIT method go even more with
the assembling. I do not go into the details of
the NBIT method because I described that on the previous edition of
the data science summit. But generally speaking,
from the ensembling point of view, the NBeats
uses ensembled up to 100
different models. So it ensembled 100 different
model, and models are trained
threefold. There are models trained on the subset
of the training data with the different loss function
and with the different horizon. Thanks to
that, the outcome of the prediction
is very diversified.
So it increased the chances
that one single model could give the wrong
prediction, increased the chances that the one wrong
model will not destroy the effect
of the predictions. So in bits, there are up
to 100 different model based for the predictions.
Okay, that's all from my theoretical
part. Just for summary, there is a dynamical
development of the new ensembling method,
and ensembling could give us could improve the efficiency
of the predictions and especially increased
robustness. We are working in
the financial time series forecasters, and for the
financial time series,
ensembling significantly improves accuracy
and robustness. So that's all about the theoretical introduction.
I do my best to keep it short. And now
anya will present hands on session how to
use this ensembling and what are the results in practice.
Thank you very much.
So after the theoretical introduction, I will present
some real case study from our work. I will be talking
about the real time series ensembling on cloud resources
predictions. We have a project
where we are forecasting cloud resources usage,
and based on the predictions we can make a decision whether to
change, for example, number of instances or not. The whole
process looks as follow different models for
time series forecasting are trained independently
of each other, so they are trained
as separated models, and predictions from
each of the trained method is sent with the
same frequency. So for example, every 10 seconds.
In this way we obtain the m predictions, where m is the number of forecasting
methods for each timestamp. But it's
the most convenient to have only one number and make decisions
based on it. So we use
an ensembler that not only returns a more convenient format
of our prediction, but can also correct it for
us. There are several challenges
that we have to face. First of all, as I mentioned before,
the predictions are made in real time, and for this reason there
are often missing predictions. It may happen
because there was some delay or the methods is not ready
for predictions yet.
We also need to ensure the versatility of our solution so
that the final predictions are accurate for different types of applications
and presented metrics and also robust to
poorly predicted forecasters that sometimes happen large
predictions error can lead to bad decisions and be very costly
in cloud computing environment.
And now I will show how we can use ensembling for this problem on
an examples data set our predicted
metric, our target metric is cpu usage and
our data contains 6000 rows and
these 6000 rows are tp
usage predicted by five forecasters
on a test set.
Amongst the forecasters we can find methods
like TFT or nbeats and here
we have plotted real value here and predictions for
each of the forecasting methods. We can zoom and
as we can see, some of the forecasters actually look
quite good. The predictions look accurate,
for example prediction two or predictions six,
but we still try to improve them with ensembling.
We start with fast, easy to implement standard methods.
So of course we have here methods like nave mean
over the best subset. We are taking all possible subsets
and calculate which set of features of
predictions has the best. For example, mean absolute error
on a train set, mean over the best n
methods on Kalas time steps. So it's similar
from the to the previous method, but we are
calculating error only on
the last k steps. You can also find
weights for each forecasters, for example with linear programming.
With linear programming it's easy to impose constraints on
the weights. We want them to be positive and sum to one.
We can also find forecaster weights depending on metric scores.
So the better the results of a method, the more weight we give to
it. And now I will show how
these simple methods work on our data set.
And now I will show how these simple methods work on
our data set.
So we have a simple visualization. Here we
have five predictors, five forecasters and
four of the simple ensembling
methods. The predictions are sent in real time
and assembled predictions are also produced in
real time. In this
table we can see how good are our
methods. We have metrics,
five metrics here, for example, mean absolute error and the
best scores are highlighted with green.
So we can see that linear programming is
the winner method here, however not
its nave mean over best subset is
often the best on
smapy metrics. And here we
can see what's the gain of the best ensembling
method over the best single method single forecaster.
These light green rectangles means that
this specific method achieves the best
scores on this period.
To sum up, ensembling methods achieve
usually better scores than the single forecasters.
However, there is no it's hard
to say which of the Nsemzik methods is the best because
it vary across the data set.
So after the simple methods, it's time to present something more
advanced. We will be training neural networks
with the given input predictions,
historical real values and historical errors.
Each input will consist of two parts,
past and future. In the
past we have predictions error for each of
the forecasting methods,
and by forecasting error I mean real value minus
predicted values. We have also columns
indicating that we are in the past real
value so the original target
metrics, values and time index in
a future. We have predicted values for each of the forecasters.
Columns are indicating that we are in the future
and we
cannot impute real values here because they
are real values here because they
are not known at the moment. And you
also have time index. The train architectures
are fully connected. Neural networks convolutional
neural networks one dimensional conclusion
neural networks one dimensional plus residual connections
convolutional neural networks one dimensional plus attention
and convolutional neural networks one dimensional
plus phase dual connection plus attention and the best methods
were conclusion neural networks one
dimensional plus size dual connections and convolutional
neural networks one dimensional plus size dwell connections plus
attention tension it's hard
to say which of those two top methods
was better as this attention model did not
have a large impact on the performance
in this case. As I mentioned before,
we have issue with missing values
in our problem. So we decided to train
one more model that would be robots to potential future
data gaps for some predictive models.
For this purpose, we randomly removed some columns during training.
Input for such networks looks very similar
to the previous task,
but we are randomly removing some
of the columns from the input. So for example,
here we removed values from the
prediction zero and replaced it with zeros.
And for each of the forecasters method we have mask
which indicates which
of the forecasters methods are not present at the
moment, and the feature has similar
structure. Instead of standard softmax at
the end of neural networks, we used based
Softmax for this problem. The experiments confirmed
that the network trained in this way is immune to potential data
failures and continues to return good results.
How performs an ensembling method based on deep learning
compare to standard techniques.
So we have one more plot here. The best neural network
it was hard to choose the best method from standard ensembled
techniques because it
changed a lot and it was dependent on the part
of the time series we are currently looking at.
We can see here that none of the
method is the best on all of the metrics
and that it also varies across
data set.
But here in this case where we have this
best neural network ensembling method,
it does not change at all.
Deep learning method is constantly the best here
and the improvement in mean absolute error
is significantly better because here we have
around four and here around
here is usually less than one.
And also the plot here looks significantly
better than the plots of other
methods you.
So to sum up, ensembling can help
a lot. Even simple methods can produce satisfactory results.
Ensembling based on neural network may
give much better results but requires also more data to
start because it requires reasonable
amount of data to be trained on and properly
trained neural networks can be resistant to the deficiency of
some predictions and still give better predictions results
than classical methods.