Conf42 Python 2022 - Online

Advanced ensembling techniques for time series forecasting

Video size:

Abstract

Presentation of the advanced ensemble techniques for time series forecasting. The latest state of the art forecasting methods are ensembled using innovative techniques, to improve accuracy and robustness of predictions. During the session we will make an introduction to the forecaster’s ensemble approach and after that we will present a live demo based on real examples.

The Temporal Fusion Transformer and N-Beats forecaster will be ensembled using various techniques and the results will be presented in the live session. In the conclusion we will share some suggestions regarding design of efficient and robust time series forecasting solutions.

Summary

  • Pavos Kripek and Anna Warno will focus on advanced ensembling methods. Anya will present the advanced ensembleling method based on the machine learning algorithms neural networks. The results will be shown on real data sets.
  • Ensembling is just a simply used outcome from the multiple forecasters to achieve better accuracy of prediction. The real breakthrough in the time series forecasting has been achieved during the M four and M five competition. In the hands on session, Anya will tell more about other methods and show how it works live.
  • Real time series ensembling on cloud resources predictions. Project where we are forecasting cloud resources usage. Based on the predictions we can make a decision whether to change. Large predictions error can lead to bad decisions in cloud computing environment.
  • Simple visualization shows how these simple methods work on our data set. To sum up, ensembling methods achieve usually better scores than the single forecasters. However, there is no it's hard to say which of the Nsemzik methods is the best.
  • We train neural networks with the given input predictions, historical real values and historical errors. The best methods were conclusion neural networks and convolutional neural networks. Even simple methods can produce satisfactory results.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning. Hello. My name is Pavos Kripek, and today with Anna Warno, we will tell a little bit more about the time series forecasting. We will focus on advanced ensembling methods. As you probably know, we are working with the time series for quite a long time using the most advanced method, and today we want to briefly tell about ensembling. So why the ensembling is important? Why to use the ensembling? How the ensembling is used by the leading forecasting methods. I will start with the brief introduction with some slides to introduce the topic and to provide a little bit of theoretical background. And then Anya will make a hands on session and present the different ensembling methods. The results on the real data sets using different ensembling method. And Anya also will present the advanced ensembling method based on the machine learning algorithms neural networks. The methods has been prepared by her. So it's a very interesting and unique approach for the ensembling. Okay, so starting from the beginning, what is ensembling? Ensembling is just a simply used outcome from the multiple forecasters to achieve better accuracy of prediction, better robustness and stability of the results. It's nothing new. The ensembling has been used for the very long time. Mostly it was based in the simple form. I will present that, but also there were present some additional more advanced algorithms of assembling. I will tell a bit more about that in the presentation. Why disassembly is important, as you know, because we already told about that on the previous editions of the data Science Summit. The real breakthrough in the time series forecasting has been achieved during the M four and M five competition. It is probably the most prestigious competition related to time serious forecasting organized by the professor Sparrows Macridiakis, the legend in the time series forecasting. It is very interested that both winning method for the first and for the second place on the M four were using ensembling in the different way, but both in the very innovative way. In my presentation, I will briefly say about the winning method, about the ES hybrid, how the ES hybrid used ensembling, and also how the ensembling was extended in the NBeats method. It is some kind of the successor or next method related to the time series forecasting, and in the hands on session, Anya will tell more about other methods and show how it works live. The most important thing about ensembling is that improves the accuracy and also the stability, because having multiple ensembles, multiple forecasters allow to avoid some negative effect of the individual forecaster in a particular period of the time. As I said, the winning methods from the m four and m five competition use very heavily the ensembling. What are the types of the ensembling? There are some, let's say simple types of the ensembling, like voting, stacking, bugging and combination of ensembling. So it simply means that, for example, in the voting, we are selecting the prediction which is provided by most of the forecasters, because assembling, of course could be used for the regression and classification problem. Voting is of course useful for the classification problem. And for example, if we want to use voting, then we are selecting the class which is most voted. So most of the forecasters predict that class. Of course, we can use additional rules for that, like some kind of the majority rules, or to exclude the forecasters with no predictions and so on. But general idea is quite simple. It is also very similar for the regression. We can use the average of the prediction. Of course we can add additional rules like removing outliers and so on, and other combination. That's the first category of the ensembling. So this simple ensembling, and there are also advanced ensembling approaches, about two of them I will tell a little bit more describing the methods, and one even more advanced will be presented by Anya during the hands on session. Very important thing is that these ensembling methods are currently very extensively developed. So new concepts, new ideas appearing. It looks that it's a very useful approach for the time series forecasters. How the assembling was used in the ES hybrid method, it is the winning methods from the m four competition. In this methods there are many innovative solutions or concepts. I told about that more on my previous session on previous edition of data Science summit. But from the ensembling perspective, the ES hybrid method uses very innovative approach for the ensembling. So grouping different models to the particular time series. So during the training process, each model is rated, the metrics are calculated, the used metrics is SMEP. The metrics are calculated and based on the value of the metrics, each model is assigned to one or more time series on which the results are the best. And per each time series, the set of the models, the best models from the whole training is keep kept. So after each of the epoch, the model is evaluated against all of the time series and assigned to the best one. Of course, one model could be assigned to the multiple time series. Final prediction of the test set is done by the set of the best model for the given time series selected during the trade. So that was a very unique approach to the assembling and also very successful. And the second method, which was not used in the m four competition, it was some kind of successor. Or the method which claims to have better results than es hybrid, is the nbeats method. And the NBIT method go even more with the assembling. I do not go into the details of the NBIT method because I described that on the previous edition of the data science summit. But generally speaking, from the ensembling point of view, the NBeats uses ensembled up to 100 different models. So it ensembled 100 different model, and models are trained threefold. There are models trained on the subset of the training data with the different loss function and with the different horizon. Thanks to that, the outcome of the prediction is very diversified. So it increased the chances that one single model could give the wrong prediction, increased the chances that the one wrong model will not destroy the effect of the predictions. So in bits, there are up to 100 different model based for the predictions. Okay, that's all from my theoretical part. Just for summary, there is a dynamical development of the new ensembling method, and ensembling could give us could improve the efficiency of the predictions and especially increased robustness. We are working in the financial time series forecasters, and for the financial time series, ensembling significantly improves accuracy and robustness. So that's all about the theoretical introduction. I do my best to keep it short. And now anya will present hands on session how to use this ensembling and what are the results in practice. Thank you very much. So after the theoretical introduction, I will present some real case study from our work. I will be talking about the real time series ensembling on cloud resources predictions. We have a project where we are forecasting cloud resources usage, and based on the predictions we can make a decision whether to change, for example, number of instances or not. The whole process looks as follow different models for time series forecasting are trained independently of each other, so they are trained as separated models, and predictions from each of the trained method is sent with the same frequency. So for example, every 10 seconds. In this way we obtain the m predictions, where m is the number of forecasting methods for each timestamp. But it's the most convenient to have only one number and make decisions based on it. So we use an ensembler that not only returns a more convenient format of our prediction, but can also correct it for us. There are several challenges that we have to face. First of all, as I mentioned before, the predictions are made in real time, and for this reason there are often missing predictions. It may happen because there was some delay or the methods is not ready for predictions yet. We also need to ensure the versatility of our solution so that the final predictions are accurate for different types of applications and presented metrics and also robust to poorly predicted forecasters that sometimes happen large predictions error can lead to bad decisions and be very costly in cloud computing environment. And now I will show how we can use ensembling for this problem on an examples data set our predicted metric, our target metric is cpu usage and our data contains 6000 rows and these 6000 rows are tp usage predicted by five forecasters on a test set. Amongst the forecasters we can find methods like TFT or nbeats and here we have plotted real value here and predictions for each of the forecasting methods. We can zoom and as we can see, some of the forecasters actually look quite good. The predictions look accurate, for example prediction two or predictions six, but we still try to improve them with ensembling. We start with fast, easy to implement standard methods. So of course we have here methods like nave mean over the best subset. We are taking all possible subsets and calculate which set of features of predictions has the best. For example, mean absolute error on a train set, mean over the best n methods on Kalas time steps. So it's similar from the to the previous method, but we are calculating error only on the last k steps. You can also find weights for each forecasters, for example with linear programming. With linear programming it's easy to impose constraints on the weights. We want them to be positive and sum to one. We can also find forecaster weights depending on metric scores. So the better the results of a method, the more weight we give to it. And now I will show how these simple methods work on our data set. And now I will show how these simple methods work on our data set. So we have a simple visualization. Here we have five predictors, five forecasters and four of the simple ensembling methods. The predictions are sent in real time and assembled predictions are also produced in real time. In this table we can see how good are our methods. We have metrics, five metrics here, for example, mean absolute error and the best scores are highlighted with green. So we can see that linear programming is the winner method here, however not its nave mean over best subset is often the best on smapy metrics. And here we can see what's the gain of the best ensembling method over the best single method single forecaster. These light green rectangles means that this specific method achieves the best scores on this period. To sum up, ensembling methods achieve usually better scores than the single forecasters. However, there is no it's hard to say which of the Nsemzik methods is the best because it vary across the data set. So after the simple methods, it's time to present something more advanced. We will be training neural networks with the given input predictions, historical real values and historical errors. Each input will consist of two parts, past and future. In the past we have predictions error for each of the forecasting methods, and by forecasting error I mean real value minus predicted values. We have also columns indicating that we are in the past real value so the original target metrics, values and time index in a future. We have predicted values for each of the forecasters. Columns are indicating that we are in the future and we cannot impute real values here because they are real values here because they are not known at the moment. And you also have time index. The train architectures are fully connected. Neural networks convolutional neural networks one dimensional conclusion neural networks one dimensional plus residual connections convolutional neural networks one dimensional plus attention and convolutional neural networks one dimensional plus phase dual connection plus attention and the best methods were conclusion neural networks one dimensional plus size dual connections and convolutional neural networks one dimensional plus size dwell connections plus attention tension it's hard to say which of those two top methods was better as this attention model did not have a large impact on the performance in this case. As I mentioned before, we have issue with missing values in our problem. So we decided to train one more model that would be robots to potential future data gaps for some predictive models. For this purpose, we randomly removed some columns during training. Input for such networks looks very similar to the previous task, but we are randomly removing some of the columns from the input. So for example, here we removed values from the prediction zero and replaced it with zeros. And for each of the forecasters method we have mask which indicates which of the forecasters methods are not present at the moment, and the feature has similar structure. Instead of standard softmax at the end of neural networks, we used based Softmax for this problem. The experiments confirmed that the network trained in this way is immune to potential data failures and continues to return good results. How performs an ensembling method based on deep learning compare to standard techniques. So we have one more plot here. The best neural network it was hard to choose the best method from standard ensembled techniques because it changed a lot and it was dependent on the part of the time series we are currently looking at. We can see here that none of the method is the best on all of the metrics and that it also varies across data set. But here in this case where we have this best neural network ensembling method, it does not change at all. Deep learning method is constantly the best here and the improvement in mean absolute error is significantly better because here we have around four and here around here is usually less than one. And also the plot here looks significantly better than the plots of other methods you. So to sum up, ensembling can help a lot. Even simple methods can produce satisfactory results. Ensembling based on neural network may give much better results but requires also more data to start because it requires reasonable amount of data to be trained on and properly trained neural networks can be resistant to the deficiency of some predictions and still give better predictions results than classical methods.
...

Pawel Skrzypek

CEO @ Omphalos Fund

Pawel Skrzypek's LinkedIn account Pawel Skrzypek's twitter account

Anna Warno

Data Scientist @ 7bulls.com

Anna Warno's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)