Bibtex
Cite as text
@Select Types{,
Journal = "Band-1",
Title= "Handling Concept Drifts in Regression Problems – the Error Intersection Approach",
Author= "Lucas Baier, Marcel Hofmann, Niklas Kühl, Marisa Mohr and Gerhard Satzger",
Doi= "https://doi.org/10.30844/wi_2020_c1-baier",
Abstract= "Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time—a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly.
",
Keywords= "Machine learning, Concept drift, Demand Prediction",
}
Lucas Baier, Marcel Hofmann, Niklas Kühl, Marisa Mohr and Gerhard Satzger: Handling Concept Drifts in Regression Problems – the Error Intersection Approach. Online: https://doi.org/10.30844/wi_2020_c1-baier (Abgerufen 26.12.24)
Open Access
Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time—a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly.
Machine learning, Concept drift, Demand Prediction
1. Chen, H., Chiang, R.H.L., Storey, V.C.: Business Intelligence and Analytics: From Big Data To Big Impact. Mis Q. 36, 1165–1188 (2012).
2. Schüritz, R., Satzger, G.: Patterns of Data-Infused Business Model Innovation. In: CBI 2016. pp. 133–142 (2016).
3. Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects, (2015).
4. Baier, L., Kühl, N., Satzger, G.: How to Cope with Change? Preserving Validity of Predictive Services over Time. In: Hawaii International Conference on System Sciences (HICSS-52) (2019).
5. Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. Proc. – 29th int. conf. very large data bases. 81–92 (2003).
6. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996).
7. Tsymbal, A.: The problem of concept drift: definitions and related work. Comput. Sci. Dep.Trinity Coll. Dublin. 4, 2004–15 (2004).
8. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014).
9. Zliobaite, I.: Learning Under Concept Drift: An Overview. arXiv Prepr. (2010).
10. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Discov. 30, 964–994 (2016).
11. Guajardo, J.A., Weber, R., Miranda, J.: A model updating strategy for predicting time series with seasonal patterns. Appl. Soft Comput. J. (2010).
12. Kuncheva, L., Žliobaite, I.: On the window size for classification in changing environments. Intell. Data Anal. 13, 861–872 (2009).
13. Oneto, L., Ghio, A., Ridella, S., Anguita, D.: Learning Resource-Aware Classifiers for Mobile Devices: From Regularization to Energy Efficiency. Neurocomputing. (2015).
14. Ivannikov, A., Pechenizkiy, M., Bakker, J., Leino, T., Jegoroff, M., Kärkkäinen, T., Äyrämö, S.: Online mass flow prediction in CFB boilers. In: Lecture Notes in Computer Science. pp. 206–219 (2009).
15. Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in Nonstationary Environments: A Survey. IEEE Comput. Intell. Mag. 10, 12–25 (2015).
16. Page, E.S.: Continuous inspection schemes. Biometrika. 41, 100–115 (1954).
17. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. pp. 443–448 (2007).
18. Baena-Garcia, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. pp. 77–86 (2006).
19. Cavalcante, R.C., Minku, L.L., Oliveira, A.L.I.: FEDD: Feature Extraction for Explicit Concept Drift Detection in time series. Proc. Int. Jt. Conf. Neural Networks. 2016-Octob, 740–747 (2016).
20. Zeileis, A., Kleiber, C., Krämer, W., Hornik, K.: Testing and dating of structural changes in practice. Comput. Stat. Data Anal. 44, 109–123 (2003).
21. Verbesselt, J., Hyndman, R., Newnham, G., Culvenor, D.: Detecting trend and seasonal changes in satellite image time series. Remote Sens. Environ. 114, 106–115 (2010).
22. Fernald, J.G., Hall, R.E., Stock, J.H., Watson, M.W.: The disappointing recovery of output after 2009. (2017).
23. Perron, P.: The great crash, the oil price shock, and the unit root hypothesis. Econom. J. Econom. Soc. 1361–1401 (1989).
24. Glynn, J., Perera, N., Verma, R.: Unit Root Tests and Structural Breaks: A Survey with Applications. Rev. Métodos Cuantitativos para la Econ. y la Empres. 3, (2007).
25. Xiao, J., Xiao, Z., Wang, D., Bai, J., Havyarimana, V., Zeng, F.: Short-term traffic volume prediction by ensemble learning in concept drifting environments. Knowledge-Based Syst. 164, 213–225 (2019).
26. Sun, J., Fujita, H., Chen, P., Li, H.: Dynamic financial distress prediction with concept drift based on time weighting combined with Adaboost support vector machine ensemble. Knowledge-Based Syst. 120, 4–14 (2017).
27. Soares, S.G., Araújo, R.: A dynamic and on-line ensemble regression for changing environments. Expert Syst. Appl. 42, 2935–2948 (2015).28. Dunning, T., Friedman, E.: Machine Learning Logistics. O’Reilly Media, Inc. (2017).
29. Alsheikh, M.A., Lin, S., Niyato, D., Tan, H.P.: Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Commun. Surv. Tutorials. (2014).
30. Bach, S.H., Maloof, M.A.: Paired learners for concept drift. Proc. – IEEE Int. Conf. Data Mining, ICDM. 23–32 (2008).
31. Kahlen, M., Ketter, W., Lee, T., Gupta, A.: Optimal Prepositioning and Fleet Sizing to Maximize Profits for One-Way Transportation Companies. ICIS Proc. (2017).
32. Xu, J., Rahmatizadeh, R., Boloni, L., Turgut, D.: Real-Time prediction of taxi demand using recurrent neural networks. IEEE Trans. Intell. Transp. Syst. 19, 2572–2581 (2018).
33. Zhang, J., Zheng, Y., Qi, D.: Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. (2016).
34. TLC: Taxi and Limousine Commission (TLC) Trip Record Data, https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.
35. Cramer, J., Krueger, A.B.: Disruptive change in the taxi business: The case of Uber. Am. Econ. Rev. 106, 177–182 (2016).
36. Stowers, K., Kasdaglis, N., Newton, O., Lakhmani, S., Wohleber, R., Chen, J.: Intelligent agent transparency: The design and evaluation of an interface to facilitate human and intelligent agent collaboration. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. pp. 1706–1710 (2016).
37. Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. OTexts (2018).
38. Hernández, L., Baladron, C., Aguiar, J.M., Carro, B., Sanchez-Esguevillas, A., Lloret, J., Chinarro, D., Gomez-Sanz, J.J., Cook, D.: A multi-agent system architecture for smart grid management and forecasting of energy demand in virtual power plants. IEEE Commun. Mag. 51, 106–113 (2013).
39. Liao, S., Zhou, L., Di, X., Yuan, B., Xiong, J.: Large-scale short-term urban taxi demand forecasting using deep learning. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 428–433. IEEE (2018).
40. Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20, 134– 144 (2002).
41. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794 (2016).
42. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining Explanations: An Overview of Interpretability of Machine Learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). pp. 80–89 (2018).