Bibtex
Cite as text
@Select Types{,
Journal = "Band-1",
Title= "Towards a Taxonomy of Data Heterogeneity",
Author= "Jan Roeder, Jan Muntermann, and Thomas Kneib",
Doi= "https://doi.org/10.30844/wi_2020_c6-roeder",
Abstract= "The increasing diversity of data available today poses a multitude of challenges to researchers and practitioners. Data understanding, i.e., describing, exploring, and verifying a data set at hand, becomes a critical process during which it is examined if data complies with the actual user needs. With an increasing complexity of the data universe accessible by organizations and decision-makers, this task has become even more important and challenging. Building on insights from information systems research, computer science, and statistics, we develop and evaluate a taxonomy of data heterogeneity for addressing this challenge. The proposed taxonomy provides a foundation for exploring the properties of data sets. Thereby, it is relevant for both researchers and practitioners as it provides a useful tool for describing and ultimately understanding data sets. We illustrate the effectiveness of our taxonomy by applying it to data sets available to the research community and industry.
",
Keywords= "data science, data heterogeneity, data understanding, taxonomy, information value chain
",
}
Jan Roeder, Jan Muntermann, and Thomas Kneib: Towards a Taxonomy of Data Heterogeneity. Online: https://doi.org/10.30844/wi_2020_c6-roeder (Abgerufen 26.12.24)
Open Access
The increasing diversity of data available today poses a multitude of challenges to researchers and practitioners. Data understanding, i.e., describing, exploring, and verifying a data set at hand, becomes a critical process during which it is examined if data complies with the actual user needs. With an increasing complexity of the data universe accessible by organizations and decision-makers, this task has become even more important and challenging. Building on insights from information systems research, computer science, and statistics, we develop and evaluate a taxonomy of data heterogeneity for addressing this challenge. The proposed taxonomy provides a foundation for exploring the properties of data sets. Thereby, it is relevant for both researchers and practitioners as it provides a useful tool for describing and ultimately understanding data sets. We illustrate the effectiveness of our taxonomy by applying it to data sets available to the research community and industry.
data science, data heterogeneity, data understanding, taxonomy, information value chain
1. Humby, C., ANA Senior marketer’s summit (2006)
2. Abbasi, A., Sarker, S., Chiang, R.H.L.: Big Data Research in Information Systems: Toward an Inclusive Research Agenda. Journal of the Assoc. for Inf. Sys. 17, i-xxxii (2016)
3. Chen, H.C., Chiang, R.H.L., Storey, V.C.: Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly 36, 1165-1188 (2012)
4. Wirth, R., Hipp, J.: CRISP-DM: Towards a standard process model for data mining. In: Proceedings of the 4th Int. Conf. on the Pract. Appl. of Knowledge Discovery and Data Mining, pp. 29-39. Citeseer, (2000)
5. Sharda, R., Delen, D., Turban, E., Aronson, J., Liang, T.P.: Business Intelligence and Analytics: Systems for Decision Support. Prentice Hall, Essex (2014)
6. Merriam-Webster, https://www.merriam-webster.com/dictionary/heterogeneous (Accessed: 10.08.2019)
7. Eliazar, I., Sokolov, I.M.: Maximization of statistical heterogeneity: From Shannon’s entropy to Gini’s index. Physica A: Statistical Mechanics and its Applications 389, 3023- 3038 (2010)
8. Lawson, T., Garrod, J.: Dictionary of Sociology. Fitzroy Dearborn, Chicago (2001)
9. Dominici, F., Parmigiani, G., Wolpert, R.L., Hasselblad, V.: Meta-Analysis of Migraine Headache Treatments: Combining Information from Heterogeneous Designs. Journal of the American Statistical Association 94, 16-28 (1999)
10. Wooldridge, J.M.: Fixed-Effects and Related Estimators for Correlated Random- Coefficient and Treatment-Effect Panel Data Models. The Rev. of Econ. and Stat. 87, 385- 390 (2005)
11. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334-350 (2001)
12. Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26, 97-107 (2014)
13. Ranjan, J.: The 10 Vs of Big Data framework in the Context of 5 Industry Verticals. Productivity 59, 324-342 (2019)
14. Kitchin, R., McArdle, G.: What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data & Society 3, 1-10 (2016)
15. Volk, M., Hart, S., Bosse, S., Turowski, K.: How much is Big Data? A Classification Framework for IT Projects and Technologies. In: Proc. of the 22nd Americas Conf. on Inf. Sys. AIS, San Diego, CA (2016)
16. Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., Lee, D.: A Taxonomy of Dirty Data. Data Mining and Knowledge Discovery 7, 81-99 (2003)
17. Susha, I., Janssen, M., Verhulst, S.: Data Collaboratives as a New Frontier of Cross-Sector Partnerships in the Age of Open Data: Taxonomy Development. In: Proc. of the 50th Hawaii Int. Conf. on Sys. Sciences. AIS, Waikōloa Beach, Hawaii (2017)
18. Widjaja, T., Kaiser, J., Tepel, D., Buxmann, P.: Heterogeneity in IT landscapes and Monopoly Power of Firms: a Model to Quantify Heterogeneity. In: Proc. of the 33rd Int. Conf. on Inf. Sys. AIS, Orlando, FL (2012)
19. Lafky, D.B.: Heterogeneous Data in Federated Networks: A Framework for Solution Development. In: Proc. of the Americas Conf. on Inf. Sys. AIS, New York, NY (2004)
20. Nickerson, R.C., Varshney, U., Muntermann, J.: A method for taxonomy development and its application in inf. sys. European Journal of Information Systems 22, 336-359 (2013)
21. Glass, R.L., Vessey, I.: Contemporary application-domain taxonomies. IEEE Software 12, 63-76 (1995)
22. Bailey, K.D.: Typologies and taxonomies. Sage, Thousand Oaks, CA (1994)
23. Gregor, S.: The Nature of Theory in Inf. Sys. MIS Quarterly 30, 611-642 (2006)
24. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group (2001)
25. Lycett, M.: ‘Datafication’: making sense of (big) data in a complex world. European Journal of Information Systems 22, 381-386 (2013)
26. Goes, P.B.: Big Data and IS Research. MIS Quarterly 38, iii-viii (2014)
27. Elmasri, R., Navathe, S.: Fund. of Database Systems. Addison-Wesley, Boston, MA (2011)
28. Kitchin, R.: The opportunities, challenges and risks of big data for official statistics. Statistical Journal of the IAOS 31, 471-481 (2015)
29. Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE Publications Ltd, London (2014)
30. Monk, A., Prins, M., Rook, D.: Rethinking Alternative Data in Institutional Investment. The Journal of Financial Data Science Winter 2019, 14-31 (2019)
31. Gelman, A., Hill, J.: Data analysis using regression and multilevel / hierarchical models. Cambridge University Press, New York, NY, USA (2006)
32. Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. J Manage Inform Syst 12, 5-33 (1996)
33. Kaggle, https://www.kaggle.com/ (Accessed: 14.08.2019)
34. Ellis, E.G., https://www.wired.com/2016/09/gab-alt-rights-twitter-ultimate-filter-bubble/ (Accessed: 20.11.2019)
35. García, D., Norli, Ø.: Crawling EDGAR. The Spanish Review of Financial Economics 10, 1-10 (2012)
36. Müller, O., Junglas, I., Brocke, J.v., Debortoli, S.: Utilizing big data analytics for information systems research: challenges, promises and guidelines. European Journal of Inf. Sys. 25, 289-302 (2016)