Oracle or Teacher? A Systematic Overview of Research on Interactive Labeling for Machine Learning

Bibtex

@Select Types{
,
  
   
  
   
  title    = "Oracle or Teacher? A Systematic Overview of Research on Interactive Labeling for Machine Learning", 
  author    = "Merlin Knaeble, Mario Nadj, and Alexander Maedche

", 
  doi    = "https://doi.org/10.30844/wi_2020_a1-knaeble", 
  abstract    = "Machine learning is steadily growing in popularity – as is its demand for labeled training data. However, these datasets often need to be labeled by human domain experts in a labor-intensive process. Recently, a new area of research has formed around this process, called interactive labeling. While much research exists in this young and rapidly growing area, it lacks a systematic overview. In this paper, we strive to provide such overview, along with a cluster analysis and an outlook on five avenues for future research. Hereby, we identified 57 relevant articles, most of them investigating approaches for labeling images or text. Further, our findings indicate that there exist two competing views how the user could be treated: (a) oracle, where users are queried whether a label is right or wrong versus (b) teacher, where users can offer deeper explanations in the interactive labeling process.", 
  keywords    = "Interactive Labeling, Interactive Machine Learning, Training Data
", 
}

Abstract

Abstract

Machine learning is steadily growing in popularity – as is its demand for labeled training data. However, these datasets often need to be labeled by human domain experts in a labor-intensive process. Recently, a new area of research has formed around this process, called interactive labeling. While much research exists in this young and rapidly growing area, it lacks a systematic overview. In this paper, we strive to provide such overview, along with a cluster analysis and an outlook on five avenues for future research. Hereby, we identified 57 relevant articles, most of them investigating approaches for labeling images or text. Further, our findings indicate that there exist two competing views how the user could be treated: (a) oracle, where users are queried whether a label is right or wrong versus (b) teacher, where users can offer deeper explanations in the interactive labeling process.

Keywords

Schlüsselwörter

Interactive Labeling, Interactive Machine Learning, Training Data

References

Referenzen

1. Watson, H.J.: Preparing for the Cognitive Generation of Decision Support. MIS Quarterly Executive. (2017).
2. Chen, N.-C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using Machine Learning to Support Qualitative Coding in Social Science: Shifting the Focus to Ambiguity. ACM Trans. Interact. Intell. Syst. 8, 9:1–9:20 (2018).
3. Anthes, G.: Artificial Intelligence Poised to Ride a New Wave. Commun. ACM. 60, 19–21 (2017).
4. Liu, S., Liu, X., Liu, Y., Feng, L., Qiao, H., Zhou, J., Wang, Y.: Perceptual Visual Interactive Learning. CoRR 1-11. (2018).
5. Bernard, J., Hutter, M., Zeppelzauer, M., Fellner, D., Sedlmair, M.: Comparing Visual- Interactive Labeling with Active Learning: An Experimental Study. IEEE Transactions on Visualization and Computer Graphics. 24, 298–308 (2018).
6. Zhang, L., Tong, Y., Ji, Q.: Active Image Labeling and Its Application to Facial Action Labeling. In: Forsyth, D., Torr, P., and Zisserman, A. (eds.) Computer Vision – ECCV 2008. pp. 706–719. Springer Berlin Heidelberg (2008).
7. Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the People: The Role of Humans in Interactive Machine Learning. 1. 35, 105–120 (2014).
8. Cakmak, M., Chao, C., Lockerd Thomaz, A.: Designing Interactions for Robot Active Learners. IEEE T. Autonomous Mental Development. 2, 108–118 (2010).
9. Dudley, J.J., Kristensson, P.O.: A Review of User Interface Design for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems. 1, 1–37 (2018).
10. Kitchenham, B., Charters, S.: Guidelines for performing Systematic Literature Reviews in Software Engineering. (2007).
11. Webster, J., Watson, R.T.: Analyzing the Past to Prepare for the Future: Writing a Literature Review. MIS Q. 26, xiii–xxiii (2002).
12. Chiu, T., Fang, D., Chen, J., Wang, Y., Jeris, C.: A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 263–268. ACM, New York, NY, USA (2001).
13. Trivedi, G.: On Interactive Machine Learning, https://www.trivedigaurav.com/blog/oninteractive- machine-learning/ (Accessed: 17.08.2019).
14. Kim, B., Pardo, B.: A Human-in-the-Loop System for Sound Event Detection and Annotation. ACM Transactions on Interactive Intelligent Systems. 8, 1–23 (2018).
15. Settles, B.: Active learning literature survey. Technical Report, UW–Madison (2010).
16. Fails, J.A., Olsen, D.R.: Interactive Machine Learning. In: Proceedings of the Seventh IUI ’03. pp. 39–45. ACM Press, Miami (2003).
17. Tian, Y., Liu, W., Xiao, R., Wen, F., Tang, X.: A Face Annotation Framework with Partial Clustering and Interactive Labeling. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8. IEEE, Minneapolis, MN, USA (2007).
18. Culotta, A., McCallum, A.: Reducing Labeling Effort for Structured Prediction Tasks: Defense Technical Information Center, Fort Belvoir, VA (2005).
19. Joshi, A.J., Porikli, F., Papanikolopoulos, N.P.: Scalable Active Learning for Multiclass Image Classification. IEEE TPAMI. 34, 2259–2273 (2012).
20. Branson, S., Perona, P., Belongie, S.: Strong supervision from weak annotation: Interactive training of deformable part models. In: 2011 International Conference on Computer Vision. pp. 1832–1839 (2011).
21. Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++. Presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Salt Lake City (2018).
22. Wolfswinkel, J., Furtmueller, E., Wilderom, C.: Using Grounded Theory as a Method for Rigorously Reviewing Literature. European Journal of Information Systems. 22, (2013).
23. Rundle-Thiele, S., Kubacki, K., Tkaczynski, A., Parkinson, J.: Using two-step cluster analysis to identify homogeneous physical activity groups. Marketing Intelligence & Planning. 33, 522–537 (2015).
24. Rissler, R., Nadj, M., Adam, M., Mädche, A.: Towards an integrative theoretical framework of IT-mediated interruptions. In: Proceedings of the 25th European Conference on Information Systems (ECIS), Guimarães, Portugal, 5-10 June 2017. pp. 1950–1967. AIS Electronic Library (AISeL), Atlanta (2017).
25. Nalisnik, M., Gutman, D.A., Kong, J., Cooper, L.A.D.: An interactive learning framework for scalable classification of pathology images. In: 2015 IEEE International Conference on Big Data (Big Data). pp. 928–935 (2015).
26. Yimam, S.M., Biemann, C., Eckart de Castilho, R., Gurevych, I.: Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 91–
96. Association for Computational Linguistics, Baltimore, Maryland (2014).
27. Kim, B.: Leveraging User Input and Feedback for Interactive Sound Event Detection and Annotation. In: 23rd International Conference on Intelligent User Interfaces. pp. 671–672. ACM, New York, NY, USA (2018).
28. Jing, J., d’Angremont, E., Zafar, S., Rosenthal, E.S., Tabaeizadeh, M., Ebrahim, S., Dauwels, J., Westover, M.B.: Rapid Annotation of Seizures and Interictal-ictal Continuum EEG Patterns. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). pp. 3394–3397 (2018).
29. Boyko, A., Funkhouser, T.: Cheaper by the dozen: group annotation of 3D data. In: Proceedings of the 27th annual ACM symposium on User interface software and technology – UIST ’14. pp. 33–42. ACM Press, Honolulu, Hawaii, USA (2014).
30. von Ahn, L., Dabbish, L.: Labeling Images with a Computer Game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 319–326. ACM, New York, NY, USA (2004).
31. Morschheuser, B., Hamari, J., Koivisto, J., Maedche, A.: Gamified Crowdsourcing: Conceptualization, Literature Review, and Future Agenda. International Journal of Human- Computer Studies. 106, 43 (2017).
32. Weigl, E., Heidl, W., Walch, A., Neissl, U., Meyer-Heye, P., Radauer, T., et al.: MapView: Graphical Data Representation for Active Learning. In: Proceedings of the Workshop Active Learning, i-KNOW, Graz, Austria (2016).
33. Harvey, N., Porter, R.: User-driven sampling strategies in image exploitation. Information Visualization. 15, 64–74 (2016).
34. Bernard, J., Zeppelzauer, M., Sedlmair, M., Aigner, W.: VIAL: a unified process for visual interactive labeling. Vis Comput. 34, 1189–1207 (2018).
35. Lopresti, D., Nagy, G.: Optimal data partition for semi-automated labeling. In: Proceedings of the 21st International Conference on Pattern Recognition. pp. 286–289 (2012).
36. Bernard, J., Ritter, C., Sessler, D., Zeppelzauer, M., Kohlhammer, J., Fellner, D.: Visual- Interactive Similarity Search for Complex Objects by Example of Soccer Player Analysis. arXiv:1703.03385 [cs]. (2017).
37. Bernard, J., Zeppelzauer, M., Lehmann, M., Müller, M., Sedlmair, M.: Towards User- Centered Active Learning Algorithms. Computer Graphics Forum. 37, 121–132 (2018).
38. Huang, L., Matwin, S., de Carvalho, E.J., Minghim, R.: Active Learning with Visualization for Text Data. In: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics. pp. 69–74. ACM, New York, NY, USA (2017).
39. Kockelkorn, T.T.J.P., Sánchez, C.I., Grutters, J.C., Ramos, R., Jong, P.A. de, Viergever, M.A., et al.: Interactive classification of lung tissue in CT scans by combining prior and interactively obtained training data: A simulation study. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). pp. 105–108 (2012).
40. Okuma, K., Brochu, E., Lowe, D.G., Little, J.J.: An Adaptive Interface for Active Localization. In: VISAPP (2011).
41. Pohl, D., Bouchachia, A., Hellwagner, H.: Batch-based active learning: Application to social media data for crisis management. Expert Systems with Applications. 93, 232–244 (2018).
42. Radeva, P., Drozdzal, M., Seguí, S., Igual, L., Malagelada, C., Azpiroz, F., Vitrià, J.: Active labeling: Application to wireless endoscopy analysis. 2012 International Conference on High Performance Computing & Simulation (HPCS). 174–181 (2012).
43. Soares Junior, A., Renso, C., Matwin, S.: ANALYTiC: An Active Learning System for Trajectory Classification. IEEE Computer Graphics and Applications. 37, 28–39 (2017).
44. Zhang, L., Tong, Y., Qiang, J.: Interactive labeling of facial action units. In: 2008 19th International Conference on Pattern Recognition. pp. 1–4 (2008).
45. Zhu, Y., Yang, K.: Tripartite Active Learning for Interactive Anomaly Discovery. IEEE Access. 7, 63195–63203 (2019).
46. Bernard, J., Dobermann, E., Vögele, A., Krüger, B., Kohlhammer, J., Fellner, D.: Visual- Interactive Semi-Supervised Labeling of Human Motion Capture Data. Electronic Imaging. 2017, 34–45 (2017).
47. Haridas, A., Bunyak, F., Palaniappan, K.: Interactive Segmentation Relabeling for Classification of Whole-Slide Histopathology Imagery. In: 2015 IEEE 28th International Symposium on Computer-Based Medical Systems. pp. 84–87 (2015).
48. South, B.R., Mowery, D., Suo, Y., Leng, J., Ferrández, Ó., Meystre, S.M., Chapman, W.W.: Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. Journal of Biomedical Informatics. 50, 162–172 (2014).
49. Su, H., Yin, Z., Huh, S., Kanade, T., Zhu, J.: Interactive Cell Segmentation Based on Active and Semi-Supervised Learning. IEEE TMI. 35, 762–777 (2016).
50. Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A.: Interactive and Iterative Annotation for Biomedical Entity Recognition. In: Brain Informatics and Health. pp. 347–357. Springer International Publishing (2015).
51. Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A.: An adaptive annotation approach for biomedical entity and relation recognition. Brain Informatics. 3, 157–168 (2016).
52. Zankl, G., Haxhimusa, Y., Ion, A.: Interactive Labeling of Image Segmentation Hierarchies. In: Pinz, A., Pock, T., Bischof, H., and Leberl, F. (eds.) Pattern Recognition. pp. 11–20. Springer Berlin Heidelberg (2012).
53. Kim, B., Pardo, B.: I-SED: An Interactive Sound Event Detector. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces – IUI ’17. pp. 553–557. ACM Press, Limassol, Cyprus (2017).
54. Kockelkorn, T.T.J.P., Jong, P.A. de, Gietema, H.A., Grutters, J.C., Prokop, M., Ginneken, B. van: Interactive annotation of textures in thoracic CT scans. In: Medical Imaging 2010: Computer-Aided Diagnosis. p. 76240X. (2010).
55. Kockelkorn, T.T.J.P., Jong, P.A. de, Schaefer-Prokop, C.M., Wittenberg, R., Tiehuis, A.M., Gietema, H.A., Grutters, J.C., Viergever, M.A., Ginneken, B. van: Semi-automatic classification of textures in thoracic CT scans. Phys. Med. Biol. 61, 5906–5924 (2016).
56. Prakash, T., Kak, A.C.: Active learning for designing detectors for infrequently occurring objects in wide-area satellite imagery. Computer Vision and Image Understanding. 170, 92– 108 (2018).
57. Rajadell, O., Garcia-Sevilla, P., Dinh, V.C., Duin, R.P.W.: Semi-supervised hyperspectral pixel classification using interactive labeling. In: 2011 3rd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS). pp. 1–4 (2011).
58. Arendt, D., Komurlu, C., Blaha, L.M.: CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning. In: Augmented Cognition. Neurocognition and Machine Learning. pp. 429–448. Springer International Publishing (2017).
59. Bernard, J., Hutter, M., Lehmann, M., Müller, M., Zeppelzauer, M., Sedlmair, M.: Learning from the Best – Visual Analysis of a Quasi-Optimal Data Labeling Strategy. 7 (2018).
60. Burkovski, A., Kessler, W., Heidemann, G., Kobdani, H., Schütze, H.: Self Organizing Maps in NLP: Exploration of Coreference Feature Space. In: Laaksonen, J. and Honkela, T. (eds.) Advances in Self-Organizing Maps. pp. 228–237. Springer (2011).
61. Chang, J.C., Amershi, S., Kamar, E.: Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems – CHI ’17. pp. 2334–2346. ACM Press, Denver, USA (2017).
62. Chen, J., Wang, D., Xie, I., Lu, Q.: Image annotation tactics: transitions, strategies and efficiency. Information Processing & Management. 54, 985–1001 (2018).
63. Cui, S., Dumitru, C.O., Datcu, M.: Semantic annotation in earth observation based on active learning. International Journal of Image and Data Fusion. 5, 152–174 (2014).
64. Das, K., Avrekh, I., Matthews, B., Sharma, M., Oza, N.: ASK-the-Expert: Active Learning Based Knowledge Discovery Using the Expert. In: Altun, Y., et al. (eds.) Machine Learning and Knowledge Discovery in Databases. pp. 395–399. Springer International (2017).
65. Datta, S., Adar, E.: CommunityDiff: Visualizing Community Clustering Algorithms. ACM Transactions on Knowledge Discovery from Data. 12, 1–34 (2018).
66. Kutsuna, N., Higaki, T., Matsunaga, S., Otsuki, T., Yamaguchi, M., Fujii, H., Hasezawa, S.: Active learning framework with iterative clustering for bioimage classification. Nature Communications. 3, (2012).
67. Nagy, G., Zhang, X.: CalliGUI: Interactive Labeling of Calligraphic Character Images. In: 2011 ICDAR. pp. 977–981 (2011).
68. Sarkar, A., Morrison, C., Dorn, J.F., Bedi, R., Steinheimer, S., Boisvert, J., et al.: Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. pp. 261–271. ACM, New York, NY, USA (2016).
69. Song, M., Sun, Z., Li, B., Hu, J.: Iterative Active Classification of Large Image Collection. In: Schoeffmann, K., Chalidabhongse, T.H., Ngo, C.W., Aramvith, S., O’Connor, N.E., Ho, Y.-S., et al. (eds.) MultiMedia Modeling. pp. 291–304. Springer International (2018).
70. Zhang, X., Nagy, G.: The CADAL Calligraphic Database. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing. pp. 37–42. ACM, New York, NY, USA (2011).
71. Plummer, B.A., Kiapour, M.H., Zheng, S., Piramuthu, R.: Give Me a Hint! Navigating Image Databases Using Human-in-the-Loop Feedback. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). 2048–2057 (2018).
72. Benato, C., Telea, A., Falcao, A.: Semi-Supervised Learning with Interactive Label Propagation Guided by Feature Space Projections. 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images. 392–399 (2018).

Most viewed articles

Meist angesehene Beiträge

GITO events | library.gito