A Semi-Automated Video Annotation Tool for Gastroenterologists
Total Page:16
File Type:pdf, Size:1020Kb
Fast Machine Learning Annotation in the Medical Domain: A Semi-Automated Video Annotation Tool for Gastroenterologists Adrian Krenzer ( [email protected] ) Julius-Maximilians-Universitat Wurzburg https://orcid.org/0000-0002-1593-3300 Kevin Makowski University of Würzburg: Julius-Maximilians-Universitat Wurzburg Amar Hekalo University of Würzburg: Julius-Maximilians-Universitat Wurzburg Daniel Fitting Universitätsklinikum Würzburg: Universitatsklinikum Wurzburg Joel Troya University Hospital Wurzburg: Universitatsklinikum Wurzburg Wolfram G. Zoller City of Stuttgart Hospitals Katharinenhospital: Klinikum Stuttgart Katharinenhospital Alexander Hann Universitätsklinikum Würzburg: Universitatsklinikum Wurzburg Frank Puppe Universität Würzburg: Julius-Maximilians-Universitat Wurzburg Research Keywords: Machine learning, Deep learning, Annotation, Endoscopy, Gastroenterology, Automation, Object detection Posted Date: August 10th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-776478/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License 1 Krenzer et al. 2 3 4 5 6 RESEARCH 7 8 9 Fast machine learning annotation in the medical 10 11 12 domain: A semi-automated video annotation tool 13 14 for gastroenterologists 15 16 Adrian Krenzer1*, Kevin Makowski1, Amar Hekalo1, Daniel Fitting2, Joel Troya2, Wolfram G. Zoller3, 17 Alexander Hann2 and Frank Puppe1 18 19 *Correspondence: 20 [email protected] Abstract 21 1Department of Artificial 22 Intelligence and Knowledge Background: Machine learning, especially deep learning, is becoming more and Systems, Sanderring 2, 97070 more relevant in research and development in the medical domain. For all of the 23 W¨urzburg, Germany 24 Full list of author information is supervised deep learning applications, data is the most critical factor in securing 25 available at the end of the article successful implementation and sustaining the progress of the machine learning 26 model. Especially gastroenterological data, which often involves endoscopic 27 videos, are cumbersome to annotate. Domain experts are needed to interpret and 28 annotate the videos. To support those domain experts, we generated a 29 framework. With this framework, instead of annotating every frame in the video 30 sequence, experts are just performing key annotations at the beginning and the 31 end of sequences with pathologies, e.g. visible polyps. Subsequently, non-expert 32 annotators supported by machine learning add the missing annotations for the 33 frames in-between. 34 Results: Using this framework we were able to reduce work load of domain 35 36 experts on average by a factor of 20. This is primarily due to the structure of the 37 framework, which is designed to minimize the workload of the domain expert. 38 Pairing this framework with a state-of-the-art semi-automated pre-annotation 39 model enhances the annotation speed further. Through a study with 10 40 participants we show that semi-automated annotation using our tool doubles the 41 annotation speed of non-expert annotators compared to a well-known 42 state-of-the-art annotation tool. 43 Conclusion: In summary, we introduce a framework for fast expert annotation 44 for gastroenterologists, which reduces the workload of the domain expert 45 considerably while maintaining a very high annotation quality. The framework 46 incorporates a semi-automated annotation system utilizing trained object 47 detection models. The software and framework are open-source. 48 49 Keywords: Machine learning; Deep learning; Annotation; Endoscopy; 50 Gastroenterology; Automation; Object detection 51 52 53 54 Background 55 56 Machine learning especially deep learning is becoming more and more relevant in 57 research and development in the medical domain [1, 2]. For all of the supervised 58 deep learning applications, data is the most critical factor in securing successful 59 implementation and sustaining progress. Numerous studies have shown that access 60 61 to data and data quality are crucial to enable successful machine learning of med- 62 ical diagnosis, providing real assistance to physicians [3, 4, 5, 6, 7]. Exceptionally 63 64 65 1 Krenzer et al. Page 2 of 22 2 3 4 5 6 high-quality annotated data can improve deep learning detection results to great 7 extent [8, 9, 10]. E.g., Webb et al. show that higher data quality improves detection 8 results more than using larger amounts of lower quality data [11]. This is especially 9 important to keep in mind while operating in the medical domain, as mistakes may 10 11 have fatal consequences. Nevertheless, acquiring such data is very costly particu- 12 larly if domain experts are involved. On the one hand domain, experts have minimal 13 time resources for data annotation, while on the other hand, data annotation is a 14 highly time-consuming process. The best way to tackle this problem is by reducing 15 16 the annotation time spend by the actual domain expert as much as possible while 17 using non-experts to finish the process. Therefore, in this paper, we designed a 18 framework that utilizes a two-step process involving a small expert annotation part 19 and a large non-expert annotation part. This shifts most of the workload from the 20 21 expert to a non-expert while still maintaining proficient high-quality data. Both of 22 the tasks are combined with AI to enhance the annotation process efficiency further. 23 To handle the entirety of this annotation process, we introduce the software Fast 24 Colonoscopy Annotation Tool (FastCat). This tool assists in the annotation process 25 26 in endoscopic videos but can easily be extended to any other medical domain. The 27 main contributions of our paper are: 28 29 1) We introduce a framework for fast expert annotation, which reduces the workload 30 31 of the domain expert by a factor of 20 while maintaining very high annotation 32 quality. 33 2) We publish an open-source software for annotation in the gastroenterological 34 domain and beyond, including two views, one for expert annotation and one 35 [1] 36 for non-expert annotation. 37 3) We incorporate a semi-automated annotation process in the software, which 38 reduces the annotation time of the annotators and further enhances the an- 39 notation process’s quality. 40 41 42 To overview existing work and properly allocate our paper in the literature we 43 describe a brief history reaching from general annotation tools for images and videos 44 to annotation specialized for medical use. 45 46 47 A brief history of annotation tools 48 As early as the 1990s, the first methods were conceived to collect large datasets 49 of labeled images [12]. E.g., ”The Open Mind Initiative”, a web-based framework, 50 was developed in 1999. Its goal was to collect annotated data by web users to 51 52 be utilized by intelligent algorithms [13]. Over the years, various ways to obtain 53 annotated data have been developed. E.g., an online game called ESP was developed 54 to generate labeled images. Here, two random online players are given the same 55 image and, without communication, must guess the thoughts of the other player 56 57 about the image and provide a common term for the target image as quickly as 58 possible [14, 12]. As a result, several million images have been collected. The first 59 and foremost classic annotation tool called labelme was developed in 2007 and is 60 still one of the most popular open-source online annotation tools to create datasets 61 62 [1]https://github.com/fastcatai/fastcat 63 64 65 1 Krenzer et al. Page 3 of 22 2 3 4 5 Table 1 Comparison between video and image annotation tools. 6 Tool CVAT LabelImg labelme VoTT VIA 7 Image • • • •• 8 Video - - • • • 9 Usability Easy Easy Medium Medium Hard 10 VOC - • • • • 11 COCO - - • • • 12 YOLO - - - • • 13 Formats TFRecord - - - • • 14 Others - - • • • 15 16 17 for computer vision. Labelme provides the ability to label objects in an image 18 by specific shapes, as well as other features [15]. From 2012 to today, with the 19 rise of deep learning in computer vision, the number of annotation tools expanded 20 21 rapidly. One of the most known and contributing annotation tools is LabelImg, 22 published in 2015. LabelImg is an image annotation tool based on Python which 23 utilizes bounding boxes to annotate images. The annotations are stored in XML 24 files that are saved in either PASCAL VOC or YOLO format. Additionally, in 2015 25 26 Playment was introduced. Playment is an annotation platform to create training 27 datasets for computer vision. It offers labeling for images and videos using different 28 2D or 3D boxes, polygons, points, or semantic segmentation. Besides, automatic 29 labeling is provided for support. In 2017 Rectlabel entered the field. RectLabel is a 30 paid labeling tool that is only available on macOS. It allows the usual annotation 31 32 options like bounding boxes as well as automatic labeling of images. It also supports 33 the PASCAL VOC XML format and exports the annotations to different formats 34 (e.g., YOLO or COCO JSON). Next, Labelbox, a commercial training data platform 35 for machine learning, was introduced. Among other things, it offers an annotation 36 37 tool for images, videos, texts, or audios and data management of the labeled data. 38 Nowadays, a variety of image and video annotation tools can be found. Some have 39 basic functionalities, and others are designed for particular tasks. We picked five 40 freely available state-of-the-art annotation tools and compared them more in-depth. 41 In table 1, we shortly describe these tools and compare them. 42 43 44 Computer Vision Annotation Tool (CVAT) CVAT [16] was developed by Intel and 45 is a free and open-source annotation tool for images and videos. It is based on a 46 client-server model, where images and videos are organized as tasks and can be split 47 up between users to enable a collaborative working process.