Deep learning for image segmentation: veritable or overhyped?

ZHENZHOU WANG* College of electrical and electronic engineering, Shandong University of technology, Zibo City, 255000,China *[email protected]

Abstract: Deep learning has achieved great success as a powerful classification tool and also made great progress in sematic segmentation. As a result, many researchers also believe that deep learning is the most powerful tool for pixel-level image segmentation. Could deep learning achieve the same pixel-level accuracy as traditional image segmentation techniques by mapping the features of the object into a non-linear function? This paper gives a short survey of the accuracies achieved by deep learning so far in image classification and image segmentation. Compared to the high accuracies achieved by deep learning in classifying limited categories in international vision challenges, the image segmentation accuracies achieved by deep learning in the same challenges are only about 80%. On the contrary, the image segmentation accuracies achieved in international biomedical challenges are close to 95%. Why the difference is so big? Since the accuracies of the competitors’ methods are only evaluated based on their submitted results instead of reproducing the results by submitting the source codes or the software, are the achieved accuracies verifiable or overhyped? We are going to find it out by analyzing the working principle of deep learning. Finally, we compared the accuracies of state of the art deep learning methods with a threshold selection method quantitatively. Experimental results showed that the threshold selection method could achieve significantly higher accuracy than deep learning methods in image segmentation. 1. Introduction Most deep learning models are based on artificial neural network that is invented by simulating human brains. Although most deep learning models have various differences from the structural and functional properties of human brains, they are analogical to some extent. The neural dynamics of human brain corresponds to a form of variational EM , i.e. with approximate rather than exact posteriors [1]. For instance, the human brain could recognize an object without knowing how to calculate its exact size, height, shape or other information and just judge based on some approximated information. Similar to human brains, deep learning models classify the objects without knowing how to calculate its exact size, height, shape or other information. In nature, deep learning combines many continuous regression units discretely and approximates a discrete non-linear mapping function without knowing its exact form. By nature, deep learning is similar to, but different from regression. The linear regression model constructs the global linear mapping between the inputs and outputs. Different from the linear regression model, deep learning needs to construct the local linear models to determine the global non-linear mapping between the inputs and the outputs. We show a simple example in Fig.1 to demonstrate the difference between the linear regression model and the deep learning model. For simplicity, we only draw one hidden layer of deep learning. For the linear regression model in Fig. 1(a), there are three variables for the inputs and two variables for the outputs and they are formulated by the following equations. 1 = 11 × 1 + 21 × 2 + 31 × 3 + 1 (1)

푦2 = 푤12 × 푥1 + 푤22 × 푥2 + 푤32 × 푥3 + 푏2 (2)

푦 푤 푥 푤 푥 푤 푥 푏

1

(a) (b) Fig. 1. Demonstration of the difference between a linear regression model and a simple learning model (a) The linear regression model; (b) The simplified deep learning model.

As can be seen, the above model can be easily solved by the least squares method. However, when the inputs are not linearly separable, there will be no linear mapping between the inputs and the outputs. Deep learning uses many hidden layers to construct the nonlinear mapping. As simplified in Fig. 1 (b), one hidden layer with three neurons is added and the nonlinear mapping between the input and the output is formulated by the following equations.

1 = 11 × 1 + 21 × 2 + 31 × 3 + 1 (3)

ℎ2 = 푊12 × 푋1 + 푊22 × 푋2 + 푊32 × 푋3 + 푏2 (4)

ℎ3 = 푊13 × 푋1 + 푊23 × 푋2 + 푊33 × 푋3 + 푏3 (5)

ℎ1 = 푊 11 ×푋 +푊 21 ×푋 +푊 31 ×푋 +푏 1 (6) ′ ′ ′ ′ 푌2 = 푊12 × ℎ1+ 푊22 × ℎ2+ 푊32 × ℎ3+ 푏2 (7) ′ ′ ′ ′ A process called 푌‘training’푊 is usedℎ1 to푊 solve theℎ2 above푊 modeℎ3l and푏 it relies on descent [2]. Different from the linear regression, deep learning could only determine these parameters to make the outputs respond to the inputs according to the training data instead of generate a continuous and meaningful mapping function between the inputs and the outputs. With the continuous and meaningful mapping function, i.e. the exact mathematical equations, regression could determine the linear or non-linear mapping between the continuous input variables and the continuous output variables. On the contrary, deep learning could only determine the non-linear mapping between the discrete input variables and the finite number of discrete output variables with a set of discrete training parameters. Without loss of generality, regression is formulated as a mapping model by the following simple equation. ( , ) (8)

is a continuous linear or non-linear푦 mapping≈ 푓 푥 푤 function and it has the exact mathematical function. denotes the coefficients to be determined by regression. denotes the continuous input푓 variable and denotes the continuous output variable. Similarly,푤 deep learning could be formulated as: 푥 푦 , ; = 1, … , ; = 1, … , (9)

푌푖 ≈ 퐹�푋푗 푊� 푖 퐼 푗 퐽

2

is a discrete and non-linear mapping function and its exact mathematical function is never known. denotes the discrete mapping parameters to be determined by training. 퐹 denotes the discrete input variable and denotes its total number. denotes the discrete푗 output variable푊 and denotes its total number. 푋 푖 For image segmentation, the inputs of퐼 the deep learning are the features푌 extracted from the image and the output퐽 is a prediction map. For the predication map to have pixel-level accuracy, the number of the outputs is computed as: = × (10) where is the width of the image푀 and푀 푁 is the height of the image. denotes the total number of labeling. If the image is 퐽segmented퐿 into two classes, equals 2. If the image is segmented 푀into three classes, equals 3, etc.푁 To segment an image with퐿 a resolution of 100 × 100 into two classes, the number of outputs is close to infinity퐿 , which is intractable for practical implementation. Consequently,퐿 the predication map could reach the semantic level in practice. Therefore, traditional image segmentation techniques are usually used to refine the prediction result to achieve the pixel-level accuracy. 2. Deep learning for object classification and detection The deep learning revolution took place around 2012 when it won several well-known international speech and image recognition or classification competitions. The competitions that made deep learning resurge include, but not limited to: large-scale automatic speech recognition competition [3-4], mixed national institute of standards and technology (MNIST) database competition [5], traffic sign competition [6], Chinese competition [7], ImageNet large scale visual recognition competition [8-9], pattern analysis statistical modeling and computational learning (PASCAL) visual object classes database competition [10] and so on. Large-scale automatic speech recognition competition uses the Texas Instrument and Massachusetts Institute of Technology (TIMIT) dataset that contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States. The test data has a core portion containing 24 speakers, 2 male and 1 female from each dialect region. According to Eq. (9), the number of inputs is 6300 and the number of the outputs is 8, which is sufficient for deep learning to determine the training parameters robustly. In [4], it is reported that the recognition error rate for the training sets is 13.3 and the recognition error rate for the core test sets is 16.5. The MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples. According to Eq. (9), the number of inputs is 60000 and the number of the outputs is 10, which is sufficient for deep learning to determine the training parameters. In [5], it is reported that the error rate is only 0.23%. Traffic sign competition contains 39209 training images in 43 classes. According to Eq. (9), the number of inputs is 39209 and the number of the outputs is 43, which is sufficient for deep learning to determine the training parameters. In [6], it is reported that a recognition rate of 99.46% had been achieved and it was better than the one of humans on this task (98.84%). Chinese handwriting recognition competition contains a large dataset of isolated characters with about 3.9 million samples of 7,356 classes (7,185 Chinese characters and 171 symbols). According to Eq. (9), the number of inputs is 3900000 and the number of outputs is 7356, which is also sufficient for deep learning to determine the training parameters robustly. In [7], a 97.39% character recognition rate was achieved. ImageNet image classification competition between 2012 and 2014 contains 1281167 images for 1000 classes of objects. According to Eq. (9), the number of inputs is 1281167 and the number of outputs is 1000, which is sufficient for deep learning to determine the training parameters. In [8], an error rate of 15.3% had been achieved, which ignite the revolution of deep learning. In [9], it is reported that the best error rate 6.66% was achieved. However, it only achieved 43.93% average precision for , which already ranked the first among 17 competing teams. In the last session of ImageNet Large Scale Visual Recognition Challenge that took

3 place in 2017, the achieved best object detection accuracy was 73.14%. All these measures range from 0 to 1 and the higher the better. For the PASCAL datasets, 8498 images are used to train a deep learning method called fully convolutional network (FCN) and only 67.5 % average precision was achieved [10]. From the visual results in [10], it is seen that even the input is the ground truth, the precision of the output is still not high. Since deep learning is a discrete mapping function and is not capable of generating continuous (infinite) outputs, the performance of deep learning in such cases will be reduced significantly. Consequently, no good results are expected when deep learning is used for image segmentation unless the resolution of the image is extremely small. For better comparison, we list all the competitions mentioned above in Table 1 with their critical data information and best accuracies achieved so far. As can be seen, the performance of deep learning is inversely proportional to the number of outputs with the assumption that sufficient inputs are available. Similar to Shannon theorem, a theorem about the proportion of the number of the inputs and the number of the outputs should be studied to guide the performance of the deep learning.

Table 1. Comparisons of the best accuracies achieved by deep learning in international competitions with the critical data information Competitions Number of inputs Number of outputs Best accuracy Voice recognition 6300 8 83.5% [2] Digits recognition 60000 10 99.77% [4] Traffic sign recognition 39209 43 99.46% [5] Chinese handwriting recognition 3900000 7356 97.39% [6] ImageNet image classification 1281167 1000 93.34% [7] ImageNet object detection 1281167 ≈infinity 73.14% [8] PASCAL VOC image segmentation 8498 ≈infinity 67.5 % [9];79.5%[10]

3. Deep learning for biomedical image segmentation In theory, deep learning does not have the ability to achieve an accurate image segmentation result since it simulates human brains to classify the objects based on approximation information instead of the exact information of the object as analyzed in the above sections. However, there are many reported research work about the unprecedented segmentation accuracy achieved by deep learning. Let us look into the details of how these researchers made it. In [11], the author who had won three international competitions [5-7] claimed that his deep neural network was trained with 3 million pixels with only 2 outputs: membrane denoted as 1 and non-membrane denoted as 0. The output is a prediction map which looks the same as the probability map generated by another competitor [12] and the popular method to generate the probability map is described in [13]. As can be seen, the prediction map is generated based on the features of the image instead of the pixels of the image. Therefore, deep learning works on the feature level instead of the pixel level for image segmentation. Accordingly, deep learning could not achieve pixel level accuracy without further processing. It is seen from the illustration in [14], there were serious segmentation errors in the final segmentation result by the deep learning method proposed in [11]. In another object detection competition [15], the same author of [11] used his deep learning method to detect mitosis in breast cancer histology images. At this time, the achieved F-measure accuracy was only 78.2%. So far, the author did not disclose any related source codes for other researchers to reproduce the segmentation accuracy or to verify the proposed deep learning method, which indicates that the international competitions based on the submitted results might not be reliable [16]. In [17], the authors gave a comprehensive survey of deep learning in cardiology and concluded that a complete theoretical understanding of deep learning is not yet available. Therefore, the authors claimed that successful application of artificial intelligence (AI) in the medical field relies on achieving interpretable models and big datasets. Unfortunately, deep

4 learning is a black box instead of an interpretable model. Because of the black box property of deep learning, its performance might vary significantly in segmenting the same images with different implementations. We take the segmentation of the left ventricle (LV) by deep learning methods as a typical example and summarize the accuracies achieved by deep learning methods [18-47] based on public magnetic resonance image (MRI) and computational tomography (CT) datasets in Table 2. For the most popular dataset (2009 MICCAI challenge), the best accuracy achieved so far by deep learning based methods was 0.94 [21], where the deep learning was used to yield an approximation segmentation and then deformable model was used to refine the final segmentation result. The results in [21] verified the point that deep learning does not have the ability to achieve an accurate image segmentation result. However, the reported accuracy in [21] seems to be exaggerated. The inferred shape by deep learning is used as initialization contour for the evolvement of the deformable model. However, the accuracy of the deformable model is determined by the generated edge map instead of its initialization contour. Therefore, the final accuracy of the combined method should be similar to that of the deformable model based methods. On the contrary, the reported accuracy was much better than those reported by the deformable model based methods. In and after 2018, the reported accuracy on the 2009 MICCAI challenge datasets have dropped to 0.91 [27] and 0.9 [28].

Table 2. Comparison of the accuracies achieved by deep leaning methods on public left ventricle datasets (Please note that the results in Table 2 should not be trusted fully since the accuracy is computed by the authors themselves or by the organizers based on the submitted results without verifying the corresponding source codes.) Datasets\Performance DICE similarity coefficients achieved in DICE similarity coefficients or before 2017 achieved in or after 2018 Datasets from York 0.75 [18]; 0.91[19] NA University(MRI dataset) Sunnybrook 2009 MICCAI 0.93 0.91[27];0.9[28] challenge (MRI dataset) [19];0.91[20];0.94[21];0.88[22];0.92 [23];0.85 [24];0.88[25];0.9 [26] ACDC challenge STACOM 0.86[29];0.97[30];0.95[31];0.96[32];0.9 0.92 [39-40] 2017 5[33];0.95[34];0.95[35];0.92[36];0.82[3 (MRI dataset) 7];0.94[38] MMWHS Challenge 80.9[41];90.8[42] 79.3 [43] STACOM 2017 (CT dataset) STACOM 2011 challenge(MRI 0.92 [44]; NA [45] NA dataset) STACOM 2013 challenge(MRI 0.82 [46] NA dataset) STACOM 2018 challenge(MRI NA 0.85 [47] dataset)

In 2017, statistical atlases and computational modeling (STACOM) of the heart workshop held two challenges: automated cardiac diagnosis challenge (ACDC) and multi-modality whole heart segmentation (MMWHS) challenge. The same as other international challenges or competitions, the competition teams were requested to submit their segmentation results to evaluate the accuracy of their methods. For the first time, deep learning based methods achieved the unprecedented accuracy by many teams, e.g. 0.97 [30], 0.96 [32] and 0.95 [32, 33-35]. One thing in common about the deep learning methods designed by these teams is that deep learning is the sole technique for segmentation and no other segmentation techniques were used for refinement. The organizer thus claimed that the problem of LV segmentation has been solved. Many of the teams had announced that they disclosed their codes or implementation details via the Gitbub. However, the author’s team did not find any useful thing from their disclosed codes and they did not respond to the email either. Many other researchers could not

5 reproduce similar accuracies either. For instance, the achieved accuracy on the ACDC datasets dropped from 0.95 to 0.92 in 2018 [39]. The achieved accuracy on the MMWHS datasets dropped from 0.91[42] to 0.79 [43] in 2018. In the latest STACOM 2018 challenge, the achieved accuracy for LV segmentation dropped to 0.85 [47]. As can be seen, the international competitions based on the submitted results are not reliable [16]. The reasons that made the accuracies achieved during STACOM 2017 so high include that (1), it is easy to improve the accuracy of the submitted results semi-automatically or manually; (2), The winners of the international challenges have a bright career future, which allure them to be dishonest. (3), most of them believe in that deep learning could actually reach that high accuracy even they did not achieve it during the practical implementations. Their blind faith in deep learning mainly comes from the great success of deep learning in classifying limited categories. As pointed out in this paper, deep learning could only give an approximation or a rough predication map instead of an accurate segmentation result for image segmentation. 4. Comparison of deep learning methods with traditional methods Firstly, we compare the deep learning based methods with the slope difference distribution (SDD) threshold selection method proposed in [48]. The performance of SDD is affected by its parameters, the bandwidth B and the fitting number N significantly [48]. The segmentation results of selecting the bandwidth as 5 and the fitting number as 15 were published in [49] while its source codes had been made publicly accessible in 2017. Without notifying the reviewers that the source codes were available, the submitted paper had been rejected by many journals with almost the same reason that deep learning was better. However, the achieved accuracy by SDD in [49] is 0.9246 which is higher than most reported accuracies by deep learning methods. When the parameters of SDD are changed to 11 and 11 respectively, the segmentation accuracy could be improved to 0.95, which is greater than the accuracies reported by all deep learning methods on the same public datasets [50]. However, the selection rules of SDD become much more complex with a higher bandwidth and a lower fitting number. It seems that many reviewers could not accept the fact that deep learning is not the best image segmentation technique. It is not known why they believe in deep learning so deeply? Maybe they are overwhelmed by the great success of deep learning in other fields. So far, we only found two deep learning methods [51-52] that have disclosed their source codes and their trained models [53-54]. However, the reproduced accuracy is significantly lower than their claimed accuracy in their papers as shown in Table 3. On the contrary, SDD based method [55] achieved significantly better accuracy with freely available MATLAB source codes [56] for verification.

Table 3. Comparison of DICE similarity coefficients achieved by the disclosed codes of two deep learning methods and the SDD method on the ACDC dataset. Validation\Methods Deep learning [51,53] Deep learning [52,54] SDD [55,56] Published accuracy 94% [51] 94% [52] 95.44% [55] Source codes accuracy 90.28% [53] 87.13% [54] 95.44% [56]

Secondly, we compare the deep learning based methods with other segmentation methods in segmenting different medical images. The quantitative comparisons are shown in Table 4. The achieved Jaccard Index accuracy by the deep learning method for ventricle segmentation in [47] was only 0.77 which is lower than that of a tracking method [57]. Deep learning was combined with surface evolution to segment CT livers in [58]. Deep learning was combined with graph cut to segment the CT livers in [59]. Even combined with deep learning for prediction, the reported liver segmentation accuracy was still lower than the previously reported liver segmentation accuracy by the non-deep-learning methods [60-61]. For the liver tumor detection, the reported accuracy by deep learning was only 69% [62] compared to 88.9% computed by a non-deep-learning method [63]. Deep learning was also used for cell segmentation [64] and cell tracking [66]. Deep learning was combined with

6 watershed for cell segmentation in [58] and the reported accuracy was only 86%. In addition, it was reported in [65] that the accuracy of cell segmentation by support vector machine was more robust than deep learning. In [66], deep learning was used for cell tracking and it was reported that only one cell could be tracked, which is much more inefficient than traditional cell tracking methods [67-68]. The cases described above are only the tip of the iceberg, there are also many cases in which deep learning performed very poorly or it was outperformed by other methods. However, it is usually difficult to publish these results since the traditional methods may be lack of originality for publishing. In many cases, the claimed accuracy in the published paper and the reported accuracy in international competitions might make people think that it is the inability of the researcher instead of inability of deep learning for the failure of a segmentation task. Almost no authors that use deep learning for image segmentation would like to disclose enough data, source codes and the required information to reproduce their results. Besides the deliberate exaggeration, there might be mistakes that caused the reported accuracy to be unreliable.

Table 4. Comparisons of the accuracies of deep leaning methods and non-deep-learning methods in segmenting the medical images (Please note that the results in Table 4 should not be trusted fully since the accuracy is computed by the authors themselves without verifying the corresponding source codes.) Competitions Deep learning methods non-deep-learning methods MRI Ventricle segmentation Jaccard Index 0.77 [47] Jaccard Index 0.84 [57] CT liver segmentation on Score 79.3% [58]; Score 79.6%[60] MICCAI-Sliver07 test set 77.8% [59] CT liver segmentation on VOE 9.36±3.34 [59] VOE 9.15±1.44 [61] 3Dircabd database Liver tumor segmentation DICE 69% [62] DICE 88.9% [63] Cell segmentation OR 0.83% [64] OR 0.69%[65] Cell tracking Capable of tracking one cell at a Capable of tracking hundreds of time [66] cells simultaneously [67-68] Please note: Jaccard Index, DICE and Score: the higher the better; VOE and OR: the lower the better.

5. Discussion The fetishistic devotion to deep learning is mainly caused by its overhype. This overhype is partially caused by the international challenges. Deep learning monopolized the champions of all the international challenges. Ironically, the organizers of the international challenges only evaluate the submitted results instead of the source codes or the software to rank the competitors. As a result, deep learning won almost all the international image segmentation challenges. According to Gary Marcus, the professor at New York University, “Deep learning systems are most often used as classification system in the sense that the mission of a typical network is to decide which of a set of categories (defined by the output variables on the neural network) a given input belongs to.” [69]. According to our analysis in this paper, deep learning is a discrete, but not continuous regression model. As a discrete non-linear mapping function, the performance of deep learning will be affected significantly when the number of the outputs approaches infinity. Thus, deep learning could not achieve pixel-level accuracy for image segmentation as traditional methods can. Compared to the regression model, deep learning has the major advantage that deep learning is more robust in estimating the discrete response variables for the linear indivisible data. Up to now, deep learning has been recognized as the most powerful classification tool. However, deep learning also has the following limitations. (1), deep learning determines a set of parameters instead of the exact mathematical functions to map the input variables into the output variables. Without the exact mathematical functions, not only deep learning could not be explained reasonably, but also the output variables could not be determined as continuous response variables. Consequently, deep learning could only map sufficient inputs into a finite number of categories. When the number

7 of output is infinite or close to infinity, the performance of deep learning will be affected significantly. (2), Lack of the exact mathematical mapping, deep learning could not yield the ideal outputs even when it is tested with the inputs used during the training process or tested with the ground truth. (3), Lack of the exact mathematical mapping, deep learning is only powerful for the classification applications, where sufficient input variables are available while the number of output variables is comparatively limited. In information theory, there is Shannon theorem that tells the maximum rate at which information can be transmitted over the channel with the specified bandwidth in the presence of noise. In deep learning, there is no theorem to determine the maximum number of the output variables when the input variables are known, which indicates that deep learning is immature. (4), the training process of deep learning relies on large amounts of human-annotated data to determine the parameters. However, when different annotated training data are used, the performance of deep learning may vary significantly. (5), there is no known efficient training algorithms with provable guarantees for deep learning and its success relies on descent algorithms, such as coordinate, gradient or stochastic gradient descent [2]. (6), since there are no guaranteed rules for the training process of deep learning, deep learning is easy to be overhyped or utilized by people who conduct false advertising for their deep learning based techniques or products.

6. Conclusion In conclusion, deep learning could only yield a semantic-level predication map for image segmentation and relies on other image segmentation techniques to obtain the pixel-level accuracy. When the number of the outputs approaches infinity, the approximating ability or the classification ability of deep learning will go down significantly. Deep learning became so hot not because it is perfect or omnipotent, but because no alternative techniques are available yet. Excessive overhype of deep learning is partially caused by some international challenges that have significant scientific impact, but do not evaluate the competitors’ algorithms strictly.

References [1] Y. Bengio et al., Towards biologically plausible deep learning, arXiv: 1502.04156 (2015). [2] E. Abbe and C. Sandon, Provable limitations of deep learning, arXiv: 1812.06369. pp. 1-52, 2018 [3] T. Laszló, Phone Recognition with Hierarchical Convolutional Deep Maxout Networks. EURASIP Journal on Audio, Speech, and Music Processing. 25, (2015). [4] O. Abdel-Hamid et al., Convolutional Neural Networks for Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. Vol. 22, No. 10, pp. 1533–1545, (2014). [5] D. Ciresan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification. 2012 IEEE Conference on and Pattern Recognition: pp. 3642–3649. (2012). [6] D. Cireşan et al., Multi-column deep neural network for traffic sign classification. Neural Networks. 32: 333– 338, (2012). [7] D. Cireşan, U. Meier, Multi-column deep neural network for offline handwritten chinese character classification. 2015 International Joint conference on Neural networks. Killarney, 2015, pp.1-6. [8] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Communications of the ACM. 60(6), pp. 84–90, (2017). [9] O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge". International journal of computer vision, Vol. 115, Issue 3, pp. 211-252, (2015). [10] E. Shelhamer, J. Long, T. Darrell, Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, (2017). [11] D. Ciresan et al., Deep neural networks segment neuronal membranes in electron microscopy images. Advances in Neural Information Processing Systems 25, pp. 2843–2851, (2012). [12] D. Laptev et al., Ansisotropic ssTEM image segmentation using dense correspondence across sections. Med Image Comput Comput Assist Interv. , 15(Pt 1):323-30, (2012). [13] P. Arbelaez et al., Contour Detection and Hierarchical Image Segmentation, in IEEE Transactions on Pattern

8

Analysis and Machine Intelligence, vol. 33, no. 5, pp. 898-916, (2011). [14] A. C. Ignacio et al., Crowdsourcing the creation of image segmentation algorithms for connectomics, Frontiers in neuroanatomy, vol. 9 No.142, (2015). [15] D. Ciresan, et al., Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks. Proceedings MICCAI. (2013). [16] Maier-Hein, L., Eisenmann, M., Reinke, A. et al. Why rankings of biomedical competitions should be interpreted with care. Nat Commun 9, 5217 (2018). https://doi.org/10.1038/s41467-018-07619-7 [17] P. Bizopoulos and D. Koutsouris, "Deep Learning in Cardiology", IEEE Reviews in Biomedical Engineering, vol. 12, pp. 168-193, 2019. [18] X. L. Yang L. Gobeawan, S.Y. Yeo, W.T. Tang, Z.Z. Wu, Y. Su, Automatic segmentation of left ventricular myocardium by deep convolutional and de-convolutional neural networks, 2016 Computing in Cardiology Conference, Vancouver, BC, pp. 81-84, 2016. [19] X.L. Yang, Z. Zeng, and Y. Su, “Deep convolutional neural networks for automatic segmentation of left ventricle cavity from cardiac magnetic resonance images,” IET Comput. Vis., vol. 11, no. 8, pp. 643–649, 2017. [20] H.F. Hu et al, Automatic segmentation of the left ventricle in cardiac MRI using local binary fitting model and dynamic programming techniques, PloS one, vol. 9, No.12, pp. e114760 , 2014. [21] M. Avendi, A. Kheradvar, and H. Jafarkhani, “A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI,” Med. Image Anal., vol. 30, pp. 108–119, 2016 [22] L. K. Tan, Y. M. Liew, E. Lim, and R. A. McLaughlin, “Cardiac left ventricle segmentation using convolutional neural network regression,” in Proc. IEEE EMBS Conf. Biomed. Eng. Sci., 2016, pp. 490–493. [23] L. V. Romaguera, M. G. F. Costa, F. P. Romero, and C. F. F. Costa Filho, “Left ventricle segmentation in cardiac MRI images using fully convolutional neural networks,” in 2017: Computer Aided Diagnosis, vol. 10134. Bellingham, WA, USA: Int. Soc. Opt. Photon., 2017, p. 101342Z. [24] C. Rupprecht, E. Huaroc, M. Baust, and N. Navab, “Deep active contours,” 2016. arXiv:1607.05074. [25] T. Anh Ngo and G. Carneiro, “Fully automated non-rigid segmentation with distance regularized level set evolution initialized and constrained by deep-structured inference,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3118–3125. [26] A. H. Curiale, F. D. Colavecchia, P. Kaluza, R. A. Isoardi, and G. Mato, “Automatic myocardial segmentation by using a deep learning network in cardiac MRI,” in Proc. IEEE Comput. Conf., XLIII Latin Amer, 2017, pp. 1– 6. [27] L. Qi, X. Y. Liu, B.Q. Yang and L. S. Xu, “Segmentation of left ventricle endocardium based on transfer learning of fully convolutional networks”, Journal of northeastern University (Natural science), Vo. 39, Issue, 11, pp. 1577-1582, 2018. [28] H.F. Hu, N. Pan, J.Y. Wang, T.L. Yin, R.Z. Ye, Automatic segmentation of left ventricle from cardiac MRI via deep learning and region constrained dynamic programming, Neurocomputing, Vol. 347, pp. 139-148, 2019. [29] Chen A. et al. (2018) Transfer Learning for the Fully Automatic Segmentation of Left Ventricle Myocardium in Porcine Cardiac Cine MR Images. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol. 10663. Springer, Cham [30] Zotti, C., Luo, Z., Humbert, O., Lalande, A., & Jodoin, P.-M. (2018). GridNet with Automatic Shape Prior Registration for Automatic MRI Cardiac Segmentation. Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges, 73–81.doi:10.1007/978-3-319-75541-0_8 [31] Grinias E., Tziritas G. (2018) Fast Fully-Automatic Cardiac Segmentation in MRI Using MRF Model Optimization, Substructures Tracking and B-Spline Smoothing. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol. 10663. Springer, Cham [32] Wolterink, J. M., Leiner, T., Viergever, M. A., & Išgum, I. (2018). Automatic Segmentation and Disease Classification Using Cardiac Cine MR Images. Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges, 101–110.doi:10.1007/978-3-319-75541-0_11 [33] 33,Baumgartner C.F., Koch L.M., Pollefeys M., Konukoglu E. (2018) An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol. 10663. Springer, Cham [34] Isensee F., Jaeger P.F., Full P.M., Wolf I., Engelhardt S., Maier-Hein K.H. (2018) Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol 10663. Springer, Cham [35] Isensee F., Jaeger P.F., Full P.M., Wolf I., Engelhardt S., Maier-Hein K.H. (2018) Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol 10663. Springer, Cham [36] Khened M., Alex V., Krishnamurthi G. (2018) Densely Connected Fully Convolutional Network for Short-Axis Cardiac Cine MR Image Segmentation and Heart Diagnosis Using Random Forest. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges.

9

STACOM 2017. Lecture Notes in Computer Science, vol 10663. Springer, Cham [37] Yang X., Bian C., Yu L., Ni D., Heng PA. (2018) Class-Balanced Deep Neural Network for Automatic Ventricular Structure Segmentation. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol. 10663. Springer, Cham [38] Jang Y., Hong Y., Ha S., Kim S., Chang HJ. (2018) Automatic Segmentation of LV and RV in Cardiac MRI. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol. 10663. Springer, Cham [39] B.Y. Song and J. Tian, “Automatic cardiac bi-ventricle MRI segmentation and disease classification using deep learning”, thesis of XiDian University. [40] Yakun Chang, Baoyu Song, Cheolkon Jung, and Liyu Huang. Automatic Segmentation and Cardiopathy Classification in Cardiac Mri Images Based on Deep Neural Networks [C]. IEEE International Conference on Acoustics, Speech and Signal Processing. April 15-20, 2018. [41] Yang X., Bian C., Yu L., Ni D., Heng PA. (2018) 3D Convolutional Networks for Fully Automatic Fine- Grained Whole Heart Partition. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol 10663. Springer, Cham [42] Payer C., Štern D., Bischof H., Urschler M. (2018) Multi-label Whole Heart Segmentation Using CNNs and Anatomical Label Configurations. In: Pop M. et al. (eds) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. STACOM 2017. Lecture Notes in Computer Science, vol 10663. Springer, Cham [43] T. Liu, Y. Tian, S. Zhao, X. Huang and Q. Wang, "Automatic Whole Heart Segmentation Using a Two-Stage U- Net Framework and an Adaptive Threshold Window," IEEE Access, vol. 7, pp. 83628-83636, 2019. [44] P.V. Tran, A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI. arXiv:1604.00494 [cs], 2016. [45] L. K. Tan, Y. M. Liew, E. Lim, and R. A. McLaughlin, “Convolutional neural network regression for short-axis left ventricle segmentation in cardiac cine MR sequences,” Med. Image Anal., vol. 39, pp. 78–86, 2017. [46] H. Yang, J. Sun, H. Li, L. Wang, and Z. Xu, “Deep fusion net for multiatlas segmentation: Application to cardiac MR images,” in International Conference on and Computer-Assisted Intervention. New York, NY, USA: Springer, 2016, pp. 521–528. [47] L. K. Tan, Y. M. Liew, E. Lim, and R. A. McLaughlin, “Convolutional neural network regression for short-axis left ventricle segmentation in cardiac cine mr sequences,” Medical image analysis, vol. 39, pp. 78–86, 2017. [48] Z.Z. Wang. "A new approach for segmentation and quantification of cells or nanoparticles.'' IEEE Trans. Ind. Inform. vol.12, no.3, pp.962-971, 2016. [49] Z.Z. Wang, Automatic and Optimal Segmentation of the Left Ventricle in Cardiac Magnetic Resonance Images Independent of the Training Sets, IET image processing, 10.1049/iet-ipr.2018.5878, 2019 [50] Z. Wang, "Automatic Localization and Segmentation of the Ventricles in Magnetic Resonance Images," in IEEE Transactions on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2020.2981530. [51] Khened, M., Kollerathu, V. A., & Krishnamurthi, G. (2019). Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical image analysis, 51, 21–45. https://doi.org/10.1016/j.media.2018.10.004 [52] Bai, W., Sinclair, M., Tarroni, G., Oktay, O., Rajchl, M., Vaillant, G., Lee, A. M., Aung, N., Lukaschuk, E., Sanghvi, M. M., Zemrak, F., Fung, K., Paiva, J. M., Carapella, V., Kim, Y. J., Suzuki, H., Kainz, B., Matthews, P. M., Petersen, S. E., Piechnik, S. K., … Rueckert, D. (2018). Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of cardiovascular magnetic resonance : official journal of the Society for Cardiovascular Magnetic Resonance, 20(1), 65. https://doi.org/10.1186/s12968-018- 0471-x [53] https://github.com/mahendrakhened/Automated-Cardiac-Segmentation-and-Disease-Diagnosis [54] https://github.com/baiwenjia/ukbb_cardiac [55] Z.H. Wang and Z.Z. Wang, "Fully automated segmentation of the left ventricle in in Magnetic Resonance Images," 2020,arXiv:2007.10665 [56] https://www.mathworks.com/matlabcentral/fileexchange/78417-sdd-lv-segmentation-for-comparison-with-dl- and-cnn-methods [57] Li, B., Liu, Y., Occleshaw, C.J., Cowan, B.R., Young, A.A., 2010. In-line automated tracking for ventricular function with magnetic resonance imaging. JACC 3, 860–866. doi:10.1016/j.jcmg.2010.04.013. [58] P. Hu et al., Automatic 3D Liver Segmentation Based on Deep Learning and Globally Optimized Surface Evolution, Phys. Med. Biol., vol. 61, no. 24, pp. 8676–8698 (2016). [59] F. Lu et al., Automatic 3D liver location and segmentation via convolutional neural network and graph cut, Int J Comput Assist Radiol Surg, Vol. 12, No. 2, pp.1-12 (2017). [60] S.D. S. Al-Shaikhli, M.Y. Yang and B. Rosenhahn, Automatic 3d liver segmentation using sparse representation of global and local image information via level set formulation, arXiv: 1508.01521, pp. (2015). [61] G. Li et al., Automatic liver segmentation based on shape constraints and deformable graph cut in CT images. IEEE Transactions on Image Processing, Vol. 24, No. 12, , 5315-5329 (2015). [62] G. Chlebus et al. Automatic liver tumor segmentation in CT with fully convolutional neural networks and

10

object-based postprocessing. Scientific Reports, Vol. 8, No. 15497, pp. 1-7, 2018. [63] L. Massoptier and S. Casciaro, A new fully automatic and robust algorithm for fast segmentation of liver tissue and tumors from CT scans, European radiology, Vol. 18, No.8, pp. 1658-1665, (2008). [64] Y. A I -Kofahi et al., A deep learning-based algorithm for 2-D cell segmentation in microscopy images. BMC bioinformatics, Vol. 19, No. 365, pp. 1-11 (2018). [65] X. Zheng et al., Fast and robust segmentation of white blood cell images by self-supervised learning. Micron, Vol. 107, No. 1, pp. 55-71(2018). [66] T. He et al., Cell tracking using deep neural networks with multi-task learning. Image and vision computing, Vol. 60, No. 1, pp. 142-153 (2017). [67] V. Ulman et al., An objective comparison of cell-tracking algorithms. Nature Methods, Vol. 14, No. 12, pp.1141- 1152, 2017. [68] Z.Z. Wang, L.J. Yin and Z.H. Wang, "A New Approach for cell detection and tracking. " IEEE Access. 7:99889 - 99899, 2019. [69] G. Marcus, Deep learning: a critical appraisal, arXiv:1801.00631, pp. 1-27, (2018).

11