Constrained Deep Weak Supervision for Histopathology Image Segmentation Zhipeng Jia, Xingyi Huang, Eric I-Chao Chang and Yan Xu*

1 Constrained Deep Weak Supervision for Histopathology Image Segmentation Zhipeng Jia, Xingyi Huang, Eric I-Chao Chang and Yan Xu* Abstract—In this paper, we develop a new weakly-supervised as MRI, CT, and Ultrasound. However, the success of these learning algorithm to learn to segment cancerous regions in supervised learning algorithms depends on the availability of a histopathology images. Our work is under a multiple instance large amount of high-quality manual annotations/labeling that learning framework (MIL) with a new formulation, deep weak supervision (DWS); we also propose an effective way to introduce are often time-consuming and costly to obtain. In addition, constraints to our neural networks to assist the learning process. well-experienced medical experts themselves may have a dis- The contributions of our algorithm are threefold: (1) We build agreement on ambiguous and challenging cases. Unsupervised an end-to-end learning system that segments cancerous regions learning strategies where no expert annotations are needed with fully convolutional networks (FCN) in which image-to- point to a promising but thus far not clinically practical image weakly-supervised learning is performed. (2) We develop a deep week supervision formulation to exploit multi-scale learning direction. under weak supervision within fully convolutional networks. In-between supervised and unsupervised learning, weakly- (3) Constraints about positive instances are introduced in our supervised learning in which only coarse-grained (image- approach to effectively explore additional weakly-supervised level) labeling is required makes a good balance of having information that is easy to obtain and enjoys a significant boost a moderate level of annotations by experts while being able to to the learning process. The proposed algorithm, abbreviated as DWS-MIL, is easy to implement and can be trained efficiently. automatically explore fine-grained (pixel-level) classification Our system demonstrates state-of-the-art results on large-scale [13], [14], [15], [16], [17], [18], [19], [20]. In pathology, a histopathology image datasets and can be applied to various pathologist annotates whether a given histopathology image applications in medical imaging beyond histopathology images has a cancer or not; a weakly-supervised learning algorithm such as MRI, CT, and ultrasound images. would hope to automatically detect and segment cancerous Index Terms—Convolutional neural networks, histopathology tissues based on a collection of histopathology (training) image segmentation, weakly supervised learning, fully convolu- images annotated by expert pathologists; this process that sub- tional networks, multiple instance learning. stantially reduces the amount of work for annotating cancerous tissues/regions falls into the category of weakly-supervised I. INTRODUCTION learning, or more specifically multiple instance learning [13], IGH resolution histopathology images play a critical which is the main topic of this paper. H role in cancer diagnosis, providing essential information Multiple instance learning (MIL) was first introduced by to separate non-cancerous tissues from cancerous ones. A Dietterich et al. [13] to predict drug activities; a wealthy body variety of classification and segmentation algorithms have of MIL based algorithms was developed thereafter [21], [22], been developed in the past [1], [2], [3], [4], [5], [6], [7], [14]. In multiple instance learning, instances arrive together [8], focusing primarily on the design of local pathological in groups during training, known as bags, each of which is patterns, such as morphological [2], geometric [1], and texture assigned either a positive or a negative label (can be multi- [9] features based on various clinical characteristics. class), but absent instance-level labels (as shown in Figure 1). In medical imaging, supervised learning approaches [10], arXiv:1701.00794v1 [cs.CV] 3 Jan 2017 In the original MIL setting [13], each bag consists of a number [11], [12] have shown their particular effectiveness in perform- of organic molecules as instances; their task was to predict ing image classification and segmentation for modalities such instance-level label for the training/test data, in addition to This work is supported by Microsoft Research under the eHealth program, being able to perform bag-level classification. In our case here, the Beijing National Science Foundation in China under Grant 4152033, the each histopathology image with cancer or non-cancer label Technology and Innovation Commission of Shenzhen in China under Grant forms a bag and each pixel in the image is referred to as an shenfagai2016-627, Beijing Young Talent Project in China, the Fundamental Research Funds for the Central Universities of China under Grant SKLSDE- instance (note that the instance features are computed based 2015ZX-27 from the State Key Laboratory of Software Development Envi- on each pixel’s surroundings beyond the single pixel itself). ronment in Beihang University in China. Asterisk indicates corresponding Despite the great success of MIL approaches [13], [14], [15] author. Xingyi Huang, and Yan Xu are with State Key Laboratory of Soft- that explicitly deal with the latent (instance-level) labels, one ware Development Environment and Key Laboratory of Biomechanics and big problem with many existing MIL algorithms is the use Mechanobiology of Ministry of Education and Research Institute of Beihang of pre-specified features [21], [14], [16]. Although algorithms University in Shenzhen, Beihang University, Beijing 100191, China (email: [email protected]; [email protected]). like MILBoost [14] have embedded feature selection proce- Zhipeng Jia, Eric I-Chao Chang, and Yan Xu are with Mi- dures, their input feature types are nevertheless fixed and pre- crosoft Research, Beijing 100080, China (email: [email protected]; specified. To this point, it is natural to develop an integrated [email protected]; [email protected]). Zhipeng Jia is with Institute for Interdisciplinary Information Sciences, Ts- framework by combining the MIL concept with convolutional inghua University, Beijing 100084, China (email: [email protected]). neural networks (CNN), which automatically learns rich hi- 2 Our motivation to introduce area constrains is three-fold. First, Soft Classifier 0.046 0 max having informative but easy to obtain expert annotation can always help the learning process and we are encouraged to (Non-Cancer) seek information beyond being just positive or negative. There exists a study in cognitive science [28] indicating the natural Soft surfacing of the concept of relative size when making a dis- Classifier 0.832 1 max Learning Procedure crete yes-or-no decision. Second, our DWS-MIL formulation (Cancer) Pixel-level Image-level under an image-to-image paradigm allows the additional term Label Prediction Prediction of the area constraints to be conveniently carried out through Fig. 1: Illustration of the learning procedure of a MIL algorithm. Our training dataset back-propagation, which is nearly impossible to do if a patch- is denoted by S = f(Xi;Yi); i = 1; 2; 3; : : : ; ng, where Xi indicates the ith input based approach is adopted [16], [17]. Third, having area image, and Yi 2 f0; 1g represents its corresponding manual label (Yi = 0 refers to a non-cancer image and Yi = 1 refers to a cancer image). Given an input image, constraints conceptually and mathematically greatly enhances a classifier C generates pixel-level predictions. Then, the image-level prediction Ybi is learning capability; this is evident in our experiments where computed from pixel-level predictions via a softmax function. Next, a loss between the ground truth Yi and the image-level prediction Ybi is computed for the ith input a significant performance boost is observed using the area image, denoted by li(Yi; Ybi). Finally, an objective loss function L takes the sum of constraints. loss functions of all input images. The classifier C is learned by minimizing the objective loss function. To summarize, in this paper we develop a new multiple instance learning algorithm for histopathology image segmentation under a deep weak supervision formulation, ab- erarchical features for pattern recognition with state-of-the- breviated as DWS-MIL. The contributions of our algorithm art classification/recognition results. A previous approach that include: (1) DWS-MIL is an end-to-end learning system adopts CNN in a MIL formulation was recently proposed [17], that performs image-to-image learning and prediction under but its greatest limitation is the use of image patches instead of weak supervision. (2) Deep weak supervision is adopted in full images, making the learning process slow and ineffective. each intermediate layer to exploit nested multi-scale feature For patch-based approaches: (1) image patch size has to be learning. (3) Area constraints are also introduced as weak specified in advance; (2) every pixel as the center of a patch supervision, which is shown to be particularly effective in is potentially an instance, resulting in millions of patches to the learning process, significantly enhancing segmentation be extracted even for a single image; (3) feature extraction accuracy with very little extra work during the annotation for image patches is not efficient. Beyond the patch-centric process. In addition, we experiment with the adoption of CNN framework is the image-centric paradigm where image- super-pixels [29] as an alternative way to pixels and show

Constrained Deep Weak Supervision for Histopathology Image Segmentation Zhipeng Jia, Xingyi Huang, Eric I-Chao Chang and Yan Xu*

Generating Multi-Agent Trajectories Using Programmatic Weak

Constrained Labeling for Weakly Supervised Learning

Multi-Task Weak Supervision Enables Automated Abnormality Localization in Whole-Body FDG-PET/CT

Visual Learning with Weak Supervision

Natural Language Processing and Weak Supervision

Towards a Perceptually Relevant Control Space

Self-Supervised Collaborative Learning for Medical Image Synthesis

Weakly Supervised Text Classification with Naturally Annotated Resource

Weakly Supervised Short Text Categorization Using World Knowledge

Strength from Weakness: Fast Learning Using Weak Supervision

Snuba: Automating Weak Supervision to Label Training Data

Multi-Modal Weakly Supervised Dense Event Captioning