Deep Morphological Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
1 Deep Morphological Neural Networks Yucong Shen, Xin Zhong, and Frank Y. Shih neural networks and deep learning, researchers have proposed Abstract—Mathematical morphology is a theory and technique the concept of morphological layers for this issue. Different to collect features like geometric and topological structures in from the convolutional layers that compute the linear weighted digital images. Given a target image, determining suitable summation in each kernel applied on an image, the dilation morphological operations and structuring elements is a cumbersome and time-consuming task. In this paper, a layers and erosion layers in morphological neural network morphological neural network is proposed to address this (MNN) approximate local maxima and minima and therefore problem. Serving as a nonlinear feature extracting layer in deep providing nonlinear feature extractors. Similar to the CNN that learning frameworks, the efficiency of the proposed morphological trains the weights in convolution kernels, MNN intends to learn layer is confirmed analytically and empirically. With a known the weights in SE. Besides, a morphological layer also needs to target, a single-filter morphological layer learns the structuring decide the selection between the operations of dilation and element correctly, and an adaptive layer can automatically select appropriate morphological operations. For practical applications, erosion. Since the maximization and minimization operations the proposed morphological neural networks are tested on several in morphology are not differentiable, incorporating them into classification datasets related to shape or geometric image neural networks needs smooth and approximate functions. features, and the experimental results have confirmed the high Ritter et al. [11] presented an attempt of MNN formulated by computational efficiency and high accuracy. image algebra [12]. Masci et al. [10] approximated the dilation and erosion in deep learning framework using counter- Index Terms—Deep learning, deep morphological neural harmonic mean. However, only the pseudo dilation and erosion network, morphological layer, image morphology. can be achieved since the formulation requires infinite integers. Most recently, Shih et al. [17] proposed a deep learning framework to tackle this issue and introduced smooth local I. INTRODUCTION maxima and minima layers as the feature extractors. Their ATHEMATICAL morphology, based on set theory and MNN roughly approximates dilation and erosion, and learns the M topology, is a feature extracting and analyzing technique, binary SE, but cannot learn the non-flat SE and the where dilation and erosion, i.e., enlarging and shrinking the morphological operations. objects respectively, are the two elementary operations. In this paper, we propose morphological layers in deep neural Applying structuring elements (SE) as sliding windows to networks that can learn both the SE and the morphological identify the feature on digital images, mathematical operations. Given a target, a morphological layer learns the SE morphology is typically applied in many core applications in correctly, and an adaptive morphological layer can image analysis to extract features like shapes, regions, edges, automatically select appropriate operations by applying a skeleton, and convex hull [13]. Hence, it facilitates a wide range smooth sign function of an extra trainable weight. Considering of applications like defect extraction [4], edge detection [20], the strength of morphology in analyzing shape features on and image segmentation [14]. images, a residual MNN pipeline and its applications are In computer vision, deep learning has become an efficient presented to validate the practicality of the proposed layer. The tool since the development of computer hardware brings an MNN residual structure is compared against CNN of the same increased computational capability. Many deep learning structure on several datasets related to shape features. structures have been proposed based on convolutional neural Experimental results have validated the superior of the networks (CNN) [6]. For example, LeNet [8] was proposed for proposed MNN in these tasks. document recognition. Nowadays, deepening the CNN The remainder of this paper is organized as follows. Section structures enables many computer vision applications, II introduces the morphology layers and the residual especially for image recognition [18]. morphological neural network for shape classifications. Section In image morphology, given a desired target, it is time- III presents an adaptive morphological layer for determining the consuming and cumbersome to determine appropriate proper morphological operations from the original images and morphological operations and SE. From the perspective of Y. Shen is with the Computer Vision Laboratory, Department of Computer F.Y. Shih is with the Computer Vision Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102 USA (email: Science, New Jersey Institute of Technology, Newark, NJ 07102 USA (email: [email protected]). [email protected]). X. Zhong is with the Robotics, Networking, and Artificial Intelligence (R.N.A) Laboratory, Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68106 USA (email: [email protected]). 2 the desired result images. Section IV shows the experimental results. Finally, conclusions are drawn in Section V. II. DEEP MORPHOLOGICAL NEURAL NETWORK A. The Proposed Morphological Layer Morphological dilation and erosion are approximated using counter-harmonic mean in [10]. For a grayscale image 푓(푥) and a kernel 휔(푥), the core is to define a PConv layer as: (a) (푓푃+1∗휔)(푥) 푃퐶표푛푣(푓; 휔, 푃)(푥) = = (푓 ∗ 휔)(푥) (1) (푓푃∗휔)(푥) 푃 where “∗” denotes the convolution, and 푃 is a scalar controlling (b) the choice of operation ( 푃 < 0 is pseudo-erosion, 푃 > 0 is pseudo-dilation, and 푃 = 0 is standard convolution). Since performing a real dilation or erosion is to require 푃 to be an unachievable infinite number, this formulation can be highly inaccurate in implementation. Shih et al. [17] represented the dilation and erosion using the (c) soft maximum and the soft minimum functions. In a dilation layer, the 푗-th pixel in the 푠-th feature map 푧 ∈ ℝ푛 is 푠 푛 휔푖푥푖 푧푗 = ln(∑푖=1 푒 ) (2) (d) where 푛 is the total number of weights in a SE, 푥푖 is the 푖-th Fig. 2. (a) The horizontal, diagonal, vertical, and diamond 3 × 3 structuring element of the masked window in the input image, and 휔푖 is the elements applied to input images when creating target images, (b) the 푖-th element of the current weight. The 푧 is similar in an erosion corresponding structuring elements learned by the single dilation layer MNN, (c) the original 45°, crossing 5 × 5 structuring elements and horizontal line layer as 1 × 5 structuring elements applied on the inputs images when creating target images, (d) the corresponding structuring elements learned by the single 푠 푛 −휔푖푥푖 dilation layer MNN. 푧푗 = − ln(∑푖=1 푒 ) (3) Although Shih et al. [17] approximates dilation and erosion TABLE I theoretically, it failed to learn the SE accurately. Training on NOTATIONS AND TERMS USED IN THIS PAPER the samples of input images and desired morphological images, 휔 The weight of morphological layer. a single-layer and single-filter MNN always miss some elements on the SE. Fig. 1 presents the architecture of the 푥 The input image of morphological layer. single-layer MNN and Fig. 2 shows some SE learned, from 푏 The bias matrix. which we can observe the errors. These missing points are caused by rounding errors. Eqs. (2) and (3) do not round the 푛 = 푎 × 푏 The total number of weights in a SE, 푎 is the width and floating points when computing the maxima and minima in a 푏 is the height of SE. sliding window, while the neural network tries to minimize the 푊푖 The 푖-th element of the SE 푊. difference between the predicted and the target images. As a result, the learned SE compensates these floating points and 푋푖 The 푖-th element of the current masked hence contains the rounding errors. The notations and terms window in the original image 푋. used in this paper are listed in Table I. 푦̂ The output of the network. 푦 The target of the network. Definition 1: The differentiable binary dilation of the 푗-th pixel in an output image 푌 ∈ ℝ푛 is defined as 푛 푊푖푋푖 푌푗 = 푙푛(∑푖=1 푒 ) (4) where 푊 is a binary SE slid on the input image 푋, and the default stride is 1. We denote it as 푊 ⊕ 푋, where 푊 ∈ ℝ푛 and Fig. 1. Architecture of the single layer MNN. 푋 ∈ ℝ푛. 3 Definition 2: The differentiable binary erosion of the 푗-th 푛 푛 푊푖+푋푖 pixel in an output image 푌 ∈ ℝ is defined as 푌푗 = 푙푛(∑푖=1 푒 ) (13) where 푊 is the non-flat SE slid on the input image 푋, and the 푛 −푊푖푋푖 푛 푌푗 = −푙푛(∑푖=1 푒 ) (5) default stride is 1. We denote it as 푊⨁푔푋, where 푊 ∈ ℝ , and 푋 ∈ ℝ푛. where 푊 is the binary SE slid on the input image 푋, and the Definition 4: The differentiable grayscale erosion of the 푗-th default stride is 1. We denote it as 푊 ⊖ 푋, where 푊 ∈ ℝ푛 and pixel in an output image 푌 ∈ ℝ푛 is defined as 푛 푋 ∈ ℝ . When the binary dilation is learned, 푛 −(푊푖−푋푖) 푌푗 = −푙푛(∑푖=1 푒 ) (14) where 푊 is the non-flat SE slid on the input image 푋, and the 푛 푙푛(∑ 푒푊푖푋푖) ≥ 푚푎푥 (푊 푋 , 푊 푋 , … , 푊 푋 ) (6) 푛 푖=1 푖 1 2 2 푛 푛 default stride is 1. We denote it as 푊 ⊖푔 푋, where 푊 ∈ ℝ , and 푋 ∈ ℝ푛. which indicates that When learning a dilation with a non-flat SE like Eq. (6), there should have 푙푛(∑푛 푒푊푖푋푖) ≥ 푋 . (7) 푖=1 푖 푛 푊푖+푋푖 푙푛(∑푖=1 푒 ) ≥ 푚푎푥 (푊푖 + 푋1, … , 푊푛 + 푋푛). (15) Therefore, we have We can obtain 푛 푊푖푋푖 푋푖 ∑푖=1 푒 ≥ 푒 .