Arxiv:1905.10901V1 [Cs.LG] 26 May 2019
Total Page:16
File Type:pdf, Size:1020Kb
Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory: An Abstract Algebraic Interpretation of Convolutional Neural Networks Andrew Hryniowski1;2;3, Alexander Wong1;2;3 1 Video and Image Processing Research Group, Systems Design Engineering, University of Waterloo 2 Waterloo Artificial Intelligence Institute, Waterloo, ON 3 DarwinAI Corp., Waterloo, ON fapphryni, [email protected] Abstract of applications, particularly for prediction using structured data. Despite such successes, a major challenge with lever- Researchers are actively trying to gain better insights aging convolutional neural networks is the sheer number of into the representational properties of convolutional neural learnable parameters within such networks, making under- networks for guiding better network designs and for inter- standing and gaining insights about them a daunting task. As preting a network’s computational nature. Gaining such such, researchers are actively trying to gain better insights insights can be an arduous task due to the number of pa- and understanding into the representational properties of rameters in a network and the complexity of a network’s convolutional neural networks, especially since it can lead architecture. Current approaches of neural network inter- to better design and interpretability of such networks. pretation include Bayesian probabilistic interpretations and One direction that holds a lot of promise in improving information theoretic interpretations. In this study, we take a understanding of convolutional neural networks, but is much different approach to studying convolutional neural networks less explored than other approaches, is the construction of by proposing an abstract algebraic interpretation using finite theoretical models and interpretations of such networks. Cur- transformation semigroup theory. Specifically, convolutional rent approaches of neural network interpretation include layers are broken up and mapped to a finite space. The state Bayesian probabilistic interpretations [14] and information space of the proposed finite transformation semigroup is theoretic interpretations [25, 19, 18]. Such theoretical mod- then defined as a single element within the convolutional els and interpretations could help guide and motivate design layer, with the acting elements defined by surrounding state decisions of convolutional neural networks. elements combined with convolution kernel elements. Gen- Motivated by this direction, in this study we take a dif- erators of the finite transformation semigroup are defined to ferent approach to interpreting and gaining insight into the complete the interpretation. We leverage this approach to nature of convolutional neural networks through the use of analyze the basic properties of the resulting finite transfor- finite transformation semigroup theory [8]. To achieve this mation semigroup to gain insights on the representational goal, we introduce an abstract algebraic interpretation of properties of convolutional neural networks, including in- convolutional operations enabled by a novel approach for sights into quantized network representation. Such a finite arXiv:1905.10901v1 [cs.LG] 26 May 2019 the interpretation of convolutional layers in convolutional transformation semigroup interpretation can also enable bet- neural networks as finite transformation semigroups. This ter understanding outside of the confines of fixed lattice data allows us to study the basic properties of the resulting finite structures, thus useful for handling data that lie on irregular transformation semigroup from an abstract algebraic inter- lattices. Furthermore, the proposed abstract algebraic inter- pretation to gain insights on the representational properties pretation is shown to be viable for interpreting convolutional of convolutional layers. operations within a variety of convolutional neural network architectures. The remainder of this paper is organized as follows. First, Section2 reviews convolution, discusses challenges of net- work analysis, introduces efforts into neural network quan- 1. Introduction tized representation, and defines the notion of finite trans- formation semigroups. Then, Section3 proposes a novel Convolutional neural networks (CNNs) [12] have demon- technique for interpreting convolutional layers within convo- strated tremendous success in recent years for a large range lutional neural networks as a finite transformation semigroup. 1 Figure 1. Two examples of finite transformation semigroups. Each transformation semigroup is depicted as an automaton in conjunction with the corresponding semigroup multiplication table. The finite transformation semigroup (X; S) on the left is generated by one element a. The finite transformation semigroup (Y; T ) on the right is generated by three elements b, c, and d. Both S and T consist of three unique functions. Next, Section4 analyzes properties of the proposed abstract where i is the index of the current layer in a given network, algebraic interpretation of convolutional neural networks. j is the index of the input channel being operation on, k is i Section5 studies the effect of using the proposed abstract the index of the output channel, xj is the feature map being algebraic interpretation with different number of states in the operated on, jcij is the number of input feature maps, and i+1 finite transformation semigroups to model convolution op- xk is the output feature map. Note that a bias term, which erations and the associated effects on network performance. is typically added to then entire output channel, is omitted Finally, Section6 summarizes the results and suggests future from Equation1 and is not considered as part of convolu- areas of research. tional operations in this work. Other convolution techniques include dilated convolution [23], depthwise separable convo- 2. Background lution [7], and rotation equivariant convolution [20]. In this section, we will review convolutions in the context 2.2. Network Analysis of convolutional neural networks, discuss the challenges of analyzing convolutional neural networks, discuss efforts into Convolutional neural networks in general are quite dif- neural network quantized representation, and define mathe- ficult to analyze. With an enormous number of trainable matically the notion of finite transformation semigroups. parameters it can be extremely difficult to determine what components of an input led to any given prediction, and 2.1. Convolutions in Convolutional Neural Net- more difficult still to determine what features a network has works learned to detect. Some convolutional neural networks are Convolutions are not only a key aspect of convolutional built for data that can not be easily mapped to a regular lat- neural networks, but also the major computational work tice. The lack of well-defined regular lattice structures in the horse of such networks. As such, a better understanding of data representation increases the difficulty of analyzing its what they are computing, and how they can be represented internal behavior. An issue with such convolutional neural and analyzed would allow for a host of interesting insights networks is that the network structure can only be applied that can drive better design, ranging from more efficient to a specific graph structure that is designed for a particular network architecture designs and representations [13, 1, 9, instantiation of the data being analyzed [16, 2]. Generalized 7, 17, 22, 24, 21] to new network architectures with greater analysis techniques that can be applied to analyze heteroge- representational capacity for improved accuracy [6, 26]. neous network structures are highly desired as they would Convolutional neural networks are primarily built from a allow for greater insights into network behavior. series of stacked convolutional layers, each of which are pa- Some current approaches of neural network interpreta- rameterized by sets of weights wi contained on some lattice tion include Bayesian probabilistic interpretations [14] and structure. The lattice structure of the weight is dependent on information theoretic interpretations [25, 19, 18]. For exam- the type of data the convolutional neural network is designed ple, a Bayesian interpretation was explored in [14] from the for. Convolution operations can be described by perspective of Kolmogorovs representation of a multivariate response surface, where operations within a neural network i+1 X i i can be seen as a superposition of univariate activation func- xk = xj ∗ wjk (1) j2jcij tions applied to an affine transformation of the input variable. 2 In [19, 18], an information theoretical interpretation of neu- S = hfspgi where fspg 2 S, and 1 ≤ p ≤ t. All elements ral networks was explored where networks are quantified by in S are words formed by the generating elements. Note that the mutual information between layers in the network as well the size of a semigroup is the number of unique functions, as the input and output variables. Further investigation into and not the number of generating elements. Two examples alternative interpretations can lead to new insights beyond of finite transformation semigroups are shown in Figure1. what these existing interpretations can provide. Two special types of semigroups are monoids and groups. A monoid is a semigroup with an identity map e. An identity 2.3. Neural Network Quantized Representation maps all states to themselves. All identity maps in a monoid The precision of data representation of the weights of a are equivalent.