Tropical Geometry of Deep Neural Networks

Tropical Geometry of Deep Neural Networks Liwen Zhang 1 Gregory Naitzat 2 Lek-Heng Lim 2 3 Abstract leau, 2011; Montufar et al., 2014; Eldan & Shamir, 2016; Poole et al., 2016; Arora et al., 2018). Recent work (Zhang We establish, for the first time, connections be- et al., 2016) showed that several successful neural networks tween feedforward neural networks with ReLU possess a high representation power and can easily shatter activation and tropical geometry — we show that random data. However, they also generalize well to data the family of such neural networks is equivalent unseen during training stage, suggesting that such networks to the family of tropical rational maps. Among may have some implicit regularization. Traditional mea- other things, we deduce that feedforward ReLU sures of complexity such as VC-dimension and Rademacher neural networks with one hidden layer can be char- complexity fail to explain this phenomenon. Understanding acterized by zonotopes, which serve as building this implicit regularization that begets the generalization blocks for deeper networks; we relate decision power of deep neural networks remains a challenge. boundaries of such neural networks to tropical hypersurfaces, a major object of study in tropi- The goal of our work is to establish connections between cal geometry; and we prove that linear regions neural network and tropical geometry in the hope that they of such neural networks correspond to vertices of will shed light on the workings of deep neural networks. polytopes associated with tropical rational func- Tropical geometry is a new area in algebraic geometry that tions. An insight from our tropical formulation has seen an explosive growth in the recent decade but re- is that a deeper network is exponentially more mains relatively obscure outside pure mathematics. We will expressive than a shallow network. focus on feedforward neural networks with rectified linear units (ReLU) and show that they are analogues of rational functions, i.e., ratios of two multivariate polynomials f; g in 1. Introduction variables x1; : : : ; xd, Deep neural networks have recently received much limelight fpx1; : : : ; xdq ; for their enormous success in a variety of applications across gpx1; : : : ; xdq many different areas of artificial intelligence, computer vi- sion, speech recognition, and natural language processing in tropical algebra. For standard and trigonometric poly- (LeCun et al., 2015; Hinton et al., 2012; Krizhevsky et al., nomials, it is known that rational approximation — ap- 2012; Bahdanau et al., 2014; Kalchbrenner & Blunsom, proximating a target function by a ratio of two polynomials 2013). Nevertheless, it is also well-known that our theoreti- instead of a single polynomial — vastly improves the quality cal understanding of their efficacy remains incomplete. of approximation without increasing the degree. This gives our analogue: An ReLU neural network is the tropical ratio There have been several attempts to analyze deep neural net- arXiv:1805.07091v1 [cs.LG] 18 May 2018 of two tropical polynomials, i.e., a tropical rational function. works from different perspectives. Notably, earlier studies More precisely, if we view a neural network as a function have suggested that a deep architecture could use parameters ν : d Ñ p, x “ px ; : : : ; x q ÞÑ pν pxq; : : : ; ν pxqq, more efficiently and requires exponentially fewer parame- R R 1 d 1 p then each ν is a tropical rational map, i.e., each ν is a ters to express certain families of functions than a shallow i tropical rational function. In fact, we will show that: architecture (Delalleau & Bengio, 2011; Bengio & Delal- 1Department of Computer Science, University of Chicago, 2 the family of functions represented by feedforward Chicago, IL Department of Statistics, University of Chicago, neural networks with rectified linear units and Chicago, IL 3Computational and Applied Mathematics Initiative, University of Chicago, Chicago, IL. Correspondence to: Lek-Heng integer weights is exactly the family of tropical Lim <[email protected]>. rational maps. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 It immediately follows that there is a semifield structure on by the author(s). this family of functions. More importantly, this establishes a Tropical Geometry of Deep Neural Networks bridge between neural networks1 and tropical geometry that generalize many algebraic objects (e.g., matrices, polynomi- allows us to view neural networks as well-studied tropical als, tensors, etc) and notions (e.g., rank, determinant, degree, geometric objects. This insight allows us to closely relate etc) over the tropical semiring — the study of these, in a boundaries between linear regions of a neural network to nutshell, constitutes the subject of tropical algebra. tropical hypersurfaces and thereby facilitate studies of de- Let “ tn P : n ¥ 0u. For an integer a P , raising cision boundaries of neural networks in classification prob- N Z N x P to the ath power is the same as multiplying x to lems as tropical hypersurfaces. Furthermore, the number of R itself a times. When standard multiplication is replaced by linear regions, which captures the complexity of a neural tropical multiplication, this gives us tropical power: network (Montufar et al., 2014; Raghu et al., 2017; Arora et al., 2018), can be bounded by the number of vertices of xda :“ x d ¨ ¨ ¨ d x “ a ¨ x; the polytopes associated with the neural network’s tropical rational representation. Lastly, a neural network with one where the last ¨ denotes standard product of real numbers; it hidden layer can be completely characterized by zonotopes, is extended to R Y t´8u by defining, for any a P N, which serve as building blocks for deeper networks. ´8 if a ¡ 0; In Sections2 and3 we introduce basic tropical algebra and ´8da :“ 0 if a 0: tropical algebraic geometry of relevance to us. We state # “ our assumptions precisely in Section4 and establish the A tropical semiring, while not a field, possesses one quality connection between tropical geometry and multilayer neural of a field: Every x P R has a tropical multiplicative inverse networks in Section5. We analyze neural networks with given by its standard additive inverse, i.e., xdp´1q :“ ´x. tropical tools in Section6, proving that a deeper neural Though not reflected in its name, T is in fact a semifield. network is exponentially more expressive than a shallow network — though our objective is not so much to perform One may therefore also raise x P R to a negative power state-of-the-art analysis but to demonstrate that tropical al- a P Z by raising its tropical multiplicative inverse ´x to the gebraic geometry can provide useful insights. All proofs are positive power á, i.e., xda “ p´xqdpáq. As is the case deferred to SectionD of the supplement. in standard real arithmetic, the tropical additive inverse ´8 does not have a tropical multiplicative inverse and ´8da 2. Tropical algebra is undefined for a ă 0. For notational simplicity, we will henceforth write xa instead of xda for tropical power when Roughly speaking, tropical algebraic geometry is an ana- there is no cause for confusion. Other algebraic rules of logue of classical algebraic geometry over C, the field of tropical power may be derived from definition; see SectionB complex numbers, but where one replaces C by a semifield2 in the supplement. called the tropical semiring, to be defined below. We give a We are now in a position to define tropical polynomials and brief review of tropical algebra and introduce some relevant tropical rational functions. In the following, x and xi will notations. See (Itenberg et al., 2009; Maclagan & Sturmfels, denote variables (i.e., indeterminates). 2015) for an in-depth treatment. Definition 2.2. A tropical monomial in d variables The most fundamental component of tropical algebraic ge- x1; : : : ; xd is an expression of the form ometry is the tropical semiring T :“ R Y t´8u; `; d . a1 a2 ad The two operations ` and d, called tropical addition and c d x1 d x2 d ¨ ¨ ¨ d xd tropical multiplication respectively, are` defined as follows.˘ where c P R Y t´8u and a1; : : : ; ad P N. As a conve- Definition 2.1. For x; y P R, their tropical sum is x ` y :“ nient shorthand, we will also write a tropical monomial in maxtx; yu; their tropical product is x d y :“ x ` y; the α d multiindex notation as cx where α “ pa1; : : : ; adq P N tropical quotient of x over y is x m y :“ x ´ y. α α and x “ px1; : : : ; xdq. Note that x “ 0 d x as 0 is the tropical multiplicative identity. For any x P R, we have ´8 ` x “ 0 d x “ x and ´8 d x “ ´8. Thus ´8 is the tropical additive identity Definition 2.3. Following notations above, a tropical poly- and 0 is the tropical multiplicative identity. Furthermore, nomial fpxq “ fpx1; : : : ; xdq is a finite tropical sum of these operations satisfy the usual laws of arithmetic: associa- tropical monomials tivity, commutativity, and distributivity. The set R Y t´8u f x c xα1 c xαr ; is therefore a semiring under the operations ` and d. While p q “ 1 ` ¨ ¨ ¨ ` r it is not a ring (lacks additive inverse), one may nonetheless d where αi “ pai1; : : : ; aidq P N and ci P R Y t´8u, 1Henceforth a “neural network” will always mean a feedfor- i “ 1; : : : ; r. We will assume that a monomial of a given ward neural network with ReLU activation. multiindex appears at most once in the sum, i.e., αi ‰ αj 2A semifield is a field sans the existence of additive inverses. for any i ‰ j. Tropical Geometry of Deep Neural Networks Definition 2.4. Following notations above, a tropical rational function is a standard difference, or, equivalently, a tropical quotient of two tropical polynomials fpxq and gpxq: fpxq ´ gpxq “ fpxq m gpxq: We will denote a tropical rational function by f m g, where f and g are understood to be tropical polynomial functions.

Load more