Tropical Geometry of Deep Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Tropical Geometry of Deep Neural Networks Liwen Zhang 1 Gregory Naitzat 2 Lek-Heng Lim 2 3 Abstract 2011; Montufar et al., 2014; Eldan & Shamir, 2016; Poole et al., 2016; Telgarsky, 2016; Arora et al., 2018). Recent We establish, for the first time, explicit connec- work (Zhang et al., 2016) showed that several successful tions between feedforward neural networks with neural networks possess a high representation power and ReLU activation and tropical geometry — we can easily shatter random data. However, they also general- show that the family of such neural networks is ize well to data unseen during training stage, suggesting that equivalent to the family of tropical rational maps. such networks may have some implicit regularization. Tra- Among other things, we deduce that feedforward ditional measures of complexity such as VC-dimension and ReLU neural networks with one hidden layer can Rademacher complexity fail to explain this phenomenon. be characterized by zonotopes, which serve as Understanding this implicit regularization that begets the building blocks for deeper networks; we relate generalization power of deep neural networks remains a decision boundaries of such neural networks to challenge. tropical hypersurfaces, a major object of study in tropical geometry; and we prove that linear The goal of our work is to establish connections between regions of such neural networks correspond to neural network and tropical geometry in the hope that they vertices of polytopes associated with tropical ra- will shed light on the workings of deep neural networks. tional functions. An insight from our tropical for- Tropical geometry is a new area in algebraic geometry that mulation is that a deeper network is exponentially has seen an explosive growth in the recent decade but re- more expressive than a shallow network. mains relatively obscure outside pure mathematics. We will focus on feedforward neural networks with rectified linear units (ReLU) and show that they are analogues of rational 1. Introduction functions, i.e., ratios of two multivariate polynomials f; g in variables x1; : : : ; xd, Deep neural networks have recently received much limelight f(x ; : : : ; x ) for their enormous success in a variety of applications across 1 d ; many different areas of artificial intelligence, computer vi- g(x1; : : : ; xd) sion, speech recognition, and natural language processing in tropical algebra. For standard and trigonometric poly- (LeCun et al., 2015; Hinton et al., 2012; Krizhevsky et al., nomials, it is known that rational approximation — ap- 2012; Bahdanau et al., 2014; Kalchbrenner & Blunsom, proximating a target function by a ratio of two polynomials 2013). Nevertheless, it is also well-known that our theoreti- instead of a single polynomial — vastly improves the quality cal understanding of their efficacy remains incomplete. of approximation without increasing the degree. This gives There have been several attempts to analyze deep neural net- our analogue: An ReLU neural network is the tropical ratio works from different perspectives. Notably, earlier studies of two tropical polynomials, i.e., a tropical rational function. have suggested that a deep architecture could use parameters More precisely, if we view a neural network as a function d p more efficiently and requires exponentially fewer parame- ν : R ! R , x = (x1; : : : ; xd) 7! (ν1(x); : : : ; νp(x)), ters to express certain families of functions than a shallow ar- then ν is a tropical rational map, i.e., each νi is a tropical chitecture (Delalleau & Bengio, 2011; Bengio & Delalleau, rational function. In fact, we will show that: 1Department of Computer Science, University of Chicago, Chicago, IL 2Department of Statistics, University of Chicago, the family of functions represented by feedforward Chicago, IL 3Computational and Applied Mathematics Initiative, neural networks with rectified linear units and University of Chicago, Chicago, IL. Correspondence to: Lek-Heng integer weights is exactly the family of tropical Lim <[email protected]>. rational maps. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 It immediately follows that there is a semifield structure on by the author(s). this family of functions. More importantly, this establishes a Tropical Geometry of Deep Neural Networks bridge between neural networks1 and tropical geometry that it is not a ring (lacks additive inverse), one may nonetheless allows us to view neural networks as well-studied tropical generalize many algebraic objects (e.g., matrices, polynomi- geometric objects. This insight allows us to closely relate als, tensors, etc) and notions (e.g., rank, determinant, degree, boundaries between linear regions of a neural network to etc) over the tropical semiring — the study of these, in a tropical hypersurfaces and thereby facilitate studies of de- nutshell, constitutes the subject of tropical algebra. cision boundaries of neural networks in classification prob- Let = fn 2 : n ≥ 0g. For an integer a 2 , raising lems as tropical hypersurfaces. Furthermore, the number of N Z N x 2 to the ath power is the same as multiplying x to linear regions, which captures the complexity of a neural R itself a times. When standard multiplication is replaced by network (Montufar et al., 2014; Raghu et al., 2017; Arora tropical multiplication, this gives us tropical power: et al., 2018), can be bounded by the number of vertices of the polytopes associated with the neural network’s tropical x a := x · · · x = a · x; rational representation. Lastly, a neural network with one hidden layer can be completely characterized by zonotopes, where the last · denotes standard product of real numbers; it which serve as building blocks for deeper networks. is extended to R [ {−∞} by defining, for any a 2 N, ( In Sections2 and3 we introduce basic tropical algebra and −∞ if a > 0; −∞ a := tropical algebraic geometry of relevance to us. We state 0 if a = 0: our assumptions precisely in Section4 and establish the connection between tropical geometry and multilayer neural A tropical semiring, while not a field, possesses one quality networks in Section5. We analyze neural networks with of a field: Every x 2 R has a tropical multiplicative inverse tropical tools in Section6, proving that a deeper neural given by its standard additive inverse, i.e., x (−1) := −x. network is exponentially more expressive than a shallow Though not reflected in its name, T is in fact a semifield. network — though our objective is not so much to perform One may therefore also raise x 2 R to a negative power state-of-the-art analysis but to demonstrate that tropical al- a 2 Z by raising its tropical multiplicative inverse −x to the gebraic geometry can provide useful insights. All proofs are positive power −a, i.e., x a = (−x) (−a). As is the case deferred to SectionD of the supplement. in standard real arithmetic, the tropical additive inverse −∞ does not have a tropical multiplicative inverse and −∞ a 2. Tropical Algebra is undefined for a < 0. For notational simplicity, we will henceforth write xa instead of x a for tropical power when Roughly speaking, tropical algebraic geometry is an ana- there is no cause for confusion. Other algebraic rules of logue of classical algebraic geometry over C, the field of 2 tropical power may be derived from definition; see SectionB complex numbers, but where one replaces C by a semifield in the supplement. called the tropical semiring, to be defined below. We give a brief review of tropical algebra and introduce some relevant We are now in a position to define tropical polynomials and notations. See (Itenberg et al., 2009; Maclagan & Sturmfels, tropical rational functions. In the following, x and xi will 2015) for an in-depth treatment. denote variables (i.e., indeterminates). The most fundamental component of tropical algebraic ge- Definition 2.2. A tropical monomial in d variables x ; : : : ; x is an expression of the form ometry is the tropical semiring T := R [ {−∞}; ⊕; , 1 d also known as the max-plus algebra. The two operations ⊕ a1 a2 ad c x1 x2 · · · xd and , called tropical addition and tropical multiplication respectively, are defined as follows. where c 2 R [ {−∞} and a1; : : : ; ad 2 N. As a conve- : nient shorthand, we will also write a tropical monomial in Definition 2.1. For x; y 2 R, their tropical sum is x ⊕ y = α d : multiindex notation as cx where α = (a1; : : : ; ad) 2 N maxfx; yg; their tropical product is x y = x + y; the α α tropical quotient of x over y is x y := x − y. and x = (x1; : : : ; xd). Note that x = 0 x as 0 is the tropical multiplicative identity. For any x 2 R, we have −∞ ⊕ x = 0 x = x and Definition 2.3. Following notations above, a tropical poly- −∞ x = −∞. Thus −∞ is the tropical additive identity nomial f(x) = f(x1; : : : ; xd) is a finite tropical sum of and 0 is the tropical multiplicative identity. Furthermore, tropical monomials these operations satisfy the usual laws of arithmetic: associa- f(x) = c xα1 ⊕ · · · ⊕ c xαr ; tivity, commutativity, and distributivity. The set R [ {−∞} 1 r is therefore a semiring under the operations ⊕ and . While d where αi = (ai1; : : : ; aid) 2 N and ci 2 R [ {−∞}, 1Henceforth a “neural network” will always mean a feedfor- i = 1; : : : ; r. We will assume that a monomial of a given ward neural network with ReLU activation. multiindex appears at most once in the sum, i.e., αi 6= αj 2A semifield is a field sans the existence of additive inverses. for any i 6= j. Tropical Geometry of Deep Neural Networks Definition 2.4. Following notations above, a tropical ra- tional function is a standard difference, or, equivalently, a tropical quotient of two tropical polynomials f(x) and g(x): f(x) − g(x) = f(x) g(x): We will denote a tropical rational function by f g, where f and g are understood to be tropical polynomial functions.