Lecture Notes on Convex Analysis and Iterative Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

Lecture Notes on Convex Analysis and Iterative Algorithms Lecture Notes on Convex Analysis and Iterative Algorithms Ilker_ Bayram [email protected] About These Notes These are the lectures notes of a graduate course I offered in the Dept. of Elec- tronics and Telecommunications Engineering at Istanbul Technical University. My goal was to get students acquainted with methods of convex analysis, to make them more comfortable in following arguments that appear in recent signal processing literature, and understand/analyze the proximal point al- gorithm, along with its many variants. In the first half of the course, convex analysis is introduced at a level suitable for graduate students in electrical engi- neering (i.e., some familiarity with the notion of a convex set, convex functions from other courses). Then several other algorithms are derived based on the proximal point algorithm, such as the Douglas-Rachford algorithm, ADMM, and some applications to saddle point problems. There are no references in this version. I hope to add some in the future. Ilker_ Bayram December, 2018 Contents 1 Convex Sets2 1.1 Basic Definitions..........................2 1.2 Operations Preserving Convexity of Sets.............3 1.3 Projections onto Closed Convex Sets...............7 1.4 Separation and Normal Cones................... 10 1.5 Tangent and Normal Cones.................... 12 2 Convex Functions 15 2.1 Operations That Preserve the Convexity of Functions...... 17 2.2 First Order Differentiation..................... 18 2.3 Second Order Differentiation.................... 20 3 Conjugate Functions 22 4 Duality 27 4.1 A General Discussion of Duality.................. 27 4.2 Lagrangian Duality......................... 31 4.3 Karush-Kuhn-Tucker (KKT) Conditions............. 34 5 Subdifferentials 35 5.1 Motivation, Definition, Properties of Subdifferentials...... 35 5.2 Connection with the KKT Conditions............... 40 5.3 Monotonicity of the Subdifferential................ 40 6 Applications to Algorithms 45 6.1 The Proximal Point Algorithm................... 45 6.2 Firmly-Nonexpansive Operators.................. 48 6.3 The Dual PPA and the Augmented Lagrangian......... 52 6.4 The Douglas-Rachford Algorithm................. 53 6.5 Alternating Direction Method of Multipliers........... 57 6.6 A Generalized Proximal Point Algorithm............. 60 1 1 Convex Sets This first chapter introduces convex sets and discusses some of their properties. Having a solid understanding of convex sets is very useful for convex analysis of functions because we can and will regard a convex function as a special representation of a convex set, namely its epigraph. 1.1 Basic Definitions Definition 1. A set C 2 Rn is said to be convex if x 2 C, x0 2 C implies that αx + (1 − α)x 2 C for any α 2 [0; 1]. Consider the sets below. Each pair of (x; x0) we select in the set on the left defines a line segment which lies inside the set. However, this is not the case for the set on the right. Even though we are able to find line segments with endpoints inside the set (as in (x; x0)), this is not true in general, as exemplified by (y; y0). x0 y y0 x x x0 For the examples below, decide if the set is convex or not and prove whatever you think is true. n Example 1 (Hyperplane). For given s 2 R , r 2 R, consider the set Hs;r = fx 2 Rn : hs; xi = rg. Notice that this is a subspace for r = 0. Example 2 (Affine Subspace). This is a set V 2 Rn such that if x 2 V and x0 2 V , then αx + (1 − α)x0 2 V for all α 2 R. n − Example 3 (Half Space). For given s 2 R , r 2 R, consider the set Hs;r = fx 2 Rn : hs; xi ≤ rg. Example 4 (Cone). A cone K 2 Rn is a set such that if x 2 K, then αx 2 K for all α > 0. Note that a cone may be convex or non-convex. See below for an example of a convex (left) and a non-convex (right) cone. 2 Exercise 1. Let K be a cone. Show that K is convex if and only if x; y 2 K implies x + y 2 K. 1.2 Operations Preserving Convexity of Sets Proposition 1 (Intersection of Convex Sets). Let C1, C2,... Ck be convex sets. Show that C = \iCi is convex. Proof. Exercise! Question 1. What happens if the intersection is empty? Is the empty set convex? This simple result is useful for characterizing linear systems of equations or inequalities. Example 5. For a matrix A, the solution set of Ax = b is an intersection of hyperplanes. Therefore it is convex. For the example above, we can in fact say more, thanks to the following vari- ation of Prop.1. Exercise 2. Show that the intersection of a finite collection of affine subspaces is an affine subspace. Let us continue with systems of linear inequalities. Example 6. For a matrix A, the solution set of Ax ≤ b is an intersection of half spaces. Therefore it is convex. Proposition 2 (Cartesian Products of Convex Sets). Suppose C1,. , Ck are n convex sets in R . Then the Cartesian product C1 × · · · × Ck is a convex set in Rn×···×n. Proof. Exercise! Given an operator F and a set C, we can apply F to elements of C to obtain the image of C under F . We will denote that set as FC. If F is linear then it preserves convexity. Proposition 3 (Linear Transformations of Convex Sets). Let M be a matrix. If C is convex, then MC is also convex. Consider now an operator that just adds a vector dto its operand. This is a translation operator. Geometrically, it should be obvious that translation preserves convexity. It is a good exercise to translate this mental picture to an algebraic expression and show the following. 3 Proposition 4 (Translation). If C is convex, then the set C + d = fx : x = v + d for some v 2 Cgis also convex. Given C1, C2, consider the set of points of the form v = v1 + v2, where vi 2 Ci. This set is denoted by C1 + C2and is called the Minkowski sum of C1and C2. We have the following result concerning Minkowski sums. Proposition 5 (Minkowski Sums of Convex Sets). If C1 and C2 are convex then C1 + C2is also convex. Proof. Observe that C1 + C2 = II (C1 × C2): Thus it follows by Prop.2 and Prop.3 that C1 + C2 is convex. Example 7. The Minkowski sum of a rectangle and a disk in R2 in shown below. 3 2 + = 1 1 2 Definition 2 (Convex Combination). Consider a finite collection of points x1, ..., xk. x is said to be a convex combination of xi's if x satisfies x = α1 x1 + ··· + αk xk for some αi such that αi ≥ 0 for all i; k X αk = 1: i=1 Definition 3 (Convex Hull). The set of all convex combinations of a set C is called the convex hull of C and is denoted as Co(C). Below are two examples, showing the convex hull of the sets C = fx1; x2g, and 0 C = fy1; y2; y3g. 4 y3 y2 x2 x1 y1 Notice that in the definition of the convex hull, the set C does not have to be convex (in fact, C is not convex in the examples above). Also, regardless of the dimension of C, when constructing Co(C), we can consider convex combinations of any number of finite elements chosen from C. In fact, if we denote the set of all convex combinations of kelements from C as Ck, then we can show that Ck ⊂ Ck+m for m ≥ 0. An interesting result, which we will not use is in this course, is the following. Exercise 3 (Caratheodory's Theorem). Show that, if C 2 Rn, then Co(C) = Cn+1. The following proposition justifies the term `convex' in the definition of the convex hull. Proposition 6. The convex hull of a set C is convex. Proof. Exercise! The convex hull of C is the smallest convex set that contains C. More precisely, we have the following. Proposition 7. If D = Co(C), and if E is a convex set with C ⊂ E, then D ⊂ E. Proof. The idea is to show that for any integer k, Econtains all convex com- binations involving k elements from C. For this, we will present an argument based on induction. We start with k = 2. Suppose x1; x2 2 C. This implies x1; x2 2 E. Since E is convex, we have αx1 + (1 − α)x2 2 E, for all α 2 [0; 1]. Since x1, x2 were arbitrary elements of C, it follows that Econtains all convex combinations of any two elements from C. Suppose now that Econtains all convex combinations of any k − 1 elements Pk−1 from C. That is, if x1,... xk−1 are in C, then for i=1 αi = 1, and αi ≥ 0, we Pk−1 th have i=1 αi xi 2 E. Suppose we pick a k element, say xk, from C. Also, let α1,... αkbe on the unit simplex, with αk 6= 0 (if αk = 0, we have nothing 5 to prove). Observe that k k−1 X X αi xi = αk xk + αi xi i=1 i=1 k−1 X αi = α x + (1 − α ) x : k k k 1 − α i i=1 k Notice that k−1 X αi = 1; 1 − α i=1 k α i ≥ 0; for all i: 1 − αk Therefore, k−1 X αi y = x 1 − α i i=1 k is an element of E since it is a convex combination of k − 1 elements of C.
Recommended publications
  • Conjugate Convex Functions in Topological Vector Spaces
    Matematisk-fysiske Meddelelser udgivet af Det Kongelige Danske Videnskabernes Selskab Bind 34, nr. 2 Mat. Fys. Medd . Dan. Vid. Selsk. 34, no. 2 (1964) CONJUGATE CONVEX FUNCTIONS IN TOPOLOGICAL VECTOR SPACES BY ARNE BRØNDSTE D København 1964 Kommissionær : Ejnar Munksgaard Synopsis Continuing investigations by W. L . JONES (Thesis, Columbia University , 1960), the theory of conjugate convex functions in finite-dimensional Euclidea n spaces, as developed by W. FENCHEL (Canadian J . Math . 1 (1949) and Lecture No- tes, Princeton University, 1953), is generalized to functions in locally convex to- pological vector spaces . PRINTP_ll IN DENMARK BIANCO LUNOS BOGTRYKKERI A-S Introduction The purpose of the present paper is to generalize the theory of conjugat e convex functions in finite-dimensional Euclidean spaces, as initiated b y Z . BIRNBAUM and W. ORLICz [1] and S . MANDELBROJT [8] and developed by W. FENCHEL [3], [4] (cf. also S. KARLIN [6]), to infinite-dimensional spaces . To a certain extent this has been done previously by W . L . JONES in his Thesis [5] . His principal results concerning the conjugates of real function s in topological vector spaces have been included here with some improve- ments and simplified proofs (Section 3). After the present paper had bee n written, the author ' s attention was called to papers by J . J . MOREAU [9], [10] , [11] in which, by a different approach and independently of JONES, result s equivalent to many of those contained in this paper (Sections 3 and 4) are obtained. Section 1 contains a summary, based on [7], of notions and results fro m the theory of topological vector spaces applied in the following .
    [Show full text]
  • Turbo Decoding As Iterative Constrained Maximum Likelihood Sequence Detection John Maclaren Walsh, Member, IEEE, Phillip A
    1 Turbo Decoding as Iterative Constrained Maximum Likelihood Sequence Detection John MacLaren Walsh, Member, IEEE, Phillip A. Regalia, Fellow, IEEE, and C. Richard Johnson, Jr., Fellow, IEEE Abstract— The turbo decoder was not originally introduced codes have good distance properties, which would be relevant as a solution to an optimization problem, which has impeded for maximum likelihood decoding, researchers have not yet attempts to explain its excellent performance. Here it is shown, succeeded in developing a proper connection between the sub- nonetheless, that the turbo decoder is an iterative method seeking a solution to an intuitively pleasing constrained optimization optimal turbo decoder and maximum likelihood decoding. This problem. In particular, the turbo decoder seeks the maximum is exacerbated by the fact that the turbo decoder, unlike most of likelihood sequence under the false assumption that the input the designs in modern communications systems engineering, to the encoders are chosen independently of each other in the was not originally introduced as a solution to an optimization parallel case, or that the output of the outer encoder is chosen problem. This has made explaining just why the turbo decoder independently of the input to the inner encoder in the serial case. To control the error introduced by the false assumption, the opti- performs as well as it does very difficult. Together with the mizations are performed subject to a constraint on the probability lack of formulation as a solution to an optimization problem, that the independent messages happen to coincide. When the the turbo decoder is an iterative algorithm, which makes the constraining probability equals one, the global maximum of determining its convergence and stability behavior important.
    [Show full text]
  • A Comparative Study of Time Delay Estimation Techniques for Road Vehicle Tracking Patrick Marmaroli, Xavier Falourd, Hervé Lissek
    A comparative study of time delay estimation techniques for road vehicle tracking Patrick Marmaroli, Xavier Falourd, Hervé Lissek To cite this version: Patrick Marmaroli, Xavier Falourd, Hervé Lissek. A comparative study of time delay estimation techniques for road vehicle tracking. Acoustics 2012, Apr 2012, Nantes, France. hal-00810981 HAL Id: hal-00810981 https://hal.archives-ouvertes.fr/hal-00810981 Submitted on 23 Apr 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Proceedings of the Acoustics 2012 Nantes Conference 23-27 April 2012, Nantes, France A comparative study of time delay estimation techniques for road vehicle tracking P. Marmaroli, X. Falourd and H. Lissek Ecole Polytechnique F´ed´erale de Lausanne, EPFL STI IEL LEMA, 1015 Lausanne, Switzerland patrick.marmaroli@epfl.ch 4135 23-27 April 2012, Nantes, France Proceedings of the Acoustics 2012 Nantes Conference This paper addresses road traffic monitoring using passive acoustic sensors. Recently, the feasibility of the joint speed and wheelbase length estimation of a road vehicle using particle filtering has been demonstrated. In essence, the direction of arrival of propagated tyre/road noises are estimated using a time delay estimation (TDE) technique between pairs of microphones placed near the road.
    [Show full text]
  • An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function
    Journal of Convex Analysis Volume 3 (1996), No.1, 63{70 An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function B. Lemaire Universit´e Montpellier II, Place E. Bataillon, 34095 Montpellier Cedex 5, France. e-mail:[email protected] Received July 5, 1994 Revised manuscript received January 22, 1996 Dedicated to R. T. Rockafellar on his 60th Birthday The asymptotical limit of the trajectory defined by the continuous steepest descent method for a proper closed convex function f on a Hilbert space is characterized in the set of minimizers of f via an asymp- totical variational principle of Brezis-Ekeland type. The implicit discrete analogue (prox method) is also considered. Keywords : Asymptotical, convex minimization, differential inclusion, prox method, steepest descent, variational principle. 1991 Mathematics Subject Classification: 65K10, 49M10, 90C25. 1. Introduction Let X be a real Hilbert space endowed with inner product :; : and associated norm : , and let f be a proper closed convex function on X. h i k k The paper considers the problem of minimizing f, that is, of finding infX f and some element in the optimal set S := Argmin f, this set assumed being non empty. Letting @f denote the subdifferential operator associated with f, we focus on the contin- uous steepest descent method associated with f, i.e., the differential inclusion du @f(u); t > 0 − dt 2 with initial condition u(0) = u0: This method is known to yield convergence under broad conditions summarized in the following theorem. Let us denote by the real vector space of continuous functions from [0; + [ into X that are absolutely conAtinuous on [δ; + [ for all δ > 0.
    [Show full text]
  • Optimization Basic Results
    Optimization Basic Results Michel De Lara Cermics, Ecole´ des Ponts ParisTech France Ecole´ des Ponts ParisTech November 22, 2020 Outline of the presentation Magic formulas Convex functions, coercivity Existence and uniqueness of a minimum First-order optimality conditions (the case of equality constraints) Duality gap and saddle-points Elements of Lagrangian duality and Uzawa algorithm More on convexity and duality Outline of the presentation Magic formulas Convex functions, coercivity Existence and uniqueness of a minimum First-order optimality conditions (the case of equality constraints) Duality gap and saddle-points Elements of Lagrangian duality and Uzawa algorithm More on convexity and duality inf h(a; b) = inf inf h(a; b) a2A;b2B a2A b2B inf λf (a) = λ inf f (a) ; 8λ ≥ 0 a2A a2A inf f (a) + g(b) = inf f (a) + inf g(b) a2A;b2B a2A b2B Tower formula For any function h : A × B ! [−∞; +1] we have inf h(a; b) = inf inf h(a; b) a2A;b2B a2A b2B and if B(a) ⊂ B, 8a 2 A, we have inf h(a; b) = inf inf h(a; b) a2A;b2B(a) a2A b2B(a) Independence For any functions f : A !] − 1; +1] ; g : B !] − 1; +1] we have inf f (a) + g(b) = inf f (a) + inf g(b) a2A;b2B a2A b2B and for any finite set S, any functions fs : As !] − 1; +1] and any nonnegative scalars πs ≥ 0, for s 2 S, we have X X inf π f (a )= π inf f (a ) Q s s s s s s fas gs2 2 s2 As as 2As S S s2S s2S Outline of the presentation Magic formulas Convex functions, coercivity Existence and uniqueness of a minimum First-order optimality conditions (the case of equality constraints) Duality gap and saddle-points Elements of Lagrangian duality and Uzawa algorithm More on convexity and duality Convex sets Let N 2 N∗.
    [Show full text]
  • Comparison of Risk Measures
    Geometry Of The Expected Value Set And The Set-Valued Sample Mean Process Alois Pichler∗ May 13, 2018 Abstract The law of large numbers extends to random sets by employing Minkowski addition. Above that, a central limit theorem is available for set-valued random variables. The existing results use abstract isometries to describe convergence of the sample mean process towards the limit, the expected value set. These statements do not reveal the local geometry and the relations of the sample mean and the expected value set, so these descriptions are not entirely satisfactory in understanding the limiting behavior of the sample mean process. This paper addresses and describes the fluctuations of the sample average mean on the boundary of the expectation set. Keywords: Random sets, set-valued integration, stochastic optimization, set-valued risk measures Classification: 90C15, 26E25, 49J53, 28B20 1 Introduction Artstein and Vitale [4] obtain an initial law of large numbers for random sets. Given this result and the similarities of Minkowski addition of sets with addition and multiplication for scalars it is natural to ask for a central limit theorem for random sets. After some pioneering work by Cressie [11], Weil [28] succeeds in establishing a reasonable result describing the distribution of the Pompeiu–Hausdorff distance between the sample average and the expected value set. The result is based on an isometry between compact sets and their support functions, which are continuous on some appropriate and adapted sphere (cf. also Norkin and Wets [20] and Li et al. [17]; cf. Kuelbs [16] for general difficulties). However, arXiv:1708.05735v1 [math.PR] 18 Aug 2017 the Pompeiu–Hausdorff distance of random sets is just an R-valued random variable and its d distribution is on the real line.
    [Show full text]
  • Direct Optimization Through $\Arg \Max$ for Discrete Variational Auto
    Direct Optimization through arg max for Discrete Variational Auto-Encoder Guy Lorberbom Andreea Gane Tommi Jaakkola Tamir Hazan Technion MIT MIT Technion Abstract Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an arg max operation and is non-differentiable. In contrast to previous works which resort to softmax-based relaxations, we propose to optimize it directly by applying the direct loss minimization approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the arg max operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables. 1 Introduction Models with discrete latent variables drive extensive research in machine learning applications, including language classification and generation [42, 11, 34], molecular synthesis [19], or game solving [25]. Compared to their continuous counterparts, discrete latent variable models can decrease the computational complexity of inference calculations, for instance, by discarding alternatives in hard attention models [21], they can improve interpretability by illustrating which terms contributed to the solution [27, 42], and they can facilitate the encoding of inductive biases in the learning process, such as images consisting of a small number of objects [8] or tasks requiring intermediate alignments [25]. Finally, in some cases, discrete latent variables are natural choices, for instance when modeling datasets with discrete classes [32, 12, 23]. Performing maximum likelihood estimation of latent variable models is challenging due to the requirement to marginalize over the latent variables.
    [Show full text]
  • Extreme-Value Theorems for Optimal Multidimensional Pricing
    Extreme-Value Theorems for Optimal Multidimensional Pricing Yang Cai∗ Constantinos Daskalakisy Computer Science, McGill University EECS, MIT [email protected] [email protected] October 28, 2014 Abstract We provide a near-optimal, computationally efficient algorithm for the unit-demand pricing problem, where a seller wants to price n items to optimize revenue against a unit-demand buyer whose values for the items are independently drawn from known distributions. For any chosen accuracy > 0 and item values bounded in [0; 1], our algorithm achieves revenue that is optimal up to an additive error of at most , in polynomial time. For values sampled from Monotone Hazard Rate (MHR) distributions, we achieve a (1 − )-fraction of the optimal revenue in poly- nomial time, while for values sampled from regular distributions the same revenue guarantees are achieved in quasi-polynomial time. Our algorithm for bounded distributions applies probabilistic techniques to understand the statistical properties of revenue distributions, obtaining a reduction in the search space of the algorithm through dynamic programming. Adapting this approach to MHR and regular distri- butions requires the proof of novel extreme value theorems for such distributions. As a byproduct, our techniques establish structural properties of approximately-optimal and near-optimal solutions. We show that, when the buyer's values are independently distributed according to MHR distributions, pricing all items at the same price achieves a constant fraction of the optimal revenue. Moreover, for all > 0, at most g(1/) distinct prices suffice to obtain a (1 − )-fraction of the optimal revenue, where g(1/) is a quadratic function of 1/ that does not depend on the number of items.
    [Show full text]
  • ADDENDUM the Following Remarks Were Added in Proof (November 1966). Page 67. an Easy Modification of Exercise 4.8.25 Establishes
    ADDENDUM The following remarks were added in proof (November 1966). Page 67. An easy modification of exercise 4.8.25 establishes the follow­ ing result of Wagner [I]: Every simplicial k-"complex" with at most 2 k ~ vertices has a representation in R + 1 such that all the "simplices" are geometric (rectilinear) simplices. Page 93. J. H. Conway (private communication) has established the validity of the conjecture mentioned in the second footnote. Page 126. For d = 2, the theorem of Derry [2] given in exercise 7.3.4 was found earlier by Bilinski [I]. Page 183. M. A. Perles (private communication) recently obtained an affirmative solution to Klee's problem mentioned at the end of section 10.1. Page 204. Regarding the question whether a(~) = 3 implies b(~) ~ 4, it should be noted that if one starts from a topological cell complex ~ with a(~) = 3 it is possible that ~ is not a complex (in our sense) at all (see exercise 11.1.7). On the other hand, G. Wegner pointed out (in a private communication to the author) that the 2-complex ~ discussed in the proof of theorem 11.1 .7 indeed satisfies b(~) = 4. Page 216. Halin's [1] result (theorem 11.3.3) has recently been genera­ lized by H. A. lung to all complete d-partite graphs. (Halin's result deals with the graph of the d-octahedron, i.e. the d-partite graph in which each class of nodes contains precisely two nodes.) The existence of the numbers n(k) follows from a recent result of Mader [1] ; Mader's result shows that n(k) ~ k.2(~) .
    [Show full text]
  • Cesifo Working Paper No. 5428 Category 12: Empirical and Theoretical Methods June 2015
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Aquaro, Michele; Bailey, Natalia; Pesaran, M. Hashem Working Paper Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients CESifo Working Paper, No. 5428 Provided in Cooperation with: Ifo Institute – Leibniz Institute for Economic Research at the University of Munich Suggested Citation: Aquaro, Michele; Bailey, Natalia; Pesaran, M. Hashem (2015) : Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients, CESifo Working Paper, No. 5428, Center for Economic Studies and ifo Institute (CESifo), Munich This Version is available at: http://hdl.handle.net/10419/113752 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu Quasi Maximum Likelihood Estimation of Spatial Models with Heterogeneous Coefficients Michele Aquaro Natalia Bailey M.
    [Show full text]
  • CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: a CONVERSE of the GKR THEOREM Emil Ernst
    CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM Emil Ernst To cite this version: Emil Ernst. CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM. 2011. hal-00652630 HAL Id: hal-00652630 https://hal.archives-ouvertes.fr/hal-00652630 Preprint submitted on 16 Dec 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. CLOSED MEANS CONTINUOUS IFF POLYHEDRAL: A CONVERSE OF THE GKR THEOREM EMIL ERNST Abstract. Given x0, a point of a convex subset C of an Euclidean space, the two following statements are proven to be equivalent: (i) any convex function f : C → R is upper semi-continuous at x0, and (ii) C is polyhedral at x0. In the particular setting of closed convex mappings and Fσ domains, we prove that any closed convex function f : C → R is continuous at x0 if and only if C is polyhedral at x0. This provides a converse to the celebrated Gale-Klee- Rockafellar theorem. 1. Introduction One basic fact about real-valued convex mappings on Euclidean spaces, is that they are continuous at any point of their domain’s relative interior (see for instance [13, Theorem 10.1]).
    [Show full text]
  • Machine Learning Theory Lecture 20: Mirror Descent
    Machine Learning Theory Lecture 20: Mirror Descent Nicholas Harvey November 21, 2018 In this lecture we will present the Mirror Descent algorithm, which is a common generalization of Gradient Descent and Randomized Weighted Majority. This will require some preliminary results in convex analysis. 1 Conjugate Duality A good reference for the material in this section is [5, Part E]. n ∗ n Definition 1.1. Let f : R ! R [ f1g be a function. Define f : R ! R [ f1g by f ∗(y) = sup yTx − f(x): x2Rn This is the convex conjugate or Legendre-Fenchel transform or Fenchel dual of f. For each linear functional y, the convex conjugate f ∗(y) gives the the greatest amount by which y exceeds the function f. Alternatively, we can think of f ∗(y) as the downward shift needed for the linear function y to just touch or \support" epi f. 1.1 Examples Let us consider some simple one-dimensional examples. ∗ Example 1.2. Let f(x) = cx for some c 2 R. We claim that f = δfcg, i.e., ( 0 (if x = c) f ∗(x) = : +1 (otherwise) This is called the indicator function of fcg. Note that f is itself a linear functional that obviously supports epi f; so f ∗(c) = 0. Any other linear functional x 7! yx − r cannot support epi f for any r (we have supx(yx − cx) = 1 if y 6= c), ∗ ∗ so f (y) = 1 if y 6= c. Note here that a line (f) is getting mapped to a single point (f ). 1 ∗ Example 1.3. Let f(x) = jxj.
    [Show full text]