Using a New Nonlinear Gradient Method for Solving Large Scale Convex Optimization Problems with an Application on Arabic Medical Text

Total Page:16

File Type:pdf, Size:1020Kb

Using a New Nonlinear Gradient Method for Solving Large Scale Convex Optimization Problems with an Application on Arabic Medical Text Using a New Nonlinear Gradient Method for Solving Large Scale Convex Optimization Problems with an Application on Arabic Medical Text Jaafar Hammouda*, Ali Eisab, Natalia Dobrenkoa, Natalia Gusarovaa aITMO University, St. Petersburg, Russia *[email protected] bAleppo University, Aleppo, Syria Abstract Gradient methods have applications in multiple fields, including signal processing, image processing, and dynamic systems. In this paper, we present a nonlinear gradient method for solving convex supra-quadratic functions by developing the search direction, that done by hybridizing between the two conjugate coefficients HRM [2] and NHS [1]. The numerical results proved the effectiveness of the presented method by applying it to solve standard problems and reaching the exact solution if the objective function is quadratic convex. Also presented in this article, an application to the problem of named entities in the Arabic medical language, as it proved the stability of the proposed method and its efficiency in terms of execution time. Keywords: Convex optimization; Gradient methods; Named entity recognition; Arabic; e-Health. 1. Introduction Several nonlinear conjugate gradient methods have been presented for solving high-dimensional unconstrained optimization problems which are given as [3]: min f(x) ; x∈Rn (1) To solve problem (1), we start with the following iterative relationship: 푥(푘+1) = 푥푘 + 훼푘 푑푘, 푘 = 0,1,2, (2) where α푘 > 0 is a step size, that is calculated by strong Wolfe-Powell’s conditions [4] 푇 푓(푥푘 + 훼푘푑푘) ≤ 푓(푥푘) + 훿훼푘푔푘 푑푘, (3) 푇 푇 |푔(푥푘 + 훼푘푑푘) 푑푘| ≤ 휎|푔푘 푑푘| where 0 < 훿 < 휎 < 1 푑푘 is the search direction that is computed as follow [3] −푔푘 , if 푘 = 0 (4) 푑푘+1 = { −푔푘+1 + 훽푘푑푘 , if 푘 ≥ 1 Where 푔푘 = 푔(푥푘) = ∇푓(푥푘) represent the gradient vector for 푓(푥) at the point 푥푘 , 훽푘 ∈ ℝ is known as CG coefficient that characterizes different CG methods. Some classical methods such below: 2 푇 퐻푆 푔푘 (푔푘 − 푔푘−1) HS [4] 훽푘 = 푇 (푔푘 − 푔푘−1) 푑푘−1 푇 FR [5] 퐹푅 푔푘 푔푘 훽푘 = 2 ‖푔푘−1‖ 푇( ) PRP [6, 7] 푃푅푃 푔푘 푔푘 − 푔푘−1 훽푘 = 2 ‖푔푘−1‖ 푇 ‖푔푘‖ HRM [2] 푔푘 (푔푘 − 푔푘−1) 퐻푅푀 ‖푔푘−1‖ 훽푘 = 2 2 ; 휏 = 0.4 휏‖푔푘−1‖ + (1 − 휏)‖푑푘−1‖ 2 ‖푔푘‖ 푇 NHS [1] ‖푔푘‖ − max 0, 푔푘 푔푘−1 푁퐻푆 ‖푔푘−1‖ 훽푘 = 푇 2 푇 ; 푢 = 1.1 max {푚푎푥 0, 푢푔푘 푑푘−1 + ‖푔푘−1‖ , 푑푘 푦푘−1} The iterative solution stops when we reach a point 푥푘 where the condition ‖푔푘‖ ≤ 휀 is fulfilled, where 휖 is a very small positive number. Among the most common methods that rely on the aforementioned strategy are Newton's methods [9], quasi-Newton methods [10, 11, 12], trust region methods [13, 14], and conjugated gradient methods [15, 16]. On the other hand, the optimization techniques and methods play one of the most important roles in training neural networks (NN), because it is used to reduce the losses by changing the attributes of NN such as weights and learning rate. The effect of choosing one optimization algorithm over another has been studied previously by many researchers, and despite the continuous development of this aspect, the widespread platforms that are used in the field of machine learning and deep learning depend on a specific group of these algorithms such as ADAM [22], SGD with momentum [23], and RMSprop [24], but these platforms come with the possibility of creating our own optimizer. In [25], the named entity problem was studied on Arabic medical text taken from three medical volumes issued by the Arab Medical Encyclopedia, the researchers used a BERT model [26] that was introduced by Google. As an application of the presented method, we implement our optimizer on the same dataset and show the results of comparison with the previous one. During the following sections, a hybrid method for solving Problem (1) will be presented, and then its convergence will be studied, the numerical results of the mentioned method will be presented, and in the end, an application in the field of Arabic medical text processing will prove the efficiency of the method. 2. The formula and its convergence In the following, we show a nonlinear gradient method for solving convex functions with high dimensions by hybridizing two CG formulas [28], and the new formula is given as: 푊퐻푀 푁퐻푆 퐻푅푀 훽푘 = (1 − 휃푘)훽푘 + 휃푘훽푘 (5) From (5) we distinguish the following cases: 푊퐻푀 푁퐻푆 • Case 1: if 휃푘 = 0 then 훽푘 = 훽푘 . 3 • Case 2: if 0 < 휃푘 < 1 then we find the new value for 휃푘 by using the search direction that was introduced by [16] as below: 푇 푇 푑푘+1푦푘 = −푡푠푘 푔푘+1; 푡 > 0 (6) Where is 푦푘 = 푔푘+1 − 푔푘 and . 푠푘 = 푥푘+1 − 푥푘. The new search direction is given by relation: 푊퐻푀 푑푘+1 = −푔푘+1 + 훽푘 푑푘 (7) From (5) and (7) we find that: 푁퐻푆 퐻푅푀 (8) 푑푘+1 = −푔푘+1 + ((1 − 휃푘)훽푘 + 휃푘훽푘 ) 푑푘 And from (8) and (6) we find that: 푇 푁퐻푆 푇 퐻푅푀 푇 푇 −푔푘+1 푦푘 + (1 − 휃푘)훽푘 푑푘 푦푘 + 휃푘 훽푘 푑푘 푦푘 = −푡푠푘 푔푘+1 퐻푅푀 푁퐻푆 푇 푇 푇 푁퐻푆 푇 휃푘 (훽푘 − 훽푘 )푑푘 푦푘 = −푡푠푘 푔푘+1 + 푔푘+1 푦푘 − 훽푘 푑푘 푦푘 푇 푇 푁퐻푆 푇 푛푒푤 −푡푠푘 푔푘+1 + 푔푘+1 푦푘 − 훽푘 푑푘 푦푘 (9) 휃푘 = 퐻푅푀 푁퐻푆 푇 (훽푘 − 훽푘 )푑푘 푦푘 푊퐻푀 퐻푅푀 • Case 3: if 휃푘 = 1 then 훽푘 = 훽푘 . 2.1 Algorithms steps 푛 Input: 푥0 ∈ ℝ a start point, 푓 a goal function, and 휀 > 0. Step 0: calculate the gradient vector 푔0 = ∇푓(푥0), the initial search direction 푑0 = −푔0 , and the step size 휆0 = 1 ⁄ , then we put 푘 = 0, if ‖푔푘)‖ ≤ 휖 we stop, else we go to step 1. ‖푔0)‖ Step 1: we calculate the new search direction that satisfied the strong Wolfe-Powell conditions. 푇 푓(푥푘 + 휆푘푑푘) − 푓푘 ≤ 훿휆푘푔푘 푑푘 푇 푇 |푔(푥푘 + 휆푘푑푘) 푑푘| ≤ −휎푔푘 푑푘 Where 훿 ∈ (0,0.5), 휎 ∈ (훿, 1). Step 2: set a new point 푥푘+1 = 푥푘 + 휆푘푑푘, if ‖푔푘)‖ ≤ 휖 we stop else go to step 3. Step 3: calculate , 푠푘 = 푥푘+1 − 푥푘, . 푦푘 = 푔푘+1 − 푔푘. 푁퐻푆 퐻푅푀 Step 4: find the value of each of 훽푘 , 훽푘 from below relations: 푇 ‖푔푘‖ 푔푘 (푔푘 − 푔푘−1) 퐻푅푀 ‖푔푘−1‖ 훽푘 = 2 2 ; 휏 = 0.4. 휏‖푔푘−1‖ + (1 − 휏)‖푑푘−1‖ 2 ‖푔푘‖ 푇 ‖푔푘‖ − max 0, 푔푘 푔푘−1 푁퐻푆 ‖푔푘−1‖ 훽푘 = 푇 2 푇 ; 푢 = 1.1 max {푚푎푥 0, 푢푔푘 푑푘−1 + ‖푔푘−1‖ , 푑푘 푦푘−1} 푛푒푤 Step 5: calculate 휃푘 by the relation below: 푇 푇 푁퐻푆 푇 푛푒푤 −푡푠푘 푔푘+1 + 푔푘+1 푦푘 − 훽푘 푑푘 푦푘 휃푘 = 퐻푅푀 푁퐻푆 푇 (훽푘 − 훽푘 )푑푘 푦푘 푛푒푤 푊퐻푀 Step 6: if 0 < 휃푘 < 1 then calculate the 훽푘 as below: 4 푊퐻푀 푁퐻푆 퐻푅푀 훽푘 = 훽푘 = (1 − 휃푘)훽푘 + 휃푘훽푘 푛푒푤 푁퐻푆 If 휃푘 = 0 then 훽푘 = 훽푘 푛푒푤 퐻푅푀 If 휃푘 = 1 then 훽푘 = 훽푘 Step 7: set the new search direction with the relation: 푛푒푤 푑 = −푔푘+1 + 훽푘푑푘 푇 2 푛푒푤 Step 8: if |푔푘+1 푔푘| ≥ 0.2‖푔푘+1‖ then 푑푘+1 = −푔푘+1 else 푑푘+1 = 푑 , after that find ‖푑푘‖ 휆푘+1 = 휆푘 × ‖푑푘+1‖ Step 9: set k = k + 1 and go to step 1. 2.2 Convergence analysis The following assumptions are often used in previous studies of the conjugate gradient methods: [2, 18] Assumption A: 푛 푓(푥) is bounded from below on the level set Ω = {푥 ∈ ℝ | 푓(푥) ≤ 푓(푥0)}, where 푥0 is the starting point. Assumption B: In some neighbourhood 푁 of Ω, the objective function is continuously differentiable, and its gradient is Lipschitz continuous, that is, there exists a constant 퐿 > 0 such that. ‖푔(푥) − 푔(푦)‖ ≤ 푙‖푥 − 푦‖ ∀ 푥, 푦 ∈ 푁 Assumption C: ∀푥 ∈ Ω||푔(푥)|| ≤ Γ; Γ ≥ 0 Theorem 1: Suppose that the sequences {푔푘} and {푑푘} are generated by the presented method. Then the sequence {푑푘} possesses as the sufficient descent condition 푇 2 푔푘 푑푘 ≤ 푐‖푔푘‖ ∀푘 ≥ 0, 푐 > 0 Proof: For 푘 = 0 the relation (10) is fulfilled, because: 푇 2 푔0 푑0 = −‖푔0‖ For 푘 ≥ 1 푊퐻푀 푑푘+1 = −푔푘+1 + 훽푘 푑푘 푁퐻푆 퐻푅푀 푑푘+1 = −푔푘+1 + ((1 − 휃푘)훽푘 + 휃푘훽푘 ) 푑푘 푁퐻푆 퐻푅푀 푑푘+1 = −(휃푘푔푘+1 + (1 − 휃푘)푔푘+1) + ((1 − 휃푘)훽푘 + 휃푘훽푘 ) 푑푘 퐻푅푀 푁퐻퐶 푑푘+1 = 휃푘(−푔푘+1 + 훽푘 푑푘) + (1 − 휃푘)(−푔푘+1 + 훽푘 푑푘) 퐻푅푀 푁퐻퐶 푑푘+1 = 휃푘푑푘+1 + (1 − 휃푘)푑푘+1 We discuss according to the value of 휃푘 we find: 5 푁퐻퐶 i. If 휃푘 = 0 then 푑푘+1 = 푑푘+1 푇 푇 푁퐻퐶 푇 푁퐻퐶 2 푔푘+1푑푘+1 = 푔푘+1푑푘+1 = 푔푘+1(−푔푘+1 + 훽푘 푑푘) ≤ 푐1‖푔푘+1‖ 1 where 푐 = (1 − ) ; 휇 = 1.1 1 휇 퐻푅푀 ii. If 휃푘 = 1 then 푑푘+1 = 푑푘+1 푇 푇 퐻푅푀 푇 퐻푅푀 2 푔푘+1푑푘+1 = 푔푘+1푑푘+1 = 푔푘+1(−푔푘+1 + 훽푘 푑푘) ≤ 푐2‖푔푘+1‖ 1 where 푐 = (2 − ) ; 휎 = 0.001 2 1−5휎 iii. If 0 < 휃푘 < 1 then we suppose that: 0 < 푚1 ≤ 휃푘 ≤ 푚2 < 1 푇 푇 퐻푅푀 푇 푁퐻퐶 푔푘+1푑푘+1 = 휃푘푔푘+1푑푘+1 + (1 − 휃푘)푔푘+1푑푘+1 푇 푇 퐻푅푀 푇 푁퐻퐶 푔푘+1푑푘+1 ≤ 푚1푔푘+1푑푘+1 + (1 − 푚2)푔푘+1푑푘+1 푇 2 2 푔푘+1푑푘+1 ≤ 푚1푐2‖푔푘+1‖ + (1 − 푚2)푐1‖푔푘+1‖ 푇 2 푔푘+1푑푘+1 ≤ 푐 ‖푔푘+1‖ , 푐 = 푚1푐2 + (1 − 푚2)푐1 Theorem 2: If the assumptions are fulfilled, and since the search direction fulfills the condition of sufficient descent condition, then the presented method fulfills the property of global convergence.
Recommended publications
  • 12. Coordinate Descent Methods
    EE 546, Univ of Washington, Spring 2014 12. Coordinate descent methods theoretical justifications • randomized coordinate descent method • minimizing composite objectives • accelerated coordinate descent method • Coordinate descent methods 12–1 Notations consider smooth unconstrained minimization problem: minimize f(x) x RN ∈ n coordinate blocks: x =(x ,...,x ) with x RNi and N = N • 1 n i ∈ i=1 i more generally, partition with a permutation matrix: U =[PU1 Un] • ··· n T xi = Ui x, x = Uixi i=1 X blocks of gradient: • f(x) = U T f(x) ∇i i ∇ coordinate update: • x+ = x tU f(x) − i∇i Coordinate descent methods 12–2 (Block) coordinate descent choose x(0) Rn, and iterate for k =0, 1, 2,... ∈ 1. choose coordinate i(k) 2. update x(k+1) = x(k) t U f(x(k)) − k ik∇ik among the first schemes for solving smooth unconstrained problems • cyclic or round-Robin: difficult to analyze convergence • mostly local convergence results for particular classes of problems • does it really work (better than full gradient method)? • Coordinate descent methods 12–3 Steepest coordinate descent choose x(0) Rn, and iterate for k =0, 1, 2,... ∈ (k) 1. choose i(k) = argmax if(x ) 2 i 1,...,n k∇ k ∈{ } 2. update x(k+1) = x(k) t U f(x(k)) − k i(k)∇i(k) assumptions f(x) is block-wise Lipschitz continuous • ∇ f(x + U v) f(x) L v , i =1,...,n k∇i i −∇i k2 ≤ ik k2 f has bounded sub-level set, in particular, define • ⋆ R(x) = max max y x 2 : f(y) f(x) y x⋆ X⋆ k − k ≤ ∈ Coordinate descent methods 12–4 Analysis for constant step size quadratic upper bound due to block coordinate-wise
    [Show full text]
  • Process Optimization
    Process Optimization Mathematical Programming and Optimization of Multi-Plant Operations and Process Design Ralph W. Pike Director, Minerals Processing Research Institute Horton Professor of Chemical Engineering Louisiana State University Department of Chemical Engineering, Lamar University, April, 10, 2007 Process Optimization • Typical Industrial Problems • Mathematical Programming Software • Mathematical Basis for Optimization • Lagrange Multipliers and the Simplex Algorithm • Generalized Reduced Gradient Algorithm • On-Line Optimization • Mixed Integer Programming and the Branch and Bound Algorithm • Chemical Production Complex Optimization New Results • Using one computer language to write and run a program in another language • Cumulative probability distribution instead of an optimal point using Monte Carlo simulation for a multi-criteria, mixed integer nonlinear programming problem • Global optimization Design vs. Operations • Optimal Design −Uses flowsheet simulators and SQP – Heuristics for a design, a superstructure, an optimal design • Optimal Operations – On-line optimization – Plant optimal scheduling – Corporate supply chain optimization Plant Problem Size Contact Alkylation Ethylene 3,200 TPD 15,000 BPD 200 million lb/yr Units 14 76 ~200 Streams 35 110 ~4,000 Constraints Equality 761 1,579 ~400,000 Inequality 28 50 ~10,000 Variables Measured 43 125 ~300 Unmeasured 732 1,509 ~10,000 Parameters 11 64 ~100 Optimization Programming Languages • GAMS - General Algebraic Modeling System • LINDO - Widely used in business applications
    [Show full text]
  • The Gradient Sampling Methodology
    The Gradient Sampling Methodology James V. Burke∗ Frank E. Curtisy Adrian S. Lewisz Michael L. Overtonx January 14, 2019 1 Introduction in particular, this leads to calling the negative gra- dient, namely, −∇f(x), the direction of steepest de- The principal methodology for minimizing a smooth scent for f at x. However, when f is not differentiable function is the steepest descent (gradient) method. near x, one finds that following the negative gradient One way to extend this methodology to the minimiza- direction might offer only a small amount of decrease tion of a nonsmooth function involves approximating in f; indeed, obtaining decrease from x along −∇f(x) subdifferentials through the random sampling of gra- may be possible only with a very small stepsize. The dients. This approach, known as gradient sampling GS methodology is based on the idea of stabilizing (GS), gained a solid theoretical foundation about a this definition of steepest descent by instead finding decade ago [BLO05, Kiw07], and has developed into a a direction to approximately solve comprehensive methodology for handling nonsmooth, potentially nonconvex functions in the context of op- min max gT d; (2) kdk2≤1 g2@¯ f(x) timization algorithms. In this article, we summarize the foundations of the GS methodology, provide an ¯ where @f(x) is the -subdifferential of f at x [Gol77]. overview of the enhancements and extensions to it To understand the context of this idea, recall that that have developed over the past decade, and high- the (Clarke) subdifferential of a locally Lipschitz f light some interesting open questions related to GS.
    [Show full text]
  • A New Algorithm of Nonlinear Conjugate Gradient Method with Strong Convergence*
    Volume 27, N. 1, pp. 93–106, 2008 Copyright © 2008 SBMAC ISSN 0101-8205 www.scielo.br/cam A new algorithm of nonlinear conjugate gradient method with strong convergence* ZHEN-JUN SHI1,2 and JINHUA GUO2 1College of Operations Research and Management, Qufu Normal University Rizhao, Sahndong 276826, P.R. China 2Department of Computer and Information Science, University of Michigan Dearborn, Michigan 48128-1491, USA E-mails: [email protected]; [email protected] / [email protected] Abstract. The nonlinear conjugate gradient method is a very useful technique for solving large scale minimization problems and has wide applications in many fields. In this paper, we present a new algorithm of nonlinear conjugate gradient method with strong convergence for unconstrained minimization problems. The new algorithm can generate an adequate trust region radius automatically at each iteration and has global convergence and linear convergence rate under some mild conditions. Numerical results show that the new algorithm is efficient in practical computation and superior to other similar methods in many situations. Mathematical subject classification: 90C30, 65K05, 49M37. Key words: unconstrained optimization, nonlinear conjugate gradient method, global conver- gence, linear convergence rate. 1 Introduction Consider an unconstrained minimization problem min f (x), x ∈ Rn, (1) where Rn is an n-dimensional Euclidean space and f : Rn −→ R is a continu- ously differentiable function. #724/07. Received: 10/V/07. Accepted: 24/IX/07. *The work was supported in part by NSF CNS-0521142, USA. 94 A NEW ALGORITHM OF NONLINEAR CONJUGATE GRADIENT METHOD When n is very large (for example, n > 106) the related problem is called large scale minimization problem.
    [Show full text]
  • Conditional Gradient Method for Multiobjective Optimization
    Conditional gradient method for multiobjective optimization P. B. Assun¸c~ao ∗ O. P. Ferreira ∗ L. F. Prudente ∗ April 7, 2020 Abstract: We analyze the conditional gradient method, also known as Frank-Wolfe method, for constrained multiobjective optimization. The constraint set is assumed to be convex and com- pact, and the objectives functions are assumed to be continuously differentiable. The method is considered with different strategies for obtaining the step sizes. Asymptotic convergence proper- ties and iteration-complexity bounds with and without convexity assumptions on the objective functions are stablished. Numerical experiments are provided to illustrate the effectiveness of the method and certify the obtained theoretical results. Keywords: Conditional gradient method; multiobjective optimization; Pareto optimality; con- strained optimization problem 1 Introduction In the constrained multiobjective optimization, we seek to simultaneously minimize several objec- tive functions on a set C. One strategy for solving multiobjective problems that has become very popular consists in the extension of methods for scalar-valued to multiobjective-valued optimiza- tion, instead of using scalarization approaches [23]. As far as we know, this strategy originated from the work [14] in which the authors proposed the steepest descent methods for multiobjec- tive optimization. Since of then, new properties related to this method have been discovered and several variants of it have been considered, see for example [4,5,16,19,20,26,27]. In recent years, there has been a significant increase in the number of papers addressing concepts, techniques, and methods for multiobjective optimization, see for example [6, 9, 13, 15, 25, 41, 42, 45, 46, 48, 53, 54].
    [Show full text]
  • Successive Linear Programming to The
    THE UNIVERSITY OF MANITOBA AN APPLICATION OF SUCCESSIVE LINEAR PROGRAMMING TO THE OPTIMIZATION OF AN INTERCONNECTED HYDRO UTILMY OPERATION by @ KARoLI REZNICEK A Thesis Submitted to the Faculfy of Graduate Studies in Partial Fulfillment of the Requirements for the Degree of Master of Science DEPARTMENT OF CTVIL ENGINEERING Winnipeg, Manitoba DECEMBER 1988 Per¡niesion trae been granted L'autorleatlon a êtê accordêe to the Natlonal Llbrary of å la Blbllothèque natlonale Canada to nicrofiln thlg du Canada de nlcrofllner thesls and to lend or eell cette thèse et de prêter ou copies of the filn. de vendre dee exenplairee du f llrn. The author (copyright owner) Lrauteur (tltulaire du drolt has reaerved other d'auteur) ae rêeerve lee pubtication ríghts, and autres droits de publlcatlon; neither the thesie nor nl ta thèee nl de longe extensive extracts from it extraite de celle-cl ne Day be printed or otherwiee doivent être irnprlnês ou reproduced wlthout his/her autrement reprodults BanB aon written perurieeion. autorieatlon écrite. rsBN 0-3L5-5i.576-7 AN APPLICATION OF SUCCESSIVE LINEAR PROGRAMMING TO THE OPTIMIZATION OF AN INTERCONNECTED HYDRO UTILITY OPERATION BY KAROLJ REZNICEK A thesis submirted to the Faculty of craduate studies of the university of Manitoba in partiar furfilrment of the requirenrents of the degree of I'fASTER OF SCIENCE o 1989 Permission has beerr grarìred ro the LIBRARy oF THE UNIVER- S¡TY OF MANITOBA to lend or sell copies of this thesis. ro the NATIONAL LIBRARy oF CANADA to microfìrnr this thesis and to lend or seil copies of the firm, and uNIVERsrry llf ICROFILMS to publish an absrract of this thesis.
    [Show full text]
  • Metaheuristic Techniques in Enhancing the Efficiency and Performance of Thermo-Electric Cooling Devices
    Review Metaheuristic Techniques in Enhancing the Efficiency and Performance of Thermo-Electric Cooling Devices Pandian Vasant 1,2,*, Utku Kose 3 and Junzo Watada 4 1 Modeling Evolutionary Algorithms Simulation and Artificial Intelligence, Faculty of Electrical & Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam 2 Department of Fundamental and Applied Sciences, Universiti Teknologi Petronas, Seri Iskandar, Perak 32610, Malaysia 3 Computer Sciences Application& Research Center, Usak University, Usak 64000, Turkey; [email protected] 4 Department of Computer & Information System, Universiti Teknologi Petronas, Seri Iskandar, Perak 32610, Malaysia; [email protected] * Correspondence: [email protected] or [email protected]; Tel.: +60-16-489-2932 Received: 11 August 2017; Accepted: 2 September 2017; Published: 25 October 2017 Abstract: The objective of this paper is to focus on the technical issues of single-stage thermo-electric coolers (TECs) and two-stage TECs and then apply new methods in optimizing the dimensions of TECs. In detail, some metaheuristics—simulated annealing (SA) and differential evolution (DE)—are applied to search the optimal design parameters of both types of TEC, which yielded cooling rates and coefficients of performance (COPs) individually and simultaneously. The optimization findings obtained by using SA and DE are validated by applying them in some defined test cases taking into consideration non-linear inequality and non-linear equality constraint conditions. The performance of SA and DE are verified after comparing the findings with the ones obtained applying the genetic algorithm (GA) and hybridization technique (HSAGA and HSADE). Mathematical modelling and parameter setting of TEC is combined with SA and DE to find better optimal findings.
    [Show full text]
  • Branch and Bound Algorithm for Dependency Parsing with Non-Local Features
    Branch and Bound Algorithm for Dependency Parsing with Non-local Features Xian Qian and Yang Liu Computer Science Department The University of Texas at Dallas qx,yangl @hlt.utdallas.edu { } Abstract Charniak and Johnson, 2005; Hall, 2007). It typi- cally has two steps: the low level classifier gener- Graph based dependency parsing is inefficient ates the top k hypotheses using local features, then when handling non-local features due to high the high level classifier reranks these candidates us- computational complexity of inference. In ing global features. Since the reranking quality is this paper, we proposed an exact and effi- bounded by the oracle performance of candidates, cient decoding algorithm based on the Branch some work has combined candidate generation and and Bound (B&B) framework where non- local features are bounded by a linear combi- reranking steps using cube pruning (Huang, 2008; nation of local features. Dynamic program- Zhang and McDonald, 2012) to achieve higher or- ming is used to search the upper bound. Ex- acle performance. They parse a sentence in bottom periments are conducted on English PTB and up order and keep the top k derivations for each s- Chinese CTB datasets. We achieved competi- pan using k best parsing (Huang and Chiang, 2005). tive Unlabeled Attachment Score (UAS) when After merging the two spans, non-local features are 93.17% no additional resources are available: used to rerank top k combinations. This approach for English and 87.25% for Chinese. Parsing speed is 177 words per second for English and is very efficient and flexible to handle various non- 97 words per second for Chinese.
    [Show full text]
  • Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes
    Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes Dan Garber Ofer Meshi Toyota Technological Institute at Chicago Google [email protected] [email protected] Abstract Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when: i) the feasible set is a polytope, and ii) the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: 1. large memory requirement due to the need to store an explicit convex de- composition of the current iterate, and as a consequence, large running-time overhead per iteration 2. the worst case convergence rate depends unfavorably on the dimension In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular: 1. both memory and computation overheads are only linear in the dimension 2. in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous work, with a linear dependence on the number of non-zeros in the optimal solution At the heart of our method and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more.
    [Show full text]
  • Conjugate Gradient Descent Conjugate Direction Methods
    Conjugate Gradient Descent Conjugate Direction Methods 2 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses gradient) and Newton’s method (second-order method that uses Hessian as well). Motivation: ! steepest descent is slow. Goal: Accelerate it! ! Newton method is fast… BUT: we need to calculate the inverse of the Hessian matrix… Something between steepest descent and Newton method? 3 Conjugate Direction Methods Goal: • Accelerate the convergence rate of steepest descent • while avoiding the high computational cost of Newton’s method Originally developed for solving the quadratic problem: Equivalently, our goal is to solve: Conjugate direction methods can solve this problem at most n iterations (usually for large n less is enough) 4 Conjugate Direction Methods ! algorithm for the numerical solution of linear equations, whose matrix Q is symmetric and positive-definite. ! An iterative method, so it can be applied to systems that are too large to be handled by direct methods (such as the Cholesky decomposition.) ! Algorithm for seeking minima of nonlinear equations. 5 Numerical Experiments A comparison of * gradient descent with optimal step size (in green) and * conjugate vector (in red) for minimizing a quadratic function. Conjugate gradient converges in at most n steps (here n=2). 6 Conjugate directions Definition [Q-conjugate directions] ! In the applications that we consider, the matrix Q will be positive definite but this is not inherent in the basic definition. ! If Q = 0, any two vectors are conjugate. ! if Q = I, conjugacy is equivalent to the usual notion of orthogonality. 7 Linear independence lemma Lemma [Linear Independence] Proof: [Proof by contradiction] 8 Why is Q-conjugacy useful? 9 Quadratic problem Goal: the unique solution to this problem is also the unique solution to the linear equation: 10 Importance of Q-conjugancy Therefore, Standard orthogonality is not enough anymore, We need to use Q-conjugacy.
    [Show full text]
  • Block Stochastic Gradient Method
    Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization Yangyang Xu∗ Wotao Yiny March 3, 2015 Abstract The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of) variables. This paper introduces a method that combines the great features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. This paper proposes a block stochastic gradient (BSG) method for both con- vex and nonconvex programs. BSG generalizes SG by updating all the blocks of variables in the Gauss-Seidel type (updating the current block depends on the previously updated block), in either a fixed or randomly shuffled order. Although BSG has slightly more work at each iteration, it typically outperforms SG be- cause of BSG's Gauss-Seidel updates and larger stepsizes, the latter of which are determined by the smaller per-block Lipschitz constants. The convergence of BSG is established for both convex and nonconvex cases. arXiv:1408.2597v3 [math.OC] 2 Mar 2015 In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and logistic regression, which are convex, and low-rank tensor recov- ery and bilinear logistic regression, which are nonconvex.
    [Show full text]
  • 13. Conjugate Gradient Method
    L. Vandenberghe ECE236C (Spring 2020) 13. Conjugate gradient method conjugate gradient method for linear equations • complexity • conjugate gradient method as iterative method • applications in nonlinear optimization • 13.1 Unconstrained quadratic minimization 1 minimize f x = xT Ax bT x ¹ º 2 − with A symmetric positive definite and n n × equivalent to solving the linear equation Ax = b • the residual r = b Ax is the negative gradient: r = f x • − −r ¹ º Conjugate gradient method (CG) invented by Hestenes and Stiefel around 1951 • the most widely used iterative method for solving Ax = b, with A 0 • can be extended to non-quadratic unconstrained minimization • Conjugate gradient method 13.2 Krylov subspaces Definition: a sequence of subspaces k 1 = 0 ; = span b; Ab;:::; A − b for k 1 K0 f g Kk f g ≥ Properties subspaces are nested: • K0 ⊆ K1 ⊆ K2 ⊆ · · · dimensions increase by at most one: dim dim is zero or one • Kk+1 − Kk if = , then = for all i k: • Kk+1 Kk Ki Kk ≥ k k 1 i k 1 A b span b; Ab;:::; A − b = A b span b; Ab;:::; A − b for i > k 2 f g ) 2 f g Conjugate gradient method 13.3 Solution of Ax = b Key property: 1 A− b 2 Kn this holds even when , Rn Kn from Cayley–Hamilton theorem, • n n 1 p A = A + a A − + + a I = 0 ¹ º 1 ··· n n n 1 where p λ = det λI A = λ + a1λ − + + an 1λ + an ¹ º ¹ − º ··· − multiplying on the right with A 1b shows • − 1 1 n 1 n 2 A− b = A − b + a1A − b + + an 1b −an ··· − Conjugate gradient method 13.4 Krylov sequence xk = argmin f x ; k = 0;1;::: x ¹ º 2Kk from previous page, x = A 1b • n − CG method is a recursive method for computing the Krylov sequence x , x ,..
    [Show full text]