A Machine Learning Based Meta-Heuristic Approach for Constrained Continuous Optimization

Pruned Search: A Machine Learning Based Meta-Heuristic Approach for Constrained Continuous Optimization Ruoqian Liu, Ankit Agrawal, Wei-keng Liao, Alok Choudhary Zhengzhang Chen EECS Department NEC Laboratories America, Inc. Northwestern University Princeton, NJ USA Evanston, IL USA [email protected] frll943,ankitag,wkliao,[email protected] Abstract—Searching for solutions that optimize a continuous Excluding brute-force methods where making exhaustive function can be difficult due to the infinite search space, and can enumeration of solutions is insurmountable for continuous be further complicated by the high dimensionality in the number problems (unless some discretization is performed), classical of variables and complexity in the structure of constraints. Both deterministic and stochastic methods have been presented in the algorithmic optimization techniques can be divided into two literature with a purpose of exploiting the search space and main groups. The first group of methods are searches typi- avoiding local optima as much as possible. In this research, we cally based on deterministic numerical rules realized through develop a machine learning framework aiming to ‘prune’ the the differential calculus, such as Gradient Search, Linear search effort of both types of optimization techniques by devel- Programming, Quadratic Programming, etc. They are deter- oping meta-heuristics, attempting to knowledgeably reorder the search space and reduce the search region. Numerical examples ministic in that given the same initial values, repeated runs demonstrate that this approach can effectively find the global will always produce the same result. In practice, this quality optimal solutions and significantly reduce the computational time renders overcoming local optima a serious problem. As an for seven benchmark problems with variable dimensions of 100, alternative, the other group of methods introduces stochastic 500 and 1000, compared to Genetic Algorithms. elements. Examples including Simulated Annealing, Genetic Keywords-constrained optimization; complexity reduction; ma- Algorithms [1] and Particle Swarm Optimization are less likely chine learning; meta-heuristics to get stuck in local optima are and more flexible towards discovering new solution spaces. I. INTRODUCTION Our work in [2] has proposed the concept of search space Searching for suitable solutions that optimize an objective preprocessing to narrow down the search space which could function, and/or satisfy a set of constraints, is a general then be used by traditional searches, in order to ease the technique and ubiquitous solution to applications in various curse of dimensionality. The framework consisting of dataset engineering areas. Search problems are established around construction, search order reduction and search domain re- three components: variables, constraints, and the objective duction was proposed, employing data mining and dimension function, and searches are performed in the domain of each reduction techniques to gain informative insights about the variable which can be either continuous or discrete. Discrete problem space, so as to direct the search quest into more optimization, or combinatorial optimization, has a finite set of promising areas. In this paper, we implement the proposed solutions and is usually represented by graph structures and idea into ’Pruned Search’, a Machine Learning (ML) based approached with heuristic search (e.g. A*) and dynamic pro- meta-heuristic developed for fast and accurate optimization gramming. Continuous optimization, however, involves search search for constrained and continuous problems. spaces that are infinite, and the challenge of search escalates Given an optimization problem with a large amount of when high dimensionality in the number of variables and great variables as well as constraints, we first collect a set of repre- complexity in the structure of constraints are acknowledged. sentative variable-objective data instances using the proposed A d-dimensional continuous optimization problem can be data distillation technique based on vertex enumeration and expressed as: Lagrangian relaxation. Then, by employing feature selection minimize f0(x) and classification techniques, we refine the search path and subject to fi(x) ≤ bi; i = 1; : : : ; n: search region. Finally, a simple line search-like algorithm is Here x = (x1; : : : ; xd) is the vector of variables, the proposed to enhance the optimization. The rest of the paper d function f0 : R ! R is the objective function, and functions is organized as follows. Section II goes into the methodology d fi : R ! R; i = 1; : : : ; n, with constants b1; : : : ; bn, are the of the proposed framework, with a subsection dedicated to constraints. In computational geometry, such a problem has a each of our three key processes. In Section III we present problem domain (if feasible) as a polyhedra in Rd, defined by experiments conducted on: (1) a simple problem to validate the n halfspaces from the n constraints. path refinement, (2) a number of canonical test problems, and (3) a synthetic problem. Section IV lists related works, and path is deployed. A simple line search-like algorithm is Section V concludes the paper. employed. We specify a prefixed searching order, and replace the original constraints with the pruned ones. II. METHODOLOGY We will use an example problem throughout Section II-B A. Overview to Section II-D to illustrate each step. The example problem, Following the philosophy stated in [2] we consider develop- G1, is taken from [3]. It is formulated as follows. ing meta-heuristics with ML to enhance the search process in G1 Problem: P4 P4 2 P13 optimization. The attempt is to have the search force focused Min: f(x) = 5 i=1 xi − 5 i=1 xi − i=5 xi in a more promising path and prune the irrelevant effort. In s.t.: g1(x) = 2x1 + 2x2 + x10 + x11 − 10 ≤ 0, an optimization problem, variables cannot be fumbled freely g2(x) = 2x1 + 2x3 + x10 + x12 − 10 ≤ 0, with dimension reduction methods such as PCA, but we can g3(x) = 2x2 + 2x3 + x11 + x12 − 10 ≤ 0, still assume that one of the following statements will hold. g4(x) = −8x1 + x10 ≤ 0, g5(x) = −8x2 + x11 ≤ 0, • Assumption 1: The desired (optimal) value of the func- g (x) = −8x + x ≤ 0, tion depends only on a reduced, albeit unknown, set of 6 3 12 g (x) = −2x − x + x ≤ 0, variables. 7 4 5 10 g8(x) = −2x6 − x7 + x11 ≤ 0, • Assumption 2: The impact of each variable to the function g (x) = −2x − x + x ≤ 0, is different. Hence, there exists an optimal order in terms 9 8 9 12 x ≥ 0; i = 1;:::; 13, of searching priority. i xi ≤ 1; i = 1;:::; 9; 13, The first assumption imposes an additional parameter—the xi ≤ 100; i = 10; 11; 12. size of the subset of “active” variables. Both assumptions are The G1 problem has 13 variables, 9 complex linear com- commonly seen in many industrial processes, during the design binational constraints and 2 boundary constraints on each of of which, variables are included as many as possible but not all the variable, in total 35. The true global minima is known as of them are equally important. The two assumptions together x∗ = (1; 1;:::; 1; 3; 3; 3; 1); f(x∗) = −15. G1 is not chosen complete the scene of intrinsic variable priority on which our to prove the robustness of our methodology to high dimensions proposed method is based. (which will be exhibited in Section III with problems up to In this setting we can apply feature selection in data mining 1,000 dimensions), but rather to help illustrate each step. to analyze variable relations. The framework contains three major components, as shown in Fig. 1. The functionalities of B. Data Distillation with Vertex Enumeration each component are introduced below, and further explained Given an optimization problem, the simplest way of data and illustrated with an example in the following sections. collection would be feeding in random combinations of valid Data distillation. For ML to work, it needs to have access variable values to the function and collecting its outputs. to a set of data that contain values of the variables and Number-theoretic methods (NTM’s) are a class of techniques the corresponding function value. The data has to be by which representative points of the uniform distribution on distinguishably significant in that it contains instances the unit cube can be generated [4]. One example of NTM is with the most wanted (highest or lowest corresponding the quasi-Monte Carlo model. to maximization or minimization) function values. We However, while a uniformly randomized set could be rep- obtain such a distilled significant data set by firstly trans- resentative for numerical analysis of functions, it does not forming the problem into the computational geometry necessarily serve as representative when it comes to machine regime to form a polyhedra representing the feasible learning purposes. Just as we only need to show a child red and space, and then extracting the vertices of the polyhedra. green cards if what we want him/her to learn is to distinguish Lagrangian relaxation is used to handle complicated red and green, we claim a representative and significant set constraining conditions. for analyzing variable relations is the collection of instances Complexity reduction. With a proper collection of variable- that produce very high and very low function values. One objective value instances, ML can learn to extract infor- set is the “target”, and the other is included as an opposing mation of variables to produce two functionalities: 1) the force. Depending on what feature selection method is used, creation of an ordered list of variables based on their the opposing set can sometimes be omitted. influence and impact towards the function, and 2) the To collect such a data set, we follow the idea of vertex reduction of the feasible region in variable searching. enumeration in computational geometry. The set of linear The former is achieved through feature selection methods inequalities Ax ≤ b defines a polyhedron in Rd.

Load more