GLOBAL SEARCH METHODS FOR SOLVING

NONLINEAR OPTIMIZATION PROBLEMS

BY

YI SHANG

BEngr University of Science and Technology of China

MEngr Academia Sinica

THESIS

Submitted in partial fulllment of the requirements

for the degree of Do ctor of Philosophy in Computer Science

in the Graduate College of the

University of Illinois at UrbanaChampaign Urbana Illinois

c

Copyrightby Yi Shang

GLOBAL SEARCH METHODS FOR SOLVING

NONLINEAR OPTIMIZATION PROBLEMS

Yi Shang PhD

Department of Computer Science

University of Illinois at UrbanaChampaign

Benjamin W Wah Advisor

In this thesis we present new metho ds for solving nonlinear optimization problems These

problems are dicult to solve b ecause the nonlinear constraints form feasible regions that

are dicult to nd and the nonlinear ob jectives contain lo cal minima that trap descenttyp e

search metho ds In order to nd go o d solutions in nonlinear optimization we fo cus on the

following twokey issues how to handle nonlinear constraints and how to escap e from lo cal

minima We use a Lagrangemultiplierba sed formulation to handle nonlinear constraints

and develop Lagrangian metho ds with dynamic control to provide faster and more robust

convergence We extend the traditional Lagrangian theory for the continuous space to the

discrete space and develop ecient discrete Lagrangian metho ds Toovercome lo cal minima

we design a new tracebased globalsearch metho d that relies on an external traveling trace

to pull a search tra jectory out of a lo cal optimum in a continuous fashion without having to

restart the search from a new starting p oint Go o d starting p oints identied in the global

search are used in the lo cal searchtoidentify true lo cal optima By combining these new

metho ds wedevelop a prototyp e called Novel Nonlinear Optimization Via External Lead

that solves nonlinear constrained and unconstrained problems in a unied framework

Weshow exp erimental results in applying Novel to solve nonlinear optimization problems

including a the learning of feedforward neural networks b the design of quadrature

um satisability mirrorlter digital lter banks c the satisabili ty problem d the maxim

problem and e the design of multiplierless quadraturemirrorlter digital lter banks Our

metho d achieves b etter solutions than existing metho ds or achieves solutions of the same

quality but at a lower cost iii

DEDICATION

TomywifeLei my parents and my son Charles iv

ACKNOWLEDGEMENTS

This dissertation would not have b een p ossible without the supp ort of many p eople

With much pleasure I take the opp ortunity to thank all the p eople who havehelpedme

through the course of graduate study

First I would like to thank my advisor Professor Benjamin W Wah for his invaluable

guidance advice and encouragement His constant condence in the face of research uncer

tainties and his p ersistent questioning of our metho ds have b een and will always b e inspiring

to me I would like to express my gratitude for the other memb ers of my dissertation com

mittee Professors Michael T Heath Chung L Liu and Sylvian R Ray for their careful

reading of the dissertation and valuable suggestions I would like to thank Dr Bart Selman

for providing help and guidance in studying SAT problems I would like to thank Ms Jenn

Wilson for carefully reading the nal draft of this dissertation

Many thanks are due to my colleagues who havecontributed to this dissertation Dr Yao

Jen Chang for laying the foundation of this researchwork Ms Ting Yu for adapting our

metho ds to digital lterbank applications Mr Tao Wang for extending our metho ds to

constrained optimization problems and Mr Zhe Wu for extending our metho ds to more

discrete optimization applications and carrying on this work in the future I would liketo

thank all my current and former colleagues LonChan Chu Kumar GanapathyArthur

Ieumwananonthachai YaoJen Chang ChinChi Teng Peter Wahle Tao Wang ChienWei

Li Yu Ting Xiao Su Je Monks Zhe Wu and Dong Lin for their helpful discussions

useful advice supp ort and friendship v

Iwould liketoacknowledge the nancial supp ort of National Science Foundation Grants

MIP MIP and MIP and National Aeronautics and Space

Administration GrantNAG

My family always surrounded me with love and supp ort I wish to thank my wife Lei for

her love understanding encouragement and patience I wish to thank my parents whose

love and supp ort have brought me this far I wanttothankmy sister Ting and my brother

Ying for their caring and supp ort throughout my education I want to thank my parentsin

law for their help in the nal stage of my graduate education I wish to dedicate this thesis

to my wife my parents and my little b oy Charles vi

TABLE OF CONTENTS

Page

INTRODUCTION

Optimization Problems

ATaxonomy of Optimization Problems

Continuous Optimization Problems

Discrete Optimization Problems

Decision Problems

Challenges in Solving Nonlinear Optimization Problems

Characteristics of Nonlinear Optimization Algorithms

Goals and Approaches of This Research

Approaches to Handle Nonlinear Constraints

Approaches to Overcome Lo cal Minima

A Unied Framework to Solve Nonlinear Optimization Problems

Applications of Prop osed Metho ds

Outline of Thesis

Contributions of This Research

HANDLING NONLINEAR CONSTRAINTS

Previous Work

Nontransformational Approaches

Transformational Approaches

Existing Lagrangian Theory and Metho ds

Discrete Optimization

Prop osed Metho ds for Handling Nonlinear Constraints

Handling Continuous Constraints

Convergence Sp eed of Lagrangian Metho ds

Dynamic Weight Adaptation

Discrete Constraint Handling Metho d

Summary vii

OVERCOMING LOCAL MINIMA

Previous Work on Global Optimization

Deterministic Global Optimization Metho ds

Probabilistic Global Optimization Metho ds

Global Search in Constrained Optimization

Problem Sp ecication and Transformation

A New Multilevel Global SearchMethod

Tracebased Global Search

General Framework

An Illustrative Example

Trace Function Design

Evaluation of Coverage

Space Filling Curves

Searching from Coarse to Fine

Numerical Evaluation of TraceFunction Coverage

Summary of TraceFunction Design

Sp eed of Trace Function

Variable Scaling During Global Search

Numerical Metho ds for Solving the Dynamic Systems of Equations

Novel A Prototyp e Implementing Prop osed Metho ds

Continuous Metho ds

Discrete Metho ds

Potential for Parallel Pro cessing

Summary

UNCONSTRAINED OPTIMIZATION IN NEURALNETWORK LEARNING

Articial Neural Networks

Feedforward NeuralNetwork Learning Problems

Existing Metho ds for Feedforward Neural Network Learning

Learning Using Lo cal Optimization

Learning Using Global Search

Exp erimental Results of Novel

Benchmark Problems Studied in Our Exp eriments

Exp erimental Results on the Twospiral Problem

Comparison with Other Global Optimization Metho ds

Exp erimental Results on Other Benchmarks

Summary

CONSTRAINED OPTIMIZATION IN QMF FILTERBANK DESIGN

TwoChannel Filter Banks

Existing Metho ds in QMF FilterBank Design

Optimization Formulations of QMF Filterbank Design viii

Evaluations of Performance Metrics

Closedform Formulas of E x E x and E x

r s p

Numerical Evaluation of x x T x

s p t

Evaluations of the Firstorder Derivatives of Performance Metrics

Exp erimental Results

Performance of Novel

Comparison with Other Global Optimization Metho ds

Performance Tradeos

Summary

DISCRETE OPTIMIZATION

Intro duction to Satisability Problems

Previous Work

Discrete Formulations

Continuous Formulations

Applying DLM to SolveSAT Problems

Implementations of DLM

Algorithmic Design Considerations

Three Implementations of the Generic DLM

Reasons b ehind the DevelopmentofA

Exp erimental Results

Solving Maximum Satisability Problems

Previous Work

DLM for Solving MAXSAT Problems

Exp erimental Results

Designing Multiplierless QMF Filter Banks

Intro duction

DLM Implementation Issues

Exp erimental Results

Summary

CONCLUSIONS AND FUTURE WORK

Summary of Work Accomplished

Future Work

APPENDIX A EXPERIMENTAL RESULTS OF NOVEL ON SOME TEST

FUNCTIONS

REFERENCES

VITA ix

LIST OF TABLES

Table Page

Global and lo cal searchcomponents used in existing global optimization

metho ds

Von Neumann computer versus biological neural systems

Benchmark problems studied in our exp eriments The numb ers of inputs

outputs and training and test patterns are listed in the table

Results of using NOVEL with the LSODE solver to train four and ve

hiddenunit neural networks for solving the twospiral problem

Summary results of Novel using the Euler metho d to solvethetwospiral

problem The limit of time unit in each run is

The p erformance of the b est networks with and hidden units

whichwere trainned byNOVEL and simulated annealing SA

Comparison of the b est results obtained byNOVEL and truncated New

tons algorithm with multistarts TNMS for solving four b enchmark

problems where the parameters in one metho d that obtains the b est re

sult may b e dierent from those of another metho d Results in b old font

are b etter than or equal to results obtained by TNMS

Possible design ob jectives of lter banks Refer to Eq and

Figure for explanation

Lower and upp er b ounds for calculating the reconstruction error E

r

Exp erimental results of the dynamic Lagrangian metho d in designing QMF

lter banks We use Johnstons solutions as starting p oints

Exp erimental results of Novel using tracebased global search and the dy

namic Lagrangian metho d as lo cal search in solving the QMF lterbank

design problems A MHz Pentium Pro running Linux x

Exp erimental results of simulated annealing SA for solving QMF lter

bank problems The co oling rates of SA for and tap

lter banks are and resp ectively The total

number of evaluations for and tap lter banks are

and millions resp ectively A MHz Pentium Pro running

Linux

Exp erimental results of the evolutionary algorithm for solving a constrained

formulation EAConstr of QMF lterbank problems The p opulation

size of EA is n where n is the numberofvariables and the generations

for and tap lter banks are

and resp ectively A Sun SparcStation

Exp erimental results of the evolutionary algorithm for solving the weighted

sum formulation EAWt of QMF lterbank problems The p opulation

size of EA is n where n is the numberofvariables and the generations

for and tap lter banks are

and resp ectively A Sun SparcStation

Execution times of A in CPU seconds on a Sun SparcStation for

one run of A starting from x origin as the initial p oint on some

DIMACS b enchmark problems

Comparison of A s execution times in seconds averaged over runs with

resp ect to published results of WSAT GSAT and DavisPutnams algo

rithm on some of the circuit diagnosis problems in the DIMACS

archive A Sun SparcStation and a MHz SGI Challenge with

MIPS R GSAT WSAT and DP SGI Challenge with a MHz MIPS

R

Comparison of A s execution times in seconds averaged over runs with

the published results on the circuit synthesis problems including the b est

known results obtained byGSAT IP and simu

lated annealing A Sun SparcStation and a MHz SGI

Challenge with MIPS R GSAT and SA SGI Challenge with a

MHz MIPS R Integer Programming VAX

Comparison of DLMs execution times in seconds averaged over runs

with the b est known results obtained byGSAT on the DIMACS

circuitsynthesis paritylearning articially generated SAT instances

and graph coloring problems Results on A were based on a tabu length

of at region limit of reset interval of and reset to

be whenthe reset interval is reached For gandg

c For gandg c A A A Sun

SparcStation GSAT SGI Challenge mo del unknown xi

Comparison of DLMs execution times in seconds over runs with the

published results of Grasp on the aim problems from the DIMACS

archive DLM A A Sun SparcStation Grasp SGI Chal

lenge with a MHz MIPS R Success ratio for Grasp is always

Comparison of DLMs execution times in seconds over runs with the

published results of Grasp on the aim problems from the DIMACS

archive DLM A A Sun SparcStation Grasp SGI Chal

lenge with a MHz MIPS R Success ratio for Grasp is always

Comparison of DLM A s execution times in seconds over runs with

the published results of Grasp on the ii problems from the DIMACS

archive DLM A Sun SparcStation Grasp SGI Challenge

with a MHz MIPS R The success ratios for DLM A and Grasp

are always

Comparison of DLM A s execution times in seconds over runs with the

published results of Grasp on the jnh par and ssa problems from

the DIMACS archive DLM A Sun SparcStation Grasp

SGI Challenge with a MHz MIPS R and success ratio for Grasp

is always

Comparison of DLM A s execution times in seconds over runs with

the published results of Grasp on some of the more dicult DIMACS

benchmark problems from the DIMACS archiv e Success ratio of

Grasp is always Program parameters For all the problems at

region limit reset to every iterations For the par

problems tabu length For the rest of the par problems

tabu length For the f problems tabu length

System conguration For the hanoi problem tabu length

DLM A Sun SparcStation Grasp SGI Challenge with a MHz

MIPS R

Execution times of DLM A in Sun SparcStation CPU seconds over

runs on the as and tm problems from the DIMACS archive

Comparison of results of DLM and those of GRASP with resp ect to the op

P

timal solutions jnh clauses w GRASP was run

i

i

iterations on a MHz SGI Challenge with MIPS R whereas

DLM with scaling was run times each for iterations from

random starting p oints on a Sun SparcStation For DLM with dy

namic scaling I and r

Continuation of Table Comparison of results of DLM and those of

GRASP with resp ect to the optimal solutions jnh clauses

P P

w jnh clauses w

i i

i i xii

Comparison of average CPU time in seconds for iterations of DLM

and GRASP

Comparison of solutions found by DLM and those found by GRASP with

longer iterations

Comparison of a PO lter bank obtained by truncating the real co ef

cients of Johnstons e QMF lter bank to bits and a similar

PO lter bank whose co ecients were scaled by b efore trunca

tion Performance has b een normalized with resp ect to the p erformance

of the original lter bank

Comparison of normalized p erformance of PO lter banks designed by

DLM with resp ect to those designed by Johnston and Chen Columns

show the p erformance of DLM using bits for tap lters and bits

for tap lters normalized with resp ect to that of Johnstons e d

and e lter banks Columns show the p erformance of DLM

using bits normalized with resp ect to that of Chen et als tap and

tap lter banks

Comparison of normalized p erformance of e PO QMF lter banks de

signed by DLM with resp ect to those designed byNovel simulated anneal

ing SA and genetic algorithms EACt and EAWt

A A set of multimo dal test functions

A Exp erimental results of multistarts of gradient descent on the test func

tions The numb er of runs that nd the optimal solutions are shown in

the table out of runs If the optimal solution is not found the b est

solution obtained is shown in the parenthesis

A Exp erimental results of Novel on the test functions The index of the rst

lo cal descentofNovel that nds the global minimum is shown in the table

out of descents xiii

LIST OF FIGURES

Figure Page

A classication of optimization and decision problems The classes of

problems addressed in this thesis are shaded

Illustration of diculties in optimizing a nonlinear function There may

exist lo cal minima in a multimo dal nonlinear function with dierentchar

acteristics shallow or deep wide or narrow

Twodimensional pro jections of the dimensional error surface for a ve

hiddenunit weight feedforward neural network with sigmoidal activa

tion function The terrain is around a solution found by our new global

search metho d Novel to solvethetwospiral problem These two graphs

are plotted along two dierent pairs of dimensions

Transformation of various forms of nonlinear optimization problems into

a unied formulation that is further solved by global search metho ds

Organization of this thesis

Classication of metho ds to handle nonlinear constraints Our new metho d

presented in this chapter is a variation of the Lagrangian metho d

Classication of discrete optimization metho ds

Illustration of the search tra jectories using the slackvariable metho d when

the saddle p oint is on the b oundary of a feasible region f x is an ob jective

function of variable x The search can a converge to a saddle p oint b

oscillate around a saddle p oint or c div erge to innity

Illustration of the search tra jectory using the MaxQ metho d when the

saddle p oint is on the b oundary of a feasible region f x is the ob jective

function The search starts from inside left or outside right the feasible

region In b oth cases the search tra jectory eventually approaches the

saddle p oint from outside the feasible region xiv

Search prole with static weight w in designing a e QMF lter

bank The ob jectiveisoverweighted and its values are reduced quickly

in the b eginning Later the Lagrangian part slowly gains ground and

pushes the searchbackinto the feasible region The convergence time to

the saddle p oint is long CPU minutes at t

Search prole with static weight w in designing a e QMF lter

bank The ob jective and the constraints are balanced and the convergence

sp eed to the saddle p oint is faster CPU minutes at t

Search prole with static weight w in designing a e QMF lter

bank The constraints are overweighted and constraint satisfaction dom

inates the search pro cess The tra jectory is kept inside or very close to the

feasible region However the improvement of the ob jectivevalue is slow

causing slowoverall convergence to the saddle p oin t CPU minutes

at t

Framework of the dynamic weightadaptation algorithm

Weightadaptation rules for continuous and discrete problems Step of

Figure

Prole with adaptivechanges of w in designing a e QMF lter bank

The searchconverges much faster CPU minutes in t than

those using static weights

Execution proles of DLM A without dynamic scaling of Lagrange mul

tipliers on problem g starting from a random initial p oint The

top left graph plots the Lagrangianfuncti on values and the number of

unsatised clauses against the numb er of iterations sampled every it

erations The top right graph plots the minimum average and maximum

values of the Lagrange multipliers sampled every iterations The b ot

tom two graphs plot the iterationbyiteration ob jective and Lagrangian

part values for the rst iterations

Execution proles of A with p erio dic scaling of by a factor of every

iterations on problem g starting from a random initial

poin t Note that the average Lagrangemultipli er values are very close to

See Figure for further explanation

Classication of lo cal optimization metho ds for unconstrained nonlinear

optimization problems

Classication of unconstrained nonlinear continuous global minimization

metho ds

Framework of a new multilevel global search metho d This metho d has a

combined global and lo cal search phases

In the tracebased global search the search tra jectory showsacombined

eect of gradient descents and pull exerted bythemoving trace Three cas

caded global search stages are shown in this gure that generate tra jectory

and resp ectively xv

Framework of a threestage tracebased global search metho d

Relationship b etween the tracebased global search and the lo cal optimiza

tion metho d

Illustration of the tracebased global search D plots top row and D

contour plots b ottom row of Levys No function with sup erimp osed

search tra jectories trace rst column tra jectory after stage second

column tra jectory after stage third column and tra jectory after stage

fourth column Darker regions in the contour plot mean smaller func

tion values The tra jectories are solved by LSODE a dierentialequation

solver

Illustration of the distance evaluation in a D square and several curves

with dierent lengths The left graph contains a very short curve The

curve is extended in the middle graph The right graph have the longest

curve and the b est coverage The number in each subsquare is the shortest

unit distance from the subsquare to the curve

Two equallength curves C and C with the same maximal distance of

subspaces but with dierent distance distributions

Space lling curves snakecurve left and the rst twoPeano curves P

and P right

The rst three Hilb ert space lling curves of degree

D plots of trace function at t and resp ectively The parameter

values of the trace function are a s d and

i

k

Coverage measurements of a trace function in a D space Each dimension

has sample p oints The three graphs corresp ond to the coverage when

the trace runs for and time units resp ectivelyEachnumber is

the unit distance of the corresp onding subsquare to the trace function

The coverage of trace functions in a D space with a and

d and resp ectivelyEach graph shows a D plot and

a contour plot of the cumulative distribution of distances from subspaces

to the trace function The contour plots show the and p ercentile

of the distribution

Sampling is used to evaluate the coverage of a set of trace functions in a

D space with a and d and resp ectivelyEach

graph shows a d plot and a contour plot of the cumulative distribution

of distances b etween subspaces and the trace function The contour plots

show the and p ercentiles of the distribution

The coverage of a set of trace functions in a dimensional space with

a and d and resp ectivelyEach graph shows

a D plot and a contour plot of the cumulative distribution of distances

between subspaces and the trace function The contour plots show the

and p ercentiles of the distribution xvi

The coverage of a set of trace functions in a dimensional space with

a and d and resp ectivelyEach graph shows

a D plot and a contour plot of the cumulative distribution of distances

between subspaces and the trace function The contour plots show the

and p ercentiles of the distribution

The coverage of a set of trace functions in a dimensional space with

a and d and resp ectivelyEach graph shows

a D plot and a contour plot of the cumulative distribution of distances

between subspaces and the trace function The contour plots show the

and p ercentiles of the distribution

Variable scaling stretches out the variable which reduces the backward

force due to the gradient and increases the forward force due to the trace

Illustration of gradient and trace force around a lo cal minimum along di

mension x The left graph shows the situation b efore variable scaling in

which the search tra jectory is trapp ed in the lo cal minim um The right

graph shows the situation after variable scaling where variable x is scaled

up times and the search tra jectory overcomes the lo cal minimum The

graphs show the trace function going from left to right and the resulting

search tra jectory

The trace function and search tra jectories after one stage of the trace

based global search with and without adaptivevariable scaling on the D

Grewanks function The D contour plots of Grewanks function are

shown in all three graphs A darker color in the contour plot represents

a smaller function value The left graph shows the trace function starting

from the p osition marked x The middle graph shows search tra jectory

without variable scaling The right graph shows search tra jectory with

adaptivevariable scaling

Organization of the Novel global search metho d

A generic discrete Lagrangian metho d

McCullo chPitts mo del of a neuron left and a threelayer feedforward

neural network right

The training set of the twospiral problem left and the test set of the

twospiral problem on top of the training set right

Structure of the neural network for the twospiral problem

A four hiddenunit neural network trained byNovel that solv es the two

spiral problem xvii

D classication graphs for the twospiral problem by rst column

second column third column and fourth column hiddenunit

neural networks trained byNovel upp er row and SIMANN lower row

Parameters for Novel are and and search

g t

range Parameters for SIMANN are RT NT n and

search range The crosses and circles represent the two spirals

of training set Note that the hiddenunit network was trained byNovel

dierently from the other networks

The b est p erformance of various global minimization algorithms for learn

ing the weights of neural networks with and hidden units for

solving the twospiral problem

Training and testing errors of the b est designs obtained byvarious algo

rithms for solving the twospiral problem There are and

weights including biases in neurons in the neural networks with resp ec

tiv ely and hidden units

The structure of a twochannel lter bank

The prototyp e lowpass lter and its mirror lter form the reconstruction

error E of the QMF lter bank

r

An illustration of the p erformance metrics of a single lowpass lter

P

k

Kahans formula for summation a

j

j

Comparison of convergence sp eed of Lagrangian metho ds with and without

dynamic weightcontrol for d left and e right QMF lterbank

design problems The convergence times of the Lagrangian metho d using

dierentstaticweights are normalized with resp ect to the time sp entby

the adaptive Lagrangian metho d

A comparison of our D QMF design and Johnstons design The left

graph shows the passband frequency resp onse and the right graph shows

the reconstruction error

Exp erimental results in relaxing the constraints with resp ect to Johnstons

designs for D QMFs The xaxis shows the relaxation ratio of stopband

energy and ripple left or transition bandwidth right as constraints in

Novel with resp ect to Johnstons value The y axis shows the ratio of the

measure found byNovel with resp ect to Johnstons

Exp erimental results in relaxing and in tightening the constraints with

resp ect to Johnstons designs for D left and D right QMFs The

xaxis shows the ratio of the constraintinNovel with resp ect to Johnstons

value The y axis shows the ratio of the p erformance measures of Novels

solutions with resp ect to Johnstons

Illustration of the discrete gradient op erator for SAT

Generic discrete Lagrangian algorithm A for solving SAT problems

Discrete Lagrangian Algorithm Version A an implementation of A for

solving SAT problems xviii

Discrete Lagrangian Algorithm Version A for solving SAT problems

Discrete Lagrangian Algorithm Version A for solving SAT problems

Execution proles of A on problem g starting from a random

initial p oint The top left graph plots the Lagrangianfunctio n values and

the numb er of unsatised clauses against the numb er of iterations sampled

every iterations The top right graph plots the minimum average

and maximum values of the Lagrange multipliers sampled every it

erations The b ottom two graphs plot the iterationbyiteration ob jective

and Lagrangianpa rt values for the rst iterations

Execution proles of A using a tabu list of size and at moves of limit

of on problem g starting from a random initial p oint See

Figure for further explanation

Execution proles of A with tabu list of size at moves of limit and

p erio dic scaling of by a factor of every iterations on problem

g starting from a random initial p oint Note that the average

Lagrange multiplier values are very close to and therefore overlap with

the curveshowing the minimum Lagrange multiplier values See Figure

for further explanation

Generic DLM A for solving SAT problems

maxsat

Execution proles of jnh left and jnh right jnh is satisable and

jnh is not

A an ecient implementation of DLM

maxsat

An implementation of DLM for solving the design problem of multiplierless

lter banks

Algorithm for nding the b est scaling factor where w is the weightof

i

constraint i

The violation values of T during a search left and the corresp onding

t

value of right

T

t

Normalized p erformance with resp ect to Johnstons e left and e

right QMF lter banks for PO lters with a maximum of bits p er

co ecient and dierentnumb er of lter taps

Normalized p erformance with resp ect to Johnstons e QMF lter bank

for PO lters with taps and dierent maximum number of bits per

co ecient xix

INTRODUCTION

Many application problems in engineering decision sciences and op erations research

are formulated as optimization problems Such applications include digital signal pro cess

ing structural optimization engineering design neural networks computer aided design for

VLSI database design and pro cessing nuclear p ower plant design and op eration mechan

ical engineering and chemical pro cess control Optimal solutions in these

applications have signicant economical and so cial impact Better engineering designs often

result in lower implementation and maintenance costs faster execution and more robust

op eration under a variety of op erating conditions

Optimization problems are made up of three basic comp onents a set of unknowns or

variables an ob jective function to b e minimized or maximized and a set of constraints that

sp ecify feasible values of the variables The optimization problem entails nding values of

the variables that optimize minimize or maximize the objective function while satisfying

the constraints

There are manytyp es of optimization problems Variables can take on continuous dis

crete or symbolic values The ob jective function can b e continuous or discrete and have

linear or nonlinear forms Constraints can also have linear or nonlinear forms and b e dened

implicitly or explicitlyormay not even exist

In this chapter we rst formally dene optimization problems and identify the classes

of problems addressed in this thesis We then summarize the characteristics of nonlinear

optimization problems and solution metho ds Finallywe present the goals and approaches

of this research outline the organization of this thesis and summarize our contributions

Optimization Problems

A general minimization problem is dened as follows

Given a set D and a function f D P nd at least one point x D that satises

f x f x for al l x D or show the nonexistenceofsuchapoint

A mathematical formulation of a minimization problem is as follows

minimize f x

sub ject to x D

In this formulation x x x x isanndimensional vector of unknowns The

n

function f is the objective function of the problem and D is the feasible domain of x sp ecied

by constraints

Avector x D satisfying f x f x for all x D is called a global minimizer of f

over D The corresp onding value of f is called a global minimumAvector x D is called

a local minimizer of f over D if f x f x for all x D close to x The corresp onding

value of f is called a local minimum Note that since max f D minf D maxi

mization problems can b e transformed into minimization problems shown in Weuse

optimization and minimization interchangeably in this thesis

Optimization problems o ccur in two levels the probleminstance level and the meta level

tly to obtain its In the probleminstance level each problem instance is solved indep enden

optimal solution In contrast in the meta level heuristics and algorithms are designed to

solve a class of problems The optimal solution in this case should generalize well to the

whole group of problems even if it is obtained based on one or a subset of problem instances

Consider sup ervised learning of feedforward articial neural networks ANNs to b e stud

ied in Chapter as an example to illustrate these twolevels of optimization ANNs consist

of a large number of interconnected simple pro cessing units neurons Each pro cessing unit

is an elementary pro cessor p erforming some primitive op erations like summing the weighted

inputs coming to it and then amplifying or thresholding the sum The output of each neu

ron is fed into neurons through connections and weighted by connection weights Hence in

ANNs information is stored in synaptic connections and represented as connection weights

Generally sp eaking a neural network p erforms a mapping from an input space to an out

put space In sup ervised learning the target outputs are known ahead of time and neural

networks are trained to p erform the desired mapping functions

Probleminstancelevel optimization o ccurs when the architecture of a neural network

suchasthenumb er of neurons activation function of each neuron and connection pattern is

xed and only the values of connection weights are adjustable In this optimization problem

the connection weights are variables and the ob jective is to minimize the dierence b etween

the output of the neural network and the desired output

Metalevel optimization o ccurs on the other hand when the architecture of a neural

network is not xed For example the neural network can b e a feedforward or a feedback

network a multilayer network with or without shortcuts or a pyramid network The acti

vation function of each neuron can b e a step function a sigmoidal function or a radial basis

function An appropriate architecture needs to b e chosen in order to solve a class of problems

well The solution of metalevel optimization is a set of heuristics that pro duce appropriate

network architectures based on the characteristics of applicationsp ecic training patterns

The variables of a nonlinear problem can takeonvarious values such as discrete continu

ous or mixed values In most problems variables hav ewelldened physical meaning How

ever it is not rare that variables are not welldened or even undened in many realworld

applications In those circumstances variables take on noisy approximate or imprecise

values

Consider the previous example on sup ervised learning of neural networks In most

probleminstancelevel optimization problems the variables connection weights takeon

continuous real numb ers However for niteprecision and multiplierless neural networks

connection weights are restricted to takeonvalues from a discrete set suchas f or

g The meaning of variables in a probleminstancelevel learning problem is welldened

It represents the exhilarating or inhibiting relationship b etween neurons and takecontinu

ous discrete or mixed values On the other hand the variables in metalevel learning are

not welldened There are a large numb er of dierentarchitectures and their p erformance

measures on classes of tasks are usually noisy and inaccurate

The search space of an optimization problem can b e nite or innite Continuous op

timization problems consist of realvalue variables and the search space is usually innite

For instance when the connection weights take on real values sup ervised learning of neural

networks have innite search space

Many discrete optimization problems have nite search spaces whose complexitycanbe

p olynomial or exp onential as a function of the numberofinputsFor example the search

space of nding the shortest path b etween two no des in a graph is p olynomial with resp ect

to the numb er of no des However the search space of a maximum satisabili ty problem to

b e studied in Chapter is exp onential with resp ect to the numb er of literals Search spaces

with exp onential complexity can b e enormously large when the problem size is large For

example a maximum satisabili ty problem with variables has p ossible

assignments although variables only takevalue or

The ob jective function of an optimization problem mayormaynothave a closedform

formula Some ob jectives are evaluated deterministically and return the same value for the

same set of variables ev ery time Other ob jectives are evaluated probabilistical ly and could

have dierentvalues every time they are evaluated

Again in the example on sup ervised neuralnetwork learning a problem instancelevel

optimization problem usually has a closedform nonlinear ob jective function that is computed

deterministically A metalevel optimization problem often do es not have a closedform

ob jective function and is evaluated through a pro cedure Due to the randomness involved

in evaluating noisy and limited training patterns the results may b e probabilistic

Similar to ob jective functions constraints mayormaynothave a closedform formula

For example in the QMF lterbank design problem to b e studied in Chapter some

constraints such as stopband and passband energyhave a closedform formula Other

constraints including stopband ripple passband ripple and transition bandwidth do not

have a closedform formula and havetobeevaluated numerically

In constrained optimization problems some constraints are hard constraints meaning

that they must b e satised by solutions Others are soft constraints for which some degree of

violation is acceptable In satisabili ty problems to b e studied in Chapter all constraints

are hard constraints and must not b e violated In contrast in neuralnetwork learning

problems soft constraints are sometimes used to sp ecify the desirable ranges of connection

weightvalues In this case some degree of violations are acceptable as long as the violations

result in b etter neural networks

In addition to constraints sp ecifying the feasible domain of an optimization problem con

straints on computational resources are also imp ortant for solving realworld applications

For the application problems studied in Chapters and including feedforward neural

network learning digital lterbank design satisabilitymaximum satisabilityandmulti

plierless lterbank design constraints on computational resources sp ecically CPUtime

constraints are usually imp osed in our exp eriments Since it can b e very timeconsuming

to solve large nonlinear optimization problems time constraints are used to make sure that

the solution pro cess nishes in a reasonable amount of time

As wehave seen there are manytyp es of optimization problems with distinctivechar

acteristics In this thesis we fo cus on probleminstancelevel optimization problems whose

ob jective and constraints are evaluated deterministically

ATaxonomy of Optimization Problems

Figure shows a taxonomy of optimization problems and the classes of problems ad

dressed in this thesis In this gure optimization and decision problems are classied ac

cording to the attributes of variable typ e presence of constraints and complexity

Although our fo cus is on optimization problems decision problems are closely related to

optimization problems esp ecially constrained optimization problems In our researc h we Attributes Optimization Problems Decision Problems

Variable type Continuous Discrete Continuous Discrete (CSP)

Presence of constraints Unconstrained Constrained Unconstrained Constrained

Complexity Uni-modal Multi-modal Linear Nonlinear P ClassNP-Hard P Class NP-Hard

Figure

A classication of optimization and decision problems The classes of problems

addressed in this thesis are shaded

solve decision problems through optimization approaches Therefore we show a classication

of decision problems in Figure

Continuous Optimization Problems

Optimization problems are classied into continuous and discrete problems A problem

is continuous if the unknowns variables take on continuous real values ie D in

consists of real numb ers A problem is discrete if the unknowns take on discrete usually

integer values

Continuous optimization problems are further classied into constrained optimization

and unconstrained optimization based on the presence of constraints Problems without

constraints fall into the class of unconstrained optimization

minimize f x

n

sub ject to x R

There are twotyp es of optimal p oints of an optimization problem lo cal minima and

global minima A lo cal minimum has the smallest value in a lo cal feasible region surrounding

itself whereas a global minimum has the smallest value in the whole feasible domain In a

continuous unconstrained optimization problem an ob jective function is minimized in the

real domain An unconstrained optimization problem is unimodal if its ob jective function

is convex A unimo dal problem has one lo cal minimum which is the global minimum at

the same time A problem is multimodal if its ob jective function has more than one lo cal

minimum General nonlinear functions are multimo dal and mayhavemany lo cal minima

that are not global minima

When each dimension of D in consists of real values constrained by simple lower

andor upp er b ounds the corresp onding optimization problem is called a simplebounded

continuous optimization problem

minimize f x

sub ject to l x u

n

x R

where l and u are constants

We put simpleb ounded constrained problems in the class of unconstrained optimiza

tion b ecause simpleb ound constraints for example x are easy to handle and

algorithms for problems without constraints and with simpleb ound constraints are similar

Although variables in unconstrained optimization problems can b e any real numb ers the

search for global optima of multimo dal functions often takes place in some limited regions

of interests Hence unconstrained nonlinear optimization problems and simpleb ounded con

strained problems are usually solved in similar ways The remaining problems with nontrivial

constraints b elong to the class of constrained optimization

The Rastrigin function from optimal control application is an example of a simple

b ounded optimization problem The minimization problem is formulated as follows

min f cos x cos x x x x

xR

x i

i

Its global minimum is equal to and the minimum p oint is at There are approxi

mately lo cal minima in the region b ounded by the two constraints

As shown in Figure constrained continuous problems are classied into linear and

nonlinear dep ending on the form of constraint functions When D in is b ounded

by linear functions the corresp onding optimization problem is called a linear constrained

problem This class of problems is relatively easy to solve Among these problems twotyp es

of problems that have b een studied extensively and solved well are linear programming and

problems If the ob jective f is a linear function of the unknowns

and the constraints are linear equalities or inequalities in the unknowns the corresp onding

optimization problem is called a linear programming problem

n

X

minimize c x

i i

i

n

X

st a x b for j m

ij i j

i

x

n

x R

where c a andb are constant real co ecients A linear programming problem has one

i ij j

lo cal minimum which is also the global minimum

If the ob jective f is a quadratic function and the constraints are linear functions the

corresp onding optimization problem is called a quadratic programming problem As an

example

n n n

X X X

minimize c x d x x

i i ij i j

i i j

n

X

st a x b for j m

ij i j

i

x

n

x R

where c d a and b are constant real co ecients Linear constrained problems including

i ij ij j

linear programming and quadratic programming problems have b een studied extensively

Ecient algorithms have b een develop ed to solvethemvery well

When D is characterized by nonlinear functions the optimization problem is called a

nonlinear optimization or a problem

minimize f x

sub ject to hx

g x

n

x R

where hx represents a set of equality constraints and g x a set of inequality constraints

In a nonlinear optimization problem the ob jective function as well as the constraint func

tions are nonlinear Nonlinear constrained problems are dicult to solve b ecause nonlinear

constraints may constitute feasible regions that are dicult to nd and nonlinear ob jectives

mayhavemany lo cal minima Many application problems fall into this class

In this thesis we fo cus on general multimo dal nonlinear optimization problems and

prop ose some new metho ds Unimo dal problems are relatively easy to solve They havebeen

studied extensively for decades Many algorithms have b een develop ed including gradient

descent Newtons metho d quasi Newtons metho ds and conjugate gradient metho ds

These algorithms are very ecient and can solve optimization problems with

tens of thousands of variables within seconds

Multimo dal problems are much more dicult and yet plentiful in real applications

Although many metho ds havebeendevelop ed for m ultimo dal problems most of them are

ecient only for sub classes of problems with sp ecic characteristics Metho ds for general

multimo dal problems are far from optimal

Discrete Optimization Problems

When the feasible set D in consists of discrete values the problem is called a discrete

optimization problem Discrete optimization is a eld of study in combinatorics as well as

in mathematical programming A classication of combinatorial problems consists of four

categories a evaluation of required arrangements b enumeration of counting of p ossible

arrangements c extremization of some measure over arrangements and d existence of

sp ecic arrangements Discrete optimization usually refers to the third categoryIn

our studywe also consider the last category a sp ecial case of discrete optimization after

intro ducing an articial ob jective function

Some famous examples of discrete optimization problems are as follows

Knapsack Problem Determine a set of integer values x i n that minimize

i

f x x x sub ject to the restriction g x x x b where b is a constant

n n

Traveling Salesman Problem Given a graph directed or undirected with sp ecied

weights on its edges determine a tour that visits every vertex of the graph exactly

once and that has minimum total weight

Bin Packing Given a set of weights w i n and a set of bins each with

i

xed capacity W nd a feasible assignmentofweights to bins that minimizes the total

numb er of bins used

Discrete optimization problems including the ab ov e examples can b e expressed in the

following integer programming IP formulation

minimize f x

sub ject to hx

g x

n

x I

where I represents the integer space Variable domain D consists of integers and is charac

terized byequality constraints hx and inequality constraints g x Notice the similarityof

to the formulation of continuous constrained optimization problem in

As shown in Figure we classify discrete optimization problems according to the

existence of constraints and their computational complexity When there is no constraint

b esides integral requirements of variables the problem is called an unconstrained problem

With additional constraints the problem is called a constrained problem An unconstrained

discrete optimization problem has the following form

minimize f x

n

sub ject to x I

When D consists of integer values constrained bysimplelower andor upp er b ounds the

optimization problem is called a simplebounded discrete optimization problem

minimize f x

sub ject to l x u

n

x I

As in the continuous case we asso ciate simpleb ounded discrete optimization problems

with the class of unconstrained problems Algorithms for these problems are similar The

ma jority of discrete optimization problems have some constraints and few are unconstrained

optimization problems

When domain D and ob jective f are characterized by linear functions the optimization

problem is called an integer linear programming ILP problem Here the ob jective function

is linear in the unknowns and the constraints are linear equalities or inequalities in the

unknowns

n

X

minimize c x

i i

i

n

X

sub ject to a x b for j m

ij i j

i

x

n

x I

where c a and b are constant realinteger co ecients Many discrete optimization prob

i ij j

lems can b e formulated as ILPs byintro ducing additional variables and constraints

An example of ILP problems is the following inequalityform knapsack problem

n

X

minimize c x

i i

i

n

X

a x b sub ject to

i i j

i

x or a and c are integers i n

i i i

Discrete constrained optimization problems have b een studied extensively in computer

science and op erations research Based on their computational complex

ity there are two imp ortant classes of discrete optimization problems Class P and Class NP

Class P contains all problems that can b e solved by algorithms of p olynomialti me complex

ity where P stands for p olynomial Examples of Class P problems are matching spanning

trees network ows and shortest path problems Class P problems havebeenwell studied

Many discrete problems in realworld applications do not have p olynomialti me algo

rithms and are more dicult to solve than those in Class P Among them one imp ortant

class is NP Class NP includes all those problems that are solvable in p olynomial time if

correct p olynomialleng th guesses are provided Examples are knapsack travelingsalesman

and binpacking problems Problems to whichallmemb ers of NP p olynomiall y reduce are

called NPhard The Class NP contains the Class P as well as a great many problems not

b elonging to P The fo cus of this thesis is on NPhard problems that are not p olynomialtime

solvable

Decision Problems

A decision problem diers from a constrained optimization problem in the waythatit

has no ob jective function Usually a decision problem is sp ecied by a set of constraints

and a feasible solution that satises all the constraints is desired or infeasibility is derived

Decision problems are classied as continuous or discrete according to their variable

domains In a continuous decision problem eachvariable is asso ciated with a continuous

domain There are not manycontinuous decision problems in real applications In this

thesis we fo cus on NPhard discrete decision problems

In a discrete decision problem eachvariable is asso ciated with a discrete nite domain

Discrete decision problems are referred to as constraint satisfaction problems CSPs in arti

cial intelligence There has b een extensive research on CSPs in articial intelligence resource

scheduling temp oral reasoning and many other areas

An example of CSP is the wellknown Nqueen problem that is dened as follows Given

an integer N place N queens on N distinct squares in an N Nchess b oard so that no two

queens are on the same row column or diagonal

Decision problems can b e solved as optimization problems For example an algorithm

that nds the minimal tour length in a traveling salesman problem will also determine

whether there is a tour with a sp ecic threshold for every p ossible threshold

On the other hand optimization problems can also b e solved through one or a series of

decision problems For example the minimal tour length in a traveling salesman problem

can b e found by solving a sequence of decision problems In each decision problem the

existence of a tour within a sp ecied length is solved As the length decreases graduallythe

minimal tour length can b e found

Challenges in Solving Nonlinear Optimization Problems

Except for trivial cases nonlinear optimization problems do not have closedform solu

tions and cannot b e solved analytically Numerical metho ds havebeendevelop ed to search

for optimal solutions of these problems The pro cess of solving an optimization problem

b ecomes a pro cess of searching for optimal solutions in the corresp onding search space

Finding global minima of a nonlinear optimization problem is a challenging task Non

linear constraints form feasible regions that are hard to nd and dicult to deal with and

nonlinear ob jectives have many lo cal minima that make global minima hard to nd and

verify Tall hills that are difficult to overcome Gradients vary by many orders of magnitude

Shallow basins with small slopes

Deep valleys with steep slopes

Figure

Illustration of diculties in optimizing a nonlinear function There may exist

lo cal minima in a multimo dal nonlinear function with dierentcharacteristics

shallow or deep wide or narrow

In solving nonlinear constrained optimization problems there may not b e enough time

to nd a feasible solution when nonlinear constraints constitute small feasible regions that

are dicult to lo cate For example in our study of designing digital lter banks in Chapter

the feasible regions are small and random searches like genetic algorithms can hardly nd

a feasible solution Also in satisability problems studied in Chapter feasible solutions

can b e very few and dicult to nd due to the large numb er of constraints

Second nonlinear ob jective functions in b oth constrained and unconstrained optimization

problems make global minima dicult to nd Figure illustrates the challenges of a multi

mo dal nonlinear function The terrains have dierentcharacteristics and cause problems for

search algorithms that adapt to only a subset of the characteristics Because of large slop es

tall hills are dicult to overcome in gradientbased search metho ds In the search space

gradients can vary bymany orders of magnitude whichmake the selection of appropriate

stepsize dicult Large shallow basins and at plateaus provide little information for search

direction and maytake a long time for a search algorithm to pass these regions if the stepsize

is small Further small lo cal minima are dicult to nd

These diculties happ en in realworld applications such as neuralnetwork learning In

sup ervised learning of feedforward articial neural networks studied in Chapter articial

neural networks are trained to p erform desired mapping functions The dierence of desired

outputs and actual outputs of a neural network forms an error function to b e minimized The

Figure

Twodimensional pro jections of the dimensional error surface for a ve

hiddenunit weight feedforward neural network with sigmoidal activation

function The terrain is around a solution found by our new global search

metho d Novel to solve the twospiral problem These two graphs are plotted

along two dierent pairs of dimensions

neuralnetwork learning problem is solved as an unconstrained optimization problem with

the error function as the ob jective Ob jective functions in neuralnetwork learning problems

are usually highly nonlinear and havemany lo cal minima

Figure shows the error function of a feedforward neural network trained to solvethe

twospiral problem to b e discussed in Chapter The dimensional error function is

pro jected to two dierent pairs of dimensions The plots are around a solution found by our

new global search metho d Novel

Figure shows that the terrains have many dierent features The left plot shows a

large numb er of small and shallow lo cal minima The global minimum is in the middle and

its attraction region is small The right graph shows a terrain divided into four parts bytwo

narrowvalleys Two of them consist of large at regions with gradients close to whereas

the other twoofthemhavemany lo cal minima There are also two long and narrowvalleys

across the terrain On the edge of these valleys steep slop es havevery large gradients

Values at the b ottom of these valleys changes gradually

In nonlinear optimization problems global optimal solutions are not only dicult to nd

but also dicult to verify There is no lo cal criterion for deciding whether a lo cal optimal

solution is a global optimum Therefore nonlinear optimization metho ds cannot guarantee

solution qualities for general nonlinear problems

To summarize the challenges of general nonlinear optimization include the following

Feasible regions b ounded by nonlinear constraints may b e dicult to nd

The ob jectivefunction terrain of search space maybevery rugged with many sub

optima

There may exist terrains with large shallow basins and small but deep basins

The dimension of optimization problems is large in manyinteresting applications

The ob jective and constraints are exp ensivetoevaluate

Characteristics of Nonlinear Optimization Algorithms

A large numb er of optimization metho ds have b een develop ed to solve nonlinear opti

mization problems Nonlinear optimization metho ds are classied into local optimization and

global optimization metho ds Global optimization metho ds have the ability to nd global

optimal solutions given long enough time while lo cal optimization metho ds do not

Lo caloptimizati on metho ds include gradient descent Newtons metho d quasi Newtons

metho d and conjugate gradient metho ds They converge to a

lo cal minimum from some initial p oints Such a lo cal minimum is globally optimal only

when the ob jective is quasiconvex and the feasible region is convex which rarely happ ens

in practice For nonlinear optimization problems a lo cal minimum can b e muchworse

than the global minimum Toovercome lo cal minima and search for global minima global

optimization metho ds have b een develop ed

Global optimization metho ds lo ok for globally optimal solutions As

stated by Griewank global optimization for nonlinear problems is mathematically ill

p osed in the sense that a lower b ound for the global optima of the ob jective function cannot

b e given after anynitenumber of evaluations of the ob jective function unless the ob jective

satises certain conditions such as the Lipschitz condition and the search space is b ounded

Global optimization metho ds p erform global and lo cal searches in regions of attraction

and balancing the computation b etween global and lo cal exploration A global search metho d

has the mechanism to escap e from lo cal minima while a lo cal search metho d do es not In

an optimization problem a region of attraction denes the region inside which there is

a lo cal minimum and the constraints are satised The regions of attraction trap lo cal

search metho ds In the general blackb ox mo del global optimization is p erformed without

a priori knowledge of the terrain dened by the ob jective and the constraints

Therefore global optimization algorithms use heuristic global measures to search

for new regions of attraction Promising regions identied are further optimized bylocal

renement pro cedures such as gradient descent and Newtons metho ds In many realworld

applications the computational complexity of nding global optima is prohibitive Global

optimization metho ds usually resort to nding go o d suboptimal solutions

Nonlinear optimization metho ds employvarious search algorithms to escap e from lo cal

minima and search for global optima The characteristics of search algorithms are summa

rized as follows

a Representation of search space A search space sp ecies the range of variable

assignments that are prob ed during the pro cess of searching for optimal solutions The

search space maycontain only feasible regions sp ecied by constraints or may contain some

infeasible regions as well The search space of an optimization problem can b e nite or

innite which directly aects the computational complexity of the corresp onding search

algorithms For example the search space is nite in a traveling salesman problem with

a nite numb er of cities and in a knapsack problem with nite num b er of ob jects For

continuous optimization problems in whichvariables take on real values the search space is

innite

The search space can have structural representations For example in a binary integer

programming problem the search space can b e represented as a binary search tree Each

no de of the search tree represents the resolution of a variable and the two branches coming

from it represent the two alternativevalues of the variable In another representation the

space space can b e represented based on the p ermutation of variable values All p ossible

p ermutations constitute the search space

The representation of the search space aects the complexity of a search problem Struc

tural representations can sometimes reduce the complexity signicantly For example in

solving a integer programming problem a representation based on a binary search tree

provides the basis for branchandb ound metho ds which use ecient algorithms to nd

tightlower and upp er b ounds of search no des and can prune the search tree signicantly

The search space b ecomes much smaller than that in a representation based on p ermuta

tions

b Decomp osition strategies Some search algorithms work on the whole search

space directly while others decomp ose the large search space into smaller ones and then

work on them separately Divideandconquer and branchandb ound are two decomp osition

strategies that have b een applied in many search algorithms for b oth continuous and discrete

problems Unpromising subspaces are excluded from further search

and promising ones are recursively decomp osed and evaluated

c Heuristic predictor or direction nder During a search pro cess search

algorithms have used many dierent heuristics to guide the search to globally optimal solu

tions In branchandb ound metho ds heuristic lower b ounds are asso ciated with decomp osed

smaller subspaces to indicate their go o dness For example in interval metho ds interval

analyses are used to estimate the lower b ounds of a continuous region Similarly in solving

integer programming problems discrete problems are relaxed into continuous linear pro

gramming problems to obtain a lower b ound The lowerb ound information is

further used to guide the search In simulated annealing the ob jectivevalues of neighb or

hood points provide a direction for the search to pro ceed In genetic algorithms

search directions are obtained from tness values of individuals in a p opulation

Comp onents of individuals with high tness are reinforced and the searchmoves in direc

tions formed by go o d building blo cks

d Mechanisms to help escap e from lo cal minima To nd globally optimal

solutions a search algorithm has to b e able to escap e from lo cal minima In discrete opti

mization a backtracking mechanism has b een used extensively A similar mechanism has

also b een applied in covering metho ds to solvecontinuous optimization problems

Probabilis tic metho ds get out of lo cal minima based on probabilistic decisions For

example simulated annealing metho ds can move in a direction with worse solutions using

adaptive probabilities Genetic algorithms escap e from lo cal minima through prob

abilistic recombination of individuals and random p erturbation of existing solutions

e Mechanisms to handle constraints Tosolve constrained optimization problems

search algorithms have to handle constraints eciently Simple constraints including simple

b ound and linear constraints are relatively easy to handle For example variable substitu

tion can eliminate linear constraints Two ma jor classes of techniques transformational

and nontransformational approaches havebeendevelop ed to handle nonlinear constraints

In transformational approaches the constraints and ob jectives are combined to form a single

function Hence constrained problems are converted into another form usually an

unconstrained form b efore b eing solved In nontransformational approaches the searchis

p erformed in a relaxed search space that contains an infeasible region Infeasible

solutions along a search path are either discarded or repaired to b ecome feasible solutions

f Stopping conditions Some optimization problems can b e solved optimally in a

short amount of time Unfortunately most realworld applications are large and have high

computational complexity whichmakes optimal solutions imp ossible to nd or verify Hence

search algorithms have to terminate in a reasonable amount of time and settle on approxi

mation solutions The degree of approximation is usually prop ortional to the amoun t of time

available Better solutions can b e found given more execution time Pruning mechanisms

have also b een used to reduce the search space and nd approximation solutions For ex

ample a search algorithm stops when all no des of a search tree are pruned based on certain

conditions Many stopping criteria have b een used to deal with the timequality tradeo

Some widely used stopping conditions include resource limit eg physical time limit of

computer program execution and degree of solution improvement

g Resource scheduling strategies To execute an optimization program on a com

puter system resource scheduling determines the allo cation of computer resources includ

ing CPU and memorywhich in turn aects resource utilization Scheduling issues are

esp ecially imp ortant for parallel computer systems Large optimization problems are com

putationall y intensiveandmay require the most p owerful computer systems such as parallel

pro cessing systems Go o d resource scheduling strategies make go o d use of the complex

parallel systems Many parallel search algorithms have b een develop ed and studied theoret

ically and empirically Signicant sp eedups have b een achieved on existing parallel comput

ers

Among the seven issues of search algorithms weidentify the two most critical ones for

nonlinear optimization handling nonlinear constraints and escaping from lo cal minima

The representation of a search space is derived from the feasible domain of an optimization

problem and is usually straightforward For unconstrained or simpleb ounded constrained

problems their search space corresp onds to feasible regions For nonlinear constrained prob

lems their search space is relaxed to contain infeasible regions as well

Decomp osition strategies have b een studied extensively for b oth continuous and discrete

nonlinear problems They are eective to help reduce the complexity of relatively small

problems However their computational complexity increases exp onentially with problem

size which renders them not very helpful for larger problems

Heuristic predictors are very useful when problemsp ecic characteristics are available

and ready to b e exploited For general blackb ox nonlinear optimization only limited infor

mation suchasgradient is available Few heuristic predictors work well in this situation

F or realworld applications stopping conditions are usually derived from physical limi

tations such as computation resources and time available Go o d stopping conditions make

b etter usage of limited resources and prevent unnecessary computation They do not help

in nding go o d optimal solutions

Similarly resource scheduling strategies schedule tasks to b e executed in a computer

system They try to make high utilization of limited computational resources and nish

the search tasks as fast as p ossible For optimization problems go o d resource scheduling

strategies can help reduce the time to nd solutions of the same qualityHowever they are

not helpful to improve the solution if the problem has exp onential complexity

In short handling nonlinear constraints and escaping from lo cal minima are the twokey

issues in nonlinear optimization Any go o d nonlinear optimization metho d has to have good

strategies to address them

Nonlinear constraints make problems dicult by forming small irregular and hardto

nd feasible regions Nonlinear functions may b e complex and do not haveobvious features to

aid the search Metho ds based on easytond feasible regions do not work in this situation

Constraint handling techniques need to b e robust and able to deal with general complex

nonlinear constraints

Lo cal minima of nonlinear ob jective functions make lo cal searches more dicult General

nonlinear optimization problems have many lo cal minima Search metho ds that are trapp ed

by lo cal minima are incapable of obtaining go o d solutions The mechanism of escaping from

lo cal minima determines the eciency of a global search algorithm and the solution quality

it obtains and has long b een the central issue in developing global search metho ds

Goals and Approaches of This Research

The goals of this research are to address the twokey issues in nonlinear optimization

handling nonlinear constraints and escaping from lo cal minima and to develop ecient

and robust global search metho ds We develop metho ds to solve nonlinear continuous and

discrete optimization problems that a achieve b etter solution quality than existing metho ds

in a reasonable amount of time or b execute faster for the same solution quality

Approaches to Handle Nonlinear Constraints

To deal with general nonlinear constraints wehave develop ed adaptive Lagrangian meth

o ds Chapter A constrained optimization problem is rst transformed into a Lagrangian

function Then gradien t descents in the original variable space and gradient ascents in the

Lagrangemultipl ier space are p erformed Eventually the search pro cess converges to a

saddle p oint that corresp onds to a lo cal minimum of the original constrained optimization

problem

Adaptive control of relativeweights b etween the ob jective and the constraints can sp eed

up convergence The pro cess of searching for saddle p oints is governed by the combination of

the twocounteracting descent and ascent forces Descents try to reduce the ob jectivevalue

whereas ascents try to satisfy the constraints At a saddle p oint the descent and ascent

forces reach a balance through appropriate Lagrangemultipli er values

The sp eed of convergence to a saddle p ointvaries a lot dep ending on the relative mag

nitudes of the ob jective function and the constraints When the ob jectivevalue is to o large

as compared to the constraintvalues descents in the originalva ria ble space dominate the

search pro cess and the Lagrange multipliers have to b ecome very large in order to overcome

the descent force and pull the searchback to a feasible region Since the Lagrange multipliers

are up dated according to constraint violations it may takeavery long time for the Lagrange

multipliers to b ecome large enough which slows down the convergence to saddle p oints

On the other hand when the ob jective has little weight and the force due to descents

is very small constraint satisfaction dominates the search Hence the search pro cess visits

many feasible p oints but improvemen ts of ob jectivevalues are slow and so is convergence

to a saddle p oint

In order to improveconvergence sp eed wehavedevelop ed mechanisms to adaptively

control the relativeweight of descent and ascent forces Section A weight co ecientis

assigned to the ob jective function and is adjusted adaptively during the search

For example in one of our adaptive metho ds the initial values of weights are selected

based on the gradient of the ob jective function at the starting p oint The searchisthen

p erformed for a small amount of time The ratio of the change in the ob jectivevalue to

the time interval gives the sp eed of descent in the ob jective function This ratio is used as

the normalization factor to adjust the initial weight Next the search is restarted from the

original starting p oint with the adjusted initial weight on the ob jective As the searchgoes

on the weight is adaptively adjusted based on the magnitude of the maximum violation of

the constraints and the change of the ob jective function

The general control strategy is to use the constraint violation as the primary indicator

and the change of the ob jectivevalue as the secondary indicator If any constraint is violated

the weight on the ob jective is reduced otherwise if the ob jectivevalue is changing then

the weight on the ob jective is increased The weight on the ob jective is adjusted adaptively

so that convergence is more robust and faster when the ob jective and the constraints have

dierent magnitude

To handle constraints in discrete constrained optimization problems we extend the La

grangian theory for continuous problems to discrete problems and develop the discrete

Lagrangia n metho d to solve discrete constrained problems Section

Similar to the continuous case a Lagrangian function of discrete variables is formed by

combining the ob jective and the constraints using Lagrange multipliers Then descents in

the originalva riab le space and ascents in the Lagrangemultipli er space are p erformed to seek

saddle p oints In the continuous case descents are based on continuous gradients Variables

in discrete optimization problems take on discrete values therefore the traditional denition

of gradientcannotbeusedWe prop ose to use discrete gradients instead A p ointinthe

neighb orho o d of the currentpoint that has the most reduction in function value gives the

discrete gradientdescent direction of the function at that p oint We show that by using the

discrete gradient descent the search based on the Lagrangian function converges to a saddle

p oint corresp onding to a lo cal minimum of the original discrete optimization problem Also

adaptive control of ob jectiv eweight can sp eed up convergence

Approaches to Overcome Lo cal Minima

Nonlinear optimization problems usually havemany lo cal minima in their search space

In unconstrained problems the nonlinear ob jective function has many lo cal minima In

constrained problems the Lagrangian function formed by the combination of the ob jective

and the constraints has many lo cal minima In order to nd global optima it is necessary

for search metho ds to escap e from lo cal minima

Toovercome lo cal minima wehavedevelop ed a new deterministic global searchmethod

called the tracebased metho d Chapter Our tracebased metho d consists of a global

search phase and a lo cal search phase The global search phase is further divided into a

coarselevel and a nelevel global search In the coarselevel global search metho ds that

explore the whole search space eciently are applied to nd promising regions The output

of the coarselevel global search is one or a set of starting p oints for the nelevel global

search

In the nelevel global search the tracebased metho d uses a combined force of an external

guidance and the lo cal gradient to form the search tra jectory An external trace function is

designed to lead the search tra jectory around the search space and to pull the search out of

lo cal minima once it gets there In the meantime lo cal gradient attracts the searchtowards

a lo cal minimum The smaller the lo cal minimum is the larger the gradient force b ecomes

In other words the trace force emphasizes exploration of the search space while the gradient

force detects lo cal minima The search tra jectory formed by the combination of these two

forces traverses the search space continuously and reveals promising lo cal minimal regions

on the way Go o d initial p oints for the lo calsearch phase are identied along the search

tra jectory

In the lo calsearch phase lo caldescent metho ds are started from the initial p oints gen

erated from the nelevel global search The lo cal minimum corresp onding to each initial

p oint is lo cated The b est ones among them are output as the nal solution

A Unied Framework to Solve Nonlinear Optimization Problems

Combining our metho ds of handing constraints and overcoming lo cal minima wecan

solve general nonlinear optimization problems Figure shows howvarious typ es of prob

lems are solved in a unied framework

Unconstrained optimization problems are solved directly by tracebased global optimiza

tion metho ds Constrained problems whether continuous or discrete are rst transformed

into Lagrangian formulations in which constraints are combined with the ob jective function

using Lagrange multipliers

Decision problems including constraint satisfaction problems CSPs do not haveob

jective functions They are solved as optimization problems by rst intro ducing articial Decision objective problems introduced Lagrangian formulation Global constrained search Optimization methods

problems unconstrained

Figure

Transformation of various forms of nonlinear optimization problems into a uni

ed formulation that is further solved by global search metho ds

ob jectives These ob jectives are helpful in providing guidance toward feasible solutions For

example given a CSP as follows

Find a feasible solution that satises hx

The problem can b e transformed into a constrained optimization problem by adding a merit

function that measures the norm of the constraint set

minimize N hx

sub ject to hx

where N is a scalar function suchthat N hxihx Although is

equivalent to the original constraintonly formulation the ob jective merit function

indicates how close the constraints are b eing satised hence providing additional guidance

in leading the search to a satisable assignment

Note that a sucient condition for solving the CSP dened in is when there is

an assignment such that the ob jectivefunctioniniszeroHowever optimizing the

ob jective function in alone without the constraints is less eective as there may exist

many lo cal minima in the space of N Strictly following a descentpathof N often ends

t where N is not but its value cannot b e improved by lo cal renement up in a den

After a constrained problem is transformed into a Lagrangian formulation our trace

based global search is p erformed based on the Lagrangian function to nd go o d solutions

Applications of Prop osed Metho ds

Wehave applied our Lagrangian metho ds for handling nonlinear constraints and our

tracebased global search metho d for overcoming lo cal minima to several applications These

include articial neuralnetwork learning digital lterbank design satisabili ty and maxi

mum satisability and multiplierless lterbank design

Wehave applied our tracebased global search metho d to learn the weights of a feedfor

ward neural network a nonlinear unconstrained optimization problem Chapter Arti

cial neural networks have b een used in many applications in recentyears However the

learning and design metho d is far from optimal In our exp eriments weformulate the

sup ervised learning of feedforward neural networks as an unconstrained optimization prob

lem The ob jective function is highly nonlinear with many lo cal minima We compare our

tracebased metho d with other global search metho ds and existing results in solving a set of

b enchmark problems Our tracebased metho d improves the designs signicantly resulting

in much smaller neural networks with b etter qualities

Wehave applied our Lagrangian metho d with adaptivecontrol to solve constrained op

timization applications sp ecically the design of digital lter banks Chapter Starting

with a multipleob jective design problem weformulate it into a constrained formulation

constraining all ob jectives but one The constrained continuous problem is then solved by

our adaptive Lagrangian metho d and tracebased globalsearch metho d Our exp eriments

show that our metho d converges faster is more robust and improves existing solutions on

all test problems

Wehave applied our discrete Lagrangian metho d to solveseveral discrete optimization

problems including the satisabili ty problem the maximum satisability problem and the

design of a multiplierless digital lter bank Chapter These problems are formulated

as discrete constrained optimization problems and solved by discrete Lagrangian metho ds

Using a large numb er of b enchmark and test problems weshowvery promising results of the

discrete Lagrangian metho d It either improves existing solutions or nds optimal solutions

much faster than other metho ds

To summarize we address two imp ortant issues of nonlinear optimization in this research

the handling of nonlinear constraints and metho ds to help escap e from lo cal minima We

use a Lagrangemultipli erbas ed formulation to handle nonlinear continuous and discrete

constraints and develop ecient Lagrangian metho ds to search for saddle p oints Toover

come lo cal minima we develop a new global search metho d that uses an external force to

lead the searchcontinuously across the search space and p erform a combination of lo cal and

global searches These metho ds have b een applied to a large numb er of unconstrained and

constrained application problems and have obtained very promising results

Outline of Thesis

This thesis is divided into two parts In the rst part which consists of Chapters

andwe review existing metho ds for handling nonlinear constraints and overcoming lo cal

minima and present our new metho ds In the second part which consists of Chapters

and our prop osed metho ds are applied to solve some application problems formulated as

unconstrained and constrained continuous and discrete optimization problems Figure

shows the organization of this thesis

Many problems in engineering can b e formulated as optimization problems Most op

timization problems havemultimo dal ob jective functions ie there exists more than one

lo cal optimum and p ossibly nonlinear constraints Lo cal optimization techniques which

stop at identifying lo cal optima are not suitable for such problems Mechanisms of handling

nonlinear constraints and overcoming lo cal minima are two imp ortant asp ects of solving

nonlinear optimization problems

In Chapter we address issues in handling nonlinear constraints We rst review existing

metho ds to deal with nonlinear constraints in con tinuous and discrete problems and intro

duce the classical Lagrangian theory Then we prop ose a new metho d based on Lagrange

multipliers which employs adaptivecontrol of relativeweights b etween the ob jective and

the constraints to achieve faster and more robust convergence Tosolve discrete constrained

problems we extend Lagrangian theory for continuous problems to discrete problems and

present a new discrete Lagrangian metho d For b oth continuous and discrete problems we Chapter 1 Introduction to nonlinear global optimization

Chapter 2 Chapter 3 Handling nonlinear constraints Overcoming local minima * Previous work * Previous work * Proposed methods * Proposed method (Lagrangian, DLM) (Trace-based global search, Novel)

Chapter 5 Chapter 6 Chapter 4 Continuous constrained Discrete constrained Unconstrained optimization application optimization applications optimization application SAT, MAX-SAT (QMF filter bank design)()Multiplierless QMF filter bank design (Neural network learning)

Chapter 7

Conclusions

Figure Organization of this thesis

transform constrained problems into unconstrained problems based on Lagrange multipli

ers The transformed unconstrained problems are further solved by lo cal and globalsearch

metho ds

In Chapter we address issues in overcoming lo cal minima We rst review existing

nonlinear optimization techniques including lo cal and globalsearch metho ds and iden

tify advantages and disadvantages of these optimization metho ds Then we presentanew

deterministic global search metho d our prop osed tracebased metho d Beginning with an

example weintro duce the overall framework of the metho d We then present each com

p onent of the metho d and discuss them in detail Our tracebased metho d employs an

innovative global search strategy that combines global search and lo cal search eciently Its

unique tracebased global searchtechniques make it stand out among existing global search

techniques and p ose great p otential in generating improved p erformance on realworld ap

plications An implementation of the tracebased metho d Novel an acronym for Nonlinear

Optimization Via External Lead is then presented

In Chapter we present the application of Novel to solve nontrivial unconstrained

nonlinear optimization problems The application is neural network learning First we

describ e the neural network learning problem and survey existing learning metho ds The

learning problem of manytyp es of neural network can b e formulated as an optimization

problem Also many neural network learning algorithms can b e traced back to optimization

metho ds Then we present exp erimental results of No vel in solving some neuralnetwork

b enchmark problems and compare the results with those of other global search metho ds

The results obtained byNovel are phenomenal and show signicant improvement in terms

of learning error and network size

In Chapter we apply our adaptive Lagrangian metho d and Novel to solve nonlinear

constrained optimization problems The application is the design of quadraturemirrorlter

QMF digital lter banks We rst intro duce the lterbank design problem which has mul

tiple ob jectives to b e optimized Then we summarize existing approaches and present our

nonlinear constrained optimization formulation whichissolved by our adaptive Lagrangian

metho d and Novel In our exp eriments wesolve a set of QMF lterbank design problems

For all the test problems our results improve previous results and are also b etter than

those obtained by other metho ds Using our constrained formulation wehave also studied

p erformance tradeos among dierent design ob jectives

In Chapter we present the application of our discrete Lagrangian metho ds DLM to

solve discrete optimization problems including satisabili ty SAT maximum satisability

MAXSAT and the design of multiplierless QMF lter bank problems The satisability

problem is the core of NPcomplete problems and is signicant in computer science research

We rst summarize various formulations and the corresp onding metho ds for solving this

problem Then we present three versions of DLM to solveSAT problems A large number of

SAT b enchmark problems are used in our exp eriments Our exp erimental results show that

DLM usually p erforms b etter than the b est existing metho ds and can achieve to order

ofmagnitude sp eedup for many problems MAXSAT is a general case of SAT Wehave

also applied DLM to solve MAXSAT test problems In our exp eriments DLM generally

nds b etter solutions and is usually two orders of magnitude faster than comp eting metho ds

Finallywe apply DLM to design multiplierless QMF lter banks DLM has obtained high

quality designs with less computation and hardware implementation costs than realvalue

designs

Finally in Chapter we summarize the work wehave presented in this thesis and prop ose

future directions to extend and enhance this research

Contributions of This Research

The following are the main contributions of this thesis

Adaptive Strategy to Hand le Nonlinear Constraints Chapter Wehave prop osed

Lagrangemultipl ierba sed metho ds to handle nonlinear constraints and havedevel

op ed an adaptive and robust strategy to sp eed up the search for saddle p oints

Discrete Lagrangian Method Section Wehave extended the classical Lagrangian

theory to discrete problems and have develop ed the discrete Lagrangian metho d DLM

that works on discrete optimization problems directly and eciently

Innovative Tracebased Global Search to Overcome Local Minima Chapter We

have prop osed a tracebased global search metho d consisting of a coarselevel and ne

level globalsearch phase and a lo calsearch phase Each phase addresses a dierent

asp ect of the global search Collectively they constitute a p owerful metho d that nds

global optima faster or nds go o d solutions in a short amount of time Our unique

tracebased global search strategy overcomes lo cal minima in a continuous fashion

Global Search Approach for NeuralNetwork Learning Chapter Wehave applied

the new tracebased global search metho d to neuralnetwork learning The metho d

has shown substantial p erformance advantage over existing neuralnetwork learning

algorithms and other global search metho ds

Global Search Approach for Digital FilterBank Design Chapter Wehaveap

plied our adaptive Lagrangian metho d and tracebased global search to the design of

quadraturemirrorlter digital lter banks Wehave found designs that are b etter

than previous results as well as b etter than those obtained bysimulated annealing

and genetic algorithms

Discrete Lagr angian Method for Solving Discrete Problems Chapter Wehaveap

plied our discrete Lagrangian metho d DLM to solve a large numb er of satisability

and maximum satisabil ity problems Wehave obtained signicant p erformance im

provementover the b est existing metho ds DLM has also b een applied to the design of

multiplierless QMF lter banks mainly by Mr Zhe Wu another student in the group

Wehave found highquality designs with a fraction of computation and implementation

costs

HANDLING NONLINEAR CONSTRAINTS

A nonlinear constrained optimization problem consists of a nonlinear ob jective function

and a set of p ossibly nonlinear constraints Compared with unconstrained optimization

problems constrained problems have the additional requirement of satisfying constraints

Naturallyhow to handle constraints is the rst issue to b e addressed

Constrained problems are continuous or discrete dep ending on their variable domains

Continuous problems are dened over continuous variables while discrete problems are de

ned over discrete variables Continuous and discrete problems have dierentcharacteristics

and their solution metho ds are also dierent

Constraints in continuous constrained problems have linear or nonlinear forms Linear

constraints are relatively easy to handle For example variable substitution and other trans

formation metho ds can b e used to eliminate linear constraints as well as some variables

Some linear constrained optimization problems including linear programming and quadratic

programming have b een studied extensively and can b e solved by ecient algorithms devel

op ed in the past General nonlinear constrained problems are more dicult

and have b een an active research topic in recentyears

Discrete constrained problems can have linear or nonlinear constraints In contrast to

continuous problems linear constrained discrete problems can have high computational com

plexity and b e exp ensive to solve due to their large search space Nonlinear constraints can

b e transformed into linear constraints after intro ducing dummyvariables

In this chapter we fo cus on issues in handling nonlinear constraints We rst review ex

isting metho ds for constrained continuous and discrete problems and intro duce classical La

grangian theory for handling nonlinear constraints Then we prop ose a Lagrangemultipli er

based metho d with adaptivecontrol This metho d dynamically adjusts the relativeweights

between the ob jective and the constraints to achieve faster and more robust convergence to

saddle p oints Tosolve discrete problems we extend Lagrangian theory from the continuous

domain to the discrete domain and present a new discrete Lagrangian metho d

In general our approach of handling nonlinear constraints is to transform b oth con

tinuous and discrete constrained problems into unconstrained formulations using Lagrange

multipliers and solve them using lo cal and globalsearch metho ds

Previous Work

Metho ds to handle nonlinear constraints take one of the two approaches transformational

or nontransformational In nontransformational approaches optimal solutions are searched

directly based on the ob jective and the constraints Examples are rejectingdiscarding meth

o ds repairing metho ds and reducedgradient metho ds In transformational approaches the

original constrained problem is transformed into another usually a simple form b efore b eing

solved Figure shows a classication of nonlinear constrained metho ds

Nontransformational Approaches

Nontransformational approaches work on the original problem directly by searching

through its feasible regions for the optimal solutions They stay within the feasible re

gions and try to improve the ob jectivevalues Metho ds in this category include reject

ingdiscardin g metho ds repairing metho ds and reducegradient metho ds

The advantages of these metho ds are a their feasible termination p oints and

b their conv ergence to a lo cal constrained minimum Their disadvantages are a their re

quirement of an initial feasible p oint and b the diculty to remain within a feasible region

when constraints are nonlinear Nonlinear constraint-handling methods

Non-transformational Transformational

Rejecting, Repairing Reduced- PenaltyBarrier Lagrangian Sequential discarding gradient quadratic

programming

Figure

Classication of metho ds to handle nonlinear constraints Our new metho d

presented in this chapter is a variation of the Lagrangian metho d

Rejectingdiscarding metho ds take the general strategy that infeasible solutions are

dropp ed during the search pro cess for optimal solutions and only feasible solutions are

accepted This strategy is simple easy to understand and easy to implement Its

disadvantage is that it is inecient if a feasible region is dicult to nd which happ ens

when the optimization problem is highly constrained and the feasible region is irregular

When a feasible region is small and not easy to nd these metho ds sp end most of the time

in generating and rejecting infeasible candidate solutions

Repairing metho ds attempt to maintain feasibilityby repairing moves that go o

a feasible region Infeasible solutions are transformed into feasible ones with some repair

Usually repairing metho ds are problemsp ecic Restoring feasibilitymay b e as dicult as

the original optimization problem

Reducedgradient metho ds search along gradient or pro jected gradient directions within

feasible regions They p erform well if constraints are nearly linear When

p ossible these metho ds rst eliminate equality constraints and some variables The original

problem is reduced to a b oundconstrained problem in the space of the remaining variables

Variables are divided into two categories xed and sup erbasic Fixed variables are those that

are at either upp er or lower b ounds and that are to b e held constant Sup erbasic variables

are free to move A standard reducedgradient algorithm searches along the steep estdescent

direction in the sup erbasic variable space

In short nontransformational metho ds work directly on constraints of the constrained

optimization problem They explicitly handle infeasible p oints during the search and do not

transform the original constrained problem These metho ds do not work well if a feasible

starting p oint is dicult to nd and the feasible region is nonlinear and irregular

Transformational Approaches

Transformational approaches convert a constrained problem into another form b efore

solving it Wellknown metho ds include p enalty metho ds barrier metho ds Lagrangian

metho ds and sequential quadratic programming metho ds

Penalty metho ds approximate a constrained problem by an unconstrained problem that

assigns high cost to p oints far from the feasible region Barrier metho ds on the other hand

approximate a constrained problem by an unconstrained problem that assigns high cost to

b eing near the b oundary of the feasible region Unlikepenalty metho ds these metho ds are

applicable only to problems having a robust feasible region

Penalty metho ds handle constraints based on p enalty functions A

p enalty term that reects the violation of the constraints is added to the ob jective function

to form a p enalty function A constrained optimization problem is transformed into a single

or a sequence of unconstrained optimization problems minimizing p enalty functions Un

constrained optimization metho ds are used to nd optimal solutions of the unconstrained

problem As the p enalty parameter on each constraint is made large enough the uncon

strained problem has the same optimal solution as the original constrained problem

Penalty metho ds are simple and easy to implement However nding the appropriate

p enalty parameters for a sp ecic problem instance is not trivial If the p enalty parameters

are to o large constraint satisfaction dominates the search whichconverges to any feasible

p ointthatmay b e far from the optimum Also large p enalty terms make the p enalty

function space very rugged and dicult to searc h for go o d solutions On the other hand

if the p enalty parameters are to o small and constraint violations do not have large enough

weights the optimal solution based on the p enalty function may not b e feasible Usually

p enalty metho ds require tuning p enalty co ecients either b efore or during the optimization

pro cess

Barrier functions are similar to p enalty functions except that barriers are set up to

avoid solutions from going out of feasible regions Barriers are derived from

constraints Examples of barrier functions are those that intro duce logarithms of the in

equalities in the ob jective function Barrier metho ds transform constrained problems into

unconstrained problems and minimize barrier functions Basicallytheywork inside feasible

regions and search for optimal feasible p oints Barrier metho ds usually require a feasible

starting p oint and have diculty in situations in which feasible regions of a set of nonlinear

constraints are hard to identify

Lagrangian and augmented Lagrangian metho ds use Lagrange multipliers to combine

constraints with the ob jective function to form a Lagrangian function The original con

strained problem is transformed into an unconstrained problem based on the Lagrangian

function By intro ducing Lagrange multipliers constraints can b e gradually resolved through

iterative up dates Lagrangian and augmented Lagrangian metho ds manage numerical sta

bilityandachieve high accuracy at a price of an increased numb er of problem dimen

sions

Penalty barrier and Lagrangian metho ds transform a constrained problem into an un

constrained form b efore solving it In contrast sequential quadratic programming SQP

metho ds approximate the original problem using quadratic programming problems

They solve a series of successive quadratic approximations of a nonlinear constrained prob

lem In its purest form SQP metho ds replace the ob jective function by a quadratic approx

imation and replace the constraint functions by linear approximations A set of quadratic

subprograms is solved If the starting p oint is suciently close to the lo cal minimum SQP

metho ds converge at a secondorder rate

Feasible sequen tial quadratic programming metho ds are variations of SQP metho ds in

which all iterations work on feasible p oints They are more exp ensive than stan

dard SQP metho ds but are useful when the ob jective function is dicult or imp ossible to

calculate outside the feasible set or when termination of the algorithm at an infeasible p oint

is undesirable

Among these transformational approaches of handling nonlinear constraints Lagrangian

metho ds are general robust and can achieve high degrees of precision Penaltymethods

have dicultyincho osing appropriate p enalty co ecients Barrier metho ds do not work well

when the feasible region is small and hard to nd SQP metho ds have to start close to lo cal

minima to p erform well Therefore we use Lagrange multipliers for constraint relaxation in

developing our algorithm In the next section we briey intro duce Lagrangian theory for

continuous constrained problems and Lagrangian metho ds

Existing Lagrangian Theory and Metho ds

Lagrangian metho ds are classical metho ds for solving continuous constrained optimiza

tion problems The basic form of the Lagrangian metho d works with equality

constraints while inequality constraints are rst transformed into equality constraints In

this section we rst intro duce Lagrangian theory for equality constrained problems and re

lated Lagrangian metho ds Then we present metho ds that transform inequality constraints

into equality ones

Given an equality constrained optimization problem as follows

n

min f x

xR

sub ject to hx

where x x x x and hx h xh x h x are m constraints The

n m

corresp onding Lagrangian function Lxisdenedby

m

X

Lxf x h x

i i

i

where are Lagrange multipliers The augmented Lagrangian function is a

m

combination of Lagrangian and p enalty function as follows

m

X

h x Lxf x jjhxjj

i i

i

The augmented Lagrangian function often provides b etter numerical stability and conver

gence

Denition Apoint x satisfying the constraints hx is said to bea regular p oint

of the constraints if the gradient vectors rh x i m are linearly independent

i

The tangent plane at regular p oints can b e characterized in terms of the gradients of the

constraint functions

Firstorder necessary conditions Let x be a local minimum of f sub ject to constraints

m

hx and x b e a regular p oint of these constraints Then there exists R such

that

r Lx and r Lx

x

The conditions in are not sucienttohave the existence of a constrained lo cal mini

mum of f x unless second or higherorder derivatives of f x also satisfy certain conditions

An example of secondorder sucient conditions is as follows

m

Secondorder sucient conditions Supp ose there exist p oints x and R such

that

r Lx and r Lx

x

Lx is p ositive denite on the tangentplane M fz Supp ose also that matrix r

x

t

rhx z g that is y r x y for all y M and y Then x is a strict lo cal

x

minimum of f x sub ject to hx

The necessary conditions in form a system of n m equations with n m unknowns

When the equations are nonlinear it is dicult to solve them analytically Numerical meth

o ds nd solutions satisfying the necessary conditions through the search of saddle p oints

Denition A saddlep ointx of Lagrangian function Lx is denedasone

that satises the fol lowing condition

Lx Lx Lx

for al l x and al l x suciently close to x

The following theorem states the relationship b etween lo cal minima and saddle p oints

Here wepresent the pro of for equalityconstrained problems Similar pro ofs havebeen

derived for continuous problems with inequality constraints This pro of will

b e used when we extend Lagrangian theory to discrete optimization problems in Section

SaddlePoint Theorem x is a lo cal minimum to the original problem dened in

if there exists suchthatx constitutes a saddle p oint of the asso ciated Lagrangian

function Lx

Pro of Since x is a saddle p oint Lx Lx for suciently close to

From the denition of the Lagrangian function this implies

m m

X X

h x h x

i i i

i

i i

Our pro of is by contradiction Supp ose there exists some k k m h x

k

If h x then vector would violate the inequality for

k

k m

w a p ositive If h x then vector ould violate the

k

m k

inequality for a p ositive Therefore hx and x is a feasible solution to the problem

Since x is a saddle p oint Lx Lx forx suciently close to x From

the denition of Lagrangian function

m

X

f x f x h x

i

i

i

Thus for any feasible x hx andwehave

f x f x

is a lo cal minimum So x

A saddlep oint is a lo cal minimum of Lagrangian function Lxinthex space and a

lo cal maximum of Lxinthe space One typical way to nd saddle p oints is to descend

in the x space and ascend in the space

Lagrange multipliers can also b e viewed as p enalties asso ciated with constraints and

Lagrangia n function L corresp onds to a p enalty function When some constraints are not

satised the sum of unsatised constraints weighted by the corresp onding Lagrange mul

tipliers are added to the ob jective function to form a p enalty function Ascents of L in

the space therefore corresp ond to increasing the p enalties asso ciated with unsatised

constraints As L is minimized the p enalties will eventually increase to a p oint that pushes

the constraints to b e satised Likewise descents of L in the x space nd a lo cal minimum

when all the constraints are satised

Based on the SaddlePoint Theorem numerical algorithms havebeendevelop ed to lo ok

for saddle p oints that corresp ond to lo cal minima in the corresp onding search space One

typical metho d is to do descents in the original variable space of x and ascents in the

Lagrangemultipl ier space of This metho d can b e written as a system of

ordinary dierential equations ODEs as follows

dx d

r Lx and r Lx

x

dt dt

where t is an autonomous time variable This dynamic system evolves over time t and

p erforms gradient descents in the original variable space of x and gradient ascents in the

Lagrangemultipl ier space of When the system reaches an equilibrium p oint where all

the gradients vanish a saddle p oint of Lagrangian function L is found In other words

given an initial p oint the system will not stop until a saddle p oint a lo cal minimum of the

constrained optimization problem is reached

Metho ds based on solving are lo cal search metho ds When applied an initial

assignmenttox and are rst given and the lo cal solution will b e the very saddle p oint

reached from this initial p oint After reaching the saddle p oint the solution will not improve

unless a new starting p oint is selected Note that nonlinearity can cause chaos in whicha

small variation in the initial p oint can lead to a completely dierent solution This happ ens

when are solved using nitestep discrete metho ds In this case oversho ots and under

sho ots can lead to unpredictable and p erhaps undesirable solutions

Besides the metho d that solves as a system of ordinary dierential equations other

Lagrangia n metho ds are based on successive minimization of the Lagrangian function with

resp ect to the original variables with up dates of the Lagrange multipliers b etween iterations

max min Lx

n

m

R xR

Unconstrained optimization metho ds such as Newtons metho d are applied to solvethe

minimization problem with xed These metho ds are approximation of the approachto

solve a system of ODEs They are faster but not as accurate

Lagrangian metho ds nd lo cal minima in the constrained space From an initial assign

menttox and the searchconverges to a saddle p oint corresp onding to a lo cal solution

After reaching the saddle p oint the solution will not change unless restarted from a new

initial p oint Note that nonlinearitycancausechaos in which a small variation in the initial

p oint can lead to a completely dierent nal solution

Wehave discussed Lagrangian theory and metho ds that work on equality constraints

Inequality constraints are transformed into equality constraints to b e solved One trans

formation metho d the slackvariable methodintro duces slackvariables for inequalitycon

straints For example an inequality constraint g x is transformed into equalit y

constraint g x z with the addition of slackvariable z R

For an inequalityconstrained problem

min f x

n

xR

st g x j m

j

after intro ducing slackvariables z z z z the corresp onding augmented La

m

grangian function is

m

X

L x zf x g x z jg x z j

z j j j

j j

j

At a saddle p oint L x z is at a lo cal minimum with resp ect to x and z and at a lo cal

z

maximum with resp ect to From the lo cal minimum condition of L with resp ect to z at

z

a saddle p oint the following solution can b e derived

z max g x

j j

j

After substituting into wehave

m

X

L xf x max g x

z j j

j

j

In this augmented Lagrangian function slackvariables z have b een removed Lagrangian

metho ds nd saddle p oints of this function by doing descents in the x space and ascents in

the space

Discrete Optimization

Discrete optimization problems usually have nite search spaces However they are

dicult to solve due to the enormous size of their search spaces which grow explosively

with the numb er of discrete variables The immense size of the search space rules out the

p ossibil ity of completely enumerating all solutions except for trivial problems Often only

a small fraction of the search space can b e enumerated

Some discrete problems can b e solved without enumeration at all Unfortunatelythe

state of our presentknowledge leaves most discrete optimization problems in the enumeration

required category

The diculty of discrete problems is addressed in complexity theory Mathematicians

and computer scientists have studied complexity for manyyears In early s the seminal

pap ers of Co ok and Karp profoundly aected the research of complexity Complex

ity theory tries to classify problems in terms of the mathematical order of the computational

resources required to solve a problem Categories of computational orders include compu

tation that gro ws logarithmicall y or b etter with problem size computation that grows

p olynomiall y and computation that grows worse than p olynomiall y eg exp onentially

Many complexity theories for discrete problems were develop ed on a class of succinctly

dened problems called decision problems Decision problems are solved byayes or no answer

to all input rather than by a solution or an optimal value Discrete optimization problems

are converted to their decisionproblem counterparts by asking the question of whether there

is a feasible solution to a given problem with the value of the ob jective function equal to or

smaller than a sp ecied threshold

The diculty of decision problems ranges from the b est solved Class Ptoundecidable

ones that are proved not solvable byany algorithm Class P contains the set of all problems

solved by p olynomialti me complexity algorithms Class NP which stands for nondeter

ministic p olynomial contains more dicult problems NP includes all decision problems

that can b e answered in p olynomial time if the right p olynomiall ength guess is provided

Problems in NP can b e solved by total enumeration of p olynomiall ength solutions It is

widely b elieved that P NP

Class NP contains Class P and also includes many problems not in P Problems to which

all memb ers of NP p olynomiall y reduce are called NPhard The rst NPhard problem is

the satisability problem found by Co ok in This is followed by the discovery of a rich

family of NPhard problems NPhard problems are as dicult to solveasany problem of NP

Problems that are b oth NPhard and memb ers of NP are called NPcomplete NPcomplete

problems form an equivalence class if any NPcomplete problem has a p olynomialtime

algorithm then all NPcomplete problems are solvable in p olynomial time The satisability

problem is the rst NPcomplete problem

NPcomplete problems only contain decision problems Many discrete optimization prob

lems have an NPcomplete threshold decision counterpart For example a traveling salesman

problem is an optimization problem that nds the minimal tour length in a graph Its corre

sp onding decision problem is whether there exists a tour with a sp ecic length in the graph

This decision problem is NPcomplete The original optimization problem is harder than

the decision problem If there is an algorithm that solves the original optimization problem

in p olynomial time the decision problem can also b e solved in p olynomial time Hence

optimization problems are NPhard In this thesis we fo cus on NPhard problems Discrete Optimization Methods

Exact Inexact

Partial- Cutting Local Stochastic Tabu search

enumeration plane improvement methods

Figure Classicatio n of discrete optimization metho ds

Metho ds to solve NPhard problems are either exact or inexact Exact metho ds guarantee

solution quality while inexact approaches return sub optimal solutions

with no optimality guarantee NPhard problems have exp onential time com

plexity in the worst case Figure shows a classication of discrete optimization metho ds

discussed in this section

Two main classes of exact metho ds are partial enumeration and cuttingplane metho ds

Partial enumeration metho ds systematically decomp ose a large search space into smaller

tractable ones When feasible regions are easy to identify each search space consists of

only feasible p oints On the other hand when feasible p oints are not obvious each search

space often consists of infeasible p oints and a sequence of infeasible p oints are enumerated to

approach feasible and minimal solutions Branchandb ound metho ds are wellknown partial

enumeration metho ds

Cutting plane and p olyhedral description metho ds pro ceed by redening the given prob

lem b etter and b etter until it b ecomes tractable More constraints usually linear

constraints are intro duced to redene the feasible region Regions containing the optimal

solution are rened and unpromising regions are excluded from further search

Almost all discrete optimization problems can b e expressed in the form of integer linear

programs ILP General discrete mo dels are transformed into the ILP format at the cost

of intro ducing a large number of variables and constraints Therefore the transformed ILP

formulations could b e more exp ensivetosolve than the original problems Many discrete

problems have b een solved eciently by problemsp ecic algorithms that exploit the under

lying combinatorial structure which is not present in general ILP formulations and thus not

addressed by general algorithms

Although exact metho ds solvemany practical problems NPhard problems eventually

reach sizes where exact metho ds are impractical and inexact approaches are the only choice

Inexact approaches settle for sub optimal solutions A basic requirement of an inexact metho d

is eciency which means stopping the metho d in p olynomiall yb ou nded time

Inexact heuristic or approximation metho ds have b een increasingly applied to solve

realworld discrete optimization problems Examples of inexact search metho ds are lo cal im

provement sto chastic metho ds and

tabu search They work with b oth transformational and nontransformational

approaches to handle constraints

In lo cal improvement neighb orho o d search a search pro ceeds by sequential improve

ment of problem solutions advancing at each step from a current solution to a sup erior

neighb or For constrained problems with unknown feasible solutions a twophase approach

is applied Beginning with an infeasible solution lo cal improvement rst minimizes the in

feasibility of the solution If a feasible solution is discovered the search is switched to an

optimizing phase starting from the feasible solution

In lo cal improvement a search is trapp ed by a lo cal minimum when all neighb ors are

worse than the current p oint Global search strategies ha ve b een develop ed in order to get

out of lo cal minima Examples are multistart or random restart simulated annealing

SA genetic algorithms GAs tabu search and other variations of randomized adaptive

search metho ds

In multistart the search escap es from a lo cal minimum by jumping to a new randomly

generated starting p oint Lo cal improvement is p erformed until a lo cal minimum is reached

Then a new initial p oint is generated from which lo cal improvement starts again The b est

solution found is rememb ered and rep orted when the search terminates

SA and GA use more sophisticated probabilistic mechanisms to escap e from lo cal minima

SA adds a sto chastic mechanism to lo cal improvement Starting from an

initial p oint SA prob es nearby p oints whose ob jectivevalue or p ossibly constraintvalues

is evaluated When minimizing a function anydownhill movement is accepted and the

pro cess is rep eated from this new p oint An uphill movementmay b e accepted and by

doing so SA escap es from a lo cal minimum Uphill decisions are made according to the

Metrop olis criteria As the minimization pro cess pro ceeds the distance b etween the prob ed

p oint and the currentpoint decreases and the probability of accepting uphill movements

decreases as well The search converges to a lo cal sometimes global minimum at the end

GA is based on the computational mo del of evolution GAs maintain a

p opulation of individual p oints in the search space and the p erformance of the p opulation

evolves to b e b etter through selection recombination mutation and repro duction The

ttest individual has the largest probability of survival The p opulation evolves to contain

b etter and b etter individuals and eventually converges to a lo cal sometimes global minimum

at the end

SA and GA have b een widely used to solve NPhard problems SA and GA achieve

more success in solving discrete problems than continuous problems They p erform a search

based on the ob jective or tness function values whichmakes them work well for discrete

problems that do not have gradient information

Tabu searc hwas rst intro duced byGlover In tabu search a tabu list containing

historical information is maintained during the search In each iteration a lo cal improvement

is p erformed However moves that lead to solutions on the tabu list are forbidden or are

tabu If there are no b etter solutions in the neighb orho o d the tabu searchmoves to the b est

neighb oring solution The tabu list prevents returning to the lo cal optimum from whichthe

search has recently escap ed Tabu search has obtained go o d results in solving some large

discrete optimization problems

Prop osed Metho ds for Handling Nonlinear Constraints

In this section we present a new Lagrangian formulation to handle inequality constraints

and prop ose new metho ds to search for saddle p oints of Lagrangian functions more eciently

Our metho d employs adaptivecontrol of relativeweights b etween the ob jective and the

constraints to achieve faster and more robust convergence We then extend Lagrangian

theory to discrete problems and present a new discrete Lagrangian metho d to solve discrete

constrained problems Our approach of solving continuous and discrete constrained problems

is to transform them into Lagrangian formulations and then search for the corresp onding

saddle p oints

Handling Continuous Constraints

We use Lagrange multipliers to handle nonlinear constraints Lagrangemultipli er theory

works well on equality constraints but cannot directly deal with inequality constraints

Often inequality constraints are rst converted to equality constraints b efore Lagrange

multiplier metho ds are applied

Wehave applied two metho ds to convert inequality constraints to equality constraints

the slackvariable method and the MaxQ method Recall the general constrained problem as

follows

min f x

n

xR

st h x i m

i

g x j m

j

In the slackvariable metho d intro duced in the last section inequality constraints are

transformed to equality constraints byintro ducing Lagrange multipliers

m m m m

  

The augmented Lagrangian function of is

T

L x f x hx jjhxjj

z m



m

X

max g x

m j j

 m j



j

The rstorder derivatives of L x with resp ect to x and are

z

m m



X X

h x g x L x f x

k k z

h x max g x

k k m k k



x x x x

i i i i

k k

L x

z

h x j m

j

j

L x

z

max g x j m

m j j m j

 

m j



The other metho d wehave applied to convert inequality constraints is the MaxQ method

develop ed byMrTao Wang another student in our group In the MaxQ metho d

the maximum function is used to convert an inequality constraintinto an equality constraint

ie

q

j

g x maxg x

j j

where q j m are control parameters Inside the feasible region constraint

j

q

j

g x hence maxg x In contrast to that in the slackvariable metho d

j j

constraints have no eect inside feasible regions

In the MaxQ metho d the augmented Lagrangian function of is

T

L x f x hx jjhxjj

q m



m m

X X

q q

j j

g x max g x max

j m j j



j j

When the q s are constant the rstorder derivatives of L x with resp ect to x and are

j q

m

X

L x f x g x

q j

q

j

q max g x

m j j j



x x x

i i i

j

m m



X X

h x g x

k j

q

j

max g x h x q

j k k j

x x

i i

j

k

L x

q

h x j m

j

j

L x

q

q

j

max g x j m

j

m j



q

j

q must b e larger than in order for the term max g x to exist

j j

Based on the Lagrangian function and its gradients we search for saddle p oints by

p erforming gradient descents in the original variable x space and ascents in the Lagrange

multiplier space The search is mo deled by the following system of ordinary dierential

equations ODEs

dxt

r L xtt

x z

dt

dt

r L xtt

z

dt

This system of ODEs is solved as an initialval ue problem which converges to a saddle p oint

corresp onding to a lo cal minimum of the original constrained optimization problem

The convergence b ehavior of the slackvariable metho d is dierent from that of the MaxQ

metho d when the saddle p oint is on the b oundary of a feasible region When a saddle p ointis

inside a feasible region and the initial p oint of the search is also inside the feasible region and

close to the saddle p oint the search tra jectories of b oth metho ds can b e similar However

when a saddle p oint is on the b oundary the search tra jectories of these two metho ds are

usually dierent even when they start from the same initial p oint

When a saddle p oint is on the b oundary of a feasible region the slackvariable metho d

approaches the saddle p oint from b oth inside and outside the feasible region The search Search trajectory Search trajectory Search trajectory

f(x) f(x) f(x)

saddle point saddle point saddle point x x x Feasible region Feasible region Feasible region

(a) Convergence to a saddle point (b) Oscillation around a saddle point (c) Divergence to infinity

Figure

Illustration of the search tra jectories using the slackvariable metho d when the

saddle p oint is on the b oundary of a feasible region f x is an ob jective function

of variable x The search can a converge to a saddle p oint b oscillate around

a saddle p oint or c diverge to innity

tra jectory oscillates around the saddle p oint and converges when the magnitude of oscillations

go es to In some situations the magnitude of oscillatio ns do es not reduce or even increases

causing the searchtodiverge eventually Dep ending on the initial p oint and the relative

magnitude of the ob jective function and the constraints the search tra jectory could end up

in one of three dierent states as illustrated in Figure

convergence to a saddle p oint

oscillatio ns around a saddle p oint and

divergence to large values and eventually to innity

The rst state is desirable while the other two states should b e avoided Oscillations

and divergence happ en when the accumulated force of the constraints b ecomes to o large

The constraints accumulate momentum during the search in infeasible regions whichis

represented as the values of corresp onding Lagrange multipliers When the searchenters a

feasible region the momentum of the constraint force makes the searchtogoway deep into

the feasible region At this p oint the ob jective and the corresp onding gradient can b ecome

the descent force on the ob jective dominates the constraint force and large Eventually Search trajectory Search trajectory

f(x) f(x)

saddle point saddle point x x Feasible region Feasible region

(a) Starting from inside the feasible region (b) Starting from outside the feasible region

Figure

Illustration of the search tra jectory using the MaxQ metho d when the saddle

point is on the b oundary of a feasible region f x is the ob jective function

The search starts from inside left or outside right the feasible region In

b oth cases the search tra jectory eventually approaches the saddle p oint from

outside the feasible region

the search tra jectory follows gradient descent on the ob jective function and go es outside the

feasible region again

Wehave found that by appropriate scaling the relativeweights b etween the ob jective

function and the constraints oscillations and divergence can b e reduced or eliminated Usu

ally when constraints have large weights the search can successfully converge to a saddle

p oint Scaling can also sp eed up convergence

Comparing to the slackvariable metho d the MaxQ metho d do es not have the problems of

oscillatio ns and divergence when the saddle p oint is on the b oundary of a feasible region The

MaxQ metho d approaches the convergence p oint on the b oundary asymptotically from the

outside the feasible region The closer it gets to the b oundary the slower it go es Figure

illustrates the situations when the search starts from either inside or outside a feasible region

In b oth cases the search tra jectory eventually approaches the saddle p oint from outside the

feasible region The MaxQ metho d converges slowly when the search approaches a saddle

p oint on the b oundary of a feasible region

There are twoways to sp eed up the convergence of the MaxQ metho d The rst one is to

scale the weights b etween the ob jectiv e function and the constraintsinaway similar to that

in the slackvariable metho d Another wayistoconvert inequality constraints to equality

constraints based on certain convergence conditions After the conversion convergence is

usually much faster

In b oth the slackvariable and the MaxQ metho ds saddle p oints are searched by doing

descents in the originalva riab le space and ascents in the Lagrangemultipli er space For

continuous problems whose rstorder derivatives of the ob jective and the constraints exist

gradient descents and ascents are mo deled by a system of ODEs and solved by an ODE solver

For discrete problems derivatives of the ob jective function and the constraints do not exist

Therefore descents are p erformed by some heuristic lo cal search eg hillclimbing Ascents

do not need derivatives and are only based on the values of constraint functions

Convergence Sp eed of Lagrangian Metho ds

Lagrangian metho ds rely on two counteracting forces to resolve constraints and nd

highquality solutions When constraints are satised Lagrangian metho ds rely on gradient

descents in the ob jective space to nd highquality solutions On the other hand when

constraints are not satised they rely on gradient ascents in the Lagrangemultipli er space

in order to increase the p enalties on unsatised constraints and to force the constraints into

satisfaction The balance b etween gradient descents and gradien t ascents dep ends on the

relative magnitudes of the Lagrange multipliers whichplay a role in balancing ob jective

f x and constraints hxand g x and in controlling indirectly the convergence sp eed and

solution quality of the Lagrangian metho d At a saddle p oint the forces due to descent and

ascent reach a balance through appropriate Lagrangemultipli er values

Weshow in this section that the convergence sp eed andor the solution quality can b e

aected by an additional weight w in the ob jective part of the Lagrangian function This

results in a new Lagrangian function of the slackvariable metho d as follows

T

L x wfx hx jjhxjj

o m



m

h i

X

max g x

m j j

m j 



j

where w is a static weight on the ob jective When w L xL x whichis

o z

the original Lagrangian function

In general adding a weight to the ob jectivechanges the Lagrangian function which

in turn may cause the dynamic system to settle at a dierent saddle p oint with dierent

solution quality This is esp ecially true when the saddle p oint is on the b oundary of a feasible

region In this section we only show that the convergence time can b e inuenced greatly by

the choice of the initial weight when the Lagrangian metho d converges to the same saddle

p oints

Starting from an initial p ointx the dynamic system to nd saddle p oints is

based on in which L is replaced by L

z o

dxt

r L xtt

x o

dt

dt

r L xtt

o

dt

We solve this dynamic system using an ordinary dierential equation solver LSODE and

observe a search tra jectory xtt When a saddle p oint is on the b oundary of the

feasible region which is true for most problems studied the dynamic equation approaches

it from b oth inside and outside the feasible region

Dep ending on the weights b etween the ob jective and the constraints the convergence

sp eeds of Lagrangian metho ds can vary signicantly There is no go o d metho d in selecting

the appropriate weights for a given problem Our prop osed dynamic weightadaptation

algorithm adjusts the weights b etween the ob jective and the constraints dynamically based

on the search prole It avoids selecting appropriate weights in the b eginning and achieves

faster and robust convergence than using static weights

We use the e QMF lterbank design problem to illustrate the convergence sp eed

of Lagrangian metho ds using static weights Dep ending on the weight w of the ob jective

function the convergence time of Lagrangian metho ds in solving the QMF design problem

LSODE is a solver for rstorder ordinary dierential equations a publicdomain package available from

httpwwwnetliborg

vary greatly ranging from minutes to hours even days We use Johnstons solution as our

starting p oint This p oint isfeasibleaswe use its p erformance measures as our constraints

However it is not a lo cal minimum of the ob jective function Exp erimentally using static w

of and represents the three convergence b ehaviors ob jectiveoverweighted

balanced ob jective and constraints and constraintoverweighted

For static weight w Figure shows the dynamic changes of the ob jective

the Lagrangianfunctio n value and the maximum violation as the search progresses Note

that the tra jectories of the ob jective and the Lagrangianfunctio n values are overlapp ed

b ecause constraint violations are small As the starting p oint is not a lo cal minimum of

the ob jective the search descends in the originalvaria ble x space as the ob jectivevalue

decreases In the meantime the constraints are getting violated As constraint violations

b ecome large the Lagrangian part slowly gains ground and pushes the search backtoward

the feasible region leading to increases in the ob jectivevalue and decreases in the constraint

violation Eventually the constrain t violation b ecomes and the ob jectivevalue stabilizes

The overall convergence sp eed to the saddle p oint is slow CPU minutes at t

Note that uctuations of the maximum violation as well as the ob jective is due to inaccuracy

in numerical estimation of the function values and gradients

Figure shows the search prole using static weight w The ob jective and the

constraints are more balanced in this case and the convergence time to the saddle p ointis

shorter CPU minutes at t time units

Figure shows the search prole using static weight w The constraints are

overweighted in this case and constraint satisfaction dominates the search pro cess The

tra jectory is kept inside or very close to the feasible region However due to the small weight

on the ob jective improvements of the ob jective are slow causing slowoverall convergence

to the saddle p oint CPU minutes at t

Dynamic Weight Adaptation

In this section we prop ose a strategy to adapt weight w based on the b ehavior of the

dynamic system in order to obtain highquality solutions with short convergence 1 0.0045 0.98 Objective 0.004 Lagrangian 0.96 0.0035 0.94 0.003 0.92 0.0025 0.9 0.002 0.88

Max Violation 0.0015 0.86 0.84 0.001 0.82 0.0005 0.8 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

Time Unit Time Unit

Figure

Search prole with static weight w in designing a e QMF lter bank

The ob jectiveisoverweighted and its values are reduced quickly in the b egin

ning Later the Lagrangian part slowly gains ground and pushes the search

backinto the feasible region The convergence time to the saddle p oint is long

CPU minutes at t

1.02 0.0006 1 Objective Lagrangian 0.0005 0.98 0.96 0.0004 0.94 0.0003 0.92

0.9 Max Violation 0.0002 0.88 0.0001 0.86 0.84 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Time Unit Time Unit

Figure

Search prole with static weight w in designing a e QMF lter bank

The ob jective and the constraints are balanced and the convergence sp eed to

the saddle p oint is faster CPU minutes at t

1.02 0.0018 1 Objective 0.0016 Lagrangian 0.98 0.0014 0.96 0.0012 0.94 0.001 0.92 0.0008

0.9 Max Violation 0.0006 0.88 0.0004 0.86 0.0002 0.84 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8

Time Unit Time Unit

Figure

Search prole with static weight w in designing a e QMF lter bank

The constraints are overweighted and constraint satisfaction dominates the

search pro cess The tra jectory is kept inside or very close to the feasible region

However the improvement of the ob jectivevalue is slow causing slowoverall

convergence to the saddle p oint CPUminutes at t

Select control parameters

time interval t

initial weight w t

maximum numb er of iterations i

max

Set window size N t or t

w

j j is the iteration number

while j i and stopping condition is not satised do

max

j j

advance search tra jectory by t time unit to get to x

if tra jectory diverges then

reduce w restart the algorithm by going to Step

end if

record useful information for calculating p erformance metrics

if modj N then

w

T est whether w should b e mo died at the end of a window

compute p erformance metrics based on data collected

change w if some conditions hold see Figure

end if

end while

Figure Framework of the dynamic w eightadaptation algorithm

times In general b efore a tra jectory reaches a saddle p oint changing the weightof

the ob jectivemay sp eed up or delayconvergence Further when the tra jectory is at a saddle

p oint changing the weight of the ob jectivemay bring the tra jectory out of the saddle p oint

into another In this section we exploit the rst prop ertyby designing weightadaptation

algorithms so that we can sp eed up convergence without aecting solution qualityWe plan

to exploit the second prop erty to bring a tra jectory out from a saddle p oint in our future

work We show in Chapter a tracebased approach to bring the search tra jectory out of

saddle p oints

Figure outlines the general strategy Its basic idea is to rst estimate the initial weight

w t Step measure the p erformance metrics of the search tra jectory xtt

p erio dicall y and adapt w t to improve convergence time or solution quality

Let t b e the total logical time for the search and t b e divided into small units of

max max

time t so that the maximum numb er of iterations is t tFurther assume a stopping

max

condition if the searchwere to stop b efore t Step Given a starting p oint xt we

max

i i

set the initial Lagrange multipliers to b e zero ie t Letx b e the p ointof

th

the i iteration

To monitor the progress of the search tra jectorywe divide time into nonoverlapping

windows of size N iterations each Step In each window we compute some metrics to

w

measure the progress of the search relativetothatofpreviouswindows Let v b e the

maxi

th

maximum constraint violation and f x b e the ob jective function value at the i iteration

i

th

For the m windowm we calculate the average or median of v over all

maxi

the iterations in the window

mN

w

X

v v or v median fv g

m maxj m mN j mN maxj

w w

N

w

j mN

w

and the average or median of the ob jective f x

i

mN

w

X

f f x or f median ff xg

m j m j

mN j mN

w w

N

w

j mN

w

During the search we apply LSODE to solve the dynamic system and advance the

j j

tra jectory by time interval t in each iteration in order to arrive at p ointx Step

At this p oint we test whether the tra jectory diverges or not Step Divergence happ ens

when the maximum violation v is larger than an extremely large value eg If

maxj

it happ ens we reduce w by a large amount say w w and restart the algorithm In

each iteration we also record some statistics suchasv and f x that will b e used to

maxj j

calculate the p erformance metrics for eachwindow Step

At the end of each windoworevery N iterations Step we decide whether to up date w

w

based on the p erformance metrics and Step In our current implementation

we use the averages or medians of maximum violation v and ob jective f x In

maxi j

general other applicationsp ecic metrics can b e used such as the numb er of oscillations of

the tra jectory in nonlinear continuous problems Based on these measurements we adjust

w accordingly Step

th

Given p erformance measurements for the m window

average or median of the maximum violationv

m

average or median of the ob jective f

m

numb er of oscillatio ns NO

m

and applicationsp ecic constants

Increase weight w w if

c v v

m m

f j f f jf j c j

m m m m

Decrease weight w w if

c v

m

c v v v

m m m

NO for continuous QMF design problems

m

c

NO for other continuous and discrete problems

m

Figure

Weightadaptation rules for continuous and discrete problems Step of Figure

Figure shows a comprehensiveweightadaptation algorithm Step Scaling factors

representhowfastw is up dated Because we use numerical metho ds to solve

the dynamic system dened in a tra jectory in window m is said to satisfy all the

constrain ts when v where is related to the convergence condition and the required

m

precision Parameters control resp ectively the degrees of improvementover

the ob jective and the reduction of the maximum violation Note that when comparing values

between two successive windows m and m b oth must use the same weight w otherwise

the comparison is not meaningful b ecause the terrain may b e totally dierent Hence after

adapting w we should wait at least two windows b efore changing it again

Weight w should b e increased Rule when we observe the third convergence b ehavior

In this case the tra jectory is within a feasible region and the ob jective is improved in

successivewindows Weight w will b e increased when the improvement of the ob jectiveina

feasible region is not fast enough but will not b e increased when the improvementisbeyond

an upp er b ound

Weight w should b e decreased Rule when we observe the second convergence b ehavior

the tra jectory oscillating around a saddle p oint or the fourth con vergence b ehavior the

tra jectory moving slowly back to the feasible region Weight w will b e decreased when the

tra jectory oscillates and the trend of the maximum violation do es not decrease Wehave

also dened two cases in condition c of Rule the rst case applies to the continuous QMF

design problems that do not exhibit oscillatory b ehavior whereas the second case applies to

the rest of the problems

We use the e QMF lterbank design problem to illustrate the improvementof

our dynamic weightadaptation metho d as compared to that of using static weights We

use Johnstons solution as our starting p oint Toavoid the dicultyincho osing the initial

weight wecho ose the starting weight w t using a twostep metho d

Set the initial weight based on the maximumgradient comp onent of the starting p oint

d

and the numberofvariables w t n max f L xt g

in z

dx

i

Perform the Lagrangian search for a small amount of time eg t Adjust the

weight w t based on the decrement f of the ob jectivefunction value w t

w t t

f

Figure shows the search prole of our Lagrangian metho d with adaptiveweight

control in solving the e QMF lter bank problem Wehave used a time interval t

windowsizeN and As shown in

w

the search proles the ob jectivevalue was improved fairly quickly in the b eginning and the

tra jectory was then pushed into a feasible region quickly Our dynamic metho d converges

at t CPU minutes much faster than the staticweight Lagrangian metho d

Discrete Constraint Handling Metho d

Discrete constraints are handled based on the Lagrange multipliers in a way similar to

howwe handle continuous constraints We extend the traditional Lagrangian theory to

discrete problems and develop a new discrete optimization metho d based on the Lagrangian

formulation called the discr ete Lagrangian method DLM 1 0.0035 0.98 Objective Max Violation Lagrangian 0.003 0.96 0.94 0.0025 0.92 0.002 0.9 0.88 0.0015 0.86 0.001 0.84 0.0005 0.82 0.8 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Time Unit Time Unit

Figure

Prole with adaptivechanges of w in designing a e QMF lter bank The

searchconverges much faster CPU minutes in t than those using

static weights

Lagrangian metho ds have b een used to solve discrete optimization problems whose dis

crete variables have b een relaxed into continuous ones However little work has b een done

in applying Lagrangian metho ds to work on discrete problems directly The di

culty in traditional Lagrangian metho ds lies in the requirement of a dierentiable continuous

space In order to apply Lagrangian metho ds to discrete optimization problems we need to

develop a new Lagrangemultiplierba sed search metho d that works in discrete space

Consider the following equalityconstrained optimization problem in discrete space which

is similar to the continuous version in

n

min f x

xD

sub ject to hx

h x and D is a subset of integers where x x x x hxh xh x

m n

n n

ie D I

Denition A lo cal minimum x to problem is denedashx and f x

f x for any feasible xand x and x dier in only one dimension by a magnitude of

th

For example if x diers from x in the k dimension bythenjx x j A

k

k

lo cal minimum is dened with resp ect to a certain neighborhood In this denition the

neighb orho o d is within of one dimension Note that this denition can b e extended to the

case in whichtwopoints dier by more than one dimension

Discrete Lagrangian function F has the same form as in the continuous case and is dened

as follows

m

X

F x f x h x

i i

i

where are Lagrange multipliers that haverealvalues

m

A saddle p ointx ofF x is reached when the following condition is satised

F x F x F x

and all x that dier from x in only one dimension bya for all suciently close to

magnitude

In a way similar to the continuous case we derive the follo wing theorem sp ecifying the

relation b etween lo cal minima and saddle p oints

Discrete SaddlePoint Theorem for Discrete Problems x is a lo cal minimum so

lution to the discrete constrained problem in if there exists suchthatx

constitutes a saddle p oint of the asso ciated Lagrangian function F x

Pro of The pro of is similar to that of the continuous case

In the continuous case metho ds that lo ok for saddle p oints utilize gradient information

In order for these metho ds to work in the discrete variable space x of discrete problems

we need to dene the counterpart of the gradient op erator Note that can remain to b e

continuous even in the discrete case In the following we dene a discrete dierencegradient

descent operator Note that this op erator is not unique and other op erators can b e

x

dened to work in a similar way

Denition Dierencegradient descent operator is denedwithrespect to x in such

x

P

n

n

a way that F x f g j j and x F x

x n i x

i

P

n

n

D For any x such that x j jx

i

i i

F x F x F x

x

Further if x Fx F x then F x

x

A more general discrete descent op erator is dened as follows

x

Denition Discrete descent operator is dened with respect to x in such a way

x

P

n

n n

j j x F x D and that F x f g

i x x n

i

F x F x F x

x

P

n

jx x j F x F x then F x Further if x such that

i x

i i

Based on this denition Lagrangian metho ds for continuous problems can b e extended

to discrete problems The basic idea is to descend in the original discrete variable space of x

and ascend in the Lagrangemultipli er space of We prop ose a generic discrete Lagrangian

metho d as follows

Generic Discrete Lagrangian Metho d DLM

k k k k

x x x

x

k k k

chx

k k

where x is a discrete descent op erator for up dating x based on the Lagrangian

x

function F x c is a p ositive real numb er controlling how fast the Lagrange multipliers

change and k is the iteration index

of the Lagrangian function F x FixedPoint Theorem of DLM A saddle p ointx

is found if and only if and terminate

Pro of part If and terminate at x then hx and F x

x

hx implies that F x F x forany F x implies that

x

F x F x for all x that dier from x in only one dimension by a magnitude

Therefore x isasaddlepoint

is a saddle p oint by the Discrete Saddle Point Theorem x is a part If x

lo cal minimum and satises the constraints ie hx Also b ecause x is a saddle

p oint wehave F x F x for all x that dier from x in only one dimension bya

magnitude By the denition of F x Hence and terminates

x x

at x

x is not unique and can b e dened by either steep est descent or In

x

steep estdescent x F x is the direction with the maximum improvement

x x

A hillclimbing approach on the other hand cho oses the rst p oint in the neighborhood

of the current x that reduces F Thus the descent direction x F x in this

x x

case Although b oth approaches have dierent descent tra jectories they can b oth reach

equilibrium that satises the saddlep oint condition Consequently they can b e regarded as

alternative approaches to calculate F x

x

In solving discrete constrained problems with inequality constraints we rst transform

the inequality constraints to equality constraints using the maximum function Consider the

following discrete constrained problem

n

min f x

xD

sub ject to hx

g x

where hx h x h x g x g x g x and D is a subset of integers

m m



n n

ie D I This problem with inequality constraints is transformed into an optimization

problem with equality constraints as follows

n

f x min

xD

sub ject to hx

maxg x

Obviously the lo cal minima of this equality constrained problem are onetoone corresp on

dent with those of the original inequality constrained problem Then we apply our discrete

Lagrangia n metho d DLM to nd the lo cal minima of the transformed problem whichare

actually the lo cal minima of the original problem

As in applying Lagrangian metho d to solvecontinuous constrained optimization prob

lems the relativeweights b etween the ob jective and the constraints aect the p erformance

of the discrete Lagrangian metho d DLM to o Domination of either the ob jectiveorthe

constraints can hinder the search of saddle p oints byDLM

Wehave develop ed mechanisms to dynamically adjust the relativeweights b etween the

ob jective and the constraints in DLM In solving satisability problems using DLM to b e

discussed in Chapter wehave applied a mechanism of p erio dic reduction of the val

ues of the Lagrange multipliers to dynamically adjust the weights and to obtain improved

p erformance

Consider the eect of dynamic weight scaling using a prole obtained from solving a

dicult satisabili ty b enchmark problem gcnf In our formulation of satisability

problems in Chapter the constraints havevalues of or Accordingly Lagrange mul

tipliers in DLM are asso ciated with the constraints and are always nondecreasing For

dicult problems that require millions of iterations values can b ecome very large as the

search progresses Large s are generally undesirable b ecause they put to o muchweight

on the constraints causing large swings in the Lagrangianfunctio n value and making the

search of saddle p oints more dicult

Figure shows the proles of DLM without dynamic scaling of Lagrange multipli

ers on problem g starting from a random initial p oint Lagrange multipliers and

Lagrangia nfunctio n values can gro wvery large The ob jectivepart and constraintpart

Lagrangian part values of the Lagrangian function are shown in the lower two graphs As

time go es on the constraintpart b ecomes much larger than the ob jectivepart It takes a

very long time for DLM to nd a solution from a randomly generated initial p oint or it may

not nd one in a reasonable amount of time

Wehaveintro duced p erio dic scaledowns of Lagrange multipliers to control the growth

of Lagrange multipliers as well as Lagrangianfuncti on values Figure shows the result

when all the Lagrange multipliers are scaled down by a factor of every iterations

These graphs indicate that p erio dic scaling leads to b ounded values of Lagrange multipli

ers and Lagrangianfuncti on values The ob jectivepart and constraintpart values of the 16384 800

700 MIN 4096 AVE 600 MAX 1024 500

256 400 Lagrangian Value No. of Unsat Clauses 64 300 200 16 100

4 Lagrange Multiplier Value 0 Lagrangian and Objective Values 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000

Niter x 1000 Niter x 1000

16384 128 Obj Constr

4096 64

1024 32

256 16 Value Value

64 8

16 4

4 2 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000

N_iter N_iter

Figure

Execution proles of DLM A without dynamic scaling of Lagrange multipli

ers on problem g starting from a random initial p oint The top

left graph plots the Lagrangianfun ction values and the numb er of unsatised

clauses against the numb er of iterations sampled every iterations The top

right graph plots the minimum average and maximum values of the Lagrange

multipliers sampled every iterations The b ottom two graphs plot the

iterationbyiteration ob jective and Lagrangia npart values for the rst

iterations 16384 20

4096 Lagrangian Value No. of Unsat Clauses 15 1024

256 10

64 5 MIN AVE 16 MAX

Lagrange Multiplier Value 0 4 Lagrangian and Objective Values 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000

Niter x 1000 Niter x 1000

16384 128

4096 64

1024 32

256 16

64 8

Objective-Part Value 16 4 Lagrangian-Part Value

4 2 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000

Niter Niter

Figure

Execution proles of A with p erio dic scaling of by a factor of every

iterations on problem g starting from a random initial p oint Note that

the average Lagrangemultipl ier values are very close to See Figure for

further explanation

Lagrangia n function are more balanced and have ab out the same magnitude Using this

dynamic scaling metho d we can solve some of the more dicult satisability b enchmark

problems

Summary

In this chapter wehave addressed issues in handling nonlinear constraints Existing

metho ds that handle nonlinear constraints are either transformational or nontransformational

In nontransformational approaches the searchworks on the constraints directly and tries

to stay within feasible regions Such metho ds have diculty when the nonlinear constraints

form feasible regions that are irregular and are dicult to nd

Transformational approaches on the other hand are more p opular in dealing with non

linear constraints Constrained optimization problems are transformed into unconstrained

problems to b e solved Widely used metho ds include p enalty metho ds barrier metho ds

sequential quadratic metho ds and Lagrangian metho ds Among these metho ds Lagrangian

metho ds are general robust and can achieve high precision of solutions Due to these ad

vantages weuseavariation of Lagrangian metho ds to handle nonlinear constraints in this

thesis

In our Lagrangian metho ds we rst convert inequality constraints to equality constraints

and then search for saddle p oints of the Lagrangian function by p erforming descents in the

original variable space and ascents in the Lagrangemultipli er space There are twomethods

to convert inequalities the slackvariable metho ds and the MaxQ metho d They derive

dierent Lagrangian formulations The convergence sp eed of the slackvariable metho d is

usually faster than that of the MaxQ metho d However the slackvariable metho d may

oscillate or diverge under certain conditions and may not converge to a saddle p oint The

MaxQ metho d do es not have this problem but its convergence maybeslow

Wehave discovered that through dynamic control of the relativeweights of the ob jective

and the constraints we can reduce or eliminate oscillations and divergence of the slack

variable metho d Wehavedevelop ed various mec hanisms to control the relativeweight

adaptively during a search In applying the Lagrangian metho d with adaptive control to

application problems we are able to achieve faster and more robust convergence

Discrete optimization problems can b e solved by either exact or inexact metho ds For NP

hard problems exact metho ds such as branchandb ound are computationally intractable for

large discrete problems Inexact metho ds use heuristics to search for solutions and oer no

optimality guarantee However inexact metho ds have b een applied to much larger problems

than what exact metho ds can handle and have found go o d solutions

Wehavetaken the inexact approach in solving large problems and havedevelop ed the

discrete Lagrangian metho d DLM a Lagrange multiplierbased optimization metho d to

solve discrete constrained optimization problems A Lagrangian function of discrete variables

are formed by combining the ob jectives and the constraints using Lagrange multipliers Then

saddle p oints are searched by doing descents in the original variable space and ascents in

the Lagrange multiplier space In contrast to solving continuous problems in whichdescents

are based on continuous gradients heuristic discrete descents are p erformed by using hill

climbing

The relative magnitude of the ob jective and the constraints aects the p erformance of

the discrete Lagrangian metho d Wecontrol their relative magnitude by asso ciating weight

co ecients with them and adjust the weights dynamically during the search In our exp eri

ments of solving dicult discrete problems wehave found that appropriate dynamic control

of the relativeweights can improve the p erformance of the discrete Lagrangian metho d by

signicantly reducing the time to nd saddle p oints

OVERCOMING LOCAL MINIMA

General nonlinear optimization problems are dicult to solve due to the large number

of lo cal minima in the search space Go o d lo cal minima are dicult to b e found bylocal

search metho ds b ecause they stop at every lo cal minimum Hence to obtain globally optimal

solutions global search metho ds have b een develop ed to escap e from lo cal minima once the

search gets there and continue the search

In this chapter we address an imp ortant issue in solving nonlinear optimization problems

overcoming lo cal minima We rst review existing nonlinear lo cal and global optimization

metho ds and identify their advantages and disadvantages Then we prop ose a new multi

level global search metho d that combines global and lo cal searches We present the overall

framework of this metho d and discuss eachcomponent in details Our metho d employs

an innovative tracebased globalsearch metho d that searches the space continuously and

escap es from lo cal minima without restarts In the end we presentNovel a prototyp e

implementing our prop osed metho ds for nonlinear continuous and discrete optimization

Previous Work on Global Optimization

Due to the imp ortance of optimal solutions for engineering economical and so cial sci

ences applications many optimization metho ds have b een develop ed In recent decades as

computers b ecome more p owerful numerical optimization algorithms have b een develop ed

for many applications Nonlinear Unconstrained Optimization Methods

Local optimization (descent) methods Global optimization methods

Zero-order methods First-order methods Second-order methods

Simplex search Gradient descent Newton’s method Hooke and Jeeves method Quasi-Newton’s method Trust-region method

Conjugate-direction method Conjugate gradient method Levenberg-Marquardt’s method

Figure

Classication of lo cal optimization metho ds for unconstrained nonlinear opti

mization problems

Solution metho ds for nonlinear optimization problems can b e classied into lo cal and

global metho ds Lo cal optimization metho ds such as gradientdescent and Newtons meth

o ds use lo cal information gradient or Hessian to p erform descents and converge to a lo cal

minimum They can nd lo cal minima eciently and work b est in unimo dal problems

Global metho ds in contrast employ heuristic strategies to lo ok for global minima and do

not stop after nding a lo cal minimum A taxonomy on global optimization metho ds

can b e found in Note that gradients and Hessians can b e used in b oth

lo cal and global metho ds

Figure shows a classication of lo cal optimization metho ds for unconstrained nonlin

ear optimization problems The metho ds can b e broadly classied as zeroorder rstorder

and secondorder metho ds based on the derivative information used during the search Gen

erally higherorder metho ds converge to lo cal minima faster

Zeroorder metho ds do not use derivatives of ob jective functions during optimization

Examples are the simplex search metho d the Ho oke and Jeeves metho d the Rosenbro ck

metho d and the conjugate direction metho d

Firstorder metho ds use rstorder derivatives of the ob jective function during the search

Examples are the gradientdescent metho d the discrete Newtons metho d the quasiNewton

metho ds and the conjugate gradient metho ds The gradientdescent metho d p erforms a

linear search along the direction of the negative gradient of the minimized function

The discrete Newtons metho d approximates the Hessian matrix by the nite

dierence of the gradient QuasiNewton metho ds approximate the curvature of the nonlinear

function using information of the function and its gradientonlyandavoid the explicit

evaluation of the Hessian matrix Conjugate gradient metho ds combine the

current gradient with the gradients of previous iterations and the previous search direction to

form the new search direction They generate search directions without storing a matrix

Secondorder metho ds make use of secondorder derivatives They include Newtons

metho d Levenb ergMarquardts metho d and trust region metho ds In

Newtons metho d the inverse of the Hessian matrix multiplies the gradient and a suitable

search direction is found based on a quadratic approximation of the function Newtons

metho d converge quadratically if the initial p oint is close to a lo cal minimum Levenb erg

Marquardts metho d and trust region metho ds are mo dications of Newtons metho d By

using either a linesearch or a trustregion approach these algorithms converge when their

starting p oint is not close to a minimum Linesearch and trustregion techniques are suitable

if the number of variables is not to o large Truncated Newtons metho ds are more suitable

for problems with a large numberofvariables They use iterativetechniques to obtain a

direction in a linesearch metho d or a step in a trustregion metho d The iteration is stopp ed

truncated as so on as a termination criterion is satised

Lo cal optimization metho ds converge to lo cal minima For some applications lo cal op

tima may b e go o d enough particularly when the user can draw on hisher own exp erience

and provide a go o d starting p oint for lo cal optimization algorithms However for many

applications globally optimal or nearoptimal solutions are desired

In nonlinear optimization ob jective functions are multimo dal with many lo cal minima

Lo cal search metho ds converge to lo cal minima close to the initial p oints Therefore the

solution quality dep ends heavily on the initial p oint picked When the ob jective function

is highly nonlinear lo calsearch metho ds may return solutions muchworse than the global

optima when starting from a random initial p oint Nonlinear optimization methods

Local optimization methods Global optimization methods

Deterministic methods Probabilistic methods

Covering Generalized Clustering Methods based on methods descent methods methods methods stochastic models

Trace-based Trajectory &

method penalty methods

Figure

Classication of unconstrained nonlinear continuous global minimization meth

ods

Toovercome the deciencies in lo cal search metho ds global search metho ds havebeen

develop ed with global searchmechanisms Global search metho ds use lo cal search to deter

mine lo cal minima and fo cus on bringing the search out of a lo cal minimum once it gets

there

Figure shows a classication of nonlinear global optimization metho ds Metho ds to

solve global optimization problems have b een classied as either probabilisti c or determinis

tic Probabilistic sto chastic metho ds evaluate the ob jective function at randomly sampled

p oints from the solution space Deterministic metho ds on the other hand involve no element

of randomness

Alternatively global optimization algorithms can also b e classied as reliable and unre

liable Reliable metho ds guarantee solution quality while unreliable metho ds do not Prob

abilistic metho ds including simulated annealing clustering and random searching fall into

the unreliable category Unreliable metho ds usually have the strength of eciency and b etter

p erformance in solving largescale problems

Table

Global and lo cal searchcomponents used in existing global optimization metho ds

Metho d Global Comp onent Lo cal Comp onent

Random Search Uniform Sampling AnyAvailable Lo cal Metho d

Selective Recombination Optional Genetic Algorithm

Simulated Annealing Boltzmann Motion Optional

Cluster Analysis AnyAvailable Lo cal Metho d Clustering Metho d

Bayesian Metho d Bayesian Decision Optional

Interval Metho d Interval Calculation Rarely Used

Covering Metho d Informed Searchwith Rarely Used

Bound Approximation

Generalized Gradient Traveling Tra jectory Rarely Used

The survey that follows fo cuses on features of metho ds for solving continuous constrained

problems Whenever it is p ossible we analyze the balance that each algorithm strikes b e

tween global search and lo cal renement and relate this balance to its p erformance Table

summarizes this balance for a numb er of p opular global optimization metho ds

Deterministic Global Optimization Metho ds

Many deterministic metho ds have b een develop ed in the past Some of them apply

deterministic heuristics such as mo difying the search tra jectory in tra jectory metho ds and

adding p enalties in p enaltybased metho ds to bring a search out of a lo cal minimum Other

metho ds likebranchandb ound and interval metho ds partition a search space recursively

into smaller subspaces and exclude regions containing no optimal solution These metho ds

do not work well when the search space is to o large for deterministic metho ds to cover

adequately

a Covering metho dsCovering metho ds detect subregions not containing the global

minimum and exclude them from further consideration Covering metho ds provide guar

antee of solution quality and approximate the global solution by iteratively using tighter

b ounds Obtaining a solution with guaranteed accuracy implies an ex

haustivesearch of the solution space for the global minimum Hence these metho ds can

b e computationally exp ensive with a computation time that increases dramatically as the

problem size increases

Branchandb ound metho ds including interval metho ds are examples of covering meth

o ds Branchandb ound metho ds evaluate lower b ounds on the ob jective function of sub

spaces They allow an assessment of the quality of the lo cal minima obtained Combining

with computationally veriable sucient conditions for global optimality they allow one to

actually prove global optimality of the b est solution obtained

Covering metho ds are reliable since to the extenttheywork they have builtin guarantees

of solution qualityHowever they require some global prop erties of the optimization problem

such as the Lipschitz condition In the worst case covering metho ds take an exp onential

amountofwork Many of the heuristic techniques used for searching global solutions can b e

adapted to or combined with branchandb ound approachtotake advantage of structural

insights of sp ecic applications

b Generalized descent metho ds These metho ds continue the search tra jectory

every time a lo cal solution is found There are two approaches First tra jectory metho ds

mo dify the dierential equations describing the lo caldescent tra jectory so that they can

escap e from lo cal minima Their disadvantage is the large number of

function evaluations sp ent in unpromising regions Second p enalty metho ds preventmultiple

determination of the same lo cal minima by mo difying the ob jective function namelyb y

intro ducing a p enalty term on each lo cal minimum Their problem is that as more

lo cal minima are found the mo died ob jective function b ecomes more dicult to minimize

In existing generalized descent metho ds the descent tra jectory is mo died using internal

function information eg lo cal minima along the search

We prop ose a new global search metho d the tracebased metho d in this chapter Our

tracebased metho d b elongs to the class of generalized descent metho ds b ecause it mo dies

the descent tra jectory The dierence b etween it and existing tra jectory metho ds is that our

tracebased metho d uses external information to mo dify the search tra jectory A function

indep endent trace function is used to guide the search throughout the search space and pull

the search out of lo cal minima

Alternatively deterministic metho ds can b e classied into two categories a p ointbased

metho ds metho ds that compute function values at sampled p oints such as generalized

descent metho ds and b regionbased metho ds metho ds that compute function b ounds

over compact sets suchascovering metho ds Pointbased metho ds are unreliable but usually

have less computational complexity Regionbased metho ds are exp ensive but can pro duce

rigorous global optimization solutions when they are applicable

Probabilistic Global Optimization Metho ds

Probabilistic global minimization metho ds rely on probabilitytomake decisions The

simplest probabilistic algorithm uses restarts to bring a search out of a lo cal minimum when

little improvement can b e made lo callyAdvanced metho ds use more elab orate techniques

We classify probabilistic metho ds into clustering metho ds randomsearch metho ds and

metho ds based on sto chastic mo dels

a Clustering metho ds Clustering analysis is used to prevent the re

determination of already known lo cal minima There are two strategies for grouping p oints

around a lo cal minimum i retain only p oints with relatively low function values ii

push each p ointtoward a lo cal minimum by p erforming a few steps of a lo cal search

They do not work well when the function terrain is very rugged or when the searchgets

trapp ed in a deep but sub optimal valley

b Random search metho ds These include pure random search singlestart

multistart random line search adaptive random search partitioning

into subsets replacing the worst p oint evolutionary algorithms and simulated

annealing

Simulated annealing and genetic algorithms are tw o p opular sto chastic global optimiza

tion metho ds Simulated annealing takes its intuition from the fact that heating and slowly

co oling annealing a piece of metal brings it into a more uniformly crystalline state which

is b elieved to b e the state where the free energy of bulk matter takes its global minimum

The role of temp erature is to allow the congurations to reachlower energy states with a

probabilitygiven by Boltzmanns exp onential law so that they can overcome energy bar

riers that would otherwise force them into lo cal minima Simulated annealing is provably

convergent asymptotically in a probabilisti c sense but may b e exceedingly slow Various ad

ho c enhancements makeitmuch faster Simulated annealing has b een successfully applied

to solvemany nonlinear optimization problems

Genetic algorithms make use of analogies to biological evolution by allowing mutations

and crossovers b etween candidates of go o d lo cal optima in the hop e to deriveeven b etter

ones Ateach stage a whole p opulation of congurations are stored Mutations are p er

formed as lo cal search whereas crossover op erators provide the ability to leave regions of

attraction of lo cal minimizers With high probability the crossover rules pro duce osprings

of similar or even b etter tness The eciency of a genetic algorithm dep ends on the prop er

selection of crossover rules The eect of interchanging co ordinates is b enecial mainly when

these co ordinates have a nearly indep endent inuence on the tness whereas if their inu

ence is highly correlated such as for functions with deep and narrowvalleys not parallel

to the co ordinate axes genetic algorithms have more diculties Successful tuning of ge

netic algorithms requires a considerable amountofinsightinto the nature of the problem at

hand Genetic algorithms have shown promising results in solving nonlinear optimization

problems

Random search metho ds are easy to understand and simple to realize The simplest

random algorithm uses restarts to bring a search out of a lo cal minimum Others like

simulated annealing rely on probability to indicate whether a searc h should ascend from a

lo cal minimum Other sto chastic metho ds rely on probability to decide whichintermediate

p oints to interp olate as new starting p oints likeinrandomrecombinations and mutations

in genetic algorithms These algorithms are weak in either their lo cal or their global search

For instance gradient information useful in lo cal search is not used well in simulated anneal

ing and genetic algorithms In contrast gradientdescent algorithms with multistarts are

weak in global search These metho ds p erform well for some applications However they

usually havemany problemsp ecic parameters leading to low eciency when improp erly

applied

c Metho ds based on sto chastic mo dels Most of these metho ds use random vari

ables to mo del unknown values of an ob jective function One example is the Bayesian

metho d which is based on a sto chastic function and minimizes the exp ected deviation of

the estimate from the real global minimum Bayesian metho ds do not work

well b ecause most of the samples they collect randomly from the error surface are close to

the average error value and these samples are inadequate to mo del the b ehavior at minimal

p oints Other metho ds based on sto chastic mo dels include metho ds that approximate the

level sets Although very attractive theoretically this class of metho ds are to o exp ensiveto

b e applied to problems with more than twentyvariables

Up to to day general nonlinear optimization algorithms can at b est nd go o d lo cal minima

ofamultimo dal optimization problem Only in cases with restrictive assumptions suchas

Lipschitz condition algorithms with guaranteed accuracy can b e constructed

Existing global optimization metho ds are weak in either their lo cal or their global search

To address this problem wehavedevelop ed a new global search metho d consisting of a

combination of global and lo cal searches Our metho d has the following key features of a

go o d global search algorithm

It identies promising regions eciently

It uses gradient information to descend into lo cal minima and is able to adjust to

changing gradients

It escap es from a lo cal minimum once the search gets there It leaves regions of

attraction of lo cal minimizers in a continuous fashion without restarting from one

lo cal minimum to another

It avoids redetermination of the same lo cal minima

Our metho d is based on a new tracebased globalsearch metho d in which a deterministic

external force guides the search throughout the space and pulls the search tra jectory out of

lo cal minima once it gets there This metho d is presented in Section of this chapter

Global Search in Constrained Optimization

At the level of global search strategies for solving constrained problems are similar to

those for solving unconstrained problems except in the handling of constraints Deterministic

and probabilistic global search metho ds have b een applied to solve constrained optimization

problems in conjunction with b oth nontransformationa l and transformational constraint

handling approaches

When constraints are handled using transformational approaches constrained optimiza

tion problems are transformed into one or a series of unconstrained optimization problems

Global search metho ds either deterministic or probabilistic can then b e applied to solve

the transformed problems For example multistart of gradient descent genetic algorithms

GAs and simulated annealing SA have b een applied to solve p enalty or barrier func

tions In Lagrangian metho ds multistarts of lo cal searches nd saddle

p oints from randomly generated initial p oints

Many global search metho ds also work with nontransformationa l approaches of handling

constraints For example in pure random search with rejectingdiscarding of infeasible

p oints p oints in the search space are randomly generated according to some probability

distribution and the b est feasible p oint found is rememb ered This metho d p erforms a blind

search through the search space with no lo cal renement In multistarts p oints feasible

p oints if p ossible in the search space are randomly generated and are used as initial p oints

for lo cal improvement When working directly on constraints GAs maintain a p opulation

of sample p oints and generate new sample p oints based on b oth constraint satisfaction and

ob jectivevalues of existing sample p oints SA p erforms an adaptive sto chastic search

and the acceptance rates of search p oints are determined byacombination of constraint

satisfaction and ob jectivevalue

Our new global search metho d handles constrain ts using the transformational approach

Constrained optimization problems are transformed into Lagrangian functions using La

grange multipliers Saddle p oints of Lagrangian functions are then searched using lo cal

and global search metho ds

Problem Sp ecication and Transformation

In the rest of this chapter we present our new nonlinear global search metho d This

metho d solves general nonlinear continuous and discrete constrained and unconstrained

optimization problems By rst transforming nonlinear constrained optimization problems

using Lagrange multipliers into unconstrained optimization formulations b oth constrained

and unconstrained problems are solved in a unied framework

As dened in Chapter an ndimensional unconstrained optimization problem is

T

min f x x x x x

n

n

xR

while x is an ndimensional vector f x is the ob jective function and T stands for transp ose

The gradient of function f x rf x is an ndimensional vector as follows

T

f x f x f x

rf x

x x x

n

An ndimensional constrained optimization problem is dened as

min f x

n

xR

st h x i m

i

g x j m

j

where there are m equality constraints and m inequality constraints

Constrained optimization problems are transformed into Lagrangian formulations In

equality constraints are handled using the slackvariable metho d or the MaxQ metho d as

discussed in Chapter Using the slackvariable metho d the corresp onding augmented

Lagrangia n function is

T

jjhxjj Lx f x hx

m



m

X

max g x

m j j



m j



j

m

where R m m m are Lagrange multipliers Using the MaxQ metho d the

augmented Lagrangian function has the following form

T

Lx f x hx jjhxjj

m



m m

X X

q q

j j

max g x max g x

j m j j



j j

Lagrangian function Lx is a scalar function with gradient rLx

rLx r Lx r Lx

x

Lx Lx

i n and j m

x

i j

Our metho d is a rstorder metho d b ecause only rstorder derivatives as well as the ob jective

and the Lagrangian function value are used during the search

In Figure wehave shown howvarious forms of nonlinear optimization problems are

solved in a unied framework Unconstrained optimization problems are solved directly by

our global search metho d without any transformation

Constrained optimization problems whether discrete or continuous are rst transformed

into Lagrangian formulations in which the constraints are combined with the ob jective

function using Lagrange multipliers For a decision problem without an ob jective function an

articial ob jectiveisintro duced Hence our global search metho d works on the Lagrangian

function and nds saddle p oints that corresp ond to globally optimal or nearoptimal solutions

of the original constrained problems

A New Multilevel Global Search Metho d

In this section we prop ose a new global search metho d with the fo cus of overcoming

lo cal minima in a unique deterministic and continuous fashion

Our metho d employs a combination of global and lo cal searches A global searchis

applied to lo cate promising regions in which lo cal searches are invoked to nd the lo cal Global-Search Phase

Coarse-Level Global Search

Fine-Level Global Search : Trace-Based Global Search

Local-Search Phase Local Descent Methods

Optimal/near-optimal solutions

Figure

Framework of a new multilevel global search metho d This metho d has a com

bined global and lo cal search phases

minima The global search is further divided into coarselevel and nelevel global search A

coarselevel global searchquickly identies promising subspaces in a large highdimensional

search space Then a nelevel global search applies a tracebased metho d to p erform

more detailed search The tracebased globalsearch metho d uses a problemindep endent

trace function to guide the search throughout the search space and pulls a search out of

a lo cal optimum a lo cal saddle p oint in a constrained optimization and a lo cal minimum

in an unconstrained optimization without having to restart it from a new starting p oint

Promising lo cal minimal regions are lo cated by the nelevel global search Starting from

initial p oints provided by the nelevel global search a lo cal search nds lo cal minima that

contain globally optimal or nearoptimal solutions Our metho d strives a balance b etween

global and lo cal searches

Figure shows the overall framework of our global search metho d It consists of a

globalsearch phase and a lo calsearch phase In the globalsearch phase a coarselevel

global search is followed by a nelevel global search The search space of a highdimensional

optimization problem can b e enormous making it imp ossible to examine the whole space

exhaustively Usually a small p ortion of the search space can b e searched given a reasonable

amount of time The task of coarselevel global search is to quickly identify promising sub

regions in a large highdimensional search space Outputs of the coarselevel search are used

as initial p oints in the nelevel global search

Existing global search metho ds suchassimulated annealing SA genetic algorithms

and multistart of descents are go o d at nding promising subregions given a limited amount

of time We apply them in the coarselevel search In applying SA we imp ose a fast annealing

scheme and a limited numb er of iterations in order for SA to converge quickly In applying

GA we set a limit on the numb er of generations and select the b est solutions in the nal

p opulation as outputs In applying multistart of descents we p erform a limited number of

descent steps from randomly generated sample p oints and select the b est descent results as

outputs

As an alternative to random multistarts wehavedevelop ed a tra jectorybased metho d

to p erform a coarselevel search It is based on a D parametric function that traverses

inside the search space Sample p oints along the tra jectory with small ob jectivevalues are

used as starting p oints to p erform limited steps of descents The b est solutions after descents

are output

Besides search metho ds domain knowledge is another alternative to help identify promis

ing search regions For well understo o d problems users mayprovide go o d initial p oints

directly for the nelevel global search Heuristic metho ds such as greedy search metho ds

can also b e applied to generate initial p oints for the nelevel global search

Starting from the initial p oints provided by the coarselevel searc h the nelevel global

search examines subregions in more details We apply an innovative tracebased global

search metho d to b e presented next that is a variation of the generalized descentmethod

In contrast to existing tra jectorybased global search metho ds our tracebased metho d mo d

ies the search tra jectory using external information The search tra jectory is formed bya

combined force of a problemindep endent trace function and lo cal gradient The trace func

tion is used to guide the search throughout the search space and pulls the search out of a

lo cal optimum without having to restart it from a new starting p oint Thus our tracebased

search metho d overcomes lo cal minima in a continuous and deterministic manner Along

the search tra jectory p oints corresp onding to promising lo cal minimal regions are fed to the

lo calsearch phase

The lo calsearch phase p erforms lo cal descents starting from initial p oints provided by

the globalsearch phase Existing lo caldescent metho ds including gradientdescent quasi

Newtons metho ds and conjugate gradient metho ds are applied The b est solutions after

lo cal descents are rep orted as the nal solutions

Tracebased Global Search

Our tracebased global search metho d relies on an external force to pull the search out

of a lo cal minimum and employs lo cal descents to lo cate lo cal minima It has three fea

tures exploring the solution space lo cating promising regions and nding lo cal minima

In exploring the solution space the search is guided by a continuous terrainindep endent

trace function that do es not get trapp ed by lo cal minima This trace function is usually an

ap erio dic continuous function that runs in a b ounded space In lo cating promising regions

the tracebased metho d uses lo cal gradient to attract the search to a lo cal minimum but

relies on the trace to pull it out of the lo cal minimum once little improvement can b e found

Finally the tracebased metho d selects one initial p oint for each promising lo cal region and

uses them as initial p oints for a descent algorithm to nd lo cal minima

General Framework

Figure illustrates the pro cess of our tracebased global search The search pro cess

uncovers promising lo cal optimal regions without going deep into eachone A problem

indep endent trace function leads the search throughout the search space and is resp onsible

for pulling the search out of lo cal minima once it gets there Lo cal minima unveil themselves

through gradients The global search tra jectory is formed by the combination of two forces

a problemindep endent force provided by the trace function and a problemdep endent force

provided by the lo cal gradient These tw o counteracting forces descents into lo cal minima

and attraction exerted by the trace form a comp osite vector that represents the route taken Trace function t=0 o Trajectory 1 o o Trajectory 2 o Trajectory 3

Trace o Trace direction o Moving Trajectory direction

Gradient direction

Figure

In the tracebased global search the search tra jectory shows a combined eect

of gradient descents and pull exerted by the moving trace Three cascaded

global search stages are shown in this gure that generate tra jectory and

resp ectively

by the tra jectory Figure shows on the left side how these two forces are combined to

form the search tra jectory

Generally there could b e a numb er of b o otstrapping stages in a tracebased global search

One stage is coupled to the next stage by feeding its output tra jectory as the trace of the

next stage The problemindep endent trace is fed into the rst stage Three cascaded global

search stages are shown in Figure The trace and the lo cal gradient generate tra jectory

Then tra jectory and the lo cal gradient generate tra jectory and so on

In the tracebased global search the dynamics of each stage in the globalsearchphase

is characterized by a system of ordinary dierential equations ODEs which contain b oth

problemdep endent information of lo cal gradient and problemindep endent information of a

trace function

Consider the optimization problem sp ecied in The system of ODEs that charac

terize each stage of the tracebased global search has the following general form

xt P r f xt QT t xt

x

the trace function is where t is the only indep endentvariable in the systems of ODEs T t

a function of tand P and Q are some nonlinear functions P r f x enables the gradient

x

to attract the tra jectory to a lo cal minimum QT t xt allows the trace function to lead

the tra jectory Solving the system of ODEs as an initialval ue problem from a given set of

initial values of x generates a search trajectory xt x t x t

n

P and Q can havevarious forms A simple form used in our exp eriments is

dxt

r f xt xt T t

g x t

dt

where and are co ecients sp ecifying the weights for descents in lo cal minimum regions

g t

and for exploration of broader space Note that the rst comp onent on the righthand side

of is actually gradient descent and the second comp onent forces xt to followthe

trace function T t

Generallyweights and can have dierentvalues in dierent global stages For

g t

example can b e set to have large values relativeto in the rst stage so that the global

t g

search is emphasized In later stages can have larger values and the search is fo cused on

g

a lo cal region In the simplest case and are set to constants and are the same in all

g t

stages of the globalsearch phase

When there are several stages in the tracebased global search one stage is coupled to

the next stage by feeding its output tra jectory as the trace function of the next stage with a

predened problemindep endent function as the input trace function of the rst stage The

stages are cascaded together Each stage is characterized by a system of ODEs in the form

of but with dierent T t T t in the rst stage is an external trace function T t

of later stages is the output tra jectory xt of the previous stage In our implementation

when the output tra jectory is a set of discrete sample p oints of the continuous tra jectory

interp olation is p erformed to form a continuous T t for each stage

Figure shows a threestage tracebased globalsearch metho d In general the functions

P and Q in each stage of the tracebased metho d can b e dierent In earlier stages more

weight can b e placed on the trace function allowing the resulting tra jectory to explore more

regions In later stages more weight can b e placed on lo cal descents allowing the tra jectory

to descend deep er into lo cal basins Note that all the equations in the tracebased global

X

Stage

Stage Stage

Y t

T t X t

Z t

Y t P r f Y t

Z t P r f Z t

Y

X t P r f X t

Z

X

QX tY t

QY tZt

QT tXt

Tracebased Global Search

Descent Metho ds

Lo calSearch Phase

Lo cal Minima

Select starting p oints fromX t Y t Z t

Apply gradient descent from these p oints

Figure Framework of a threestage tracebased global search metho d

search can b e combined into a single equation b efore b eing solved We did not do so b ecause

each tra jectory may identify new starting p oints that lead to b etter lo cal minima

In the lo calsearch phase a traditional descent metho d such as gradient descent conju

gate gradient or QuasiNewtons metho ds is applied to nd lo cal minima Initial p oints for

the lo cal search are selected based on tra jectories output by the tracebased global search

Two heuristics can b e applied in selecting initial p oints use the b est solutions in p erio dic

time intervalsasinitialpoints or use the lo cal minima in the tra jectory in each stage as

initial p oints As shown in Figure starting p oints for the lo cal search are selected from

search tra jectories xt y t and zt of the globalsearch phase Descent metho ds are

applied to nd lo cal minima starting from these initial p oints

Wehave presented the formulation of the our tracebased search metho d for uncon

strained problems Its formulation for constrained problems are similar Our tracebased

metho d solves constrained problems based on Lagrangian functions Traditional Lagrangian

metho ds p erform lo cal searchby doing descen ts in the originalvaria ble space and ascents in

the Lagrangemultiplier space Lo cal searchconverges to a saddle p oint that corresp onds to

a lo cal minimum of the original constrained problem Our tracebased search mo dies the

descent tra jectory in the originalvari abl e space and uses a trace function to pull the search

out of lo cal minima

Tracebased Search Gradient Descent

dx dx

r f xt r f xt xt T t

x g x t

dt dt

UNCONSTRAINED PROBLEMS

CONSTRAINED PROBLEMS

Tracebased Search Lagrangian Metho ds

dx dx

r Lxtt r Lxtt xt T t

x g x t

dt dt

d d

r Lxtt r Lxtt

dt dt

LOCAL OPTIMIZATION GLOBAL OPTIMIZATION

without trace function with trace function

Figure

Relationship b etween the tracebased global search and the lo cal optimization

metho d

In comparison to the formulation for an unconstrained problem the system of

ODEs sp ecifying a globalsearch stage for a constrained problem is as follows

dxt

r Lxtt xt T t

g x t

dt

dt

r Lxtt

dt

where and are weight co ecients Lxtt is the Lagrangian function and T t

g t

is the trace function A trace function is used in the original variable space to overcome

the lo cal minima of the nonlinear ob jective No trace function is necessary in the Lagrange

multiplier space b ecause the values of Lagrange multipliers change adaptively to reect

the degree and duration of constraint violation during the search A trace function in the

Lagrangemultipl ier space breaks the relationship b etween a search p oint in the original

variable space and its corresp onding Lagrangemultipli er values whichmakes Lagrangian

metho ds not work well

In Figure we summarize the mathematical formulations of the tracebased search

metho d for unconstrained and constrained continuous problems and show its relationship to

lo cal optimization metho ds This gure indicates that our tracebased search is an extension

of existing lo cal search metho ds and p erforms global search

An Illustrative Example

Lets illustrate the tracebased global search using a simple example based on Levys No

problem This example involves nding the global minimum of a function of two

variables x x in a b ounded region The function is

X X

f x x icosi x i jcosj x j

l



i j

Figure shows the D plots and D contour plots of this function and the search

tra jectories of the tracebased global search The b ounded region of interest is in

each dimension The function has three lo cal minima in this region one of whichisthe

global minimum Using a search range of in each dimension we start Novel from

and run it until logical time t

Figure shows the trace function and search tra jectories of a threestage tracebased

global search The plots in the rst column show the trace function mapp ed on the

dimensional terrain and the dimensional contour Although the trace function visits all

three basins it only touches the edge of the basin containing the global minimum The plots

in the second column show the tra jectory generated from stage The search tra jectory

is pulled down toward the b ottom of each lo cal minimal basin Likewise the plots in the

third and fourth columns show resp ectively that the tra jectories are further pulled down

to the lo cal basins after stages and By following the tra jectories three basins with

lo cal minima are identied and a set of minimal p oints in each tra jectory can b e used as

initial p oints in the lo calsearch phase Clearly all the lo cal minima are identied in the

globalsearch phase 100 100 100 100 1 1 1 1 0 0 0 0 0.5 0.5 0.5 0.5 -100 -100 -100 -100 -1 0 -1 0 -1 0 -1 0 -0.5 -0.5 -0.5 -0.5 0 -0.5 0 -0.5 0 -0.5 0 -0.5 0.5 0.5 0.5 0.5 1 -1 1 -1 1 -1 1 -1

1 1 1 1 x x x x x x x x 0 S 0 S 0 S 0 S x x x x x x x x x x -1 -1 -1 -1

-1 0 1 -1 0 1 -1 0 1 -1 0 1

Figure

Illustration of the tracebased global search D plots top row and D con

tour plots b ottom row of Levys No function with sup erimp osed search

tra jectories trace rst column tra jectory after stage second column tra

jectory after stage third column and tra jectory after stage fourth col

umn Darker regions in the contour plot mean smaller function values The

tra jectories are solved by LSODE a dierentialequation solver

Trace Function Design

In a tracebased global search a problemindep endent trace function leads the search

throughout the search space To nd the global minima eciently without any knowledge

of the terrain a trace function should cover the search space well and guides the searchin

an ecientway

A trace function should b e continuous and cover the search space uniformly In realworld

applications some physical time constraints are often imp osed in solving a problem Hence

it is desirable to continuously improve the b est solution found when given more time

Given the desired b ehavior of the trace function wehavedevelop ed some quantitative

p erformance measures of the quality of a trace function

Evaluation of Coverage

Quantitative measurements of the coverage of a trace function in a highdimensional

space are necessary in evaluating its p erformance A trace function is a onedimensional

curve Wewant to nd a onedimensional curve that covers a highdimensional space well

In this section we present the coverage measurementwehave studied and compare various

trace functions based on it

One p ossible distance measurementofa point i in the highdimensional space S to a

curve C is

min D

ij

j C

where j is some p ointonthecurve C and D is the Euclidean distance b etween j and i

ij

Coverage is evaluated based on the distances of all p oints in the space to the curve One

measurement is the distance distribution Another one is the maximum distance using the

following maxmin function

maxminD

ij

iS j C

Here is the radius of the largest sphere inside the highdimensional space that curve C

do es not pass through Accordingly designing a curve that has the maximal coverage of a

highdimensiona l space is to minimize

Often there is no closedform formula to evaluate the distance distribution or the maxi

mum distance for a given curve C Numerical evaluation is p ossible but usually exp ensive

Since the curve can haveanyshapeifweevaluate the distance of every p oint in the high

dimensional space to the curve the computation would b ecome intractable very quicklyTo

nd the p oint that has the largest distance from a curve is also not trivial

To make the evaluation of coverage tractable weturntoapproximate measurements

Instead of evaluating the distance of every p oints in the space to the curve we divide the

space into subspaces and measure the distance of each subspace to the curveintermsof

the numb er of subspaces in b etween The Euclidean distance from the center of a subspace

to the curve is approximated bythemultiplication of the numb er of subspaces in b etween

and the size of a subspace

Using approximate measurement the pro cedure to evaluate the coverage of a curveina

space is as follows First the highdimensional space is uniformly partitioned into smaller

subspaces A subspace is said to b e covered if the curve passes through the subspace

and has distance from the curve Subspaces contiguous to subspace of distance have

distance Subspaces contiguous to subspace of distance have distance and so on The

evaluation pro ceeds likeawaveform starting from subspaces intersected with the curve and

propagating to subspaces further away Based on the distances of all subspaces wecan

evaluate the coverage of a curveby using either the maximal distance or the distribution

Figure illustrates the pro cess of nding the distances of subspaces to a given curve

The space is a D square and is divided into x subsquares Then we mark the subspaces

intersecting with the curveashaving distance The subspaces contiguous to those with

distance are marked with distance and so on Eventuallywe nd the distances of all

subspaces which is the least numb er of subspaces b etween this subspace and the nearest

p oint of the curve The distance is called unit distance 55555555 11122222 1 1 4 4 4 4 4 4 4 4 1 111111 1 111 1 33333334 1 1 1 1 1 3 222223 4 1 1 11 1 1 1 11 3 2 1 1 1 2 3 4 1 1 1 1 1 3 2 1 0 1 2 3 4 1 1 1 1 1 1 1 1 1 3 2 1 1 1 2 3 4 2 2 1 1 1 1 1 2 1 1111

3 2222234 33222222 1 1 1

Figure

Illustration of the distance evaluation in a D square and several curves with

dierent lengths The left graph contains a very short curve The curveis

extended in the middle graph The right graph have the longest curve and the

best coverage The number in each subsquare is the shortest unit distance from

the subsquare to the curve

The left graph in Figure shows the unit distance of each subsquare to a very short

curve inside the subsquare with distance of Subsquares contiguous to distance sub

squares have distance Subsquares on the top rowhave the largest distance of In the

middle graph a longer curvecovers more regions of the space Subsquares are marked by

their unit distance to the curveasshown in the graph Distances of are omitted from the

graph The maximum unit distance of is p ossessed bythetwo subsquares at the lower

left and right corners In the graph on the right the curve is further extended and covers

even more regions of the space Now the maximum unit distance is This curveachieves

b etter coverage than the one in the middle

After getting the distance information of subspaces we can then evaluate the coverage

of a curve One metric is the maximal distance which corresp onds to the value in formula

However the single value may not tell the whole story of the coverage Two curves

with the same maximal distance could havevery dierentcoverage A b etter ev aluation

metho d is to lo ok at the distribution of subspace distances and use a set of p oints from the

distribution such as the or p ercentile as evaluation metrics 1 112 2 222

1 1 2 1112

1 1 1 1 1 1

1 1 1

C 12C

Figure

Two equallength curves C and C with the same maximal distance of sub

spaces but with dierent distance distributions

For example Figure shows two curves C and C of the same length in a D space

The space is partitioned into subsquares For b oth curves the maximal distance of

subsquares is Their distance distributions of subsquares are listed in the following table

Distance to curve

Number of C

subsquares C

From the distribution we know that curve C has b etter coverage

The accuracy of the approximate measurement based on subspaces dep ends on the size

of a subspace When the space is divided more nely and each subspace is smaller the

approximation can have high accuracy but at the exp ense of more computations For an

n

ndimensional space if the space is divided into half along each dimension there will b e

n

subspaces If each dimension is divided into k pieces then there will b e k subspaces The

computational cost of evaluating distances of all subspaces is prop ortional to the number of

subspaces Therefore the computational cost increases rapidly as the space is divided into

smaller pieces For a highdimensional space the computational complexity is exp onential

and prohibitiveeven for a mo derate division of the space Hence to evaluate coverage in

highdimensiona l spaces we apply sampling techniques A large numb er of subspaces are

sampled to give a go o d estimation of the actual distance distribution of subspaces We

found out that samples are usually go o d enough

Based on the coverage evaluation metho d wehave describ ed we then study the coverage

of various trace functions

Space Filling Curves

There are two alternativeways to search a space a divide the space into subspaces

and search subspaces extensively one by one and b search the space from coarse to ne

The rst is a divideandconquer approach One example is to use space lling curves

Space lling curves have b een studied by mathematicians since Peano intro duced them in

A space lling curveisanycontinuous and nonoverlapping set of line segments

which completely lls a subset of highdimensional space In the dimensional case a space

lling curve is a onedimensional curve that lls a plane and leaves no space free One example

is the socalled snake curve shown on the left of Figure The rst twoPeano curves

P and P are also plotted in Figure Peano curves are generated recursively Curve P

i

is generated from curve P by duplicating P nine times translating and rotating each

i i

duplicate appropriately and joining the ends

P eano curves are generated based on the base numb er system Hilb ert pro duced

space lling curves based on the evennumb er system The rst three Hilb ert curves of degree

are shown in Figure Curve H has four vertices at the center of each quarter of the

unit square Curve H has vertices each at the center of a sixteenth of the unit square

The Hilb ert space lling curve is the limit of the sequence of curves and visits every p oint

within a twodimensional space Generally for an ndimensional space the j th Hilb ert curve

is constructed from n copies of the jst Hilb ert curve The true space lling curveisthe

one that the order go es to the limit of innity

A fundamental prop erty of space lling curves is that a space lling curvenever intersects

itself To completely ll a plane space lling curves have to turn over themselves a great

numb er of times They often provide a convenient means of exploring the space around a

"Snake" Curve P1 P2

Figure

Space lling curves snakecurve left and the rst twoPeano curves P and

P right

H1 H2 H3

Figure The rst three Hilb ert space lling curves of degree

sp ecic p oint Space lling curves have b een applied in computer graphics and databases as

a basis for multiattribute indices

One disadvantage of using a space lling curve as a trace to guide the search is that the

appropriate degree of the space lling curve needs to b e decided b efore the search starts

That gives the granularityinwhich the searchcovers the space Then the searchtraverses

subregions one by one This typ e of search do esnt provide a continuous improvementof

solution quality as time go es by Without prior knowledge of the search terrain the search

pro cess may sp end a lot of time in some unpromising subregions b efore nally going to

regions with go o d solutions In that situation the quality of the solution is p o or and will

not b e improved for a long p erio d of time

Searching from Coarse to Fine

The second approachistosearch the space from coarse to ne The more time there is

available the ner details the search discovers The advantage of this approach is the con

tinuous improvement of solution quality The trace function of this approachmayintersect

itself and do es not cover the space as uniformly as space lling curves do However this

issue is not serious when the curve is used to search a highdimensional space and the time

available is limited

Wehave designed a family of trace functions that use this approach These trace func

tions are ap erio dic and do not rep eat themselves They are parametric functions of one

indep endentvariable are designed based on the sin usoidal function have a b ounded range

and move in a cyclic mo de Tomakea sinusoidal function ap erio dic we put a noninteger

power on the indep endentvariable The basic function lo oks like the following

sin t

where controls the cycle frequency The larger the is the higher the

frequency is With an ap erio dic sinusoidal function of dierent in each dimension a trace

function traverses the space in an ap erio dic fashion and do es not rep eat itself By having an

appropriate for each dimension the curve can cover the space approximately in a uniform

way

In our tracebased globalsearch metho d a dynamic system containing the trace function

and the optimized function information evolves on a pseudotime scale A parametric trace

function with one automatic variable can b e inco op erated into the dynamic system naturally

In an ndimensional space the trace function wehave designed has the following form

a a d ik

  

t

T t sin i n

i i

s

where i represents the ith dimension a d s andk are parameters that control the

sp eed and shap e of the function a and d areintherangeof and a d They

sp ecify the dierence that the function moves in dierent dimensions s controls the

sp eed of the function The larger the s is the slower the function progresses with resp ect

to t sp ecies the range that the function moves in the ith dimension k n is a scaling

i

factor The larger the value of k is the smaller the function changes from one dimension to

another

n

When the trace function is b ounded in space If the search space of

i

n n

an application problem is not in we can either map the search space to or

enlarge the range of the trace function by increasing

i

A further extension of function is to add an initial phase shift to the trace function

in each dimension as follows

a a d in

  

i t

i n T t sin

i i

s n

The second term inside the sine function is the initial dimensiondep endent phase angle

We compare trace functions with dierent parameter values using the coverageevaluation

metho d presented in the last section We use trace function as an example to illustrate

the evaluation of coverage In this example the set of parameters are a s

-1 0 -1 0 -1 0 1 1 1

0 0 0 0 0 0

-1 -1 -1

0 1 0 1 0 1

t t t

Figure

D plots of trace function at t and resp ectively The parameter

values of the trace function are a s d and k

i

d andk Figure plots the trace function at t and

i

resp ectively in a dimensional space

As shown in Figure the trace function moves in a band in the rst time units By

t the trace function has prettymuchtraversed the whole space once By t the

trace function continues to cover more vacant regions and approximately in a uniform way

Figure shows the coverage measurement of this same trace in Figure on a

D space based on our algorithm The D square is divided into x subsquares The

numberineach subsquare is the unit distance of the subsquare to the trace segment The

three graphs show the coverage of the trace when it runs for and units of time

resp ectivelyFor t the maximal distance is After time units the maximal distance

is reduced to After time units the maximal distance is

erage dep ends on how long the trace runs and the length of the trace segment The cov

Therefore the coverage of trace functions should b e compared based on same length of trace

segments

Parameters a and d are two imp ortant parameters in trace function They

determine the dierence that the trace moves in each dimension a sp ecies the fastest

movementand d sp ecies the slowest movement When a the function is p erio dic

with a cycle of in the rst dimension and moves the fastest among all dimensions In

t

             

             

             

             

             

             

             

            

            

              

              

               

              

             

             

            

             

            

            

            

t

                  

                  

                 

                  

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                  

                  

t

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

                   

Figure

Coverage measurements of a trace function in a D space Each dimension

has sample p oints The three graphs corresp ond to the coverage when the

trace runs for and time units resp ectivelyEachnumb er is the unit

distance of the corresp onding subsquare to the trace function

d



the highest dimension the function is close to sin ts Parameter s in trace function

controls the sp eed of the trace with resp ect to t and do es not aect the shap e of the

trace

Numerical Evaluation of TraceFunction Coverage

In this section weevaluate the coverage of dierent trace functions in the form of

byvarying the values of a and d The search space is assumed to b e a hyp ercub e in the

range of in each dimension Each dimension of the highdimensional space is divided

into equallength comp onents In this wayan ndimensional space is decomp osed into

n

subspaces of equal size The coverage of a trace is evaluated based on the distribution

of distances from subspaces to the trace

Figure shows the coverage of trace functions with a and d

and resp ectively Each of the four graphs shows a D plot and the corresp onding

contour plot of the cumulative distribution of distances from subspaces to the trace The

three axes are the length of the trace the distance of a subspace to the trace in the range

of and the cumulative distribution of distances from all subspaces to the trace

F or example the upp erleft graph shows the coverage distribution of a trace function with

d When the trace reaches a length of of the subspaces are passed by

the trace within one unit distance of the trace within a distance of

within a distance of within a distance of and all subspaces are within a distance

of The contour plot shows the and p ercentile of the cumulative distribution

The distance distributions of these trace functions with various d are very similar to

each other For all of them the p ercentile of distance distribution are less than after

the curve traverses longer than Moreover the distance distribution is not sensitiveto

values of d larged values eg giveworse coverage in the b eginning for shorter curves

but are slightly b etter for longer curves All the trace functions show the general trend that

an exp onentially longer curve is needed to improve the coverage linearly 4-D space: a0 = 0.05, d0 = 0.3 4-D space: a0 = 0.05, d0 = 0.5

0.9 0.9 0.5 0.5 0.1 0.1 Accumulative Distribution Accumulative Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 10 0 10 100 200 Distance 100 200 Distance 300 400 5 300 400 5 Curve Length500 Curve Length500 600 700 0 600 700 0

4-D space: a0 = 0.05, d0 = 0.7 4-D space: a0 = 0.05, d0 = 0.9

0.9 0.9 0.5 0.5 0.1 0.1 Accumulative Distribution Accumulative Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 10 0 10 100 200 Distance 100 200 Distance 300 400 5 300 400 5 Curve Length500 Curve Length500

600 700 0 600 700 0

Figure

The coverage of trace functions in a D space with a and d

and resp ectively Each graph shows a D plot and a con

tour plot of the cumulative distribution of distances from subspaces to the

trace function The contour plots show the and p ercentile of the

distribution 4-D space: a0 = 0.05, d0 = 0.3 4-D space: a0 = 0.05, d0 = 0.3

0.9 0.9 0.5 0.5 0.1 0.1 Accumulative Distribution Accumulative Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 10 0 10 100 200 Distance 100 200 Distance 300 400 5 300 400 5 Curve Length500 Curve Length500 600 700 0 600 700 0

4-D space: a0 = 0.05, d0 = 0.3 4-D space: a0 = 0.05, d0 = 0.3

0.9 0.9 0.5 0.5 0.1 0.1 Accumulative Distribution Accumulative Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 10 0 10 100 200 Distance 100 200 Distance 300 400 5 300 400 5 Curve Length500 Curve Length500

600 700 0 600 700 0

Figure

Sampling is used to evaluate the coverage of a set of trace functions in a D

space with a and d and resp ectively Each graph

shows a d plot and a contour plot of the cumulative distribution of distances

between subspaces and the trace function The contour plots show the

and p ercentiles of the distribution

Wehave also varied a while xing d Our results are similar to the situation of

varying a The distance distribution is not very sensitivetothevalues of a in the range of

We are interested in solving highdimensional optimization problems Hence wewantto

nd out the coverage of the trace function for highdimensional space The computational

complexity of complete coverage evaluation based on subspaces increases exp onentially with

resp ect to the numb er of dimensions b ecause the numb er of subspaces increases exp onen

tially Therefore we use a sampling metho d to evaluate the coverage 10-D space: a0 = 0.05, d0 = 0.3 10-D space: a0 = 0.05, d0 = 0.5

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700 800 900 0 800 900 0

10-D space: a0 = 0.05, d0 = 0.7 10-D space: a0 = 0.05, d0 = 0.9

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700

800 900 0 800 900 0

Figure

The coverage of a set of trace functions in a dimensional space with a

and d and resp ectivelyEach graph shows a D plot and

a contour plot of the cumulative distribution of distances b etween subspaces

and the trace function The contour plots show the and p ercentiles

of the distribution

Tocheck the accuracy of the sampling metho d we compare the distance distribution

obtained by the sampling metho d to that by the exact metho d Figure shows the

coverage of the same trace functions as in Figure in a D space with a and

d and resp ectively subspaces were sampled randomly for a given

trace segment to obtain the sampling distance distribution As shown in Figures and

the distribution obtained by sampling is very close to that obtained byevaluating all

the subspaces

Next we use sampling to evaluate the coverage of trace functions in highdimensional

space Figures and showthecoverage of trace functions in and

dimensional spaces As b efore each dimension of the highdimensional spaces is divided

into equallength parts Wexa and vary d as and

In each of these three gures various values of d generate similar distributions In a

dimensional space the coverage of trace functions is signicantly worse than that in a

dimensional space For example the p ercentile contour line attens out at a distance of

In contrast in a dimensional space the p ercentile contour line decreases to less than

and approaches The coverage is even worse in higherdimensional spaces In and

dimensional space the p ercentile line only reaches a distance of and resp ectively

Our conclusions of tracefunction coverage are as follows The coverage is insensitiveto

values of a and d in a large range The coverage improves signicantly in the b eginning

of the search and shows little improvement afterwards The b ehavior is worse for higher

dimensional space In highdimensiona l space the coverage is low when the curve length is

short

Summary of TraceFunction Design

In this section wehave studied the coverage of trace functions Weintro duce the coverage

of trace function in a highdimensional space and presentcoverage measurements based on

subspaces A highdimensional space is divided into smaller subspaces and each sub

space is treated as an entity The distance from each subspace to a trace is calculated

The distribution of distances is used as the basis for evaluating coverage In a higher

dimensional space the numb er of subspaces is very large and calculating distances of all

the subspaces is prohibitively exp ensive In this situation sampling is used to get an

approximate distribution A large numb er of subspaces are sampled randomly to givea

go o d approximation

Wehave identied twoways to traverse in a search space One is to divide the space

into subspaces and search one subspace extensively after another The other one is to

search the space from coarse to ne We prefer the second approach due to its advantage 20-D space: a0 = 0.05, d0 = 0.3 20-D space: a0 = 0.05, d0 = 0.5

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700 800 900 0 800 900 0

20-D space: a0 = 0.05, d0 = 0.7 20-D space: a0 = 0.05, d0 = 0.9

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700

800 900 0 800 900 0

Figure

The coverage of a set of trace functions in a dimensional space with a

and d and resp ectivelyEach graph shows a D plot and

a contour plot of the cumulative distribution of distances b etween subspaces

and the trace function The contour plots show the and p ercentiles

of the distribution 40-D space: a0 = 0.05, d0 = 0.3 40-D space: a0 = 0.05, d0 = 0.5

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700 800 900 0 800 900 0

40-D space: a0 = 0.05, d0 = 0.7 40-D space: a0 = 0.05, d0 = 0.9

0.9 0.9 0.5 0.5 0.1 0.1 Distribution Distribution 1 1

0.5 0.5

0 0 20 20 15 15 0 100 10 0 100 10 200 300 Distance 200 300 Distance 400 500 5 400 500 5 Curve Length600 700 Curve Length600 700

800 900 0 800 900 0

Figure

The coverage of a set of trace functions in a dimensional space with a

and d and resp ectivelyEach graph shows a D plot and

a contour plot of the cumulative distribution of distances b etween subspaces

and the trace function The contour plots show the and p ercentiles

of the distribution

of continuous improvement of solution quality as time go es byWehave designed a family

of trace functions using the second approach and have studied the coverage of these

trace functions with dierent parameter values in and dimensional spaces

Our results are summarized as follows

The coverage of trace functions is not sensitive to their parameter values of a and d

in fairly large ranges

The coverage improves quickly as a trace starts to traverse a search space and then

atten out

An exp onentially longer trace is needed to improve the coverage linearly

Trace functions can obtain dense coverage for lowdimensional space but can only

achieve sparse coverage for highdimensional space when the length of the trace is

short

Using trace functions to cover a highdimensional space densely is exp ensive Therefore

in our tracebased global search metho d go o d initial p oints are imp ortant to bring the

trace close to go o d solutions reducing the space that the trace function tries to cover and

shortening the time to nd global optima

In the rest of this thesis we use the trace function in with parameters a

and d in our exp eriments

Sp eed of Trace Function

The search tra jectory of our tracebased global search is aected bythespeedofthe

trace function The system of ODEs as sp ecied in has a limit on ho w fast the system

can change If the trace moves to o fast the system of ODE is not able to keep up with the

trace On the other hand a slow trace is also not go o d b ecause the search tra jectory cannot

cover a large space given the time available Also the trace mayhave little pulling force

resulting in gradient descent b eing the dominantforce

In trace function weintro duce a parameter s to adjust the sp eed of the trace

Smaller s makes the trace move faster in the same amount of time Wewant the trace to

move fast yet not over the limit of the ODE system

Tocheck whether trace T tmoves to o fast we pass the trace through the following ODE

system

xt xt T t

This system generates a tra jectory that tries to follow the trace function If the trace moves

to o fast the generated tra jectory only traverses a much smaller region as compared to the

space the trace passes

Wehave tested dierentvalues of parameter s of the trace function in Wewant

s to b e small in order to give faster sp eed while keeping the output tra jectory close to the

trace Empiricallywehave found that s is go o d and use it in the rest of this thesis

Variable Scaling During Global Search

In our tracebased globalsearch metho d the search tra jectory is generated by the com

bined force of the trace and the lo cal gradient These two forces have to strike a balance for

the tracebased global searchtowork w ell

When the globalsearch stages are mo deled as a system of ODEs in a pair of

co ecients and sp ecify the relativeweights put on the trace and the lo cal descent

t t

Appropriate values of the co ecients are necessary to generate a go o d search tra jectory

If the weight on the trace is to o large then the trace b ecomes dominant and the search

tra jectory follows the trace closely but ignores terrain information On the other hand if

the weight on the gradient is to o large then lo cal descents b ecome dominant and the search

tra jectory is trapp ed in lo cal minima and cannot get out

When we use xed weights gradient descents dominate the trace in some places where

gradients are large Toovercome the diculty caused by large gradients wehave develop ed

an adaptivevariable scaling technique The idea of variable scaling is to stretch out the Backward force due to gradient Trace force is increased significantly > Forward force due to trace through variable scaling.

-10 3 -5 6

Figure

Variable scaling stretches out the variable which reduces the backward force

due to the gradient and increases the forward force due to the trace

ob jective function along the variable dimensions Figure illustrates the eect of variable

scaling that changes the relativevalue of the gradient and the trace

In the left graph of Figure the gradient on the steep slop e contributes a back

ward force while a forward force of is provided by the trace Since the forward force is

less than the backward force the search cannot moveforward The right graph shows the

situation after scaling variable values by After the variable is stretched out the backward

force is reduced by half and the forward force is increased byNow the searchovercomes

the slop e and can move forward

Figure provides more detailed analysis on what happ ens when lo cal gradients dom

inate the trace and b efore and after variable scaling In the left graph a trace function is

going from left to right along one variable dimension x In this example x has values in the

range of and The search tra jectory follows the trace function and hits a lo cal minimum

Function values along the search tra jectory are plotted b elow the search tra jectoryOnthe

left side of the lo cal minimum basin the gradient and the trace p oint in the same direction

causing the search tra jectory to go down the slop e The problem happ ens when the trace

tries to pull the search tra jectory out of the lo cal minimum from the right side of the basin

The gradient force counteracts and dominates the trace force The result is that the search

um tra jectory cannot get out and is trapp ed in the lo cal minim Before Variable Scaling After Variable Scaling

x 123 x 10 20 30

Trace function Trace function

Search Trajectory Search Trajectory

Function value along search trajectory Function value along search trajectory

Trace force Trace force Gradient force Gradient force

Combined force Combined force

Figure

Illustration of gradient and trace force around a lo cal minimum along dimension

x The left graph shows the situation b efore variable scaling in whichthe

search tra jectory is trapp ed in the lo cal minimum The right graph shows the

situation after variable scaling where variable x is scaled up times and the

search tra jectory overcomes the lo cal minimum The graphs show the trace

function going from left to right and the resulting search tra jectory

In the right graph of Figure variable x is scaled up times and has values from

to The trace force is increased by times b ecause the dierence b etween the trace

p osition and the search tra jectory p osition is increased by times At the same time the

gradient magnitude is reduced by times When the trace tries to pull the search tra jectory

out from the right side of the lo cal minimum basin the trace force is larger than the gradient

force and the search tra jectory successfully overcomes the lo cal minimum

Scaling up variables reduces the gradient force and increases the trace force Conversely

scaling down variables increases the gradient and reduces the trace force The next problem

is whether we should scale up or down variables at a certain time during the search and by

howmuch

The trace force is prop ortional to the distance of the current trace p osition and the search

p osition Usually global optimization problems have an explicitly sp ecied region of interests

that corresp onds to the search space The trace function traverses the inside of the search

space So wehave a go o d idea of how large the trace force can b e On the other hand the

magnitude of the gradient of the optimization function is more dicult to estimate Its value

can b e in a very large range varying from one problem to another and changing signicantly

from one search region to another

In our adaptivevariablescaling metho d variables are scaled adaptively based on the

lo cal gradients and the values of the trace function When the gradientismuch larger than

the trace force variables are scaled up to make the gradient smaller than the trace force

When variables are scaled up the trace force is increased prop ortionally b ecause the trace

now is in a larger region At the same time the lo cal gradient is reduce prop ortionall y

b ecause the function value is not changed and the variables have larger values For example

supp ose the function to b e minimized is f x and x is in the range of Scaling x up

by times means that x is mapp ed to the range of and the function b ecomes

f x By doing variable scaling adaptively during the search the problem caused bydeep

lo cal minima and large gradients can b e eectively solved

Let us illustrate the adaptivevariablescaling metho d using a simple example based on

minimizing a dimensional Grewanks function Figure shows a trace and the

search tra jectories with and without adaptivevariable scaling on a D con tour plot of a

p ortion of this ob jective function A darker color in the contour plot represents a smaller

function value The left graph shows the sup erimp osed trace function starting from the

p osition marked as x near the center of the graph The starting p oint is inside a lo cal

minimal basin The trace function is evaluated for time units from t to

The middle graph shows the search tra jectory after one stage of the tracebased global

search without adaptivevariable scaling The tra jectory is obtained byevaluating the system

of ODEs from t to using an ordinary dierential equation solver LSODE

Because the gradient force is much larger than the trace force the search tra jectory is

trapp ed inside a lo cal minimal region 0.15 0.16 0.17 0.15 0.16 0.17 0.15 0.16 0.17 0.19 0.19 0.19 0.19 0.19 0.19

0.18 0.18 0.18 0.18 0.18 0.18

X X X

0.17 0.17 0.17 0.17 0.17 0.17

0.16 0.16 0.16 0.16 0.16 0.16

0.15 0.16 0.17 0.15 0.16 0.17 0.15 0.16 0.17

Figure

The trace function and search tra jectories after one stage of the tracebased

global search with and without adaptivevariable scaling on the D Grewanks

function The D contour plots of Grewanks function are shown in all three

graphs A darker color in the contour plot represents a smaller function value

The left graph shows the trace function starting from the p osition marked x

The middle graph shows search tra jectory without variable scaling The right

graph shows search tra jectory with adaptivevariable scaling

The right graph of Figure shows the search tra jectory generated using adaptive

variable scaling In this example the tra jectory is not very smo oth b ecause variables are

not scaled continuously Instead variable scaling is p erformed every t timeintervals

After every t time unit the gradient of the currentsearchpointisevaluated If the gra

dient along some variable dimensions are to o large then these variables are scaled up The

variables are scaled separately with dierent scaling factors As shown in the right graph

due to the eect that adaptivevariable scaling reduces large gradients the search tra jectory

overcomes the lo cal minima and follows the trace function nicely

Note that b esides variable scaling function scaling is also helpful in mo difying function

and gradientvalues to overcome lo cal minima For example when the ob jective function

takes on values in a large range and has large gradients function transformations suchas

logarithmic or sigmoidal transformation can b e applied to compress the function values

Numerical Metho ds for Solving the Dynamic Systems of Equations

Each stage of the tracebased global search is mo deled by a system of ordinary dierential

equation ODEs The search tra jectory of each stage is generated by solving the system of

ODEs as an initial value problem IVP

Not all initial value problems can b e solved explicitly Oftentimes it is imp ossible to nd

a closedform solution Manynumerical metho ds havebeendevelop ed to approximate the

solution Widely used metho ds include Eulers metho d RungeKutta metho ds predictor

corrector metho ds and implicit metho ds Next we briey intro duce IVP and presentsome

p opular numerical metho ds

A general IVP has the following form

y tf t y with y y

In numerical metho ds a set of p oints ft y g is generated as an approximation of the

k k

solution ie y t y Because the solution is approximated at a set of discrete p oints

k k

these metho ds are called dierence methods or discrete variable methods

A simple singlestep metho d has the form y y ht y for some function where

k k k k

t y istheincrement functionThevalue h is called the step size Singlestep metho ds

k k

use only the information from one previous p oint to compute the next p oint

The Euler metho d b elongs to the class of singlestep metho ds in which the increment

function is f t y The Euler metho d is simple and easy to understand However it has

k k

limited usage b ecause larger errors are accumulated as the search pro ceeds

Another family of singlestep metho ds are RungeKutta metho ds which are widely used

in solving IVPs The following RungeKutta metho d of order is the most p opular

y y k k k k

k k

where

k hf t y

k k

k hf t hy k

k k

k hf t hy k

k k

k hf t h y k

k k

RungeKutta metho ds are go o d general purp ose metho ds b ecause they are quite accurate

stable and easy to program

Singlestep metho ds are selfstarting in the sense that p oints can b e computed one byone

starting from the initial value Multistep methods use several prior p oints in the calculation

of one p oint They are not selfstarting A desirable feature of a multistep metho d is that

the lo cal error can b e determined and a correction term can b e included which improves the

accuracy of the answer in each step Also adaptive stepsize adjustment can b e accomplished

It is p ossible to determine if the step size is small enough to obtain an accurate value for the

current p oint yet large enough that unnecessary and timeconsuming calculations can b e

eliminated An example of a multistep metho d is the AdamsBashforthMoulton metho d

The metho ds we describ es ab oveareexplicit methods They use one or multiple previous

p oints to calculate the curren t p oint Explicit metho ds have a limited stability region from

which the step size is chosen A larger stability region can b e obtained by using information

at the current time whichmakes the metho d implicit The simplest examples is the backward

Euler metho d

y y hf t y

k k k k

This metho d is implicit b ecause f is evaluated with the argument y b efore the value of

k

y is known The value of y is often determined iteratively by metho ds such as Newtons

k k

metho d or xedp oint iteration

Implicit metho ds have a signicantly larger stability region Much larger steps maybe

taken and thus attain much higher overall eciency than explicit metho ds despite requiring

more computations p er step Also certain typ es of implicit metho ds are much more eective

than explicit metho ds in solving sti ODEs

A system of ODEs is sti if its Jacobian matrix J f y has at least one eigenvalue

i j

with a large negative real part and the solution is slowly varying on most of the interval of

integration Stiness is shown as the convergence of solution curves is to o rapid

All commonly used formulas for solving sti IVPs are implicit in some sense Implicit

metho ds solve a linear or nonlinear system in each step of the integration A go o d starting

guess for the iteration in solving y can b e obtained from an explicit formula using the

k

explicit and implicit formulae as a predictorcorrector pair

One of the most p opular pairs of multistep metho ds is the AdamsBashforthMoulton

metho d It uses the explicit fourthorder AdamsBashforth predictor

h

y y y y y y

k k

k k k k

and the implicit fourthorder AdamsMoulton corrector

h

y y y y y y

k k

k k k k

Another family of implicit multistep metho ds are the backward dierentiation formulas

One example is

y y y hy y

k k k k

k

This typ e of metho ds are particularly go o d for solving sti ODEs

After briey reviewing existing numerical metho ds for solving IVPs wecomebackto

solving the IVPs mo deling the globalsearch stages of our tracebased search Wehave used

the explicit Euler metho d the AdamsBashforthMoulton metho d and a metho d based on

backward dierentiation formulas

In applying the Euler metho d to solve a system of ODEs dened in wehavethe

following iterative equation

xt t xt t r f xt xt T t

g x t

where t is the step size A large t causes a large stride of variable mo dication p ossibly

resulting in instability On the other hand a small t means a longer computation time for

traversing the same distance When the exact search tra jectory of the globalsearchphase

is not critical the Euler metho d can obtain go o d results The Euler metho d runs faster and

is applied to solve larger problems

The more sophisticated numerical metho ds wehave used are implemented in a software

package LSODE which is the Livermore Solver for Ordinary Dierential Equations

LSODE solves the system of ODEs to within a prescrib ed degree of accuracy eciently

using adaptive stepsize adjustments LSODE implements b oth the Adams metho d and the

metho d based on backward dierentiation formulas Most of the time wehave used the

metho d based on backward dierentiation formulas in case the ODE system is sti We

have also used the Adams metho d for designing QMF lter banks

LSODE obtains high precision solutions in solving IVPs When the function to b e op

timized is smo oth LSODE is accurate and usually runs fast as well However when the

optimized function is rugged or when there is noise in function and gradientevaluations

LSODE may progress slowly When the number of variables is large LSODE may also

b ecome computationally exp ensive

Therefore the Euler metho d has advantages in situations in which the function to b e

optimized is rugged has noisy values and has large numb er of dimensions The execution

time of the Euler metho d can b e much faster than more sophisticated metho ds and is

relatively insensitive to noise Its disadvantage is that it has larger errors whichmaycause

the search tra jectory to div erge from the theoretical tra jectory Also it is not easy to select

the appropriate step size in the Euler metho d

Novel A Prototyp e Implementing Prop osed Metho ds

In the last chapter wehave presented Lagrangian metho ds to handle nonlinear con

straints in continuous and discrete optimization problems In this chapter wehave pre

sented a multilevel global search metho d to overcome lo cal minima In this section we

present a prototyp e that implements the metho ds prop osed in this and the last chapter In Problem Specifications * Objective function * Constraints

Continuous Problems Discrete Constrained Problems

Identify Promising Search Regions

Coarse-level Heuristic methods Domain global search (greedy methods) knowledge Discrete Lagrangian Method (SA, GA, Multi-starts) * Discrete gradient operator * Updating Lagrange multipliers * Starting points * Tabu search Fine-level Global Search: * Flat-region moves Control parameters * Dynamic scaling trace-based search * search range * Trace function * trace and gradient weights * LSODE * step size * Variable scaling * accuracy control

Local Search * Unconstrained problems * Constrained problems - gradient descent - Lagrangian methods - quasi-Newton’s with adaptive control - conjugate gradient

Solutions

Figure Organization of the Novel global searchmethod

the prototyp e our prop osed metho ds are put together to solve nonlinear unconstrained and

constrained continuous and discrete problems The prototypeiscalledNovel which stands

for Nonlinear Optimization Via External Lead

Figure shows the organization of the Novel prototyp e Users sp ecify an ob jective

function and a constraint set as well as some control parameters Novel solves discrete

constrained problems using discrete Lagrangian metho ds Continuous problems are solved

using a combination of global and lo cal searches We presenteachcomponentofNovel in

detail next

Continuous Metho ds

For nonlinear continuous unconstrained optimization problems the optimal solutions

of an ob jective function are solved by global and lo calsearch metho ds For constrained

problems constraints are handled by Lagrangian metho ds Inequality constraints are rst

converted to equality constraints using the slackvariable or the MaxQ metho d Then the

combination of the ob jective function and the constraints forms a Lagrangian function

Finally global and lo cal searches are p erformed on the Lagrangian function to nd go o d

solutions

The search space of a continuous optimization problem is usually large Identifying

promising search regions and concentrating the search in a smaller subspace can eectively

reduce the computation time in nding go o d solutions Promising regions are represented

as initial p oints that are fed into the nelevel global searchofNovel

In our optimization metho d promising regions are identied through three ways

domain knowledge heuristic metho d and coarselevel global search For knowledge

rich applications domain knowledge provides reliable information ab out promising search

regions F or example in feedforward neuralnetwork learning problems optimal weights

usually have small values and optimal solutions are not far from the origin Therefore the

subspace around the origin is a promising search region In designing lter banks we try

to improve the b est existing solutions Thus the subspaces around existing solutions are

promising search regions

In applications that have little or no domain knowledge some typ es of searchmethods

are applied These include heuristic search metho ds and generalpurp ose search metho ds

Heuristic search metho ds including various greedy metho ds use heuristic strategies to nd

promising regions For instance a greedy metho d can rst nd the optimal value for each

dimension while keeping the values of other variables constant Then it combines the optimal

value for each dimension to form a suboptimal solution of the original multidimensional

problem

Generalpurp ose globalsearch metho ds can b e used in the coarselevel global searchto

nd promising regions In Novel wehave used existing globalsearch metho ds that include

simulated annealing genetic algorithms and multistart of lo cal descents In addition we

have also develop ed a tra jectorybased metho d as a variation of multistart of descents

Novel consists of a unique nelevel global search the tracebased global searchmethod

Our tracebased search uses a problemindep endent trace to lead the search and achieves a

balance b etween global and lo cal searches During the search our tracebased search relies on

two counteracting forces the lo cal gradient force that drives the search to a lo cal optimum

and a deterministic trace force that leads the search out of lo cal minima Go o d starting

p oints identied in the globalsearch phase are used in the lo cal search to nd go o d lo cal

minima

In Novel the trace function for an ndimensional space is dened in and the

default parameter values are a d s and k n The globalsearch

stage of our tracebased search is mo deled as a system of ordinary dieren tial equations

ODEs For an unconstrained problem with ob jective function f x the system of ODEs

is

dxt

r f xt xt T t

g x t

dt

where and are co ecients sp ecifying the weights for descents in lo cal minimum regions

g t

and for exploration of broader space The default value of each is The default number of

globalsearch stages is An adaptivevariable scaling technique is implemented to overcome

deep lo cal minima with large gradients

For a constrained optimization problem with Lagrangian function Lx the system of

ODEs is as follows

dxt

r Lxtt xt T t

g x t

dt

dt

r Lxtt

dt

where and are co ecients The search p erforms descents in the originalva ria ble x

g t

space and ascents in the Lagrangemultipli er space A trace function T t brings the

search out of lo cal minima in the originalvari abl e space

The system of ODEs is solved by the Euler metho d or LSODE which is the Liv

ermore Solver for Ordinary Dierential Equations LSODE solves the system of ODEs to

within a prescrib ed degree of accuracy using adaptive step size adjustments LSODE imple

ments b oth Adams metho d and the metho d based on backward dierentiation formulas

Finelevel global searches discover lo calminima basins and lo cal searches nd lo cal min

ima Finelevel global searches provide initial p oints for lo cal searches Initial p oints for lo cal

searches are selected based on search tra jectories generated by each stage of the tracebased

global search Novel implements two heuristics to select the initial p oints the b est solutions

in p erio dic time intervals or the lo cal minima in each tra jectory

Existing metho ds of lo cal descents are applied in lo cal searches For unconstrained

optimization problems they are gradient descent quasiNewtons and conjugate gradient

metho ds Problemsp ecic metho ds suchasbac kpropagation in neuralnetwork training

application can also b e applied

For constrained optimization problems Novel applies the Lagrangia n metho d with adap

tive control to p erform lo cal search For constrained optimization problems with Lagrangian

function Lx the system of ODEs is

dxt

r Lxtt

x

dt

dt

r Lxtt

dt

where and are co ecients A combination of the ob jective function and the constraints

g t

forms the Lagrangian function Lx By doing descents in the original variable x space

and ascents in the Lagrangemultipli er space the system converges to a saddle p oint

corresp onding to a lo cal minimum of the original constrained optimization problem Based

on the search prole the Lagrangian metho d adjusts the weights of the ob jective and the

constraints to achieve faster and more robust convergence

In the app endix we demonstrate Novels ability of nding go o d solutions by solving some

nonlinear simpleb ounded optimization problems We compare the exp erimental results

obtained byNovel with those obtained bymultistart of gradient descent

Discrete Metho ds

Novel applies discrete Lagrangian metho ds to solve discrete constrained problems For

the following equality constrained problem

n

min f x

xD

sub ject to hx

where x x x x hx h xh x h x and D is a subset of integers

n m

n n

ie D I the Lagrangian function is

T

Lxf x hx

The general structure of DLM to solve this problem is shown in Figure In discrete

space Lx is used in place of the gradient function in continuous space DLM p erforms

x

descents in the original variable space of x and ascents in the Lagrangemultiplier space of

We call one iteration as one pass through the while lo op In the following we describ e

the features of our implementations of DLM

a Descent Strategies There are twoways to calculate Lx greedy and hill

x

climbing eachinvolving a search in the neighb orho o d of the currentvalue of x

In a greedy strategythevalue of x leading to the maximum decrease in the Lagrangian

function value is selected to up date the currentvalue Therefore all values in the vicinity

Set initial x and

while x is not a saddle p oint or stopping condition has not b een reached

up date x x x Lx

x

if condition for up dating is satised then

up date c hx

end if

end while

Figure A generic discrete Lagrangian metho d

need to b e searched every time leading to high computation complexityInhil lclimbing the

rst value of x leading to a decrease in the Lagrangianfunctio n value is selected to up date

the current x Dep ending on the order of search and the numberofpoints that can improve

hillclimbing strategies are generally less computationally exp ensive than greedy strategies

Hence we usually use the hillclimbing strategy in our exp eriments

b Updating The frequency in which is up dated aects the p erformance of a

search The considerations here are dierent from those of continuous problems In a discrete

problem descents based on discrete gradients usually make small changes in Lxineach

up date of x b ecause only one variable changes Hence should not b e up dated in each

iteration to avoid biasing the search in the Lagrangemultipl ier space of over the original

variable space of x

In our implementation is up dated in two situations One is when the searchreaches

a lo cal minimum another is decided by parameter T that sp ecies the numb er of iterations

b efore is up dated T can b e changed dynamically according to the value of Lxor

when LxWhen Lx a lo cal minimum in the original variable space

x x

is reached and the search can only escap e from it by up dating

A parameter c in the term c hx in Figure controls the magnitude of changes

in c is the weight put on constraints while the weight of the ob jective is always

In general c can b e a vector of real numb ers allowing nonuniform up dates of across

or simplicitywehave used a constant c in dierent dimensions and p ossibly across time F

our implementation for all s

For some problems the constraints hx are only evaluated to nonnegativevalues

such as in satisabili ty problems This makes in Figure nondecreasing For dif

cult problems that require millions of iterations values can b ecome very large as the

search progresses Large s are generally undesirable b ecause they cause large swings in

the Lagrangianfunctio n value and make the search space rugged and dicult to nd go o d

solutions

Toovercome this problem wehavedevelop ed strategies to reduce dynamicallyOne

metho d is to reduce p erio dically every certain numb er of iterations More sophisticated

metho ds reduce adaptively based on the proles of the ob jective constraints and Lagrange

multiplier values

c Starting Points and RestartsIncontrast to randomstart metho ds relying on restart

ing from randomly generated initial p oints to bring a search out of a lo cal minimum DLM

continue to evolve without restarts until a saddle p oint is found This avoids restarting from

a new starting p oint when a search is already in the proximity of a go o d lo cal minimum

Another ma jor advantage of DLM is that there are very few parameters to b e selected or

tuned by users including the initial starting p oint This makes it p ossible for DLM to always

start from the origin or from a random starting p oint generated by a xed random seed and

ndasaddlepoint if one exists The initial of is alw ays

d Plateaus in the Search Space In discrete problems plateaus with equal Lagrangian

values exist in the search space Discrete gradient op erator mayhave diculties in plateaus

b ecause it only examines adjacentpoints of Lx that dier in one dimension Hence it

may not b e able to distinguish a plateau from a lo cal minimum Wehave implemented two

strategies to allow a plateau to b e searched

First we need to determine when to change when the search reaches a plateau As

indicated earlier should b e up dated when the search reaches a lo cal minimum in the

Lagrangia n space However up dating when the search is in a plateau changes the surface of

the plateau and may make it more dicult for the search to nd a lo cal minimum somewhere

inside the plateau Toavoid up dating immediately when the search reaches a plateau we

have develop ed a strategy called at move This allows the searchtocontinue for some time

in the plateau without changing so that the search can traverse states with the same

Lagrangia nfunctio n value Howlongatmoves should b e allowed is heuristic and p ossibly

problemdep endent

Our second search strategy is to avoid revisiting the same set of states in a plateau In

general it is impractical to remember every state the search visits in a plateau due to the

large storage and computational overheads In our implementation wehavekept a tabu list

to maintain the set of variables ipp ed in the recent past and to avoid ipping a

variable if it is in the tabu list

Potential for Parallel Pro cessing

Parallel pro cessing is essential in solving large nonlinear optimization problems These

problems are computationally intensive b ecause the numb er of lo cal minima and the number

of variables can b e large the ob jectives constraints and derivatives can b e exp ensiveto

evaluate and manyevaluations of the ob jectives constraints and derivatives may b e needed

Also the most p owerful computers in the world are parallel computers and the trend will

continue in the future

In our global search metho d shown in Figure manycomponents can b e executed in

parallel Generally parallelism is found at three levels global search in parallel lo cal

search in parallel and evaluation of ob jective constraints and derivatives in parallel

Our coarselevel global searchinNovel provides initial p oints for our nelev el global

search Parallel search metho ds can b e used in the coarselevel global search such as parallel

genetic algorithms parallel simulated annealing and parallel multistarts of descents Other

heuristic metho ds for coarselevel global search can also b e parallelized For instance in

the tra jectorybased metho d sample p oints are selected along a dimensional tra jectory

function over the search space Then descents from these sample p oints which takemuch

more time than selecting the sample p oints can b e done in parallel

Tracebased global searches starting from dierent initial p oints generated by the coarse

level search can b e executed in parallel They are indep endentandhave minimal commu

nication and synchronization overheads In each tracebased global search multiple stages

can b e executed in parallel These stages are cascaded and can b e executed in a pip elined

fashion However since the numb er of stages is not large the amount of parallelism is

limited

Lo cal searches of our optimization metho d start from initial p oints generated by nelevel

global searches They are indep endent and can b e executed in parallel For many application

problems lo cal descents are much more exp ensive In contrast the globalsearch phase is

much faster and can quickly generate many initial p oints for lo cal descents In this case

many descents can b e p erformed in parallel since there is very little communication overhead

among them

There are a large numb er of function and gradientevaluations during the execution of

our global search metho d They make up the ma jorityofthework In many applications

the function or gradients are exp ensivetoevaluate Thus evaluating functions and gradients

at dierent p oints in parallel can signicantly reduce execution time

Overall many asp ects of parallelism can b e explored in our global search metho d Our

metho d can eectively run on parallel computers and achieve near linear sp eedups

Summary

In this chapter wehave presented a global search metho d that o vercomes lo cal minima

in nonlinear optimization problems This metho d consists of a globalsearch phase and a

lo calsearch phase The globalsearch phase is further divided into a coarselevel search and

a nelevel search The coarselevel search identies promising search regions to b e searched

more thoroughly by the nelevel search

Wehave applied existing globalsearch metho ds including simulated annealing genetic

algorithms and multistart of descents in the coarselevel global search These metho ds are

executed for a limited amount of time to coarsely identify promising search regions Wehave

also develop ed a variation of the tra jectory metho ds which selects sample p oints based on

a userprovided tra jectory in the search space The outputs of the coarselevel global search

are used as initial p oints in the nelevel global search

For the nelevel global search wehave prop osed a new globalsearch metho d for contin

uous optimization problems called tracebased global search The tracebased searchuses

a problemindep endent continuous trace to lead the search It relies on twocounteracting

forces the lo cal gradient force that drives the search to a lo cal optimum and a deterministic

trace force that leads the search out of lo cal minima The trace force is expressed in terms

of the distance b etween the current tra jectory p osition and the p osition of the trace It

pro duces an informationb eari ng tra jectory from which sample p oints can b e identied as

starting p oints for the lo cal search

Wehave studied the design of new trace functions and have prop osed trace functions that

search a space from coarse to ne Tracebased search is mo deled by systems of ordinary

dierential equations ODEs Wehavedevelop ed adaptivevariable scaling technique to

overcome the diculty caused by large gradients during global search and enable the search

tra jectory to follow the trace closely

Our tracebased metho d is ecient in the sense that it tries to rst identify go o d starting

p oints b efore applying a lo cal search Due to informed decisions in selecting go o d start

ing p oints our tracebased searchavoids many unnecessary eorts in redetermining al

ready known regions of attraction and sp ends more time in nding new unvisited ones Its

complexity is related to the numb er of regions of attraction rather than the number of

dimensions

Our tracebased metho d is also ecient b ecause it provides a continuous means of going

from one lo cal region to another This avoids problems in metho ds that determine new start

ing p oints heuristically as in evolutionary algorithms When starting p oints are determined

randomly the searchmay miss many lo cal minima b etween two starting p oints or may sp end

to o much time in one region

Our tracebased metho d is b etter than multistart metho ds that p erform blind sampling

as well as random search algorithms such as genetic algorithms and simulated annealing

that do not exploit gradient information in their global search strategies It incurs less

overhead as compared to existing generalized gradient metho ds and other deterministic global

search metho ds It can b e integrated into any existing lo calsearchmethod

In the lo calsearch phase existing lo caldescent metho ds have b een applied Gradient

descent quasiNewtons and conjugate gradient metho ds are ecient for continuous uncon

strained problems For continuous constrained problems we apply Lagrangian metho ds with

adaptive control to achieve faster and more robust convergence

Finallywehave presented a prototyp e that implements our prop osed metho ds for solving

nonlinear optimization problems The prototyp e is called Novel an acronym for Nonlinear

Optimization Via External Lead

Novel combines metho ds for handling nonlinear constraints prop osed in Chapter and

for overcoming lo cal minima prop osed in Chapter and solves constrained and uncon

strained problems in a unied framework Constrained problems are rst transformed into

Lagrangia n functions Then Novel p erforms global searches based on the ob jective functions

of unconstrained problems and the Lagrangian functions of constrained problems overcom

ing lo cal minima in a deterministic and continuous fashion

In the next part of this thesis weshowsomevery promising results in applying Novel to

solve realworld applications These applications are a neural network learning problems

formulated as unconstrained optimization problems b digital lterbank designs formulated

as nonlinear constrained optimization problems c satisabili ty and maximum satisabil

ity problems and d multiplierless lterbank designs formulated as discrete optimization

problems

UNCONSTRAINED OPTIMIZATION IN NEURALNETWORK LEARNING

In this chapter we apply Novel in feedforward neuralnetwork learning In general such

learning can b e considered as a nonlinear global optimization problem in which the goal is

to minimize a nonlinear error function that spans the space of weights and to lo ok for global

optima in contrast to lo cal optima First weintro duce articial neural networks present

the formulation of neuralnetwork learning as an optimization problem and survey existing

learning algorithms Then using ve b enchmark problems we compare Novel against some

of the b est global optimization algorithms and demonstrate its sup erior improvementin

p erformance

Articial Neural Networks

The concept of articial neural networks ANNs was inspired by biological neural net

works Biological neurons b elieved to b e the structural constituents of the brain are much

slower than silicon logic gates The brain comp ensates for the relatively slower op eration

of biological neurons byhaving an enormous numb er of them that are massively intercon

nected The result is that inferencing in biological neural networks is muchfasterthanthe

fastest computer In a word a biological neural network is a nonlinear highly parallel device

characterized by robustness and fault tolerance It learns by adapting its synaptic weights

to changes in the surrounding environment

Table Von Neumann computer versus biological neural systems

Von Neumann Computer Biological Neural System

Pro cessor Complex Simple

High sp eed Low sp eed

One or a few A large number

Memory Separate from a pro cessor Integrated into pro cessor

Lo calized Distributed

Not content addressable Content addressable

Computing Centralized Distributed

Sequential Parallel

Stored programs Selflearning

Reliability Very vulnerable Robust

Exp ertise Numerical computation and Perceptual problems

symb olic manipulations

Op erating Welldened Po orly dened

environment Wellconstrained Unconstrained

Mo dern digital computers outp erform humans in the domain of numeric computation

and symb ol manipulation However humans can solve complex p erceptual problems at a

much higher sp eed and broader extent than the worlds fastest computer The biological

brain is completely dierentfromthevon Neumann architecture implemented in mo dern

computers Table from summarizes the dierence

ANNs attempt to mimic some characteristics of biological neural networks In ANNs in

formation is stored in the synaptic connections Each neuron is an elementary pro cessor with

primitive op erations like summing the weighted inputs and then amplifying or thresholding

the sum

Figure shows a binary threshold unit prop osed by McCullo ch and Pitts as an articial

neuron This neuron computes a weighted sum of n input signals x i n and passes

i

it through an activation function a threshold function in this example to pro duce output

y

n

X

y w x

i i

i

Input Hidden Output

x

w

x

w

y

w

n

x

n

A neuron

A feedforward neural network

Figure

McCullo chPitts mo del of a neuron left and a threelayer feedforward neural

network right

where is a step function and w is the synapse weight asso ciated with the ith input

i

McCullo ch and Pitts proved that suitably chosen weights let a synchronous arrangementof

such neurons p erform universal computations The McCullo chPitts neuron has b een

generalized in manyways For example other than threshold functions activation functions

havetaken the forms of piecewise linear sigmoid or Gaussian functions

A neural network is characterized by the network top ology the connection strength b e

tween pairs of neurons weights the no de prop erties and the stateup dating rules The

up dating rules control the states of the pro cessing elements neurons

According to their architectures neural netw orks can b e classied into feedforward and

feedbacknetworks In feedforward neural networks connections b etween no des neurons

are unidirectional from the input no des to the output no des through p ossibly several

layers of hidden no des The most p opular family of feedforward networks are the multilayer

perceptronsinwhich neurons are organized into layers that have unidirectional connections

between them Figure showsatypical structure of this typ e of network On the other

hand in feedback neural networks there are connections in the reverse direction ie from

the output no des to the input no des These feedback connections feed information back from

output no des to input or hidden no des

An alternative classication of neural networks is based on learning paradigms bywhich

neural networks are classied into sup ervised reinforcement and unsup ervisedlearning

networks

In sup ervised learning the target outputs are known ahead of time and the error b etween

the desired outputs and the actual outputs during learning is fed back to the network to

adjust its weights Reinforcement learning diers in that its feedback only distinguishes

between correct and incorrect outputs but not the amount of error In unsup ervised learning

no target output is supplied during training and the network selforganizes and classies

the training data on its own

In sup ervised learning of b oth feedforward and feedback neural networks the learning can

b e formulated as an unconstrained nonlinear optimization problem Despite various forms

of error functions such as meansquare error arctangent error and crossentropy error the

function to b e optimized is usually highly nonlinear with many lo cal minima

In reinforcement learning the gradients of the optimization criterion with resp ect to

network weights is estimated instead of computed Since this information is inaccurate and

noisy learning is usually carried out by sto chastic learning metho ds and do es not involve

traditional numerical optimization

Unsup ervised learning metho ds use a presp ecied internal optimization criterion to cap

ture the quality of a neural networks internal representation For certain problems whose

outputs are known ahead of time for instance enco ding and clustering unsup ervised learn

ing can b e formulated as optimization problems For feedbacknetworks such as Hopeld

networks the network designed is for a sp ecic function it implements and do es not involve

optimization

Feedforw ard NeuralNetwork Learning Problems

Articial neural nets have found many successful applications ranging from ngerprint

recognition to intelligent cameras Neuralnetwork design learning is a dicult task and

current design metho ds are far from optimal

Learning in neural networks whether sup ervised reinforcement or unsup ervised is

accomplished by adjusting weights b etween connections in resp onse to the inputs of training

patterns In feedforward neural networks this involves mapping from an input space to an

output space That is output O of a neural network can b e dened as a function of inputs

I and connection weights W O I W where represents a mapping function

Sup ervised learning involves nding a go o d mapping function that maps training pat

terns correctly as well as to generalize the mapping found to test patterns not seen in

training This is usually done by adjusting weights W while xing the top ology and

the activation function In other words given a set of training patterns of inputoutput

pairs fI D I D I D g and an error function WID learning strives to

m m

minimize learning error E W

m

X

min E W min WI D

i i

W W

i

One p opular error function is the squarederror function in which WI D I W

i i i

D Since this error function is nonnegative ie E W if there exists W such that

i

E W then W is a global minimizer otherwise the W that gives the smallest E W

is the global minimizer The quality of a learned network is measured by its error on a given

set of training patterns and its generalization error on a set of test patterns

Sup ervised learning can b e considered as an unconstrained nonlinear minimization prob

lem in which the ob jective function is dened by the error function and the search space

is dened by the weight space Unfortunately the terrain mo deled by the error function in

its weight space can b e extremely rugged and has many lo cal minima This phenomenon

is illustrated in Figure whichshows two graphs of a D error surface pro jected to

two dierent pairs of dimensions around a global minimum The left graph shows a rugged

terrain with a large numb er of small lo cal minima in the weight space whereas the rightone

sho ws a distinctive terrain with large at regions and steep slop es This D error surface

corresp onds to a ve hiddenunit network that has b een trained to solvethetwospiral prob

lem to b e discussed in Section Obviously a search metho d that cannot escap e from

a lo cal minimum will have diculty in nding the global minimum in the rugged weight

space

Toseehow the network structure can aect the solution quality consider the fact that

one hiddenlayer p erceptron can p erform arbitrary mappings as long as the numb er of hidden

units is large enough The capacity of a neural network and the number of weights increase as

the numb er of hidden units increases This means more global minimizers exist in the error

surface and lo calsearch metho ds such as gradient descent can nd them more readilyOn

the other hand when the numb er of hidden units is small there are fewer global minimizers

Moreover the error surface can b e extremely rugged making it dicult for a descent metho d

from a random starting p ointtondagoodsolution Toovercome this problem more

powerful global search metho ds are needed

There are many b enets in using smaller neural networks First they are less costly to

implement and are faster b oth in hardware and in software implementations Second they

generalize b etter b ecause they avoid overtting the weights to the training patterns This

happ ens b ecause we are using a lowerdimensional function to t the training patterns

Existing Metho ds for Feedforward Neural Network Learning

As we know the sup ervised learning of feedforward neural networks is an unconstrained

continuous optimization problem The task is to nd the appropriate values of connection

weights so that a given error function is minimized Optimization problems are classied into

unimo dal and multimo dal dep ending on whether there is only one or there are more than

one lo cal minima in the searc h space In general neural network learning is a multimo dal

nonlinear minimization problem with many lo cal minima

Our study of error functions in the sup ervised learning of neural networks reveals the

following features

Flat regions may mislead gradientbased metho ds

There maybemany lo cal minima that trap gradientbased metho ds

Deep but sub optimal valleys may trap any search metho d

Gradients may dier by many orders of magnitude making it dicult to use gradients

during a search

A go o d search metho d should therefore havemechanisms a to use gradient information

to p erform lo cal search and b e able to adjust to changing gradients and b to escap e from

a lo cal minimum after getting there

Learning algorithms of neural networks nd their ro ots in optimization metho ds which

are classied into lo cal or global optimization Lo cal optimization algorithms suchas

gradientdescent and Newtons metho d nd lo cal minima eciently and workbestinuni

mo dal problems They execute fast but converge to lo cal minima that could b e muchworse

than the global minimum Global optimization algorithms in contrast employ heuristic

strategies to lo ok for global minima and do not stop after nding a lo cal minimum

Next we briey review lo cal and globalopti mizati on algorithms in neuralnetwork learning

Learning Using Lo cal Optimization

Many lo caloptimi zatio n algorithms such as gradient descent secondorder gradient de

scent quasiNewton and conjugate gradient metho ds have b een adapted to the learning of

neural networks The p opular backpropagation BP algorithm is a variation of gradient

descent algorithms Other gradientdescenttyp e algorithms include BP through

time recurrent BP steep est descent and other variations

Secondorder gradient descents including BP with momentum and BP with dynamic learn

ing rate have b een develop ed to improve the convergence time of BP

Also quasiNewtons metho ds have shown go o d sp eedup o ver BP whereas conju

gate gradient metho ds are among the fastest Other heuristic metho ds

that have fast learning sp eed include metho ds that learn layer bylayer iterativemeth

o ds hybrid learning algorithms and metho ds develop ed from the eld of optimal

ltering Recurrent neural networks have also b een trained by gradientbased

metho ds

Lo cal minimization algorithms have diculties when the surface is at gradient close to

zero when gradients can b e in a large range or when the surface is very rugged When

gradients can vary greatly lo cal searchmay progress to o slowly when the gradient is small

and mayoversho ot when the gradient is large When the error surface is rugged a lo cal

searchfromarandomlychosen starting p oint will likely converge to a lo cal minimum close to

the initial p oint and a solution worse than the global minimum Moreover these algorithms

usually require cho osing some parameters as incorrectly chosen parameters may result in

slow convergence and p o or quality solutions

Learning Using Global Search

In order to overcome lo cal minima in the search space and nd b etter solutions global

minimization metho ds have b een develop ed They use lo cal search to determine lo cal min

ima and fo cus on bringing the search out of a lo cal minimum once it gets there Many

global optimization algorithms have b een applied in neuralnetwork learning Examples of

deterministic metho ds are tra jectory metho ds and branchandb ound metho ds

The deciency of these metho ds is that they can only solve small networks

Sto chastic global optimization metho ds are more successful in solving the learning prob

lem of neural networks These include random search algorithms simulated

annealing and genetic algorithms They haveshown improved

p erformance with resp ect to lo cal optimization algorithms However their execution time

can b e signicantly longer

One thing that needs to b e mentioned is that neural network learning has also b een for

mulated as a combinatorial optimization problem and solved by com binatorial optimization

metho ds such as reactive tabu search

Exp erimental Results of Novel

In this section we present the exp erimental results of Novel in solving ve neural net

work b enchmark problems and show its improvement in p erformance against some of the

Table

Benchmark problems studied in our exp eriments The numb ers of inputs out

puts and training and test patterns are listed in the table

Problems inputs outputs training patterns test patterns

Twospiral

Sonar

Vowel

parity

NetTalk

b est globalopti mizati on algorithms First we present the neural network prob

lems used in our exp eriments Using the twospiral problem as an example we study the

implementation issues of Novel and compare Novel with other global optimization metho ds

Finally we present the summary of exp erimental results on the other b enchmark problems

In general Novel is able to nd b etter results as compared to other global optimization

algorithms in the same amount of time

Benchmark Problems Studied in Our Exp eriments

Table summarizes the b enchmark problems studied in our exp eriments They repre

sent neural network learning problems of dierent size and complexity These problems can

b e obtained from ftpcscmuedu in directory afscspro jectconnectb ench

Twospiral problem This problem discriminates b etween two sets of training

p oints that lie on two distinct spirals in the xy plane Two spirals coil three times around

the origin and around one another Problems like this one whose inputs are p oints on the

D plane enable us to display the output of the network in a D graph The twospiral

problem app ears to b e a very dicult task for backpropagation networks

Figure shows the training and test sets Each data set has two spirals that consist

of inputoutput pairs resp ectivelyEach training or testing pattern represents a p oint

on the corresp onding spiral The neural network that solves the twospiral problem has x x x+ x+ x x+ o o o o* o* o o o x o* * o x+ o x x o* x+ x+ * x o x x x o x o* x+ x+ x+ o* + x o x+ x o o x o o o o x x o* x+ o* o* o* o* + * x+ o o x o o* o* x+ o x o x x x x o x+ o* x+ x+ x+ x+ o* * o o x x x o x x o* o* + x+ x+ o* x+ x+ x x x o o x+ + x x+x o* x o* o o ooooooo xx x o o* x oo*o*o*o*o*o +x + x x o x o oo o o x * x+ o x+ o** *o*o +x o* x o + x oo xxxxxx o x o x * x+ oo* x+x+x+x+x+x+ *o* + o* + * o x o x o xxx o x o + o + o* +x+x+ o* x+ o x ooo x o x o x * x * x o* x+ oo*o* x+x o* +x o* +x o x o x oooooo xx o o +x *o +x o*o*o*o*o*o** +x+ o* x o x x xx x o x o o +x * + * +x+x +x o + o x o oo xxxxxxx x x * o x o*o +x+x+x+x+x+x+x * +x * + x x o o o o +x * +x *o o o* o o o x o o o x x o* o* +x * o* o o* * +x * +x x x o o o x o + +x * o* o* o* +x o o x x x o* +x +x * x o x o x x x x o x o* + o* +x +x +x +x o* + o o x o o* x o x o o o o x o* +x * o* o o o* + x x +x * * +x o x x x x o* +x +x +x +x o o*

o o o* o*

Figure

The training set of the twospiral problem left and the test set of the twospiral

problem on top of the training set right

two realvalued inputs presenting the x and y co ordinates of a p oint and one binary target

output that classies the p oint to one of the two spirals

In general we determine the correctness of a binary output using the criterion

an output is considered to b e if it is in the lower of the realvalued output range if

it is in the upp er and incorrect if it is in the middle of the range

Sonar problem This problem discriminates b etween sonar signals b ounced o a

metallic cylinder and those b ounced o a roughly cylindrical ro ck This is the data set used

by Gorman and Sejnowski in their study of the classication of sonar signals using a neural

network We used the training and test samples in their asp ect angle dep endent

exp eriments The problem has sixty realvalued inputs and one binary output

Vowel recognition problem This problem trains a network to have sp eaker

indep endent recognition of the eleven steadystate vowels of British English The problem

was used byTony Robinson in his PhD thesis to study problems that have no exact solution

and maximize a less than p erfect p erformance of neural networks

This problem has ten realvalued inputs and eleven binary outputs Vowels are classied

correctly when the distance of the correct output to the actual output is the smallest of the

set of distances from the actual output to all p ossible target outputs

parity problem This problem trains a network that computes the mo dulotwo

sum of ten binary digits It is a sp ecial case of the general parity problem in which the task

is to train a network to pro duce the sum mo dulo of N binary inputs The parity problem

has N binary inputs and one binary output The output is if there are an o dd number

of bits in the input and if an even numb er In the parity problem there are

training patterns and no test patterns

NetTalk problem This problem trains a network to pro duce prop er phonemes

given a string of letters as input This is an example of an inputoutput mapping task that

exhibits strong global regularities but also a large numb er of more sp ecialized rules and

exceptional cases The data set of NetTalk was contributed byTerry Sejnowski It contains

English words along with a phonetic transcription for eachword

Wehave used a the same network settings and unary enco ding as in Sejnowski and

Rosenb ergs exp eriments There are inputs and outputs b

most common English words as the training set and the entire data set as the test set and

c the b estguess criterion an output is treated as correct if it is closer with the smallest

angle to the correct output vector than to any other phoneme output vector There are a

total of training patterns

Exp erimental Results on the Twospiral Problem

The twospiral problem has b een used extensively in testing dierenttyp es of neural

networks and their corresp onding learning algorithms Published results include the learning

of feedforward networks using BP CASCOR and pro jection pursuit learning The

smallest network is b elieved to have nine hidden units with w eights trained by CASCOR

In our exp eriments wehave used feedforward networks with shortcuts see Figure

Each hidden unit is ordered and lab eled by an index and has incoming connections from all

input no des and from all hidden units with smaller indexes The activation function is an

Outputs

Output neuron

g

Activation function f

g x

x hidden unit

hidden unit k

exp

k hidden units

where is the sigmoid gain

with shortcuts

g

g

g

from neuron i

x is inner pro duct of outputs of

to neuron j

other neurons and incoming weights ji

w

w

Input neurons

Inputs

Figure Structure of the neural network for the twospiral problem

x

asymmetric sigmoidal function g x e where is the sigmoid gain and x is

the inner pro duct of the outputs of other neurons and the incoming weights

Wehave xed the search range as in each dimension and havevaried from to

A larger has the eect of compressing the search regions

The ob jective function to b e minimized is the total sum of squared error TSSE f x

E x dened in where the variable vector x represents the weights W The error

function f x is computed based on the training patterns during learning Test patterns

are not used in the ob jective functions in learning and are only used to check the results

obtained

The globalsearch phase of Novel consists of three cascaded stages Each stage is mo deled

as a system of ODEs in Eq which is restated as follo ws

dxt

r f xt xt T t

g x t

dt

where and are constant co ecients and are the same for all stages The systems of

g t

ODEs is solved by LSODE using the metho d based on a backward dierentiation formula

The lo calsearch phase of Novel p erforms gradient descents The gradient descent is also

Table

Results of using NOVEL with the LSODE solver to train four and ve hiddenunit

neural networks for solving the twospiral problem

Sigmoid HiddenUnit Networks weights

Gain Co ecients Training Testing

TSSE Correct Correct

g t

Maximum time units executed

Avg CPU timetime unit min

HiddenUnit Networks weights

Co ecients Training Testing

TSSE Correct Correct

g t

Maximum time units executed

Avg CPU timetime unit min

mo deled as a system of ODEs

dxt

r f xt

x

dt

and solved by solver LSODE All our exp eriments were carried out on Sun SparcStation

mo del MHz workstations

Wehave tried various combinations of algorithmic parameters and in Novel In

t g

Table we show the results in training and hiddenunit networks to solvethetwospiral

problem The sigmoid gain takes on values and while the weight co ecients

and are scaled up and down around baseline values of and resp ectively

t g

When and are b oth scaled down from the baseline values by the same magnitude

g t

for instance and the sp eed of trace movementisslowed down A slower

g t

trace takes more logical time and usually more CPU time to traverse the same distance than

a faster one One the other hand when and are b oth scaled up for instance

g t g

and the trace moves faster The danger is that if the trace moves to o fast the

t

search tra jectory generated from the ODE dynamic system will not b e able to keep up with

the trace Therefore and should b e chosen appropriately so that the search tra jectory

g t

moves at a fast sp eed but under the sp eed limit of the systems of ODEs

After trying various values of and for the and hiddenunit networks we found

g t

that the combination of and works well These parameter values

g t

were then used in our exp eriments to solve the learning problems of and hiddenunit

neural networks

Table sho ws that multiple solutions with correctness on the training patterns

are found for the hiddenunit network using dierent parameter values These solutions

have dierent correct p ercentages on testing patterns For the hiddenunit network the

solutions do not reach correctness on the training set Generally larger networks are

easier to b e trained to solve the training patterns correctly b ecause there are more

global minimizers in the search space From a function tting p oint of view a neuralnetwork

learning problem is to t a set of training data by nding the appropriate weights of a neural

network A function with more variables corresp onding to a neural network with more

weights has more ways to t a given training set than a function with fewer variables

Novel successfully trained ve hiddenunit networks in less than time units Training

four hiddenunit networks is more dicult After running Novel for time units which

was hours of CPU time on a SparcStation we found a solution with TSSE of

and correct on the training set Using this solution as a new starting p oint we executed

Novel for another hours and found a solution that is correct Figure shows

the four hiddenunit neural network trained byNovel that solves the twospiral problem

The second gure in the rst row of Figure shows how the b est four hiddenunit network

classies the D space Outputs

w25 Output neuron

w21 w23 w24 w22

w15 w10 w16

Hidden units w3 w6 w7 w11 w12w17 w18

w19 w4 w20 w1 w8 w5 w9 w14

w13 w2 Input neurons

Inputs

Synapse weights w w w w

E E

B C

E E E

B C

B C

E E E

B C

B C

B C

B C

B C

B C

B C

B C

B C

B C

B C

A

Figure

A four hiddenunit neural network trained byNovel that solves the twospiral

problem x x x x x x x x 6 x 6 x 6 x 6 x o x x o o o o x x x x x x x x x o o o x o o x x x x o o x x x o x x x x x x x x x x x x x x o x o x x x x x x x x o x x 4 x o 4 x x 4 x x 4 x o x o o o o x x x x x x x x x x x x x x x x x o o o o x x o o o x x x x x x x x x o o o x x o o x x x x x x x o o o x x x x x x x x x x x x x x o x x x x o o x x o x x x x x x x x x x x x x x x x o o x x o x x 2 x x x o o 2 x x x x x 2 x x x x x 2 x x x o o o x ooooo x x x x xxxxx x x x x xxxxx x x o x ooooo x x o x o oo x o x x x x xx x x x x x x xx x x x o x o oo x o x x o o o x x o x x x x x x x x x x x x x x x o o o x x o x o xxxxxx o o x x xxxxxx x x x x xxxxxx x x x o xxxxxx o o o xx o x x xx x x x xx x x o xx o x 0 o x o x o x oo x o x o x 0 x xxx xxx xxxx x x x 0 x xxx xxx xxxx x x x 0 o x o x o x oo x o x o x x o x ooooooo x o x x x xxxxxxx x x x x x xxxxxxx x x x o x ooooooo x o o o xx x x x x xx x x x x xx x x o o xx x x o x x o x x o o x x x x x x x x x x x x x x x x x x x o x x o x x o o x xxx xxx x xxx xxx x xxx xxx x xxx xxx x o x o x o x x x x x x x x x x o x o x o -2 x o o x o -2 x x x x x -2 x x x x x -2 x o o x o o o x o o o o x x x x x x x x x x x x x x x x x o o x o o o o x x o o x o x x x x x x x x x x x o o x o x o x x x x x x x x x x x o x x o x x o x x x x x x x x x x x o x x o x x o x x x x x x x x x x x o x x -4 o o x -4 x x x -4 x x x -4 o o x o x o o x x x x x x x x o x o o o o x x x x x x x o o x x x x x x x x x o x x x x x x x x x x x o x x x -6 o -6 x -6 x -6 o o o x x x x o o

-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6

x x x x x x x x 6 x 6 x 6 x 6 x o o o o o o o x o o o x o o o x o o o x o o o o o o o o x x x o x x x x o x x x x o x x x x o x o x x o x x o x x o x x 4 x o 4 x o 4 x o 4 x o x o o o o x x x o o o o x x x o o o o x x x o o o o x x o o o x o o o x o o o x o o o x x o o x o o x o o x o o o x x x x o x x x x o x x x x o x x x x o o x x o x x o o x x o x x o o x x o x x o o x x o x x 2 x x x o o 2 x x x o o 2 x x x o o 2 x x x o o o x ooooo x x o x ooooo x x o x ooooo x x o x ooooo x x o x o oo x o x o x o oo x o x o x o oo x o x o x o oo x o x x o o o x x o x o o o x x o x o o o x x o x o o o x x o x o xxxxxx o o x o xxxxxx o o x o xxxxxx o o x o xxxxxx o o o xx o x o xx o x o xx o x o xx o x 0 o x o x o x oo x o x o x 0 o x o x o x oo x o x o x 0 o x o x o x oo x o x o x 0 o x o x o x oo x o x o x x o x ooooooo x o x o x ooooooo x o x o x ooooooo x o x o x ooooooo x o o o xx x x o o xx x x o o xx x x o o xx x x o x x o x x o o x o x x o x x o o x o x x o x x o o x o x x o x x o o x xxx xxx x xxx xxx x xxx xxx x xxx xxx x o x o x o o x o x o o x o x o o x o x o -2 x o o x o -2 x o o x o -2 x o o x o -2 x o o x o o o x o o o o x o o x o o o o x o o x o o o o x o o x o o o o x x o o x o x o o x o x o o x o x o o x o x o x x x o x x x o x x x o x x o x x o x o x x o x o x x o x o x x o x x o x x x o x x x o x x x o x x -4 o o x -4 o o x -4 o o x -4 o o x o x o o o x o o o x o o o x o o o o x o o x o o x o o x x x x x x x x x o x x x o x x x o x x x o x x x -6 o -6 o -6 o -6 o o o o o o o o o

-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6

Figure

D classication graphs for the twospiral problem by rst column second

column third column and fourth column hiddenunit neural networks

trained byNovel upp er row and SIMANN lower row Parameters for Novel

are and and search range Parameters for

g t

SIMANN are RT NT n and search range The crosses

and circles representthetwo spirals of training set Note that the hiddenunit

network was trained byNovel dierently from the other networks

Novel has found optimal designs for the twospiral problem which corresp ond to

correct on the training set for the and hiddenunit neural networks The b est design

for the hiddenunit network is correct on the training set The top four graphs in

Figure show the D classications by the b est and hiddenunit networks found

byNovel

Besides solving the system of ODEs in the globalsearch phase using LSODE wehave

also tried the simpler less accurate but faster the Euler metho d as sp ecied in By

using xed step sizes the Euler metho d advances in discrete p oints quickly without worrying

ab out the stiness of the ODE system The drawback of the Euler metho d is that it has no

error estimation and accuracy guarantee In our exp eriments we study the tradeo of the

solution quality with resp ect to the accuracy in generating the search tra jectory

An advantage of the Euler metho d is that it can work with the patternwise learning

mo de as well as the ep o chwise learning mo de In patternwise learning variables weights

are up dated after every training pattern is presented while in ep o chwise learning variables

are up dated after all the training patterns are presented Accordingly the gradients of the

ob jective function are calculated based on every training pattern in patternwise learning

By up dating the weights more frequently patternwise learning algorithms such as BP with

patternwise learning have b een able to converge faster than ep o chwise learning algorithms

The patternwise learning mo de intro duces large variations of gradientvalues b ecause

they are calculated based on a single training pattern In Novel when we use patternwise

learning and use LSODE to solve the system of ODEs in generating the search tra jectories

the rapid change of gradients makes LSODE fail In contrast the Euler metho d uses xed

step sizes and tolerates the changes of gradients well

Hence in our exp eriments LSODE is only used with ep o chwise learning mo de where

all the training patterns are presented b efore the gradient is calculated and variables are

up dated The Euler metho d can b e used with b oth ep o chwise and patternwise learning

mo des

To measure the p erformance of b oth the ep o ch and the patternwise learning weuse

the meansquared error MSE as the ob jective function to b e minimized MSE is the total

sumsquared error TSSE divided bythenumb er of patterns used in calculating the values

of the error function and its gradient

Wehave xed the step size of in the Euler metho d and varied the values of param

eters Inepochwise training wehave tested four pairs of co ecients

t g g

and and and and and

t g t g t g t

Wehave tried the following vevalues of sigmoid gain and Table

presents the combination of parameters leading to the b est results of Novel using the Eu

ler metho d and ep o chwise training Our results show that the Euler metho d is ab out ten

Table

Summary results of Novel using the Euler metho d to solvethetwospiral problem

The limit of time unit in each run is

No of No of Sigmoid Co ecients Best Solution CPU time

hidden gain Training Testing time unit

units weights SSE Correct Correct minutes

g t

times faster than LSODE in solving the system of ODEs However the quality of solutions

obtained by using the Euler metho d is worse

In using the Euler metho d together with patternwise training we need to set the step size

to b e smaller eg and to get go o d solutions Results obtained by using pattern

wise training are worse than those obtained by using ep o chwise training For example the

b est results for hiddenunit networks are and for patternwise and ep o chwise

training resp ectively

Comparison with Other Global Optimization Metho ds

In this subsection we compare the p erformance of Novel with that of other global op

timization metho ds for solving the twospiral problem These algorithms include simulated

annealing evolutionary algorithms cascade correlation with multistarts CASCORMS

gradient descentwithmultistarts GRADMS and truncated Newtons metho d with multi

starts TNMS We rst describ e these metho ds as follows

a Simulated annealing SA SA is a sto chastic global optimization metho d Start

ing from an initial p oint the algorithm takes a step and evaluates the function When min

imizing a function anydownhill movement is accepted and the pro cess rep eats from this

new p oint An uphill movementmay b e accepted and by doing so it can escap e from lo cal

minima This uphill decision is made by the Metrop olis criteria As the minimization pro cess

pro ceeds the length of the steps decreases and the probability of accepting uphill movements

decreases as well The search converges to a lo cal sometimes global minimum at the end

The software package we used is SIMANN from netlib at httpwwwnetlibattcom

b Evolutionary algorithm EA EA is based on the computational mo del of evolu

tion A variety of EAs have b een prop osed in the past among which are genetic algorithms

evolutionary programming and evolutionary strategies EAs maintain a p opulation of in

dividual p oints in the search space and the p erformance of the p opulation evolves to b e

b etter through selection recombination mutation and repro duction The ttest individual

has the largest probability of survival EAs have b een applied to solve complex multi

mo dal minimization problems with b oth discrete and continuous variables The packages we

used in our exp eriments are GENOCOP GEnetic algorithm for Numerical Optimization for

COnstrained Problems and LICE LInear Cellular Evolution

c Cascade correlation with multistarts CASCORMS The cascade correlation

learning algorithm is a constructive metho d that starts from a small network and gradually

builds a larger network to solve the problem This algorithm was originally prop osed by

Fahlman and Lebiere and has b een applied successfully to some neuralnetwork learning

problems In CASCORMS multiple runs of CASCOR are executed from randomly selected

initial p oints

d Gradient descent with multistarts GRADMS The gradientdescental

gorithm is simple and p opular and its variants have b een applied in many engineering

applications An example is the backpropagation algorithm in neural network learning For

the twospiral problem gradient descen t is p erformed by solving a system of ODEs mo deling

gradient descent

e Truncated Newtons metho d with multistarts TNMS The truncated

Newtons metho d uses secondorder information that may help convergence in descents

They are faster than gradient descents solved by LSODE The software package we used is

TN from netlib

Next we compare the p erformance of Novel with that of these metho ds To allow a fair

comparison we ran all these metho ds for the same amount of time using the same network

structure We used sigmoid gain for all algorithms except CASCORMS and TN

MS whichhave The CPU time allowed for each exp erimentwas hours on a Sun

The simulated annealing program used in our exp eriments is SIMANN from netlib

with some mo dications by Goe Ferrier and Rogers presented in We exp erimented

with various temp erature scheduling factors RT function evaluation factors NT and search

ranges The b est results were achieved when RT NT n n is the number of

variables and the search range is

Wehave studied twoevolutionary algorithms EAs GENOCOP by Michalewicz and

LICE bySprave GENOCOP aims at nding a global minimum of an ob jective function

under linear constraints Wehave tried various search ranges and p opulation sizes Search

range and p opulation size n give the b est results LICE is a parameter opti

mization program based on evolutionary strategies In applying LICE wehav e tried various

initial search ranges and p opulation sizes Range and p opulation size n give

the b est results

In applying CASCORMS weranFahlmans CASCOR program from random initial

weights in the range We started from a new starting p oint when the current run did

not result in a converged network for a maximum of and hidden units resp ectively

In GRADMS we generated multiple random initial p oints in the range Gra

dient descent is mo deled by a system of ODEs and is solved using LSODE One descentis

usually fast The CPU time limit determines how many descents are executed

Finallywehave used truncated Newtons metho d obtained from netlib with multistarts

TNMS We generated random initial p oints in the range and set the sigmoid gain

to Since one run of TNMS is very fast a large numberofrunswere done within the time

limit

The b est p erformance of these algorithms is shown in Figure in which the four graphs

show the progress of all the learning algorithms for the and hiddenunit networks

resp ectively The plots for Novel SA and EAs show the progress of the b est run The plots

for other multistart algorithms show the improvement of the incumbent the b est solution Number of hidden units = 3 Number of hidden units = 4 75 80 70 NOVEL NOVEL SIMANN 70 SIMANN 65 TN-MS TN-MS CASCOR-MS 60 CASCOR-MS 60 GRAD-MS GRAD-MS LICE 50 LICE 55 GENOCOP GENOCOP 50 40

45 30

Error function value 40 Error function value 20 35 30 10 25 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 CPU Time (minute) CPU Time (minute)

Number of hidden units = 5 Number of hidden units = 6 80 80 NOVEL NOVEL 70 SIMANN 70 SIMANN TN-MS TN-MS 60 CASCOR-MS 60 CASCOR-MS GRAD-MS GRAD-MS 50 LICE 50 LICE GENOCOP GENOCOP 40 40

30 30

Error function value 20 Error function value 20

10 10

0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

CPU Time (minute) CPU Time (minute)

Figure

The b est p erformance of various global minimization algorithms for learning

the weights of neural networks with and hidden units for solving the

twospiral problem 100 100

90 90

80 80

70 70

60 60

50 50 NOVEL NOVEL 40 SIMANN 40 SIMANN TN-MS % correct on test set TN-MS % correct on training set 30 CASCOR-MS 30 CASCOR-MS GRAD-MS GRAD-MS 20 GENOCOP 20 GENOCOP LICE LICE 10 10 3 4 5 6 3 4 5 6

Number of hidden units Number of hidden units

Figure

Training and testing errors of the b est designs obtained byvarious algorithms for

solving the twospiral problem There are and weights including

biases in neurons in the neural networks with resp ectively and hidden

units

so far until the CPU time limit is reached The general trend of all these algorithms is

that the solutions improvequickly in the b eginning Then the improvement is slowed down

as time go es on Later the values of the ob jective functions obtained by these algorithms

stabilize at dierentlevels

Figure summarizes the training and test results of the b est solutions found byeach

of these algorithms when ran under hours of CPU time on a Sun SparcStation

The graphs show that Novel has the b est training and test results for the neural networks

found followed by SIMANN TNMS CASCORMS and the twoevolutionary algorithms

The twoevolutionary algorithms do not work well b ecause genetic op erators likemutation

and crossover do not utilize lo cal gradient information in deciding where to search

Simulated annealing algorithm has the b est results among the algorithms wehave tested

except Novel Table shows the b est solutions obtained byNovel and SA For hidden

unit networks Novel has smaller training errors and SA has smaller test error Forand

hiddenunit networks Novel is much b etter than SA Both found the global minima of the

training set for hiddenunit networks

Table

The p erformance of the b est networks with and hidden units which

were trainned byNOVEL and simulated annealing SA

No of SSE of Training Correct of Training Correct of Testing

hidden units NOVEL SA NOVEL SA NOVEL SA

Figure shows the D classication graphs for the twospiral problem by the b est

solutions of Novel and SIMANN Novels results are in the upp er row and SIMANNs the

lower row The two graphs in the rst column are classication graphs by hiddenunit

networks The graphs in the second third and fourth columns are by and hidden

unit networks resp ectively Training patterns are sup erimp osed in these graphs where

circles b elong to one spiral and crosses b elong to the other one When all the circles are in

the black region and all the crosses are in the white region the classication is p ercent

correct As shown in the graphs the solutions found byNovel have smo oth and curved

classication region whereas results by SIMANN show ragged classications

Novel has found one optimal solution for and hiddenunit networks and have found

many optimal solutions for hiddenunit networks with dierent test p erformance Their

correct p ercentages on the test set range from to This means that as network

size increases more solutions exist to solve the training set optimally Also when network

size is increased from hidden units to hidden units the b est generalization correctness

increase from to and then

The exp erimental results show that a learning algorithms p erformance dep ends on the

complexity of the error function When the error function is not complex as in optimizing

the weights of a hiddenunit network Novel as well as other algorithms like SA can nd

a good minimum in a small amount time When the error function is complex and go o d

solutions are few Novel p erforms much b etter than other algorithms When the terrain

of the optimized function is rugged Novel has the ability to searchcontinuously and can

identify deep and narrowvalleys Further Novel is able to descend to the b ottom of lo cal

minima precisely b ecause it formulates gradient descent as a system of ODEs and solves

them accurately

Exp erimental Results on Other Benchmarks

In this section weshow our results in applying Novel on four more b enchmark problems

These are sonar vowelrecognition parity and NetTalk problems They represent classi

cation problems of dierent complexityandcharacteristics The network top ologies used in

these exp eriments are multilayer p erceptrons layered feedforward networks without short

cuts The ob jective function is the meansquared error MSE Other setups are similar to

those for the twospiral problem

Neural networks for solving these problems are larger than the ones used for the two

spiral problem and havemoreweights variables When the number of variables is large

it b ecomes slow for LSODE to solve the system of ODEs in Novel Therefore we apply

the Euler metho d to solve the ODEs of global search Lo cal search is p erformed bylocal

optimization metho ds with fast convergence sp eed such as truncated Newtons metho d TN

and the back propagation metho d BP

For the sonar problem wehave applied Novel with the Euler metho d TNMS SIMANN

and BP Our results comply with what was found by Dixon TN ran much faster than

epochwise BP and achiev ed comparable solutions SIMANN is one order of magnitude

slower than TNMS and Novel and the results are not b etter For these reasons we describ e

only the results for TNMS and Novel using the Euler metho d TN is used in the lo calsearch

phase of Novel

Wehave used and hiddenunit networks to solve the sonar problem Table shows

the b est solutions of TNMS and Novel that achieve the highest p ercentage of correctness

on the sonar problems test patterns Our results show that Novel improves test accuracy

by

All the results in Table were run under similar conditions and time limits In particu

lar Novel always started from the origin and searched in the range for eachvariable

using some combinations of sigmoid gains from the set f g and

g t

from the set f g TNMS

was run using dierentcombinations of random initial p oints in search ranges from the set

f g and the same sigmoid gains as in Novel In TN

MSNovel Novel always started from the b est result of TNMS using the same sigmoid

gain when TNMS was run In solving the NetTalk problem the sigmoid gain is set to

Novel used learning rates of and and a momentum of Back propagation generated

its initial p oint in the range using a momentum of and learning rates from

the set f g

We attributed Novels sup eriority in nding b etter lo cal minima to its globalsearch stage

Since the function searched is rugged and the regions containing go o d solutions are not large

it is imp ortanttoavoid probing from many random starting p oints and to identify go o d

basins b efore committing exp ensive lo cal descents In contrast many descents p erformed

bymultistart algorithms are p erformed in unpromising regions This is exemplied bythe

b ehavior of TNMS whichgave no improvementwhenwe extended the numb er of restarts

from to However multistart algorithms may provide go o d starting p oints for Novel

For the vowelrecognition problem wehave used networks with and hidden units

Table shows that Novel improves training compared to TNMS but p erforms slightly

worse in testing when there are two hidden units TNMSNovel also improved training

when compared with TNMS The sonar and vowelrecognition problems are not as hard

as the twospiral problem studied in the last section TNMS for instance can nd go o d

solutions readily Therefore it is more dicult for globalsearch metho ds to nd b etter

solutions In spite of this Novel can still improve the designs in a small amount of time

On the parity problem we use neural net works with and hidden units to compare

the solution quality of b oth algorithms Using a similar setup as describ ed for the sonar

problem Novel improves the learning results obtained by TNMS

Table

Comparison of the b est results obtained byNOVEL and truncated Newtons

algorithm with multistarts TNMS for solving four b enchmark problems where

the parameters in one metho d that obtains the b est result may b e dierent from

those of another metho d Results in b old font are b etter than or equal to results

obtained by TNMS

TNMS NOVEL CPU

Problems of of Correct of Correct time time

HU Wts training test restarts training test units limits

Sonar s

s

Vowel h

h

parity NA NA s

NA NA s

BP NOVEL

NetTalk h

h

TNMSNOVEL

Sonar s

s

Vowel h

h

parity NA s

NA s

BP NOVEL

NetTalk h

h

In the last application wehave studied the NetTalk problem Since the number of weights

and training patterns are very large wehave used patternwise learning when applying BP

as in the original exp eriments by Sejnowski and Rosenb erg By exp erimenting with

dierent parameter settings of BP the b est learning result by BP is for a hidden

unit network

NetTalks large number of weights precluded using any metho d other than the pattern

wise mo de in the globalsearch phase and patternwise back propagation in the lo calsearch

phase Even so very few logical time units could b e simulated and our designs p erform

b etter in training but sometimes worse in testing To nd b etter designs we to ok the b est

designs obtained by patternwise BP and applied Novel Table shows improved training

results but slightly worse testing results The p o or testing results are probably due to the

small numb er of time units that Novel was run

In short Novels training results are always b etter than or equal to those by TNMS but

are o ccasionally slightly worse in testing This is attributed to the time constraint and the

excellence of solutions already found by existing metho ds In general improving solutions

that are close to the global minima is dicult often requiring an exp onential amountof

time unless a b etter search metho d is used

Summary

In this chapter wehave applied Novel to the sup ervised learning of feedforward neural

networks The learning of weights in such networks can b e treated as a nonlinear continuous

minimization problem with rugged terrains Our goal is to nd neural networks with small

number of weights while avoiding the overtting of weights in large networks Our reasoning

is that there are many go o d lo cal minima in the error space of large networks hence increasing

the chance to nd a go o d lo cal minimum that do es not generalize well to test patterns

Wehave identied two crucial features of suitable algorithms for solving these problems

Use gradient information to descend into lo cal minima Many algorithms have di

culties when the surface is at or when gradients can vary in a large range or when

the terrain is rugged In the second case the searchmay progress to o slowly when the

gradient is small and mayoversho ot when the gradient is large

Escap e from a lo cal minimum once the search gets there Suchmechanisms can b e

classied into probabilisti c and deterministic The suitability of a sp ecic strategy is

usually problem dep endent

Dierent algorithms p erforms dierent tradeos b etween lo cal search and global search

Algorithms that fo cus on either extreme do not work well These include gradient descent

with multistarts such as backpropagation and cascade correlation that fo cus to o muchon

lo cal search and covering metho ds that fo cus to o much on global search A go o d algorithm

generally combines global and lo cal searches switching from one to another dynamically

dep ending on runtime information obtained These include algorithms based on simulated

annealing evolution and clustering

Novel has a globalsearch phase that relies on two counteracting forces lo cal gradient

information that drives the search to a lo cal minimum and a deterministic trace that leads

the search out of a lo cal minimum once it gets there The result is an ecient metho d that

identies go o d basins without sp ending a lot of time in them Go o d starting p oints identied

in the globalsearch phase are used in the lo calsearch phase in which pure gradientdescents

are applied

Wehave shown improved p erformance of Novel for ve neuralnetwork b enchmark prob

lems as compared to that of existing minimization and neuralnetwork learning algorithms

For the twospiral problem w ehave shown a design with nearp erfect classication using

four hidden units and weights while the b est design known to day requires nine hidden

units and weights

CONSTRAINED OPTIMIZATION IN QMF FILTERBANK DESIGN

In this chapter we apply Novel to design quadraturemirrorlter QMF lter banks

We formulate the design problem as a nonlinear constrained optimization problem using the

reconstruction error as the ob jective and other p erformance metrics as constraints This

formulation allows us to search for designs that improveover the b est existing designs We

derive closedform formulas for some p erformance metrics and apply numerical metho ds to

evaluate the values of other p erformance metrics that do not have closedform formulas

Novel uses the adaptive Lagrangian metho d to nd saddle p oints of the constrained opti

mization problems and relies on the tracebased global search to escap e from lo cal minima

In our exp eriments weshow that Novel nds b etter designs than the b est existing solutions

and improves the p erformance of other global search metho ds including simulated annealing

and genetic algorithms We also study the tradeos among multiple design ob jectives and

show that relaxing the constraints on transition bandwidth and stopband energy leads to

signicant improvements in the other p erformance measures

TwoChannel Filter Banks

The design of digital lter banks is imp ortant b ecause improvements can have signicant

impact in many engineering elds For example lter banks have b een applied in mo dems

data transmission digital audio broadcasting sp eech and audio co ding and image and video

co ding

y n

n v n

f n

H z

G z

xn

x n

y n

n v n

f n

H z

G z

Analysis Stage Synthesis Stage

Figure The structure of a twochannel lter bank

Digital lter banks divide an input signal into multiple subbands to b e pro cessed The

simplest lter bank is the twochannel lter bank that divides an input signal into two

subbands It is the starting p oint in studying more complex multichannel lter banks

Figure shows the typical structure of a twoc hannel lter bank where xn is the input

signal in time domain andx n is the output signal The lter bank consists of four lters

H z H z G z and G z The two lters in the upp er channel H z andG z are

lowpass lters while the two lters in the lower channel H z andG z are highpass

lters

As shown in Figure the lter bank is divided into two parts an analysis stage and a

synthesis stage In the analysis stage the input sp ectrum X z orX which

is the Ztransform or Fourier transform of xn is divided into two subbands one high

z frequency band and one low frequency band by a pair of analysis lters H z and H

Then according to Nyquist theorem the subband signals are down sampled by to pro duce

the outputs of the analysis stage v nandv n These output signals can b e analyzed

and pro cessed in various ways according to the applications For example in a subband

co der for audio or image compression these signals are quantized and enco ded b efore b eing

transmitted to the receiver which is the synthesis part

When no error o ccurs b etween the analysis stage and the synthesis stage the input signals

of the synthesis stage are the same as the outputs of the analysis stage In the synthesis

stage the subband signals are upsampled by to pro duce f nand f n which are then

passed through the synthesis lters G z and G z that p erform interp olation In the

end the subband signals are added together to pro duce the reconstructed signalx natthe

output

In short output signalx n is a function of input signal xn and the lters in the lter

bank H z H z G z and G z Following the top branch in Figure we

have the following equations for the analysis and synthesis lters

z H z X z

Y z G z F z

Wehave the following equations for the downsampling and upsampling

V z z z

F z V z

Substituting Eq and into we obtain the output of the top branchas

Y z G z H z X z H z X z

Similarly for the low branch in Figure we can derive

G z H z X z H z X z Y z

The outputs of the upp er and lower branches are added together to pro duce the reconstructed

output signal

X z G z H z G z H z X z G z H z G z H z X z

T z X z S z X z

j j j j j j

where T z T e jT e je and S z S e jS e je

In the reconstructed signal X z the lter bank mayintro duce three typ es of distortions

a Aliasing distortion The subsampling in the lter bank generates aliasing and the up

sampling pro duces images The term S z X z in is called the aliasing term b

Amplitude distortion The deviation of jT z j in from unity constitutes the amplitude

distortion c Phase distortion The deviation of from the desired phase prop erties

such as linear phase is called the phase distortion These distortions are undesirable in

many lterbank applications For example in digital imagevideo compression applications

amplitude distortions change image color and brightness whereas phase distortions intro duce

unpleasant visual eects

Many metho ds have b een develop ed to minimize the undesired distortions of lter banks

For instance aliasing distortions can b e removed by enforcing appropriate relationships

between the synthesis lters and the analysis lters magnitude distortions can b e removed

by using innite impulse resp onse I IR lters and the linear phase prop ertycanbeachieved

by using linearphase nite impulse resp onse FIR lters

When all the distortions are removed the original signal is said to b e reconstructed

p erfectly Based on the p erfect reconstruction of the original signal requires S z

n



where n is a constan for all z andT z z t In this case the transfer function of the

lter bank is just a pure delay

Lets rewrite the transfer function in using the Fourier transformation as follows

X G H G H X

G H G H X

T X S X

In quadraturemirrorlter QMF lter banks the lters are chosen to satisfy the following

conditions

G H

G H

j

H e H

Thus the pairs of lters in the analysis and the synthesis stages H z andH z G z and

G z are mirror lters resp ectively With the particular selection of lters b ecomes

X H H H H X

j

H H e H H X

H H H H X

T X

Note that the aliasing term disapp ears and the aliasing cancellation is exact indep endent

of the choice of function H Now the function T is only a function of the lter

H which is called the prototype lter in the QMF lter bank The design problem is

now reduced to nding a prototyp e lter H H such that T is a pure delay

In an FIR QMF lter bank when the prototyp e lter H is a symmetric lowpass

FIR lter H as well as the function T has linear phase In this case the problem

to design a p erfectreconstruction linearphase lter bank entails making the amplitude of

T to b e eg

jH j jH j

where wehave ignored the constant co ecient in

Figure shows how the frequency resp onses of the lters in a QMF lter bank are

added together to form the overall amplitude resp onse

It can b e shown that once the lters are chosen as to it is imp ossible to

obtain p erfect reconstruction of the original signal using FIR lters except for the trivial

twotap lter case This means that in general the reconstruction error the shaded area

R

in the right graph of Figure E jH j jH j d is not exactly zero

r

However bynumerically minimizing the reconstruction error lter banks of high quality

can b e designed

To summarize FIR QMF lter banks use mirror lters in the analysis and the synthesis

stages to obtain the nice prop erties of linear phase and no aliasing distortion The design |H(w)| 22 + |H( π +w)| 2 Reconstruction |H(w)| Error E r 1 1

0 π 0 π

Figure

The prototyp e lowpass lter and its mirror lter form the reconstruction error

E of the QMF lter bank

r

of FIR QMF lter banks b ecomes designing a prototyp e FIR lowpass lter to minimize

the reconstruction error ie the amplitude distortion Although in general these QMF

lter banks cannot achieve p erfect reconstruction highquality lter banks with very small

reconstruction errors can b e found numerically

Existing Metho ds in QMF FilterBank Design

Generally sp eaking the design ob jectives of lter banks consist of two parts the re

sp onse of the overall lter bank and the resp onse of each individual lter The p erformance

metrics of the overall lter bank include amplitude distortion aliasing distortion and phase

distortion The p erformance metrics of a single lter vary from one typ e of lters to another

Filter banks can b e comp osed of either nite impulse resp onse lters FIR or innite im

pulse resp onse lters I IR Figure shows an illustration of the p erformance metrics of

asinglelowpass lter The ve p erformance metrics passband ripple stopband ripple

p

passband atness E stopband energy E and transition bandwidth T of a lter are

s p s t

calculated based on the dierence of the frequency resp onse of the lter and that of the ideal

lowpass lter which is a step function

Table summarizes the p ossible design ob jectives of a lter bank The p erformance

metrics of a single lter are for a lowpass lter Because the design ob jectives are generally ω)| |H( Passband flatness Εp

δ Passband ripple p 1

Transition bandwidth Stopband energy Τt = ωs−ωp Εs

Stopband ripple δs ω ω

0 p ωs π

Figure An illustration of the p erformance metrics of a single lowpass lter

Table

Possible design ob jectives of lter banks Refer to Eq and Figure

for explanation

Design Ob jectives Metrics

R

Overall Minimize amplitude distortion E E jT j d

r r

R

jS j d Filter Minimize aliasing distortion

a a

R

j jd Bank Minimize phase distortion

p p

Minimize stopband ripple maxjH j

s s s

Single Minimize passband ripple maxjH j

p p p

Filter Minimize transition bandwidth T T

t t s p

R

Minimize stopband energy E E jH j d

s s

s

R

p

Maximize passband atness E E jH j d

p p

passband stop band transition band

p s p s

the desired linear phase

nonlinear continuous functions of lter co ecients lterbank design problems are multi

ob jective continuous nonlinear optimization problems

Algorithms for designing lter banks can b e classied into optimizationbased and non

optimizationba sed In optimizationbased metho ds a design problem is formulated as a

multiob jective nonlinear optimization problem whose form may b e application and lter

dep endent The problem is then converted into a singleob jective optimization problem

and solved by existing optimization metho ds such as gradientdescent Lagrangemultiplier

quasiNewton simulatedannealing and geneticsbased metho ds Filterbank de

sign problems have also b een solved by nonoptimizationbas ed algorithms which include

sp ectral factorization and heuristic metho ds These metho ds generally do not

continue to nd b etter designs once a sub optimal design has b een found

The design of lter banks involves complex nonlinear optimization with the following

features

The ob jectives are not unique and may b e conicting leading to designs with dierent

tradeos To design lter banks systematically the multiple ob jectives are integrated

into a single formulation

Some design ob jectives and their derivatives are not in closed forms and need to b e

evaluated using numerical metho ds

The design ob jectives are nonlinear

Optimization Formulations of QMF Filterbank Design

There are two approaches in optimization design of QMF lter banks multiob jective

unconstrained optimization and singleob jective constrained optimization In the multi

ob jective approach the goals are

to minimize the amplitude distortion reconstruction error of the overall lter bank

and

to maximize the p erformance of the individual prototyp e lter H

A p ossible formulation is to optimize the design with resp ect to a subset of the measures

dened in Table For example if wewant to minimize the reconstruction error of the

lter bank and the stopband energy of an ntap FIR prototyp e lter with co ecients x

x x x the optimization problem is

n

n

min E x and E x

xR r s

Z

where E x jH xj jH xj d

r

Z

E x jH xj d

s

s

Unfortunately optimal solutions to the simplied optimization problem are not necessar

ily optimal solutions to the original problem Oftentimes p erformance measures not included

in the formulation are compromised For example when E and E are the ob jectives to

r s

b e minimized the solution that gives the smallest E and E will probably have a large

r s

transition bandwidth

In general optimal solutions of a multiob jective problem form a Pareto optimal frontier

such that one solution on this frontier is not dominated by another One approachtonda

p oint on the Pareto frontier is to optimize a weighted sum of all the ob jectives

This approach has dicultywhenPareto frontier p oints of certain characteristics

are desired such as those with certain transition bandwidth Dieren tcombinations of

weights must b e tested by trial and error until a desired lter is found When the desired

characteristics are dicult to satisfy trial and error is not eective in nding feasible designs

In this case a constrained formulation should b e used instead

Another approachtosolveamultiob jective problem is to turn all the ob jectives ex

cept one into constraints In this formulation the constraints are dened with resp ect

to a reference design The sp ecic measures constrained may b e application and lter

dep endent

Constraintbased metho ds have b een applied to design QMF lter banks in b oth the

frequency and the time domains In the frequency domain

the most often considered ob jectives are the reconstruction error E and the stopband

r

ripple As stopband ripples cannot b e formulated in closed form stopband attenuation

s

E is usually used instead In the time domain Nayebi gave a timedomain formulation

s

with constraints in the frequency domain and designed lter banks using an iterative time

domain design algorithm

In this thesis we formulate the design of a QMF lter bank in the most general form as

a constrained nonlinear optimization problem

minimize E x

r E

r

subject to E x E x

p E s E

p s

x x

p s

p s

T x

t T

t

where and are the p erformance values of the baseline design with

E E E T

r p s p s t

p ossibly some constraintvalues relaxed or tightened in order to obtain designs of dierent

tradeos Reconstruction error E x is the ob jective to b e minimized and all other

r

metrics of a single lter are used as constraints This formulation allows us to improve

on the b est existing design such as designs rep orted by Johnston with resp ect to

all the p erformance metrics In contrast existing metho ds generally optimize a subset of

the p erformance metrics in constrained form or a weighted sum of the metrics Since the

ob jective and the constraints in are nonlinear the optimization problem is generally

multimo dal with many lo cal minima

Wehave applied our Novel global search metho d to design b etter QMF lter banks In the

rest of this chapter we rst present our metho d in evaluating the p erformance metrics and

their corresp onding derivatives Then in the exp erimental results weshow the improvement

of Novel on existing designs and compare the p erformance of Novel with those of other global

search metho ds

Evaluations of Performance Metrics

In QMF lterbank design problems some functions such as reconstruction error E

r

stopband energy E and passband energy E have closedform formulas However other

s p

functions including stopband ripple passband ripple and transition bandwidth T do

s p t

not have closedform formulas and havetobeevaluated numerically

Let x x x x b e the parameters of an Ntap prototyp e lter and x is symmet

N

ric ie x x The frequency resp onse of the prototyp e lter is obtained by taking a

i N i

Fourier transform of x

N

X

jn

F x x e

n

n

N

X

N 

N

j

A

e x Cos n

n

n

N

is a linear function of and the amplitude From Eq the phase resp onse

resp onse is

N

X

N

A

hx x Cos n

n

n

Hence the p erformance metrics listed in Table are functions of hx as follows

Z

E x h xh x d

r

Z

h xd E x

s

x

s

Z

x

p

hx d E x

p

x x maxjhxj

s s

x maxjhx j x

p p

T x x x

t s p

The calculation of hxinvolves a series of summations When the summation of large

numb ers results in a small number the numerical cancellation errors can b e signicant

These happ en in computing hx when the prototyp e lter is close to the ideal lowpass

lter It is desirable to make the cancellation error small

S a

C

for j to k f

Y aj C

T S Y

C T S Y

S T

g

P

k

Figure Kahans formula for summation a

j

j

Wehave used twonumerical metho ds to do the summation One is to add numb ers from

P

k

small to large For summation a thenumerical solution of this metho d is

j

j

k

X

a

j j

j

where j j k j and is the machine precision The numerical error is prop ortional to

j

the numb er of terms in the summation

A b etter way is to use Kahans summation formula as shown in Figure in which extra

P

k

variables are intro duced to maintain higher precision Its solution of a is

j

j

k

X X

O N ja j a

j j

j

j

where j j Since machine precision is very small for double precision oating

j

a j is p oint representations the second term in the sum is usually small Further since j

j

j

smaller than j a j when k is large the numerical error in can b e much smaller than

j j

that in when the summation series is long

Table Lower and upp er b ounds for calculating the reconstruction error E

r

Index Lower Bound Upp er Bound

Max Nn i MinN n i

Max n i MinN Nn i

Max N n i MinN N n i

Max N n i MinN N n i

Max N n i MinN N n i

Max Nn i MinN N n i

Max n i MinN N n i

Closedform Formulas of E x E xand E x

r s p

As shown in Eq and E x E x and E x are expressed as

r s p

integrations of hx In this section we present their closedform formulas Please refer

to for more detailed derivations

By substituting into we obtain the following formula for reconstruction

error E x

r

nN iN

X X X

E x x x x x x x

r n i n i m nmi

n i

lb mub mnkm n

 

X X

x x x x

i m nmi m N nm

lb mub mnkm n lb mub mnk

 

X X

x x x x

m N nmi m N nmi

lb mub mnk lb mub mnk

X X

x x x x

m nmi m N nmi

lb mub mnk lb mub mnk

   

N

X

x

n

n

where k is an integer and k N The lower and upp er b ounds for the seven innermost

summations are shown in Table

The formula for the stopband energy E x is obtained by substituting into

s

N

X

SinN n x

s

E x x x

s s

n

N n

n

N N

X X

Sin N n m x Sinn m x

s s

x x

n m

n m N n m

n

mm n

Similarlyby substituting into we obtain the formula for the passband

energy E x as follows

p

N

X

Sin N n x

p

x x E x

p p

n

N n

n

N N

X X

Sin N n m x Sin n m x

p p

x x

n m

n m N n m

n

m nm

N

N

X

Sin n x

p

x x

n p

N

n

n

Numerical Evaluation of x x T x

s p t

Although we do not have closedform formulas for the other three p erformance metrics

of a lowpass prototyp e lter stopband ripple passband ripple and transition bandwidth

we can evaluate their values using numerical metho ds

We nd the stopband ripple x in the frequency range based on the frequency

s

resp onse of prototyp e lter x First we uniformly sample the frequency range from to

Frequency brackets that contain all the ripples are found based on the sample p oints Within

each bracket we apply Newtons metho d to precisely lo cate the ripple When Newtons

metho d do es not converge after a xed numb er of iterations we use Golden searchtoreduce

the bracket and restart Newtons metho d again Among all the ripples the largest one is the

stopband ripple x In a similar way the passband ripple x is found in the frequency

s p

range of

It is desirable to compute the ripples fast Newtons metho d has sup erlinear convergence

sp eed when the initial p oint is close to the convergence p oint However Newtons metho d

mayhaveslow convergence or even diverge in other cases Golden search uses a bracket to

safeguard the search It guarantees to nd the lo cal minimum inside a bracket but has only

linear convergence sp eed By combining Newtons metho d and Golden search we get fast

and guaranteed convergence

The transition bandwidth T x x x is found based on the passband cuto

t s p

frequency x and the stopband cuto frequency x xand x are calculated

p s p s

based on passband and stopband ripples xand x resp ectively Based on the fre

p s

quency resp onse of the prototyp e lter x x is the rst value in the range from to

p

that makes hx x Wend xintwo steps First a bracket containing

p p

x is obtained based on sampling Then a combination of Newtons metho d and bisec

p

tion search is used to nd x within a predened numerical error A similar metho d is

p

used to nd the stopband cuto frequency x which is the rst xvalue in the range

s

from to that makes hx x

s

Evaluations of the Firstorder Derivatives of Performance Metrics

For p erformance metrics with closedform formulas which include reconstruction error

E x stopband energy E x and passband energy E x please refer to for the

r s p

analytical forms of their corresp onding rstorder derivatives

Sin N i x x E

s s

x x

s i

x N i

i

N

X

Sin n i x SinN n i x

s s

x

n

n i N n i

nn i

x

s

h x x

s

x

i

E x Sin N i x

p p

x x

p i

x N i

i

N

X

Sin n i x Sin N n i x

p p

x

n

n i N n i

nn i

N

i x Sin

p

N

i

x

p

hx x

p

x

i

and

nN

X X

E x

r

x x x x x

n n i m nmi

x

i

n

lb mub mnkm n

 

X X

x x x x

m nmi m N nmi

lb mub mnkm n lb mub mnk

 

X X

x x x x

m N nmi m N nmi

lb mub mnk lb mub mnk

X X

x x x x

m N nmi m nmi

lb mub mnk lb mub mnk

   

x

i

where i N and x is symmetric

E x x

x E x

p p

s s

and are partly in closed form b ecause and are esti Note that

x x x x

i i i i

mated using nite dierence metho ds eg

x x x x x x x x x

s s i s i

N N

x x

i

x x x x x x x x x

p p i N p i N

x x

i

For all the p erformance metrics without closedform formulas which include stopband

ripple passband ripple and transition bandwidth we use nite dierence metho ds to ap

proximate their derivatives x x x x and T x x

s i p i t i

Exp erimental Results

In our exp eriments wehave applied Novel to solve the QMF lterbank design prob

lems formulated by Johnston These include a b c b c d c d

e c d e d and e where the integer in eachidentier represents the number

of lter taps and the typ es a to e represent prototyp e lters with increasingly sharp er

shorter transition bands

Our goal is to nd designs that are b etter than Johnstons results across all six p erfor

mance measures including reconstruction error stopband energy passband energy stopband

ripple passband ripple and transition bandwidth Hence we use the constrained optimiza

tion formulation with the constraint b ounds dened by those of Johnstons designs

In our exp eriments we improve existing Johnstons results using Novel compare the p er

formance of Novel with those of simulated annealing and genetic algorithms and study the

p erformance tradeos of design ob jectives

Performance of Novel

Novel uses the adaptive Lagrangian metho d presented in Section to handle nonlinear

constraints Figure compares the execution time of the adaptive Lagrangian metho d

with that of the staticweight Lagrangia n metho d when b oth metho ds achiev e the same

quality of solutions It shows the convergence times of the staticweight Lagrangian metho d

normalized with resp ect to that of the adaptive metho d in solving the d and e design

problems For the metho d with static weights wehave used resp ectivelyweightvalues of

and in our exp eriments For the metho d with dynamic

weights wehave used the weightinitializati on algorithm describ ed in Section to set

the initial weights Our program was run on a MHz Intel Pentinm Pro Computer 64 16 32

16 8

8 4 4 Normalized Time Normalized Time 2 48e 2 24d

1 1 10-6 5 10-6 10-5 5 10-5 10-4 10-6 5 10-6 10-5 5 10-5 10-4

Weight Weight

Figure

Comparison of convergence sp eed of Lagrangian metho ds with and without dy

namic weightcontrol for d left and e right QMF lterbank design

problems The convergence times of the Lagrangian metho d using dierent

static weights are normalized with resp ect to the time sp entby the adaptive

Lagrangian metho d

In solving the d problem our adaptive Lagrangian metho d takes minutes to con

verge to a saddle p oint with an ob jectivevalue of whereas the Lagrangian metho d

with static weights obtains the same quality of solution with vastly dierent convergence

sp eed ranging from times to times as long as the adaptive Lagrangian metho d

Similarly the dynamic metho d takes minutes in solving the e design problem but

the static metho d takes vastly dierent and longer times for dierent initial weights ranging

from times to times longer

Table shows the exp erimental results for all the design problems found using the

adaptive Lagrangian metho d normalized with resp ect to Johnstons solutions For each

problem the adaptive Lagrangian metho d starts from Johnstons solution and converges to

a saddle p oint We use LSODE to generate the search tra jectoryInTable a value less

than for a p erformance metric means that our design is b etter on this metric as compared

to Johnstons design The table shows that the dynamic Lagrangia n metho d improves the

ob jectivevalue second column while satisfying all the design constraints Columns

The execution times to nd the solutions dier signicantlyvarying from a few minutes to

several days

Table

Exp erimental results of the dynamic Lagrangian metho d in designing QMF lter

banks We use Johnstons solutions as starting p oints

Filtertyp e E E E T CPU time minutes Time unit

r p p s s r

a

b

c

b

c

d

c

d

e

c

d

e

d

e

Due to the nonlinear function in the ob jective and the constraints there exist lo cal

minima in the search space of QMF lterbank design problems Lagrangian metho ds only

p erform lo cal searches Globalsearch metho ds are needed to overcome lo cal minima Our

Novel relies on the tracebased global search to escap e from lo cal minima and to identify

go o d starting p oints for lo cal searches Table shows the exp erimental results of Novel

that uses the tracebased global search together with the adaptive Lagrangian metho d in

solving QMF lterbank design problems The tracebased global search consists of one

globalsearch stage that starts from Johnstons solution and searches in the range in

each dimension around Johnstons solution The globalsearch stage is run for one time unit

ie from t to t In every time units a starting p oint for the lo cal searchis

selected so there are a total descents p erformed We use LSODE to generate b oth the

global and the lo calsearch tra jectories

Table shows the results of Novel in solving the QMF design problems A value less

than for a p erformance metric means that the design is b etter on this metric as compared

to Johnstons design Due to the excessive time of the Lagrangian metho d in solving c

Table

Exp erimental results of Novel using tracebased global search and the dynamic

Lagrangian metho d as lo cal search in solving the QMF lterbank design prob

lems A MHz Pentium Pro running Linux

Filtertyp e E E E T CPU time hrs Descents

r p p s s r

a

b

c

b

c

d

c

d

e

d

e

e

and d problems we did not run Novel on them The last column in Table shows the

numb er of lo cal descents based on our Lagrangian metho d p erformed in the corresp onding

CPU time

Comparing to Table Novel with global search nds b etter solutions than Novel with

only lo cal search for problems a b c d and e The reason that no

b etter solutions are found for the other problems is attributed to the go o d quality of solutions

found by the Lagrangian metho d starting from Johnstons solutions and to the execution

time available

Johnston used sampling in computing energy values whereas Novel used closedform

integration Hence designs found by Johnston are not necessarily at the lo cal minima in

a continuous formulation To demonstrate this we applied lo cal search in a continuous

formulation of the D design starting from Johnstons design We found a design with

a reconstruction error of of Johnstons result By applying global search Novel can

further improve the design to result in a reconstruction error of of Johnstons result

Figure depicts our D QMF and Johnstons designs It indicates that our design

has smo other passband resp onse and lower reconstruction error 1.01 0.0001 Johnston 9e-05 Johnston NOVEL NOVEL 8e-05 1.005 7e-05 6e-05 1 5e-05

Passband 4e-05 3e-05 0.995 2e-05 (h(f)^2 + h(0.5-f)^2 -1.0)^2 1e-05 0.99 0 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

f f

Figure

A comparison of our D QMF design and Johnstons design The left graph

shows the passband frequency resp onse and the right graph shows the recon

struction error

To summarize Novel has improved over Johnstons results over all these lterbank design

problems The ob jective reconstruction error has b een improved for all the problems Other

p erformance measures are at least as go o d as Johnstons with a few b etter Note that other

design metho ds generally p erform tradeos resulting in designs that are b etter in one or

more measures but worse in others

Comparison with Other Global Optimization Metho ds

Wehave applied simulated annealing SA and evolutionary algorithms EA in QMF

lterbank design The SA wehave used is SIMANN from netlib that works on the following

weightedsum formulation

E x E x

r p

min f x w w max

x

E E

r p

E x x

s p

w max w max

E

s p

T x x

t s

w max w max

T

s t

Table

Exp erimental results of simulated annealing SA for solving QMF lterbank

problems The co oling rates of SA for and tap lter banks

are and resp ectively The total number of evaluations

for and tap lter banks are and millions

resp ectively A MHz Pentium Pro running Linux

Filtertyp e E E E T CPU time hrs

r p p s s r

a

b

c

b

c

d

c

d

e

c

d

e

d

e

where and are p erformance values of the baseline design In our

E E E T

r p s p s t

exp eriments weassignweight for reconstruction error w andweight for the other

p erformance measures w i SA uses Johnstons solutions as initial p oints

i

Wehave tried various parameter settings and rep ort the b est solutions in Table Like

Novel SA searches in the range in each dimension around Johnstons solution

uses co oling rates of and for and tap QMF

lter banks resp ectively and runs for a total of and millions evaluations

resp ectively

For some problems SA nds solutions that improve Johnstons solutions on all six p er

formance measures However for others SAs solutions have a larger transition band This

is b ecause in weightedsum formulation there is no way to force all the ob jectives to b e

smaller than certain values

Table

Exp erimental results of the evolutionary algorithm for solving a constrained for

mulation EAConstr of QMF lterbank problems The p opulation size of EA

is n where n is the number of variables and the generations for

and tap lter banks are and resp ectively

A Sun SparcStation

Filtertyp e E E E T CPU time hrs

r p p s s r

a

b

c

b

c

d

c

d

e

c

d

e

d

e

The EA used in our exp eriments is Spraves Lice Linear Cellular Evolution program

that can solve b oth constrained and weightedsum formulations As in Novel and SA we

have set the search range to b e in each dimension around Johnstons solution

and have tried various p opulation size and numb er of generations

Table rep orts the exp erimental results of EA solving the constrained formulation

EAConstr The tness value of each individual is based on its feasibility and its ob jective

value Feasible individuals have higher tness value than infeasible ones Among infeasible

individuals the one with a smaller constraint violation has a higher tness value The b est

solutions we obtained are rep orted in Table Wehave used a p opulation size n where n

is the number of variables and ran for a total of and generations

for and tap QMF lter banks resp ectively

EAConstr has diculty in nding go o d feasible solutions This is b ecause the constraints

based on Johnstons solutions form a tiny feasible region in the search space Randomly

generated p oints have little chance of b eing feasible

Table rep orts the exp erimental results of EA solving the weightedsum formulation

EAWt As in the exp eriments with SA We assign weight for the reconstruction error

and weight for the rest of the p erformance measures The p opulation size and the number

of generations executed are the same as in the exp eriments of EAConstr the p opulation size

is nwheren is the number of variables and a total of and

generations were executed for and tap QMF lter banks resp ectively

Except b and d where EAWt improves Johnstons solutions in all six p er

formance measures EAWt only nds solutions with tradeos Usually the solutions of

EAWt have larger transition bands than Johnstons solutions and improve in other p erfor

mance measures

To summarize the p erformance improvementofNovel comes from three sources

The closedform formulation used in Novel is more accurate than the sampling metho d

used in Johnstons approach Lo cal optima found byNovel are true lo cal optima

Johnstons solution are lo cal optima in a discrete approximation of the design problem

Novel uses a constrained formulation which allows it to nd designs that are guaran

teed to b e b etter than or equal to Johnstons design with resp ect to all p erformance

measures

The Lagrangian formulation is solved as a system of ordinary dierential equations

ODEs The system of ODEs precisely converges to the saddle p oint that corre

sp onds to the lo cal minimum of original constrained optimization problem ODE

solver LSODE solves the equations accurately

Novel uses the tracebased global searchtoovercome lo cal minima and to nd b etter

solutions than pure lo calsearch metho ds do

Table

Exp erimental results of the evolutionary algorithm for solving the weightedsum

formulation EAWt of QMF lterbank problems The p opulation size of EA

is n where n is the number of variables and the generations for

and tap lter banks are and resp ectively

A Sun SparcStation

Filtertyp e E E E T CPU time hrs

r p p s s r

a

b

c

b

c

d

c

d

e

c

d

e

d

e

Novel improves Johnstons solutions constantly SA obtains solutions with tradeos for

most of the problems They improve most p erformance measures but has worse transition

bandwidth than Johnstons This is due to the weightedsum formulations EAWt has the

same dicultyinimproving over Johnstons solution across all the p erformance measures

In particular the solutions of EAWt also have larger transition bandwidths EAConstr

uses a direct search metho d to prob e the search space When the feasible region is small

direct search metho ds have diculty in generating go o d feasible solutions ClearlyNovel

achieves the b est p erformance among these metho ds

Performance Tradeos

By using our constrained formulation we can further study tradeos in designing QMF

lter banks in a controlled environment Lo osening some constraints in generally leads

to smaller reconstruction errors

Figure demonstrates these tradeos for D QMF lter banks In our exp eriments

wehave used Johnstons designs as our baselines In the left graph constraints on stopband

ripple and energy are relaxed by to from Johnstons solution In the right graph

the constraint on transition bandwidth is relaxed by to from Johnstons solution

The y axis shows the solution ratio which is the ratio b etween the measures found byNovel

and that of Johnstons All solutions obtained byNovel satisfy the constraints

As shown in the left graph when constraints on stopband ripple and energy are lo osened

the reconstruction error passband ripple and passband energy decrease while transition

bandwidth is the same with resp ect to the relaxation ratio This means that for relaxed

requirements of stopband ripple and energy the solutions have b etter reconstruction error

passband ripple and passband energy But they do not have b etter transition bandwidth

In the right graph the constraint on transition bandwidth is lo osened The reconstruction

error and passband energy decrease signicantly The stopband ripple is reduced sligh tly

while the stopband energy is the same

In the previous exp eriments wehave relaxed one or two p erformance requirements In

the next exp eriments we study the change of reconstruction error when all the constraints 1.6 1.6 Reconstr. Error Reconstr. Error Passband Energy 1.4 Passband Energy 1.4 Passband Ripple Passband Ripple Stopband Energy 1.2 Stopband Energy Stopband Ripple Stopband Ripple 1.2 Transition Band Transition Band 1

1 0.8

0.6 Solution Ratio 0.8 Solution Ratio 0.4 0.6 0.2

0.4 0 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5

Stopband Energy and Ripple Relaxation Ratio Transition Bandwidth Relaxation Ratio

Figure

Exp erimental results in relaxing the constraints with resp ect to Johnstons de

signs for D QMFs The xaxis shows the relaxation ratio of stopband energy

and ripple left or transition bandwidth right as constraints in Novel with

resp ect to Johnstons value The y axis shows the ratio of the measure found

byNovel with resp ect to Johnstons

are either tightened or lo osened at the same time with the same ratio Figure shows the

exp erimental results in relaxing and in tightening the constraints with resp ect to Johnstons

designs for D and D QMFs Tightening all the constraints in causes the recon

struction error to increase whereas lo osening them leads to smaller reconstruction error

When the constraints are lo osened the reconstruction error passband energy passband

ripple and stopband ripple decrease signicantly with resp ect to the relaxed constraints

These improvements are at the exp ense of the transition bandwidth and stopband energy

which increase according to the relaxed constraints

When the constraints are tightened wehave diculty in nding solutions that satisfy all

constraints For b oth D and D QMF the transition bandwidth is not satised according

to the tightened constraint This means that Johnstons solutions are at or very close to the

Pareto frontier of optimal solutions There is no solution that is b etter than Johnstons

solution on all six p erformance metrics 1.6 1.8 Reconstr. Error Reconstr. Error 1.4 Passband Energy 1.6 Passband Energy Passband Ripple Passband Ripple 1.4 1.2 Stopband Energy Stopband Energy Stopband Ripple Stopband Ripple Transition Band 1.2 Transition Band 1 1 0.8 0.8 0.6 Solution Ratio Solution Ratio 0.6 0.4 0.4

0.2 0.2

0 0 0.9 1 1.1 1.2 1.3 1.4 1.5 0.9 1 1.1 1.2 1.3 1.4 1.5

Constraint Ratio Constraint Ratio

Figure

Exp erimental results in relaxing and in tightening the constraints with resp ect

to Johnstons designs for D left and D right QMFs The xaxis shows

the ratio of the constraintinNovel with resp ect to Johnstons value The y axis

shows the ratio of the p erformance measures of Novels solutions with resp ect

to Johnstons

Summary

In this chapter wehave applied Novel to design QMF lter banks We formulate the

design problem as a singleob jective constrained optimization problem that takes all the

p erformance metrics into consideration We use the reconstruction error of the overall lter

bank as the ob jective function to b e minimized and ve p erformance metrics of the proto

typ e lowpass lter including stopband ripple stopband energy passband ripple passband

atness and transition bandwidth to form constraint functions constrained by the corre

sp onding p erformance measures of the baseline solution Our ob jective is to improvethe

reconstruction error while all other p erformance measures are at least as go o d as the baseline

solution

For three p erformance metrics which include reconstruction error stopband and pass

band energies we derive their closedform formulas and their corresp onding derivatives For

the other three p erformance metrics which include stopband ripple passband ripple and

transition bandwidth we apply numerical metho ds to nd their values and the corresp onding

derivatives

In our exp eriments we use Johnstons solutions as the baselines and apply Novel to nd

b etter solutions We compare the p erformance of our adaptive Lagrangian metho d with

the static Lagrangian metho d Weshow that the convergence time of the static Lagrangian

metho d is generally longer than that of the dynamic Lagrangian metho d and can vary

signicantly dep ending on the initial weights

By using the dynamic Lagrangian metho d wehave successfully improved the solutions

of all the test problems Wehave found solutions with smaller reconstruction errors than

Johnstons solutions while all the other p erformance metrics are equal to or b etter than

those of Johnstons solutions The nonlinear functions in the ob jective and the constraints

pro duce lo cal minima that trap lo calsearch metho ds By using tracebased global searchin

Novel wehave further improved the solutions obtained by the Lagrangian metho ds

Wehave compared the results obtained byNovel with those obtained by simulated an

nealing SA and evolutionary algorithm EA SA solves an unconstrained optimization

problem with a weightedsum formulation EA works on b oth the weightedsum formulation

and the singleob jective constrained formulation solved byNovel It is inherently dicult to

improve all the p erformance measures at the same time by using the weightedsum formula

tion SA and EA usually obtain tradeo solutions with some p erformance metrics improved

and others getting worse In solving the constrained formulation EA fails to nd go o d

feasible solutions for all the test problems b ecause the feasible regions are very small and

dicult to nd by random probing Overall Novel has the b est p erformance and improves

the baseline solutions consistently

Using our constrained optimization form ulation wehave studied p erformance tradeos

among the six p erformance metrics of a QMF lter bank Wehave found that relaxing

some constraints slightly can lead to a large reduction of the reconstruction error When

the stopband ripple and energy or the transition bandwidth is relaxed the other three p er

formance measures reconstruction error stopband energy and stopband ripple can b ecome

signicantly smaller When we relax all the constraints the stopband energy and transi

tion bandwidth usually meet the relaxed constraints which other metrics are signicantly

improved

As a nal note the design approachandtheNovel metho d wehave applied to design

QMF lter banks can b e carried out to design other digital lters and lter banks suchas

I IR lters and lter banks and multirate and multiband lter banks

DISCRETE OPTIMIZATION

In this chapter we apply Novel particularly the discrete Lagrangian metho d DLM

to solve discrete optimization problems including the satisability problems SAT the

maximum satisabili ty problems MAXSAT and the design of multiplierless p owerof

two PO QMF lterbanks

SAT is a class of NPcomplete problems that mo del a wide range of realworld applica

tions These problems are dicult to solve b ecause they havemany lo cal minima in their

search space often trapping greedy search metho ds that utilize some form of descent We

formulate SAT problems as discrete constrained optimization problems and apply Novel to

solve them Instead of restarting from a new starting p oint when a searchreaches a lo cal

trap the Lagrange multipliers in our discrete Lagrangian metho d provide a force to lead

the search out of a lo cal minimum and move it in the direction provided by the Lagrange

multipliers One of the ma jor advantages of our metho d is that it has very few algorithmic

parameters to b e tuned by users and the search pro cedure can b e made deterministic and

the results repro ducible We demonstrate our metho d by applying it to solve an extensive

set of b enchmark problems archived in DIMACS of Rutgers University Our metho d usu

ally p erforms b etter than the b est existing metho ds and can achieve an orderofmagnitude

sp eedup for many problems

MAXSAT is a general case of SAT As in solving SAT we formulate a MAXSAT problem

as a discrete constrained optimization problem and solve it using Novel In our exp eriments

wehave applied Novel to solve a large numb er of MAXSAT test problems Novel is usually

two orders faster than other comp eting metho ds and can also nd b etter solutions

Finallywe apply Novel to design multiplierless QMF lter banks In multiplierless QMF

lter banks lter co ecients are represented as sums or dierences of p owers of two and

multiplications b ecome additions subtractions and shifting Without the complexityof

multiplications each lter takes less area to implement in VLSI and more lter taps could

b e accommo dated in a given area to achieve higher p erformance Weformulate the design

problem as a discrete constrained optimization problem and solve it using Novel Novel

has obtained designs with similar quality to realvalue designs but less computation and

hardware implementation costs

This chapter is divided into sections In Section to Section we presentthe

application of Novel to solveSAT problems In Section weshow the application of Novel

to solve MAXSAT problems In Section we present the application of Novel in designing

multiplierless QMF lter banks Finally in Section we conclude this chapter

Intro duction to Satisability Problems

The satisabilitySAT problem is one of the most studied problems in contemp orary

complexity theory It was the rst NPcomplete problem discovered by Stephen Co ok in

Dene a set of Bo olean variables x x x x x f g and let the complement

n i

of any of these variables x b e denoted asx In prop ositional logic the variables are called

i i

literals Using standard notations where the symbol denotes or and denotes andwecan

write any Bo olean expression in conjunctive normal form CNF ie a nite conjunction of

disjunctions in whicheach literal app ears at most once Each disjunctive grouping is called

a clause Given a set of m clauses fC C C g on the n variables x a Bo olean formula

m

in conjunctive normal form CNF is

C C C

m

A clause is satised if at least one of its memb er literals is true A Bo olean expression is

satised if all of its clauses are simultaneously satised given a particular assignmenttothe

literals variables

The satisabili ty problem is dened as follows

Given a set of literals and a conjunction of clauses dened over the literals nd

an assignment of values to the literals so that the Boolean expression is true or

derive its infeasibility if the Boolean expression is infeasible

The SAT problem b elongs to an imp ortant class of discrete constraintsatisfactio n prob

lems CSP Many problems in articial intelligence logic computer aided design database

query and planning etc can b e formulated as SAT problems These problems are known

to b e NPcomplete and require algorithms of exp onential complexity in the worst case to

obtain a satisfying assignment

Many search metho ds have b een develop ed in the past for solving SAT problems These

include resolution constraint satisfaction and backtracking These metho ds are computa

tionally exp ensive and are not suitable to apply to large problems

In addition to the formulation in SAT problems can b e formulated as discrete or

continuous constrained or unconstrained optimization problems In Section we present

ve formulations show the ob jective andor constraints for each formulation and discuss

approaches for solving each

SAT algorithms can b e classied as incomplete and complete dep ending on whether they

can nd a random solution or nd all the solutions The advantage of complete algorithms

is that they can detect infeasibility whenaSAT problem is infeasible However they are

generally computationally exp ensive and are suitable for relatively small problems On

the other hand incomplete metho ds are much faster but cannot conclude whether a SAT

problem is feasible or infeasible when no solution is found within a limited amount of time

Recently a class of lo cal incomplete search metho ds were prop osed solving a class of

hard SAT problems with size of an orderofmagnitude larger than those solved by complete

metho ds A ma jor disadvantage of these metho ds is that they require users to set some

problemsp ecic parameters in order to nd solutions ecientlyFor this reason one of our

goals is to design a fast lo cal search metho d whose results can b e repro duced easily

In our approach weformulate a SAT problem as a discrete constrained optimization

problem with a goal of minimizing N x sub ject to a set of constraints

m

X

n

U x min N x

i xfg

i

sub ject to U x i f mg

i

where U x if the logical assignment x satises C andU ifC is false N x

i i i i

measures the numb er of unsatised clauses and N x when all the clauses are satised

We then apply Novel with the discrete Lagrangian metho d to solve this problem

Traditionally Lagrangian metho ds have b een develop ed to solvecontinuous constrained

optimization problems By doing descents in the original v ariable space and ascents in

the Lagrangemultipli er space equilibrium is reached when optimal solutions are found To

apply these metho ds to solve discrete SAT problems we need to develop discrete Lagrangian

op erators that can work on discrete values Novel uses the Discrete Lagrangian method

DLM to solve discrete optimization problems and searches for saddle p oints based on

a discrete Lagrangian function Equilibrium is reached when a feasible assignmenttothe

original problem is found Novel moves a search tra jectory out of a lo cal minimum in a

direction provided by the Lagrange multipliers without restarting the search

In the next four sections of this chapter we present our results in solving SAT prob

lems In Section we rst summarize previous formulations and algorithms to solveSAT

problems In Section we apply our discrete Lagrangian metho d DLM to solveSAT

problems and derive related theoretical foundations In Section we address the issues

and alternatives in implementing DLM Finally inSectionweshow our exp erimental

results in applying DLM to solveSAT problems from the DIMACS b enchmark suite After

wards we switch to present our results in applying Novel to solve other discrete problems

Previous Work

In this section we review the previous work in solving SAT problems which include

various discrete and continuous constrained and unconstrained formulations and the corre

sp onding algorithms for solving them

Discrete Formulations

Discrete formulations of SAT problems can b e classied as unconstrained versus con

strained as follows

a Discrete ConstrainedFeasibility Formulation This is the formulation dened in

Metho ds to solve it can b e either complete or incomplete dep ending on their ability to prove

infeasibili ty Complete metho ds for solving this formulation include resolution

backtracking and consistency testing An imp ortant resolution metho d

is DavisPutnams algorithm These metho ds enumerate the search space systematically

and may rely on incomplete metho ds to nd feasible solutions Their disadvantage is that

they are computationally exp ensive For instance Selman et al and Gu

have rep orted that DavisPutnams algorithm cannot handle SAT problems with more than

variables and b etter algorithms to dayhave diculty in solving SAT problems with more

than variables

b Discrete UnconstrainedFormulation In this formulation the goal is to minimize

N x the numb er of unsatisable clauses That is

m

X

N min x U x

i

n

xfg

i

Many lo cal search algorithms were designed for this formulation These algorithms can

deal with large SAT problems of thousands of variables However they may b e trapp ed by

lo cal minima in the search space where no state in its lo cal neighb orho o d is strictly b etter

Consequently steep estdescent or hillclimbing metho ds will b e trapp ed there Restarts

merely bring the search to a completely new region

Metho ds designed for are usually incomplete metho ds although some mechanisms

like backtracking can make them complete Most incomplete metho ds are random metho ds

relying on ad ho c heuristics to nd random solutions quickly Those that have b een applied

include multistart restart of lo cal improvement descent metho ds sto chastic metho ds

suchassimulated annealing SA and genetic algorithms GA They are

discussed briey as follows

A pure descent metho d with multistarts descends in the space of the ob jective function

from an initial p oint and generates a new starting p oint when no further improvementcan

b e found lo cally Examples include hillclimbing and steep est descent

For large SAT problems hillclimbing metho ds are much faster than steep est

descent b ecause they descend in the rst direction that leads to improvement whereas

steep est descent metho ds nd the b est direction An example of an ob jective function

suitable to b e searched by descent or hillclimbing metho ds is Pure descentmethods

are not suitable when there are constraints in the search space as formulated in

Recently some lo cal search metho ds were prop osed and applied to solve large SAT prob

lems The most notable ones are those develop ed indep endently byGuand

Selman

Gu develop ed a group of lo cal search metho ds for solving SAT and CSP problems In

his PhD thesis he rst formulated conicts in the ob jective function and prop osed a

discrete relaxation algorithm a class of deterministic lo cal search to minimize the number

of conicts in these problems The algorithms he develop ed subsequently fo cused on two

comp onents metho ds to continue a searchwhenitreaches a lo cal minimum and metho ds

for v ariable selection and value assignment In the rst comp onent he rst develop ed the

socalled minconicts heuristic and showed signicant p erformance improvementin

solving large size SAT nqueen and graph coloring problems His meth

o ds use various lo cal handlers to escap e from lo cal traps when a greedy search stops progress

ing Here a search can continue without improvementwhenitreaches

a lo cal minimum and can escap e from it bya combination of backtracking restarts

and random swaps In variable selection and value assignment Gu and his colleagues have

develop ed random and partial random heuristics

These simple and eective heuristics signicantly improve the p erformance of lo cal search

algorithms bymany orders of magnitude

Selman develop ed GSAT that starts from a randomly generated assignment

and p erforms lo cal search iteratively by ipping variables Such ipping is rep eated until

either a satisable assignment is found or a preset maximum numb er of ips is reached

When trapp ed in a lo cal minimum GSATeithermoves uphill or jumps to another random

p oint Toavoid getting stuckona plateau which is not a lo cal minimum GSAT makes

sidewaymoves

In short the ob jective function in mayhave many lo cal minima that trap lo cal

search metho ds Consequently a search in a seemingly go o d direction may get stuckina

small lo cal minimum and will rely on random restarts or hill climbing to bring the search out

of the lo cal minimum However b oth schemes do not explore the search space systematically

and random restarts may bring the search to a completely dierent search space

Sto chastic metho ds suchasGAandSAhave more mechanisms to bring a search out

of a lo cal minimum but are more computationally exp ensive Selman et al rep orted

that annealing is not eective for solving SAT problems To the b est of our knowledge there

is no successful application of genetic algorithms to solveSAT problems In general these

metho ds are muchslower than descent metho ds and can only solve small problems

c Discrete ConstrainedFormulation There are various forms in this formulation One

approach is to formulate SAT problems as integer linear programming ILP problems

and apply existing ILP algorithms to solve them However this approach is gen

erally computationally exp ensive

Another approach is to minimize an ob jective function N x sub ject to a set of con

straints as dened in and restated as follows

m

X

n

N x U x min

i

xfg

i

sub ject to U x i f mg

i

This formulation is b etter than those in and b ecause the constraints provide

another mechanism to bring the search out of a lo cal minimum When a search is stuckin

a lo cal minimum the ob jectivevalue as formulated in is a discrete integer and the

vicinity of the lo cal minimum mayeitherbeworse or b e the same On the other hand in

formulating the problem as in there is very little guidance given to the search algorithm

as to whichvariable to ip when a clause is not satised Novel solves SAT problems based

on the discrete constrained formulation Weshow ecient heuristic algorithms that

search in discrete space while satisfying the constraints

Continuous Formulations

In formulating a discrete SAT problem in continuous space we transform discrete vari

ables in the original problem into continuous variables in sucha way that solutions to the

continuous problem are binary solutions to the original problem This transformation is p o

tentially b enecial b ecause an ob jectiveincontinuous space may smo oth out some infeasible

solutions leading to smaller numb er of lo cal minima explored Unfortunately continuous

formulations require computationally exp ensive algorithms rendering them applicable to

only small problems In the following weshowtwosuch formulations

a Continuous UnconstrainedFormulation

m

X

min f x c x

i

n

xR

i

where c x is a transformation of clause C

i i

n

Y

a x c x

ij j i

j

x if x in C

j j i

a x

ij j

x ifx in C

j i

j

otherwise

Values of x that make f x are solutions to the original problem in

Note that the ob jective is a nonlinear p olynomial function Hence there may b e many

lo cal minima in the search space and descent metho ds such as gradient descent conju

gate gradient and quasiNewtons metho ds can b e trapp ed by the lo cal minima

Global search techniques such as clustering metho ds generalized gradient metho ds

Bayesian metho ds and sto chastic metho ds can also b e applied however they are usually

much more computationally exp ensive than descent metho ds

Toovercome the ineciency of continuous unconstrained optimization metho ds Gu de

velop ed discrete bitparallel optimization algorithms SAT and SAT to evaluate

continuous ob jective function and have found signicant p erformance improvements

b Continuous ConstrainedFormulation This generally involves a heuristic ob jective

function that indicates the quality of the solution obtained suchasthenumb er of clauses

satised One formulation similar to is as follows

m

X

n

f x c x min

i xR

i

sub ject to c x i f mg

i

where c x is dened in

i

The key in the continuous approach lies in the transformation When it do es not smo oth

out lo cal minima in the discrete space or when the solution densityislow continuous metho ds

are much more computationally exp ensive to apply than discrete metho ds

Since is a continuous constrained optimization problem with a nonlinear ob jective

function and nonlinear constraints we can apply existing Lagrangemultiplier metho ds to

solve it Our exp erience is that a Lagrangian transformation do es not reduce the number of

lo cal minima and continuous Lagrangian metho ds are an orderofmagnitude more exp ensive

to apply than the corresp onding discrete algorithms

Applying DLM to SolveSAT Problems

To apply our discrete Lagrangian metho d DLM intro duced in Section to solvea

SAT problem weintro duce an articial ob jective H x and formulate the problem into a

constrained optimization problem as follows

n

min H x

xfg

sub ject to U x i f mg

i

where U if C is true and U if C is false

i i i i

Lets show that a saddle p ointx to the Lagrangian formulation of consists of

a feasible solution x to the SAT problem in For a given saddle p ointx

T

T

F x F x H x U x H x U x

T

T

U x U x

for any close to The condition holds only when U x ie x is a feasible solution

to the SAT problem in

In order for all feasible solutions to the SAT problem in to b ecome saddle p oints of

the ob jective function H xmust satisfy the following condition

Necessary Condition for Ob jectiveFunction If a feasible solution x to the SAT

problem in is a lo cal minimum of the ob jective function H x then x is a saddle

p oint of the Lagrangian formulation for any p ositive

The Lagrangian function of is Pro of

T

F x H x U x

For a feasible solution x of U x Hence for any p ositive and any

T

T

F x H x U x H x H x U x F x

By the denition of U x U x forany x Since x is a lo cal minima of U xand is

p ositive

T T

F x H x U x H x H x U xF x

for x close to x Therefore wehave

F x F x F x

and x is a saddle p oint

There are many functions that satisfy the necessary condition Examples are

m

X

H x U x

i

i

m

X

H x l U x

i i

i

m

X

H x l l U x

max i i

i

where l is the number of variables in clause C andl maxl i m The

i i max i

rst function assigns uniform weights to all the clauses The second function assigns more

weights to longer clauses whereas the third one is the opp osite A feasible solution x is a

lo cal minimum of these functions b ecause U x i m andH x In our

i

exp eriments we use the rst function as our ob jective function and formulate SAT problems

as constrained optimization problems in

The SAT problem dened in is a sp ecial case of the discrete constrained optimization

problem dened in An imp ortant prop erty of the formulation in is that all lo cal

minima are also the global minima This is true b ecause based on x is dened as

a lo cal minimum if U x which implies that N x a global minimum This

condition is stated more formally as follows

Necessary and Sucient Condition for Optimality x is a global minimum of the

SATformulation in if and only if U x

Pro of Straightforward

Due to this prop ertyaSAT problem formulated in can b e solved by the dierence

equations in and that nd saddle p oints of a discrete Lagrangian function In

the rest of this section we showhow the general discrete Lagrangian metho d DLM can b e

applied to solve discrete SAT problems

The discrete Lagrangian function for is dened as follows

m m m

X X X

T

LxN x U x U x U x U x

i i i i i

i i i

n m T

where x f g U x U xU x f g and is the transp ose of

m

that denotes the Lagrange multipliers Due to the binary domain of xthe

m

neighb orho o d of x is more restricted than in general discrete problems which is reected in

the denition of the saddle p oint

A saddle p ointx of Lx is dened as one that satises the following condition

Lx Lx Lx

for all suciently close to and for all x whose Hamming distance b etween x and x is

SaddlePoint Theorem for SAT x is a global minimum of if and only if there

exists some such that x constitutes a saddle p oint of the asso ciated Lagrangian

function Lx

Pro of The SaddlePoint Theorem discussed in Section for discrete problems can b e

applied here A simpler pro of is as follows

part Since x is a saddle p oint Lx Lx for suciently close to

From the denition of the Lagrangian function in this implies

m m

X X

U x U x

i i i

i

i i

Supp ose some U x which means U x Then

k k

k n

x and x is a global would violate the inequality for a p ositive Therefore U

minimum

part If x is a global minimum then U x and Lx Lx

The vector makes Lx Lx Therefore x is a saddle p oint

CorollaryIfaSAT problem formulated in is feasible then any algorithm that can

nd a saddle p ointofLx dened in from any starting p oint can nd a feasible

solution to the SAT problem

Pro ofIfaSAT problem is feasible then its solutions are global minima of These

corresp ond to saddle p oints of Lx dened in If an algorithm can nd a saddle

p oint from any starting p oint then it will nd a solution to the problem

Since a Lagrangian metho d only stops at saddle p oints this corollary implies that the

metho d will nd a saddle p oint regardless of its starting p oint including the origin if the

problem is feasible Unfortunately the corollary do es not guarantee that it will nd a saddle

p oint in a nite amount of time

To apply DLM to solveSAT problems we dene the discrete gradient operator Lx

x

with resp ect to x suchthat Lxpoints to state x in the Hamming distance neigh

x

b orho o d of the currentstate x that gives the maximum improvementin Lx If no state

in the neighb orho o d of x improves Lx then Lx

x

Figure illustrates the discrete gradient op erator It is a dimensional problem

The currentpoint x has Lagrangian value L States within Hamming distance of

from x have Lagrangian values and along dimensions and resp ectively

The neighb oring state along dimension improves the Lagrangian value most Therefore

x L

x

Next we prop ose a metho d to up date x so that it will eventually satisfy the optimality

condition dened in

Discrete Lagrangian Metho d DLM A for SAT

k k k k

x x Lx

x

k k k

U x 10 X2 L = 10 X1 7

X3

12

Figure Illustration of the discrete gradient op erator for SAT

It is easy to see that the necessary condition for algorithm A to converge is when U x

implying that x is optimal If any of the constraints in U x is not satised then will

continue to evolve to handle the unsatised constraints

The following theorem establishes the correctness of A and provides the conditions for

termination

FixedPoint Theorem for SAT An optimal solution x to the SAT problem dened in

is found if and only if A terminates

Pro of part If A terminates then U xwhichmakes Lx Since this

x

is a sucient condition for the optimality of the optimal solution is found

part If an optimal solution x is found then according to the necessary condition

of optimality U x implying that Lx Therefore neither x nor will

x

change leading to the conclusion that A terminates

Example The following simple example illustrates the discrete Lagrangian algorithm The

problem has four variables fx x x x g clauses

x x x x x x x x x x x x

x x x x x x x x x g

and solutions f g

Algorithm A works as follows

Initially x f g and fgand Lx

The L values of neighb oring p oints of the initial p ointare

L L L L

Since Lx is less than or equal to the values of the neighb oring p oints Lx

x

As the fourth clause is not satised is up dated to f g Further x is

up dated to b e x x Notethat the p enalty for the fourth clause is increased in order

to provide a force to pull the search out of the lo cal minimum

Lx The L values of x s neighb oring p oints are

L L L

and L

There are three choices of Lx

x

If wecho ose Lx then x and

x

and If wecho ose Lx then x

x

If wecho ose Lx then x and

x

Assume that wecho ose Lx in this example

x

x and Lx The L values of neighb oring p oints are

L L

and L L

Therefore Lx x and are up dated to x and

x

resp ectively

U x implies that Lx Hence A terminates and x is a solution

x

Note that dened in is nondecreasing Ways to decrease are discussed in

Section

One of the imp ortant features of DLM is that it continues to searchuntil a solution

to the SAT problem is found indep endent of its starting p oint Therefore DLM do es not

involve restarts that are generally used in other randomized search metho ds However as in

other incomplete metho ds DLM do es not terminate if there is no feasible solution Further

as in general Lagrangian metho ds the time for DLM to nd a saddle p oint can only b e

determined empirically

DLM has a continuous search tra jectory without any breaks DLM descends in the

original variable space similar to what is used in other algorithms When the searchreaches

a lo cal minimum DLM brings the search out of the lo cal minimum using its Lagrange

multipliers As a result we often see small uctuations in the numb er of unsatised clauses

as the search progresses indicating that DLM bypasses small dents in the original variable

space with the aid of Lagrange multipliers In contrast some lo cal search algorithms rely on

randomized mechanisms to bring the search out of lo cal minima in the original variable space

When the search reaches a lo cal minimum it is restarted from a completely new starting

p oint Consequently the searchmay fail to explore the vicinity of the lo cal minimum it has

just reached and the numb er of unsatised clauses at the new starting p oint is unpredictable

In contrast to other lo cal search metho ds such as Gus lo calsearch metho ds

and GSAT DLM descends in Lagrangian space To get out of lo cal minima

that do not satisfy all the constraints DLM increases the p enalties on constraints that are

violated recording history information on constraint violation in the Lagrange multipliers

Eventually as time passes the p enalties on violated constraints will b e very large forcing

these constraints to b e satised On the other hand GSAT uses uphill movements and ran

dom restarts to get out of lo cal minima whereas Gus lo calminimum handler uses sto chastic

mechanisms to escap e from lo cal minima

Although our strategy is similar to Morris breakout strategy and Selman and

Kautzs GSAT that applies adaptive p enalties to escap e from lo cal minima DLM

provides a theoretical framework for b etter understanding of these heuristic strategies In

addition DLM can incorp orate new techniques for controlling Lagrange multipliers in order

to obtain improved p erformance Some of these strategies are describ ed in Section

Implementations of DLM

In this section we discuss issues related to the implementation of DLM and present three

implementations There are three comp onents in applying a Lagrangian metho d evaluating

the derivative of the Lagrangian function up dating the Lagrange multipliers and evaluating

the constraint functions In the continuous domain these op erations are computationally

exp ensive esp ecially when the number of variables is large and the function is complex

However as weshow in this and the next sections implementation in the discrete domain

is very ecient and our metho d is faster than other lo calsearch metho ds for solving SAT

problems

Algorithmic Design Considerations

The general algorithm of DLM is shown in Figure It p erforms descents in the

original variable space of x and ascents in the Lagrangemultipli er space of In discrete

space Lx is used in place of the gradient function in continuous space We call one

x

iteration as one pass through the while lo op In the following we describ e the features of

our implementation of A in Figure

a Descent and Ascent Str ategies There are twoways to calculate Lx greedy

x

and hillclimbing eachinvolving a search in the range of Hamming distance one from the

current x assignments with one variable ipp ed from the current assignment x

Generic algorithm A

Set initial x and

while x is not a solution ie N x

up date x x x Lx

x

if condition for up dating is satised then

up date c U x

end if

end while

Figure Generic discrete Lagrangian algorithm A for solving SAT problems

In a greedy strategy the assignment leading to the maximum decrease in the Lagrangian

function value is selected to up date the current assignment Therefore all assignments in the

vicinity need to b e searched every time leading to computation complexityof O n where

n is number of variables in the SAT problem In hil lclimbin g the rst assignment leading

to a decrease in the Lagrangianfunctio n value is selected to up date the current assignment

Dep ending on the order of search and the numb er of assignments that can b e improved

hillclimbing strategies are generally less computationally exp ensive than greedy strategies

Wehave compared b oth strategies in solving SATbenchmark problems and have found

hillclimbing to b e orders of magnitude faster with solutions of comparable quality Hence

wehave used hillclimbing in our exp eriments

b Updating The frequency in which is up dated aects the p erformance of a

search The considerations here are dierent from those of continuous problems In a discrete

problem descents based on discrete gradients usually make small changes in Lxineach

up date of x b ecause only one variable changes Hence should not b e up dated in each

iteration of the searchtoavoid biasing the search in the Lagrangemultiplier space of over

the original variable space of x

In our implementation is up dated under two situations One is when Lx

x

the other is every T iterations In our exp eriments wehave found that DLM works b etter

when T is large Consequently will b e up dated infrequently and most likelybeupdated

only when Lx When Lx a lo cal minimum in the original variable

x x

space is reached and the search can only escap e from it byupdating

By setting T to innity the strategy amounts to pure descents in the original x variable

space while holding constant until a lo cal minimum is reached This corresp onds to

Morris break out strategy In the Breakout algorithm a state is a complete set of

assignments for variables In the case of a SAT problem each clause is asso ciated a weight

and the cost of a state is the sum of weights of unsatised clauses of the state Initially

all weights are one Iterativeimprovements of the state continues until a lo cal minimum is

reached Then the weightofeachcurrently unsatised clause is increased by unit increments

until breakout from the lo cal minimum o ccurs Iterativeimprovements resume afterwards

A parameter c in the term c U x in Figure controls the magnitude of changes in

In general c can b e a vector of real numb ers allowing nonuniform up dates of across

dierent dimensions and p ossibly across time For simplicitywehave used a constant c in

our implementation for all s Empirically c has b een found to work well for most of

the b enchmark problems tested However for some larger and more dicult problems we

have used a smaller c in order to reduce the search time

The last p ointon in Figure is that it is always nondecreasing This is not true

in continuous problems with equality constraints In applying Lagrangian metho ds to solve

continuous problems the Lagrange multiplier of a constraint g x increases when

g x and decreases when g x In A shown in Figure is nondecreasing

b ecause U x is either or when a clause is not satised its corresp onding is increased

and when a clause is satised its corresp onding is not changed For most of the b enchmark

problems wehave tested this strategy do es not worsen the search time as these problems are

relatively easy to solve However for dicult problems that require millions of iterations

can b ecome very large as time go es on Large s are generally undesirable b ecause they

cause large swings in the Lagrangianfuncti on value

Toovercome this problem we develop a strategy to reduce p erio dically in DLM A

in Section Using this strategywe can solve some of the more dicult b enchmark

problems

c Starting Points and Restarts In contrast to other SAT algorithms that rely on

random restarts to bring a search out of a lo cal minimum DLM will continue to evolve

without restarts until a satisable assignment is found This avoids restarting to a new

starting p oint when a search is already in the proximity of a go o d lo cal minimum Another

ma jor advantage of DLM is that there are very few parameters to b e selected or tuned by

users including the initial starting p oint This makes it p ossible for DLM to always start

from the origin or from a random starting p oint generated by a xed random seed and nd

a feasible assignment if one exists

d Plateaus in the Search Space In discrete problems plateaus with equal values exist

in the Lagrangianfuncti on space Our prop osed discrete gradient op erator mayhave dif

culties in plateaus b ecause it only examines adjacent p oints of Lx that dier in one

dimension Hence it may not b e able to distinguish a plateau from a lo cal minimum We

have implemented two strategies to allow a plateau to b e searched

First we need to determine when to change when the search reaches a plateau As

indicated earlier should b e up dated when the search reaches a lo cal minimum However

up dating when the search is in a plateau changes the surface of the plateau and may

make it more dicult for the searc htonda localminimum somewhere inside the plateau

Toavoid up dating immediately when the search reaches a plateau wehave develop ed a

strategy called at move This allows the searchtocontinue for some time in the plateau

without changing so that the search can traverse states with the same Lagrangian function

value How long at moves should b e allowed is heuristic and p ossibly problem dep endent

Note that this strategy is similar to Selmans sidewaymove strategy

Our second search strategy is to avoid revisiting the same set of states in a plateau In

general it is impractical to remember every state the search visits in a plateau due to the

large storage and computational overheads In our implementation wehavekept a tabu list

to maintain the set of variables ipp ed in the recent past and to avoid ipping a

variable if it is in the tabu list

To summarize we employ the following strategies and settings of parameters in our

implementations of the generic DLM A

A hillclimbing strategy is used for descents in the original variable space

T the time interval b etween up dating isinnity

The initial is

The initial starting p oint is either the origin or a randomlygenerated starting p oint

obtained by calling random numb er generator drand

The random numb er generator uses a xed initial seed of

To handle more dicult and complex problems at moves and tabu lists may b e used

in a search to traverse plateaus and may b e reduced systematically

Without any sp ecic statement these strategies and parameters settings are used in our

implementations in the following section

Three Implementations of the Generic DLM

Figures and show three implementations of the general algorithm A with

increasing complexity

A DLM Version shown in Figure is the simplest It has two alternatives to nd

avariable in order to improve the Lagrangianfun ction value ip variables one byonein

a predened order or ip variables in unsatised clauses Since only variables app earing

in unsatised clauses can p otentially improve the current Lagrangian function value it is

not necessary to checkvariables that app ear only in currently satised clauses The rst

alternative is fast when the search starts By starting from a randomly generated initial

assignment it usually takes just a few ips to nd a variable that improves the current

Lagrangia nfunctio n value As the search progresses there are fewer variables that can

improve the Lagrangianfun ction value At this p oint the second alternative should b e

applied

A uses parameter to control switching from the rst alternative to the second We

found that works well and used this value in all our exp eriments

DLM A

Set initial x

Set

Set c

Set

while x is not a solution ie N x

if numb er of unsatised clauses then

Maintain a list of unsatised clauses

if variable v in one of the unsatised clauses such that

Lx Lx when ipping v in x to get x then

x x

else

Up date c U x

end if

else

if variable v suchthat Lx Lx when ipping v

in a predened order in x to get x then

x x

else

Up date c U x

end if

end if

end while

Figure

Discrete Lagrangian Algorithm Version A an implementation of A for solv

ing SAT problems

DLM A

Set initial x

Set

Set c

Set n where n is the number of variables

while x is not a solution ie N x

if numb er of iterations then

Maintain a list l ofvariables suchthat

if one of them is ipp ed the solution will improve

if l is not empty then

Up date x by ipping the rst elementof l

else

Up date c U x

end if

else

if variable v suchthat Lx Lx when ipping v

in a predened order in x to get x then

x x

else

Up date c U x

end if

end if

while end

Figure Discrete Lagrangia n Algorithm Version A for solving SAT problems

A DLM Version shown in Figure employs another strategy to make lo cal im

provements InitiallyitissimilartoA As the search progresses the number of variables

that can improve the current Lagrangianfuncti on value are greatly reduced At this p oint a

list is created and maintained to contain variables that can improve the current Lagrangian

function value Consequently lo cal descent is as simple as ipping the rst variable in the

list

A uses to control the switching from the rst strategy to the second We found that

nworks well and used this value in our exp eriments A also has more ecientdata

structures to deal with larger problems

A DLMVersion shown in Figure has more complex control mechanisms and was

intro duced to solve some of the more dicult b enchmark problems suchastheg f and

large par problems in the DIMACS b enchmarks b etter than A ItisbasedonA and

uses all of A s parameters Wehave applied strategies based on at moves and tabu lists

to handle plateaus An imp ortant elemen tof A is the p erio dic scaling down of the

Lagrange multipliers in order to prevent them from growing to b e very large Further to

get b etter p erformance wemayhave to tune c for each problem instance

Program eciency is critical when dealing with SAT problems with a large number of

variables and clauses Since DLM searches by constantly up dating state information current

assignmentofx Lagrangemultiplier values and Lagrangianfun ction value state up dates

havetobevery ecient In our implementation we up date state information incrementally

inaway similar to that in GSAT In large SAT problems eachvariable usually app ears in

a small numb er of clauses Therefore state changes incurred by ipping a variable are very

limited When ipping a variable some clauses b ecome unsatised while some others b ecome

satised The incremental up date of the Lagrangian function value is done by subtracting

the part of improvement and adding the part of degradation This leads to very ecient

evaluation of Lx In a similar way the computation of Lx can also b e done

x

eciently

In general Lagrangian metho ds Lagrange multipliers intro duced in the formulation add

extra overhead in computing the Lagrangian function as compared to the original ob jective

DLM A

Set initial x

Set

Set n where n is the numberofvariables

Set tabu length L eg

t

Set atregion limit L eg

f

Set reset interval I eg

Set constant c eg

Set constant r eg

while x is not a solution ie N x

if numb er of iterations then

Maintain a list l ofvariables suchthat

if one of them is ipp ed the solution will improve

end if

if numb er of iterations and l is not empty then

Up date x by ipping the rst elementofl

else if variable v suchthatLx Lx when ipping v

in a predened order in x to get x then

x x

else if v such that Lx Lx when ipping v in x to get x

and numb er of consecutiveatmoves L

f

and v has not b een ipp ed in the last L iterations then

t

x x at mo ve

else

Up date c U x

end if

if iteration index mod I then

Reduce for all clauses eg r

if end

end while

Figure Discrete Lagrangia n Algorithm Version A for solving SAT problems

function This overhead in DLM is not signicant b ecause an up date of requires O p

time where p is the numb er of unsatised clauses and p m when the numb er of clauses

m is large

Reasons b ehind the DevelopmentofA

In our exp eriments we found that A has diculty in solving some of the hard b enchmark

problems By lo oking into the execution proles of A as illustrated in Figure wehave

the following interesting observations

The sampled Lagrangianfun ction values decrease rapidly in the rst few thousand

iterations the upp er left graph of Figure but keep increasing afterwards to b ecome

very large at the end The iterationbyiteration plot of the Lagrangian part b ottom

right graph of Figure shows the same increasing trend

Some Lagrange multipliers b ecome very large as indicated in the spread of Lagrange

multipliers the upp er right graph of Figure This large spread leads to large

Lagrangian function values

The numb er of unsatised clauses is relatively stable uctuating at around the

upp er left graph

As the Lagrangian function space is very rugged and dicult to search when there is

a large spread in Lagrangemultiplier values wehavedevelop ed three strategies in A to

address this problem

First weintro duce in A at moves to handle at regions in the Lagrangian function

space In A Lagrange multipliers are increased whenever the search hits a at region Since

at regions in hard problems can b e very large Lagrange multipliers can increase very fast

making the search more dicult Flat moves allow the search to explore at regions without

increasing the Lagrange multipliers and the Lagrangianfun ction value 16384 1600 1400 MIN 4096 AVE 1200 MAX 1024 1000

256 Lagrangian Value 800 No. of Unsat Clauses 600 64 400 16 200 0

4 Lagrange Multiplier Value

Lagrangian and Objective Values 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000

Niter x 1000 Niter x 1000

16384 256

128 4096 64 1024 32

256 16

8 64 4

Objective-Part Value 16 Lagrangian-Part Value 2

4 1 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000

Niter Niter

Figure

Execution proles of A on problem g starting from a random initial

point The top left graph plots the Lagrangian function values and the number

of unsatised clauses against the numb er of iterations sampled every itera

tions The top right graph plots the minimum average and maximum values of

the Lagrange multipliers sampled every iterations The b ottom two graphs

plot the iterationbyiteration ob jective and Lagrangian part values for the rst

iterations 16384 800

700 MIN 4096 AVE 600 MAX 1024 500

256 400 Lagrangian Value No. of Unsat Clauses 64 300 200 16 100

4 Lagrange Multiplier Value 0 Lagrangian and Objective Values 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000

Niter x 1000 Niter x 1000

Figure

Execution proles of A using a tabu list of size and at moves of limit of

on problem g starting from a random initial p oint See Figure for

further explanation

Second weintro duce a tabu list to store variables that have b een ipp ed in

the recent past and to avoid ipping any of them in the near future This strategy avoids

ipping the same set of variables back and forth and revisiting the same state

The result of applying these two strategies is shown in Figure These graphs show

signicant reduction in the growth of Lagrangianfun ction values and Lagrange multipliers

as compared to those of A However these values still grow without b ound

Third weintro duce p erio dic scaledowns of Lagrange multipliers to control their growth

as well as the growth of Lagrangian function values Figure shows the result when

all Lagrange multipliers are scaled down by a factor of every iterations These

graphs indicate that p erio dic scaling when combined with a tabu list and at moves leads to

b ounded values of Lagrange multipliers and Lagrangianfun ction values The b ottom right

graph in Figure shows the reduction in the Lagrangian part after two reductions of its

values at and iterations

By using these three strategies A can solve successfully most of the hard problems in

the DIMACS b enchmark suite 16384 20

4096 Lagrangian Value No. of Unsat Clauses 15 1024

256 10

64 5 MIN AVE 16 MAX

Lagrange Multiplier Value 0 4 Lagrangian and Objective Values 0 100 200 300 400 500 600 700 800 9001000 0 100 200 300 400 500 600 700 800 9001000

Niter x 1000 Niter x 1000

16384 128

4096 64

1024 32

256 16

64 8

Objective-Part Value 16 4 Lagrangian-Part Value

4 2 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000

Niter Niter

Figure

Execution proles of A with tabu list of size at moves of limit and p eri

o dic scaling of by a factor of every iterations on problem g

starting from a random initial p oint Note that the average Lagrange multiplier

values are very close to and therefore overlap with the curve showing the

minimum Lagrange multiplier values See Figure for further explanation

Exp erimental Results

In this section weevaluate DLM using SAT b enchmarks in the DIMACS archive The

archive is made up of a rep ository of hundreds of easy and hard SAT problems with many

variables and clauses

Our DLM co de was written in C In our exp eriments wehave tested DLM on all satis

able problems in the DIMACS archive Wehave compared the p erformance of DLM with

rep orted results in the literature on the following b enchmark problems

Circuit synthesis problems iiby Kamath et al a set of SAT enco dings of

Bo olean circuitsynthesis problems

Circuit diagnosis problems ssa a set of SATformulas based on circuit fault analysis

Parity learning problems par a collection of prop ositional versions of parity learn

ing problems

Articially generated SAT instances aim

Randomly generated SAT instances jnh

Large random satisable SAT instances f

Hard graph coloring problems g

An enco ding of the TowersofHanoi problems hanoi and

Gus asynchronouscircuit synthesis b enchmarks as and technology mapping b ench

marks tm

In this section we show exp erimen tal results on the three versions of our DLM imple

mentation describ ed in Section

A sets all parameters constant throughout the runs in order to avoid intro ducing random

eects in the program and to allow easy repro duction of results It works well on the aim

problems but not as well on others

Table

Execution times of A in CPU seconds on a Sun SparcStation for one

run of A starting from x origin as the initial p oint on some DIMACS

b enchmark problems

Problem DLM A Problem DLM A

Identier Time ofIter Identier Time of Iter

ssa iia

ssa iib

ssa iic

ssa iid

yes iie aim

aim yes iib

aim yes iic

aim yes iid

parc iie

parc

A has no problemdep endent parameters to b e tuned by users and generally works well

for all the b enchmark problems However it has diculty in solving some larger and more

dicult problems including g f large par and hanoi

A has some parameters to b e tuned in its complex control mechanisms Although these

parameters are problem dep endent wehave tried a few sets of parameters and have found

one set that generally works well Results rep orted are based on the b est set of parameters

found A solves some of the more dicult problems b etter than A

Table shows the exp erimental results when A was always started from the origin It

shows execution times in CPU seconds and the numb er of iterations In each iteration either

one variable was ipp ed or the values were up dated In each exp eriment A succeeded in

nding a feasible assignment For most of the test problems wehave found the average time

that A sp ent when started from the origin to b e longer than the average time when started

from randomly generated initial p oints A p ossible explanation is that the distance b etween

the origin and a lo cal minimum is longer than the average distance b etween a randomly

um generated starting p oint and a nearby lo cal minim

Table

Comparison of A s execution times in seconds averaged over runs with re

sp ect to published results of WSAT GSAT and DavisPutnams algorithm

on some of the circuit diagnosis problems in the DIMACS archive A Sun

SparcStation and a MHz SGI Challenge with MIPS R GSAT

WSAT and DP SGI Challenge with a MHz MIPS R

Problem No of No of DLM A WSAT GSAT DP

Identier Var Clauses SUN SGI Iter SGI SGI SGI

ssa

ssa

ssa

ssa

Wehave compared the p erformance of A with resp ect to the b est known results on these

b enchmark problems Most of our timing results were averaged over ten runs with randomly

generated initial p oints starting from a xed seed of in the random numb er generator

drand Consequently our results can b e repro duced deterministically

In Table we compare A with WSAT GSAT and DavisPutnams algorithm in

solving the circuit diagnosis b enchmark problems We presenttheaverage execution times

and the average numb er of iterations of A as well as the published average execution times

of WSAT GSAT and DavisPutnams metho d We did not attempt to repro duce the

rep orted results of GSAT and WSAT since the results may dep end on initial conditions

such as the seeds of the random numb er generator and other program parameters We ran

A on an SGI Challenge so that our timing results can b e compared to those of GSAT

and WSAT Our results showthatA is approximately one order of magnitude faster than

WSAT

Based on a singleCPU MHz SGI Challenge with MIPS R at the University of Illinois National

Center for Sup ercomputing Applications we estimate empirically that it is slower than a Sun Sparc

Station for executing A to solveSATbenchmark problems However we did not evaluate the sp eed

dierence b etween a MHz SGI Challenge and a MHz SGI Challenge on whichGSATandWSATwere

run

Table

Comparison of A s execution times in seconds averaged over runs with the

published results on the circuit synthesis problems including the b est known

results obtained byGSAT integer programming IP and simulated anneal

ing A Sun SparcStation and a MHz SGI Challenge with

MIPS R GSAT and SA SGI Challenge with a MHz MIPS R Inte

ger Programming VAX

Problem No of No of DLM A GSAT IP SA

Identier Var Clauses SUN SGI Iter SGI Vax SGI

iia

iib

iic

iid

iie

In Table we compare A with the published results of GSAT integer programming

and simulated annealing on the circuit synthesis problems Our results showthatA

p erforms several times faster than GSAT

In Table we compare the p erformance of the three versions of DLM with some of

the b est known results of GSAT on the circuitsynthesis paritylearning some articially

generated SAT and some of the hard graph coloring problems The results on GSATare

from which are b etter than other published results Our results show that DLM is

consistently faster than GSATontheii and par problems and that A is an orderof

magnitude faster than GSAT on some aim problems

Table also shows the results of A on some g problems Recall that A was develop ed

to cop e with large at plateaus in the search space that confuse A which failed to nd any

solution within million iterations Hansen and later Selman addressed this

problem by using the tabu search strategy In a similar waywehave adopted this strategy

in A bykeeping a tabu list to prevent ipping the same variable back and forth This led

to b etter p erformance although the p erformance is sensitive to the length of the tabu list

A p erforms comparably to GSAT on these g problems

Table

Comparison of DLMs execution times in seconds averaged over runs with the

b est known results obtained byGSAT on the DIMACS circuitsynthesis

paritylearning articially generated SAT instances and graph coloring prob

lems Results on A were based on a tabu length of at region limit of

reset interval of and reset to b e whenthe reset interval is

reached For gandg c For gandg

c A A A Sun SparcStation GSAT SGI Challenge mo del

unknown

Problem No of No of SUN Success SGI Success

Identier Var Clauses Sec Ratio Sec Ratio

DLM A GSAT

aim yes

aim yes

aim yes

yes aim

DLM A GSAT

iib

iic

iid

iie

parc

parc

DLM A GSAT

g

g

g

g

In Tables to we compare the p erformance of DLM with the b est results of

Grasp on the DIMACS b enchmark problems Grasp is a greedy randomized adaptive

search pro cedure that can nd go o d quality solutions for a wide variety of combinatorial

optimization problems

In four implementations of Grasp were applied to solveve classes of DIMACS SAT

problems aim ii jnh ssa and par Comparing to GSAT Grasp did b etter

on the aim ssa and par problems and worse on the ii and jnh problems The

results of the most ecient implementation of Grasp GraspA are used in our comparison

Grasp was run on an SGI Challenge computer with MHz MIPS R The average

CPU time of random runs of Grasp are shown in the tables in which Grasp succeeded in

all runs for each problem

In Table and we compare A A and Grasp in solving the whole set of aim

problems whichwere randomly generated instances with a single satisable solution For

these problems A p erforms b etter than A on the average and is usually to orders of

magnitude faster than Grasp Grasp do es not have results on some of the larger aim

instances

In Tables and we compare A with Grasp in solving the ii jnh par and

ssa problems Except for some small instances that b oth DLM and Grasp solvein

no time DLM is generally to orders of magnitude faster than Grasp in solving the ii

jnh and ssa problems For the par problems DLM A obtains comparable results

to Grasp in solving the parxc instances but is worse in solving the parx instances

In addition to the parx instances DLM A and A have diculty in solving the

par par hanoi and f problems Tables shows some preliminary but

promising results of A on some of the more dicult but satisable DIMACS b enc hmark

problems Comparing with Grasp A is b etter in solving the parxc instances but is

worse in solving the parx and parx problems

Finallywe showinTable our results of A on the as and tm problems in the

DIMACS archive The average time over runs is always under second for these problems

Table

Comparison of DLMs execution times in seconds over runs with the published

results of Grasp on the aim problems from the DIMACS archive DLM

A A Sun SparcStation Grasp SGI Challenge with a MHz MIPS

R Success ratio for Grasp is always

DLM A DLM A Grasp

Problem Succ Sun CPU Seconds Succ Sun CPU Seconds Avg SGI

Identier Ratio Avg Min Max Ratio Avg Min Max Seconds

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

aim yes

yes aim

aim yes

yes aim

aim yes

aim yes

aim yes

aim yes

aim yes

yes aim

aim yes

aim yes

yes aim

aim yes

aim yes

aim yes

aim yes

yes aim

aim yes

yes aim

aim yes

aim yes

Table

Comparison of DLMs execution times in seconds over runs with the published

results of Grasp on the aim problems from the DIMACS archive

DLM A A Sun SparcStation Grasp SGI Challenge with a MHz

MIPS R Success ratio for Grasp is always

DLM A DLM A Grasp

Problem Succ Sun CPU Seconds Succ Sun CPU Seconds Avg SGI

Identier Ratio Avg Min Max Ratio Avg Min Max Seconds

aim yes

aim yes

aim yes

yes aim

aim yes

yes aim

aim yes

aim yes

aim yes

aim yes

aim yes

yes aim

aim yes

aim yes

yes aim

aim yes

Table

Comparison of DLM A s execution times in seconds over runs with the pub

lished results of Grasp on the ii problems from the DIMACS archive

DLM A Sun SparcStation Grasp SGI Challenge with a MHz

MIPS R The success ratios for DLM A and Grasp are always

DLM A Grasp DLM A Grasp

Problem Sun CPU Seconds Avg SGI Problem Sun CPU Seconds Avg SGI

Id Avg Min Max Seconds Id Avg Min Max Seconds

iia iia

iia iia

iib iib

iib iib

iic iic

iid iid

iie iie

iia iia

iib iib

iic iic

iid iid

iie iie

iia iib

iib iib

iib iic

iic iic

iid iic

iid iid

iie iie

iie iie

iie

Table

Comparison of DLM A s execution times in seconds over runs with the pub

lished results of Grasp on the jnh par and ssa problems from the DIMACS

archive DLM A Sun SparcStation Grasp SGI Challenge with a

MHz MIPS R and success ratio for Grasp is always

DLM A Grasp

Problem Success Sun CPU Seconds Avg SGI

Identier Ratio Avg Min Max Seconds

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

par

par

par

par

par

parc

parc

parc

parc

parc

ssa

ssa

ssa

ssa

Table

Comparison of DLM A s execution times in seconds over runs with the pub

lished results of Grasp on some of the more dicult DIMACS b enchmark prob

lems from the DIMACS archive Success ratio of Grasp is always

Program parameters For all the problems at region limit reset to

every iterations For the par problems tabu length

For the f For the rest of the par problems tabu length

problems tabu length For the hanoi problem tabu length

System conguration DLM A Sun SparcStation Grasp SGI

Challenge with a MHz MIPS R

DLM A Grasp

Problem Succ Sun CPU Seconds Average

Identier Ratio Avg Min Max SGI Sec

par

par

par

par

par

par

par

par

par

par

parc

parc

parc

parc

parc

hanoi

f

f

f

Table

Execution times of DLM A in Sun SparcStation CPU seconds over

runs on the as and tm problems from the DIMACS archive

Problem Success Sun CPU Seconds Problem Success Sun CPU Seconds

Id Ratio Avg Min Max Id Ratio Avg Min Max

as as

as as

as as

as as

as as

as as

as

tm tm

To summarize wehave presented the application of Novel based on the discrete La

grangian metho d DLM in solving SAT problems DLM b elongs to the class of incomplete

metho ds that attempts to nd a feasible assignment if one exists but will not terminate if

the problem is infeasible

We formulate a SAT problem as a discrete constrained optimization problem in which

lo cal minima in the Lagrangian space corresp ond to feasible assignments of the SAT problem

Searching for saddle p oints therefore corresp onds to solving the SAT problem

We rst extend the theory of Lagrange multipliers for continuous problems to discrete

problems With resp ect to SAT problems we dene the concept of saddle p oints derivethe

Discrete Saddle Point Theorem prop ose metho ds to compute discrete gradients and apply

DLM to lo ok for saddle p oints We show the Discrete Fixed Point theorem which guarantees

that DLM will continue to searchuntil a saddle p ointisfoundWe further investigate various

heuristics in implementing DLM

Wehave compared the p erformance of DLM with resp ect to the b est existing metho ds

for solving some SAT b enchmark problems archived in DIMACS Exp erimental results show

that DLM can solve these b enchmark problems often faster than other lo calsearch metho ds

Wehave not b een able to solve the remaining DIMACS b enchmark problems that include

par thru par

parc thru parc and

hanoi

Solving Maximum Satisability Problems

In this section we apply DLM to solve a general class of satisability problem the

weighted maximum satisability problem MAXSAT The MAXSAT problem is a discrete

optimization problem They are dicult to solve due to the large amount of lo cal minima

in their search space

Weformulate a MAXSAT problem as a discrete constrained optimization problem and

present the discrete Lagrangian metho d DLM for solving MAXSAT problems DLM

provides an ecientway of searching in a discrete space Instead of restarting from a new

starting p oint when a searchreaches a lo cal minimum the Lagrange multipliers in DLM

provide a force to lead the search out of a lo cal minimum and move it in the direction

provided by the Lagrange multipliers We further investigate issues in implementing DLM

to solve MAXSAT problems Our metho d dynamically adjusts the magnitude of Lagrange

multipliers and clause weights to nd b etter solutions Since our metho d has very few

algorithmic parameters to b e tuned the search pro cedure can b e made deterministic and

the results repro ducible In our exp eriments we compare DLM with GRASP in solving a

large set of test problems and show that it nds b etter solutions and is substantially faster

Previous Work

As presented in Section the satisabilitySAT problem is dened as follows Given

a set of m clauses fC C C g on n variables x x x x x f g and a

m n i

Bo olean expression in conjunctive normal form CNF

C C C

m

nd an assignmenttothevariables so that the Bo olean expression evaluates to b e trueor

derive infeasibility if it is infeasible

MAXSAT is a general case of SAT In MAXSAT each clause C is asso ciated with

i

weight w The ob jective is to nd an assignmenttothevariables that maximizes the sum

i

of the weights of satised clauses

m

X

max w S x

i i

n

xfg

i

where S x equals if logical assignment x satises C and otherwise This ob jective

i i

is equivalent to minimizing the sum of the weights of unsatised clauses Based on this

denition SAT is a decision version of MAXSATinwhich all clauses haveunitweights

MAXSAT problems are dicult to solve for the following reasons

They have a large numb er of lo cal minima in their search space where a lo cal minimum

is a state whose lo cal neighb orho o d do es not include any state that is strictly b etter

The weights in the ob jective function of a MAXSAT problem can lead to a muc h

more rugged search space than the corresp onding SAT problem When a MAXSAT

problem is satisable existing SAT algorithms develop ed to nd a satis

able assignment to the corresp onding SAT problem can b e applied and the resulting

assignment is also optimal for However this approach do es not work when the

MAXSAT problem is not satisable In this case existing SAT lo calsearchmethods

have diculties in overcoming the rugged search space

Metho ds for solving MAXSAT can b e classied as incomplete and complete dep ending

on whether they can nd the optimal assignment Complete algorithms can determine

optimal assignments They include mixed integer linear programming metho ds and

various heuristics They are generally computationally exp ensiveandhave

diculties in solving large problems

On the other hand incomplete metho ds are usually faster and can solve some large

problems that complete metho ds cannot handle Many incomplete lo calsearchmethods

have b een designed to solve large SAT problems of thousands of variables

and have obtained promising results in solving MAXSAT problems

Another approachtosolve MAXSAT is to transform it into continuous formulations

Discrete variables in the original problem are relaxed into continuous variables in suchaway

that solutions to the continuous problem are binary solutions to the original problem This

transformation is p otentially b enecial b ecause an ob jective in continuous space may smo oth

out some lo cal minima Unfortunately continuous formulations require computationally

exp ensive algorithms rendering them applicable only to small problems

In this section we apply DLM to solve MAXSAT problems We formulate a MAXSAT

problem in as a discrete constrained optimization problem

m

X

n

min N x w U x

i i

xfg

i

sub ject to U x i f mg

i

where w and U x equals to if the logical assignmen t x satises C and otherwise

i i i

We rst present DLM for solving MAXSAT and investigate the issues and alternatives in

implementing DLM Then we present exp erimental results of DLM in solving a large set of

MAXSAT test problems

DLM for Solving MAXSAT Problems

The Lagrangian function for is dened as follows

m

X

T

w U x L xN x U x

i i maxsat

i

n m T

where x f g U x f g and the transp ose of denotes

m

the Lagrange multipliers

A saddle p ointx of L x is dened as one that satises the following

maxsat

condition

L x L x L x

maxsat maxsat maxsat

for all suciently close to and for all x whose Hamming distance b etween x and

x is The following version of DLM nds discrete saddle p oints of Lagrangian function

L x

maxsat

Discrete Lagrangian Metho d DLM A

maxsat

k k k k

x x L x

x maxsat

k k k

c U x

where c is a p ositive real number and represents exclusiveORXOR We dene the

discrete gradient operator L x with resp ect to x such that L x

x maxsat x maxsat

n

f g with at most one not zero and it gives L x the greatest

n i maxsat

reduction in the neighb orho o d of x with Hamming distance Note that L xis

x maxsat

similar to the discrete gradient op erator that wehave dened for SAT problems

There are two imp ortant prop erties ab out DLM A

maxsat

It is easy to see that the necessary condition for algorithm A to converge is when

maxsat

U x implying that x is optimal If any of the constraints in U x is not satised

then will continue to evolve to handle the unsatised constraints and the search

continues In that case a stopping criterion based on a time limit will b e needed

DLM A mo deled by and minimizes L x in while

maxsat maxsat

is changing Since U xisweighted by w in changing is equivalentto

i i i

p erturbing the weights in the original ob jective function dened in In nding

b etter assignments while trying to satisfy all the clauses DLM eectively changes the

weights of unsatised clauses and mo dies L xtobesearched Since the

maxsat

optimal assignment dep ends on the relativeweights in the change in weights

in DLM is done in suchaway that maintains their relative magnitudes In contrast

existing MAXSAT metho ds explore the search space without mo difying the weights

This feature is the ma jor dierence b etween existing metho ds and DLM

Next we discuss various considerations in implementing DLM Figure shows the

pseudo co de of A a generic DLM implementing and where c is a p ositive

maxsat

Set initial x randomly by a xed random seed

Set initial to b e zero

while x is not a feasible solution ie N x

up date x x x L x

x maxsat

up date incumbentifN x is b etter than current incumbent

if condition for up dating is satised then

up date c U x

end if

end while

Figure Generic DLM A for solving SAT problems

maxsat

constant that regulates the magnitude of up dates of We dene one iteration as one pass

through the while lo op In the following we describ e some of our design considerations

a Initial Points Lines DLM is started from either the origin or from a random

initial p oint generated using a xed random seed Further is always set to zero The xed

initial p oints allo w the results to b e repro duced easily

b Descent and Ascent Strategies Line There are twoways to calculate L x

x maxsat

greedy and hillclimbing eachinvolving a search in the range of Hamming distance one from

the current x Dep ending on the order of search and the numb er of assignments that can b e

improved hillclimbing strategies are generally much faster than greedy strategies

Among various ways of hillclimbing we used two alternatives in our implementation ip

the variables one by one in a predened order or maintain a list of variables that can improve

the current L x and just ip the rst variable in the list The rst alternativeis

maxsat

fast when the search starts By starting from a randomly generated initial assignment it

usually takes very few ips to nd a variable that improves the current L x As the

maxsat

search progresses there are fewer variables that can improve L x At this p oint

maxsat

t and should b e applied the second alternative b ecomes more ecien

c Conditions for updating Line The frequency in which is up dated aects

the p erformance of a search The considerations here are dierent from those of continuous

problems In a discrete problem descents based on discrete gradients usually make small

changes in L x in each up date of x b ecause only one variable changes Hence

maxsat

should not b e up dated in each iteration of the searchtoavoid biasing the searchinthe

Lagrangemultipl ier space of over the original variable space of x

Exp erimental results show that a go o d strategy is to up date only when L x

x maxsat

At this p oint a lo cal minimum in the original variable space is reached and the search

can only escap e from it by up dating

d Amount of update of Line A parameter c controls the magnitude of changes in

In general c can b e a vector of real numb ers allowing nonuniform up dates of across

dierent dimensions and p ossibly across time In our exp eriments c has b een found to

work well for most of the b enchmarks tested However for some larger problems a larger c

results in shorter search time and b etter solutions

The up date rule in Line results in nondecreasing because U xiseither orIn

contrast in continuous problems of constrain t g x increases when g x and

i i i

decreases when g x

i

From our previous exp erience in solving SAT problems using DLM weknow that some

values for some dicult problems can b ecome very large after tens of thousands of iterations

At this p oint the Lagrangian search space b ecomes very rugged and the search has

dicultyinidentifying an appropriate direction to move To cop e with this problem

should b e reduced p erio dicall y to keep them small and to allow DLM to b etter identify go o d

solutions

The situation is worse in MAXSAT b ecause the weights of clauses w can b e in a large

range making L x in even more rugged and dicult to search Here w

maxsat

in MAXSAT has a similar role as in SAT Without any mechanism to reduce w

L x can b ecome very large and rugged as the search progresses

maxsat

This situation is illustrated in the upp er two graphs of Figure whichshows the

b ehavior of DLM when it is applied to solvetwo MAXSAT problems The graphs show

alues vary in a very large range whichmakes the Lagrangian space that the Lagrangian v

dicult to search 32768 16384 Lagrangian Value Lagrangian Value Objective 16384 Objective 4096 8192

1024 4096

2048 256 1024 64 512 16 256 0 500 1000 1500 2000 0 500 1000 1500 2000

N_iter N_iter

Without p erio dic scaling of and w

32768 16384 Lagrangian Value Lagrangian Value Objective 16384 Objective 4096 8192

1024 4096

2048 256 1024 64 512 16 256 0 500 1000 1500 2000 0 500 1000 1500 2000

N_iter N_iter

With p erio dic scaling of and w every iterations

Figure

Execution proles of jnh left and jnh right jnh is satisable and jnh

is not

One waytoovercome this problem is to reduce and w p erio dicall yFor instance in the

lower two graphs of Figure w was scaled down by a factor of every iterations

This strategy reduces L x and restricts the growth of the Lagrange multipliers

maxsat

leading to faster solutions for many test problems

Figure shows A our implementation of DLM to solve MAXSAT A

maxsat maxsat

uses the rst strategy of hillclimbing in descents in the b eginning and switches to the second

strategy later A parameter is used to control the switch from the rst strategy to the

second We found that nworks well and used this value in our exp eriments

A up dates only when a lo cal minimum is reached In our exp eriments wehave

maxsat

used a simple scheme that increases by a constant across all the clauses

Set initial x

Set

Set c

Set n where n is the number of variables

Set reduction interval I eg

Set reduction ratio r eg

Set base value eg

b

Set base weightvalue w eg

b

while x is not a feasible solution ie N x and

termination criteria are not met

if numb er of iterations then

Maintain a list l ofvariables such that

if one of them is ipp ed L x will improve

maxsat

if l is not empty then

Up date x by ipping the rst elementof l

else

Up date c U x

end if

else

if variable v st L x L x when ipping v

maxsat maxsat

in a predened order in x to get x then

x x

else

Up date c U x

if end

end if

if iteration index mod I then

Reduce and weights for all clauses if p ossible

eg max rw maxw w r

b i b i

end if

if x is b etter than incumbent keep x as incumbent

end while

Figure A an ecient implementation of DLM

maxsat

A controls the magnitude of L xby p erio dically reducing and w More

maxsat maxsat

sophisticated adaptive mechanisms can b e develop ed to lead to b etter p erformance

Exp erimental Results

In our exp eriments weevaluate DLM using some b enchmarks and compare it with

previous results obtained by GRASP GRASP is a greedy randomized adaptive search

pro cedure that has b een shown to quickly pro duce go o dquali ty solutions for a wide variety

of combinatorial optimization problems including SAT and MAXSAT

The MAXSAT problems tested by GRASP were derived from DIMACS SAT b ench

mark jnh that have variables and b etween and clauses Some of them are

satisable while others are not Their weights are integers generated randomly b etween

and The b enchmark problems can b e obtained from

ftpnetlibattcommathpeoplemgcrdatamaxsattargz

Our DLM co de was written in C The exp eriments were run on a MHz Sun SparcSta

tion The co de was compiled using gcc with the O option

In our exp eriments we rst studied the eect of dynamic reduction of on L x

maxsat

We compared the results obtained by DLM with and without p erio dic reduction of For

each test problem we ran DLM with scaling for times from random initial p oints and

DLM without scaling b etween and times each for iterations We then to ok

the b est solution of the multiple runs as our nal solution

Table and show the results obtained by DLM with and without p erio dical

scaling When was not reduced p erio dically DLM found optimal solutions out of the

problems based on the b est results of runs When was reduced p erio dicallywe

tried several values of I the reduction interval and r the reduction ratio Using I of

and and r of and we found signicant improvement in solution

quality when was reduced p erio dically except for the case when I The b est

combination is when I and r inwhich more optimal solutions were found

than without reduction test problems were improved and no solution is worse Overall

we found optimal solutions in out of the test problems based on b est of runs

Table and compare the solutions obtained by DLM and by GRASP The results

of GRASP are from that were run for iterations on a MHz SGI Challenge

Table

Comparison of results of DLM and those of GRASP with resp ect to the optimal

P

w GRASP was run solutions jnh clauses

i

i

iterations on a MHz SGI Challenge with MIPS R whereas DLM with

scaling was run times each for iterations from random starting

p oints on a Sun SparcStation For DLM with dynamic scaling I

and r

Deviations from Optimal Solutions

Problem Best result of RunsofDLM Optimal

ID runs of DLM with scaling GRASP solution

without scaling

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

Table

Continuation of Table Comparison of results of DLM and those of GRASP

P

w with resp ect to the optimal solutions jnh clauses

i

i

P

jnh clauses w

i

i

Deviations from Optimal Solutions

Problem Best result of RunsofDLM Optimal

ID runs of DLM with scaling GRASP solution

without scaling

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

Table

Comparison of average CPU time in seconds for iterations of DLM and

GRASP

Problem Clauses GRASP DLM

jnh

jnh

jnh

with MIPS R In contrast DLM with scaling was run b etween and times from

random initial p oints using iterations each More runs are helpful in obtaining b etter

solutions and optimal solutions were found when DLM with scaling was run

and times from random initial p oints resp ectively

Our results show signicant improvementover GRASP in of the test problems

For the remaining four problems b oth DLM and GRASP found optimal solutions for three

of them and GRASP found a b etter solution in one It is imp ortant to note that although

the dierences b etween GRASPs results and the optimal solutions are relatively small

improvements in this range are the most dicult for any search algorithm

Table shows the average time in seconds for iterations of GRASP and DLM

Since GRASP was run on a faster machine one iteration of DLM is around three orders of

magnitude less than that of GRASP

GRASP found b etter results when the numb er of iterations was increased Using the ten

test problems rep orted in that were run for iterations of GRASPTable

compares the results of DLM and GRASP with longer runs It shows that DLM still found

b etter solutions in all the problems except the cases that b oth found optimal solutions

Increasing the numb er of iterations of DLM can also improve the b est solutions found

Based on the b est of runs DLM was able to nd out of optimal solutions when the

numb er of iterations is increased to while it is when the numb er of iterations is

To summarize wehave presented DLM for solving MAXSAT problems in this section

DLM b elongs to the class of incomplete metho ds that attempt to nd approximate solutions

Table

Comparison of solutions found by DLM and those found by GRASP with longer

iterations

Problem GRASP DLM Optimal

Id K K K iter Solutions

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

jnh

in a xed amount of time DLM improves over existing discrete lo cal and globalsearch

metho ds in the following ways

DLM can escap e from lo cal minima in a continuous tra jectory without restarts hence

avoiding a break in the tra jectory as in metho ds based on restarts This is advantageous

when the tra jectory is already in the vicinity of a lo cal minimum and a random restart

may bring the search to a completely dierent search space

DLM provides a mechanism to control the growth rate and the magnitude of Lagrange

multipliers which is essential in solving dicult MAXSAT problems

Using a large set of test problems wehave compared the p erformance of DLM with

that of GRASP one of the b est lo calsearch metho ds for solving MAXSAT problems Our

exp erimental results show that DLM solves these problems two orderofmagnitude faster

than GRASP and nds b etter solutions for a ma jority of test problems in just a few seconds

Designing Multiplierless QMF Filter Banks

In this section we apply discrete Lagrangian method DLM to solve an engineering ap

plication problem the design of multiplierless quadraturemirrorlter QMF lter banks

In multiplierless QMF lter banks lter co ecients are p owersoftwo PO where num

b ers are represented as sums or dierences of p owers of two also called canonical signed

digit representation or CSD Using this representation multiplications b ecome additions

subtractions and shifting

Based on a formulation similar to the realvalue QMF design problem presented in Chap

ter we formulate the design problem as a nonlinear discrete constrained optimization prob

lem using the reconstruction error as the ob jective and the other p erformance metrics as

constraints One of the ma jor advantages of this formulation is that it allows us to search

for designs that improve the b est existing designs with resp ect to all p erformance metrics

rather than designs that trade one p erformance metric for another Then we apply DLM to

solve this discrete optimization problem In our exp eriments we show that our metho d can

nd designs that improveover Johnstons b enchmark designs using a maximum of three to

six bits in each lter co ecient The p erformances of these designs are close to the b est

continuous solutions found byNovel in Chapter

Intro duction

Traditional FIR lters in QMF lter banks use real numb ers or xedp ointnumb ers

as lter co ecients Multiplications of such long oating p ointnumb ers generally limit

the sp eed of FIR ltering Toovercome such a limitation multiplierless powersoftwo or

PO lters have b een prop osed These lters use lter co ecients that have only a few

bits that are ones When multiplying a lter input multiplicand with one such co ecient

multiplier the pro duct can b e found by adding and shifting the multiplicand a number of

times corresp onding to the numb er of bits in the multiplier For example multiplying x

by can b e written as the sum of three terms x x x eachof

which can b e obtained by shifting x A limited sequence of shifts and adds are usually much

faster than full multiplications Without using full multiplications eachltertaptakes less

area to implement in VLSI and more lter taps can b e accommo dated in a given area to

implement lter banks of higher p erformance

The frequency resp onse of a PO lter H z is

n n d

X X X

i j i

A

H xz x z e z

i ij

i i j

P

d

where je jl for all i e n is the length of the PO lter the number of

ij ij

j

variables l is the maximum numb er of bits used in each co ecient and d is the number

of bits in each co ecient

The design of multiplierless lters has b een solved as integer programming problems

that represent lter co ecients as variables with restricted v alues of p owersoftwo Other

optimization techniques that have b een applied include combinatorial search metho ds

simulated annealing genetic algorithms linear programming and continuous

Lagrangemultipl ier metho ds in combination with a tree search

The design of QMF lter banks can b e formulated as a multiob jective unconstrained

optimization problem or as a singleob jective constrained optimization problem In this

thesis we take the approach similar to the design of realvalue QMF lter banks in Chapter

and formulate the design of a PO QMF lter bank in the most general form as a constrained

nonlinear optimization problem

min f xE x

r E

r

xD

sub ject to E x E x T x

p E s E t T

p s t

x x

p s

p s

where x is a vector of discrete co ecients in the discrete domain D and

E E T

p s p s t

are constraint b ounds found in the b est known design with p ossibly some b ounds relaxed or

tightened in order to obtain designs of dierent tradeos and is the baseline value of

E

r

reconstruction error found in the b est known design The goal here is to nd designs whose

p erformance measures are b etter than or equal to those of the reference design Since the

ob jective and constraints are nonlinear the problem is multimo dal with many lo cal minima

The inequality constrained optimization problem can b e transformed into an equal

ity constrained optimization problem as follows

E x

r

Minimize f x

E

r

E x

p

sub ject to g x max

E

p

E x

s

g x max

E

s

x

p

g x max

p

x

s

g x max

s

T x

t

g x max

T

t

where all the ob jective and constraints have b een normalized with resp ect to the corresp ond

ing values of the b est known design

DLM Implementation Issues

DLM was mainly implemented by Mr Zhe Wutosolve the design problem of multipli

erless lter banks The Lagrangian function of is

X

g x F x f x

i i

i

where i are Lagrange multipliers To apply DLM to solve we

i

rst dene the discrete gradient op erator as follows

F x

x x

where F x has the minimum value among all F y for y suciently close to x

x

is an op erator for changing one p oint in discrete space into another with value An example of is

the exclusiveOR op erator

Generate a starting p oint x

Set initial value of

while x is not a saddle p oint or stopping condition has not b een reached

up date x to x only if this will result in F x Fx

if condition for up dating is satised then

c g xc is real

i i i

end if

end while

Figure

An implementation of DLM for solving the design problem of multiplierless

lter banks

Figure shows our implementation of DLM for designing multiplierless lter banks

Mr Zhe Wu has addressed the following issues in implementing DLM

Generating a Starting Point

There are two alternatives to select a starting p oint Line in Figure cho osing an

existing PO QMF lter bank as a starting p oint or cho osing a discrete approximation of an

existing QMF lter bank with real co ecients The rst alterative is not p ossible b ecause

not manysuch lter banks are available in the literature In this section we discuss the

second alternative

Given a real co ecientandb the maxim um numb er of bits to represent the co ecient

we rst apply Bo oths algorithm to represent consecutive s using two bits and then

truncate the least signicant bits of the co ecients This approach generally allows more

than b bits to b e represented in b bits As an example consider a binary xed p oint

number After applying Bo oths algorithm and truncation we can represent

the numb er in bits



Bo oth sAlgorithm Truncation

Previous work shows that scaling has a signicant impact on the co ecient

optimization pro cess in PO lters In our case the p erformance of a PO lter obtained

Table

Comparison of a PO lter bank obtained by truncating the real co ecients of

Johnstons e QMF lter bank to bits and a similar PO lter bank

whose co ecients were scaled by b efore truncation Performance has

b een normalized with resp ect to the p erformance of the original lter bank

Performance Metrics E E E T

r p s p s t

Filter bank A with Truncated Co ecients

Filter bank B with Scaling and Truncation

by truncating its real co ecients to a xed maximum numb er of bits is not as go o d as one

whose real co ecients were rst multiplied by a scaling factor We illustrate this observation

in the following example

Consider Johnstons e lter bank as a starting p oint Table shows the

p erformance of two PO lters Filter A was obtained by truncating each of the original

co ecients to a maximum of bits and Filter B was obtained bymultiplying eachof

the co ecients by b efore truncation The example shows that Filter B p erforms

almost as go o d as the original design with real co ecients In fact a design that is b etter

than Johnstons e design can b e obtained by using Filter B as a starting p oint but no

b etter designs were found using Filter A as a starting p oint This example illustrates that

multiplying by a scaling factor changes the bit patterns in the lter co ecients whichcan

improve the quality of the starting p oint when the co ecients are truncated

Figure shows a simple but eective algorithm to nd the prop er scaling factor We

evaluate the quality of the resulting starting p ointbya weighted sum of the p erformance

metrics based on constraint violations Since under most circumstances the constrainton

transition bandwidth is more dicult to satisfywe assign it a weight and assign to

the other four constraints Our ob jective in nding a go o d scaling factor is dierent from

that in previous work Note that the lter output in the nal design will need

to b e divided by the same scaling factor

Exp erimental results show that the algorithm in Figure executes fast and the scaling

factors chosen are reasonable and suitable It is imp ortanttopoint out that scaling do es not

LeastSum

for scaleFactor to step

Multiply each lter co ecientby scaleFactor

Get the PO form of the scaled co ecients

Compute the weighted sum of constraint violation

P

sum w g x

i i

i

if sum LeastSum then

LeastSum sum

BestScale scaleFactor

endif

endfor

return BestScale

Figure

Algorithm for nding the b est scaling factor where w is the weightofcon

i

straint i

help when the numb er of bits allowed to representeach co ecient is large For instance

when the maximum numb er of bits allowed is larger than the p erformance of all the

lters is nearly the same for all scaling factors

Up dating x

The value of x is up dated in Line of DLM in Figure There are twoways in which

x can b e up dated greedy descent and hill climbing Hill climbing is more ecient and

generally leads to go o d PO designs Therefore we use hill climbing as our up date strategy

We pro cess all the bits of all co ecients in a roundrobin manner Supp ose n is the lter

length and l is maximum numb er of bits that can b e used for each co ecient Then the

ith co ecient is comp osed of l bits b b b We pro cess the bits in the following

i i il

order rep etitively

b b b b b b

l n nl

that diers from b by either the sign For each bit b we p erturb it to b e a new bit b

ij ij

ij

or the exp onent or b oth with the condition that b is not the same in sign and exp onent

ij

as another bit of the ith co ecient Using b b b b while keeping other

i ij in

ij

co ecients the same we compute the new Lagrangian value F x and accept the change

if F x Fx

Initializing and Up dating

The value of is initiali zed in Line of DLM in Figure To allow our exp eriments

to b e rep eated and our results b e repro duced easilywealways set to zero as our initial

p oint

Line of DLM in Figure is related to the condition when should b e up dated As we

knowinsolvingSAT and MAXSAT problems for violated constraints should b e up dated

less frequently in DLM than in traditional Lagrangian metho ds for continuous problems

Thus in our implementation weupdate every time three co ecients have b een pro cessed

Since is up dated b efore all the lter co ecients have b een p erturb ed the guidance provided

by in our implementation may not b e exact

When up dating b efore the search reaches a lo cal minimum of F x we set c in Line

of DLM to b e a normalized value as follows

speed

c

g x max

i

i

where is a real constantusedtocontrol the sp eed of increasing Exp erimentallywe

speed

have determined to b e

speed

When the search reaches a lo cal minimum a more ecient rule for up dating has b een

develop ed by Mr Zhe Wu to bring the search out of the lo cal minimum A prop er value of

c dierent from that in is calculated If is increased to o fast then the search will

restart from a random starting p oint On the other hand if the increase of is to o small

then the lo cal minimum is not changed and all surrounding p oints are still worse Hence

wewould liketosetc large enough so that some of its neighbor points will b e b etter This

means that after has b een changed to there exists x in the neighb orho o d of x such

that

F x F x and F x Fx

P

Replacing F xby f x g x in we get the following condition

i i

i

b efore changes

X X

F x f x g x F x f x g x

i i i i

i i

and that after is up dated to c g x

i i i

i

X

F x f x c g x g x

i i i

i

X

F x f x c g x g x

i i i

i

where g x is the new violation value of the ith constraintatx From and

i

we get

X

F x F x c g x g x g x

i i i i

i

F x F x

c

P

g x g x g x

i i i i

i

When c is large enough to satisfy for all x after up dating to there exists

some p oint the neighb orho o d of x that has smaller Lagrangian value

As an example consider in Figure the constraint violation of transition bandwidth

T in a typical search based on constraints derived from Johnstons e lter bank

t

Figure shows that the value of violation on T can b e extremely small in the order

t

of in the later part of the search For such small violation values the corresp onding

Lagrange multiplier increases very slow without a large cUsingc dened in to

T

t

increase we see in Figure that was increased signicantly three times when the

T T

t t 0.1 1000 900 0.01 800 700 0.001 600 500 0.0001 400 300 1e-05 200 Lagrange multiplier value 100

Normalized violation of transition band 1e-06 0 0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Iterations Iterations

Figure

The violation values of T during a search left and the corresp onding value of

t

right

T

t

condition for up dating was satised These saved at least half of the total search time for

the solution

Finallywe notice in Line of DLM that is nondecreasing This means that as the

search progresses the sgrow and more emphasis is placed on getting the search out of lo cal

minima This approach fails when the s are so large that the search is brought to a totally

new terrain when Line of DLM is applied To cop e with this problem we measure the

P

relativevalues b etween f xand g xandkeep them within a reasonable range

i i

i

P

g x

i i

i

thr eshold

f x

If the ratio exceeds then we divide all the sby a constant r to regain the balance

thr eshold

In our exp eriments weset to b e and r b e and check every time

thr eshold thr eshold

is up dated

Exp erimental Results

In this section we compare the p erformance of PO QMF lter banks designed by DLM

and those by Johnston Chen et al Novel simulated annealing and genetic algo

rithms 1.2 1.2 1.1 1 1 passband ripple 0.9 0.8 passband energy transition bandwidth 0.8 stopband ripple 0.6 stopband energy 0.7 reconstruction error 0.6 0.4 0.5 passband ripple

Normalized Performance Normalized Performance passband energy 0.4 transition bandwidth 0.2 stopband ripple 0.3 stopband energy reconstruction error 0 0.2 32 36 40 44 48 52 56 60 64 48 52 56 60 64

Filter Length Filter Length

Figure

Normalized p erformance with resp ect to Johnstons e left and e right

QMF lter banks for PO lters with a maximum of bits p er co ecient

and dierentnumb er of lter taps

There are two parameters in a PO lter bank design the maximum numb er of bits in

each lter co ecient and the numb er of lter taps In our exp eriments wehavevaried one

while keeping the other xed when evaluating a PO design with resp ect to a b enchmark

design Wehave used closedform integration to compute the values of p erformance metrics

In contrast Johnston used sampling in computing energy values Hence designs found

by Johnston are not necessarily at the lo cal minima in a continuous sense

Wehaveevaluated PO designs obtained by DLM with resp ect to Johnstons designs

whose co ecients are bit real numb ers Using the p erformance of Johnstons e design

as constraints we ran DLM from dierent starting p oints obtained by randomly

p erturbing of all the co ecients of Johnstons design Eachrunwas limited so that each

bit of the co ecientwas pro cessed in a round robin fashion times We then picked the

b est solution of the runs and plotted the result in Figure whichshows the normalized

p erformance of PO designs with increasing numb er of lter taps while each lter co ecient

has a maximum of bits The b est design is one with the minimum reconstruction error

if all the constraints are satised otherwise the one with the minimum violation is picked

Our results show a design with taps that is nearly as go o d as Johnstons e design For

lters with and taps we used a starting p oint derived from Johnstons e

design with lter co ecients rst scaled by and truncated to a maximum of bits 1.2 passband ripple 1.15 passband energy transition bandwidth 1.1 stopband ripple stopband energy 1.05 reconstruction error

1

0.95

0.9 Normalized Performance 0.85

0.8

0.75 3 4 5 6

Maximum Number of ONE bits in each Coefficient

Figure

Normalized p erformance with resp ect to Johnstons e QMF lter bank for

PO lters with taps and dierent maximum numb er of bits p er co ecient

and set the lter co ecients of the remaining taps to zero es initially The starting p oints

of lters with longer than taps were generated similarly except that a scaling factor of

was used instead Our results show that as the lter length is increased all the

p erformance metrics improve except the transition bandwidth which remains close to that

of the b enchmark design

With resp ect to Johnstons e design we set a limit so that each bit of the

co ecientwas pro cessed in a roundrobin fashion times and ran DLM once from the

truncated Johnstons e design The scaling factor was for lters with

and taps The scaling factor was for the lter with taps Our results show

that our tap PO design is slightly worse than that of Johnstons while our PO designs

with taps or larger have p erformance that are either the same or b etter than those of

Johnstons e design In particular the reconstruction error of our tap PO design is

of Johnstons e design while that of our tap PO design is only of Johnstons

e design

In the next set of exp eriments wekept the same numb er of taps as Johnstons e design

and increased the maximum numb er of bits in each co ecient from to We set a limit

so that each bit of the co ecientwas pro cessed in a roundrobin fashion times and

ran DLM once from the truncated Johnstons e design Figure shows a design that is

Table

Comparison of normalized p erformance of PO lter banks designed by DLM

with resp ect to those designed by Johnston and Chen Columns showthe

p erformance of DLM using bits for tap lters and bits for tap lters

normalized with resp ect to that of Johnstons e d and e lter banks

Columns show the p erformance of DLM using bits normalized with

resp ect to that of Chen et als tap and tap lter banks

Comparing with Johnstons Comparing with Chen et als

Metrics DLMe DLMd DLMe DLM DLM

E

r

E

p

E

s

p

s

T

t

b etter than Johnstons e design when the maximum numb er of bits p er co ecientis

In this case the reconstruction error is of Johnstons e design The scaling factors

used are for bits for bits for bits and for bits

With resp ect to Johnstons d and e designs Table shows improved PO designs

obtained by DLM using a maximum of bits p er co ecient and taps No improvements

were found when the maximum numb er of bits is less than

Table further shows improved PO designs obtained by DLM with resp ect to Chen et

als PO designs with bit co ecients with and taps resp ectively and a maximum

of bits p er co ecient In these designs we used Chen et als designs as starting p oints

and ran DLM once with a limit so that each bit was pro cessed in a roundrobin fashion

times

Finallywe compare in Table the p erformance of e PO lter banks obtained by

DLM with a maximum of bits p er co ecient and the p erformance of e realvalue lter

banks obtained byNovel simulated annealing SA and evolutionary algorithms EAs

in Chapter Novel and DLM nd solutions that is b etter than or equal to Johnstons

solution in all six p erformance metrics although DLMs solution has larger reconstruction

Table

Comparison of normalized p erformance of e PO QMF lter banks designed

by DLM with resp ect to those designed byNovel simulated annealing SA

and genetic algorithms EACt and EAWt

Metrics DLM Novel SA EACt EAWt

E

r

E e

p

E e

s

e

p

s

T

t

error than Novels SA and EAWt result in larger transition bandwidths than Johnstons

EACt converges to designs with very small reconstruction errors while other constraints

are violated signicantly Considering the fact that the DLM design uses a maximum of

two additions in each tap rather than a complex carry save adder or a bit multiplier the

design obtained by DLM has muchlower implementation cost and can achieve faster sp eed

To summarize wehave applied DLM to design multiplierless p owersoftwo PO QMF

lter banks Our design metho d is unique b ecause it starts from a constrained formulation

with the ob jective of nding a design that improves over the b enchmark design In contrast

existing metho ds for designing PO lter banks can only obtain designs with dierent trade

os among the p erformance metrics and cannot guarantee that the nal design is always

b etter than the b enchmark design with resp ect to all the p erformance metrics

DLM eectively nds high quality lterbank designs with very few bits for each lter

co ecient allowing the design to b e implemented in a fraction of the cost For example

a design with a maximum of bits p er co ecient can b e implemented in two additions

in each tap In contrast a full bit multiplier is close to one order of magnitude more

exp ensive in terms of hardware cost and computational delay

Summary

In this chapter wehave applied our new discrete Lagrangian metho d DLM to solve

discrete optimization problems that include the satisabilitySAT the maximum satisa

bility MAXSAT and the multiplierless QMF lterbank design problems These problems

have all b een formulated as discrete constrained optimization problems

We extend the theory of Lagrange multipliers for continuous problems to discrete prob

lems With resp ect to problems in discrete space we dene the concept of saddle p oints

derive the Saddle Point Theorem prop ose metho ds to compute discrete gradients and in

vestigate various heuristics in implementing DLM Weshow the Fixed Point theorem which

guarantees that the algorithm will continue to searchuntil a saddle p oint is found DLM

b elongs to the class of incomplete metho ds that attempts to nd a saddle p oint if one exists

but will not terminate if the problem is infeasible

In our exp eriments wehave compared the p erformance of DLM with resp ect to some

of the b est existing results in all three applications Using a large set of SAT b enchmark

problems archived in DIMACS wehave shown that DLM usually solves these problems

substantially faster than other comp eting lo calsearch and globalsearch metho ds and can

solve some dicult problems that other metho ds cannot In solving MAXSAT problems

wehave compared the p erformance of DLM with that of GRASP one of the b est metho ds

for solving MAXSAT Our exp erimental results show that DLM solves these problems two

ordersofmagnitude faster and found b etter solutions for a ma jorityoftestproblemsina

few seconds Finally in designing multiplierless QMF lter banks our exp erimental results

show that DLM nds b etter designs than existing metho ds Comparing with realvalue lter

banks the multiplierless lter banks designed byDLMha ve little p erformance degradation

and with very few bits for each lter co ecient allowing the design to b e implemented in

a fraction of the cost and execute much faster

To summarize DLM is a generalization of lo cal searchschemes that optimize the ob jec

tive alone or optimize the constraints alone When the search reaches a lo cal minimum the

Lagrange multipliers in DLM lead the search out of the lo cal minimum and moveitinthe

direction provided by the multipliers DLM also uses the value of an ob jective function the

numb er of violated constraints in the case of SAT problems to provide further guidance in

addition to the constraint violations The dynamic shift in emphasis b etween the ob jective

and the constraints dep ending on their relativevalues is the key of Lagrangian metho ds

DLM improves over existing discrete lo cal and globalsearch metho ds in the following as

p ects

DLM can escap e from lo cal minima without random restarts When a constraintis

violated but the searchisinalocalminimum the corresp onding Lagrange multipliers

in DLM provide a force that grows with the amount of time that the constraintis

violated eventually bringing the search out of the lo cal minimum DLM may rely on

systematic restarts that are not based on random numb ers in the Lagrangian space

when the Lagrange multipliers grow large and the search space b ecomes rugged

DLM escap es from lo cal minima in a continuous tra jectory hence avoiding a break

in the tra jectory as in metho ds based on restarts This is advantageous when the

tra jectory is already in the vicinity of a lo cal minimum and a random restart may

bring the search to a completely dierent search space

DLM is more robust than other lo calsearch metho ds and is able to nd feasible so

lutions irresp ective of its initial starting p oints In contrast descent metho ds haveto

rely on prop erly chosen initial p oints and on a go o d sampling pro cedure to nd new

starting p oints or by adding noise in order to bring the search out of lo cal minima

In short the Lagrangian formulation and our discrete Lagrangian metho ds are based on

a solid theoretical foundation and can b e used to develop b etter heuristic algorithms for

solving discrete constrained optimization problems

CONCLUSIONS AND FUTURE WORK

Summary of Work Accomplished

Many imp ortant engineering and industrial applications can b e formulated as nonlinear

constrained or unconstrained optimization problems Improved solutions to these problems

can lead to signicantsavings In this thesis wehave designed ecient metho ds to handle

nonlinear constraints and to overcome lo cal minima Using these metho ds wehaveimproved

existing nonlinear optimization metho ds in two forms nding b etter solutions at the same

cost or nding solutions of similar quality at less cost

To summarize wehave made the following contributions in this thesis

Wehave develop ed a new Lagrangian metho d with dynamic weight adaptation to

handle equality and inequality constraints in nonlinear optimization problems and to

improvetheconvergence sp eed and solution quality of traditional Lagrangian metho ds

Chapter

Wehave extended the traditional Lagrangian theory for the continuous space to the

discrete space and havedevelop ed ecient discrete Lagrangian metho ds Our discrete

Lagrangian theory provides the mathematical foundation of discrete Lagrangian meth

o ds that do es not exist in existing metho ds for handling nonlinear discrete constraints

Chapter

Wehave develop ed an innovative tracebased globalsearch metho d to overcome lo cal

minima in a nonlinear function space The tracebased global search relies on an exter

nal traveling trace to pull a search tra jectory out of a lo cal optimum in a continuous

fashion without having to restart it from a new starting p oint Go o d starting p oints

identied in the global search are used in the lo cal search to identify true lo cal optima

Chapter

Wehave develop ed a prototyp e called Novel Nonlinear Optimization Via External

Lead that incorp orates our new metho ds and solves nonlinear constrained and un

constrained problems in a unied framework In Novel constrained problems are rst

transformed into Lagrangian functions Then the nonlinear functions are searched for

optimal solutions by a combination of global and lo calsearch metho ds Chapter

Wehave applied Novel to solve neuralnetwork design problems and have obtained

signicant improvements in learning feedforward neural networks Articial neural

networks have many applications but existing design metho ds are far from optimal

Weformulate the neuralnetwork learning problem as an unconstrained nonlinear op

timization problem and solve it using Novel Novel has designed neural networks with

higher quality than existing metho ds or have found much smaller neural networks with

similar quality Chapter

Wehave applied Novel to design digital lter banks and have obtained improved so

lutions Digital lter banks have b een used in many digital signal pro cessing and

communication applications Weformulate the design of lter banks as a constrained

nonlinear optimization problem and solve it using Novel Novel has found b etter de

signs than the b est existing solutions across all the p erformance metrics Chapter

Wehave applied Novel to solve discrete problems including the satisability problem

SAT the maximum satisability problem MAXSAT and the design of multipli

erless lter banks Many problems in computer science management and decision

science can b e formulated as discrete optimization problems By formulating these

problem SAT MAXSAT and the design of multiplierless lter banks into discrete

constrained optimization problems Novel particularly discrete Lagrangian metho d

DLM has obtained signicantimprovements over the b est existing metho ds Chap

ter

Future Work

In this section we present some p ossible avenues for research in the future

Development of Tracebased Global Search for Solving Discrete Optimization problems

We plan to extend the idea of our tracebased global search to solve discrete prob

lems Nonlinear discrete optimization problems havemany lo cal minima that trap

lo cal search metho ds A tracebased global search can help the search escap e from lo

cal minima and avoid exp ensive lo cal search in unpromising regions We plan to study

dierent forms of trace functions identify their eectiveness and investigate various

implementation issues

Extension to Mixedinteger Programming Problems Wehavedevelop ed metho ds that

solvecontinuous and discrete problems resp ectivelyMany application problems are

mixedinteger programming problems whichhave mixed variable domains some con

tinuous and some discrete We plan to extend our metho ds to solve these problems

Applications of Novel There are many applications that can b e formulated as nonlin

ear optimization problems The merits of our metho d lie in its ability to nd b etter

solutions for realworld applications Dierent applications ha ve dierentcharacter

istics requirements and implementation considerations The theoretical work and

exp erimental results presented in this thesis provide a solid foundation for applying

Novel to new applications

APPENDIX A EXPERIMENTAL RESULTS OF NOVEL ON SOME TEST FUNCTIONS

In this app endix we demonstrate Novels capability of nding go o d solutions by solving

some multidimensional nonlinear optimization problems We rst describ e these problems

which include widely used multimo dal simpleb ounded test functions with known optimal

solutions Then we rep ort the exp erimental results of Novel and compare the results

with those obtained bymultistarts of gradient descent

Six Hump Camel BackC

x x x x x f x x x

C

Domain

x i

i

There are global minima at

f

x

There are lo cal minima in the minimization region

Branin BR

f x x x x cos x

BR

Domain

x x

There are global minima at

f

x

GoldsteinPrice GP

x x x x f x x x x

GP

x x x x x x x x

Domain

x i

i

The global minimum is

f

x

There are lo cal minima in the minimization region

Shekel S

X

f

S

T

x a x a c

i i i

i

Domain

x i

i

Co ecients

i a c i a c

i i i i

The global minimum is

f

x

There are lo cal minima with f c approximately at a

i i

Shekel S

X

f x

S

T

x a x a c

i i i

i

Domain

x i

i

Co ecients

i a c i a c

i i i i

The global minimum is

f

x

There are lo cal minima with f c approximately at a

i i

Shekel S

X

f x

S

T

x a x a c

i i i

i

Domain

x i

i

Co ecients

i a c i a c

i i i i

The global minimum is

f

x

There are lo cal minima with f c approximately at a

i i

Rastrigin R

f xx x cos x cos x

R

Domain

x i

i

The global minimum is

f

x

There are approximately lo cal minima in the minimization region

Griewank G

x x x

p

f x cos x cos

G

Domain

x i

i

The global minimum is

f

x

There are approximately lo cal minima in the minimization region

Griewank G

X Y

x x

i

i

p

f x cos

G

i

i i

Domain

x i

i

The global minimum is

f

x

There are several thousand lo cal minima in the minimization region

Hartman H

P



X

x p

ij j ij

j 

f x c e

H i

i

Domain

x i

i

Co ecients

i c p

i i i

The global minimum is

f

x

Hartman H

P



X

x p

ij j ij

j 

c e f x

i H

i

Domain

x i

i

Co ecients

i c p

i i i

The global minimum is

f

x

Table A A set of multimo dal test functions

Id Name Dimension Global minima Lo cal minima

Six Hump Camel Back

Branin

GoldsteinPrice

Shekel

Shekel

Shekel

Rastrigin

Griewank

Griewank

Hartman NA

Hartman NA

Table A lists the summary of these test functions Most of them are lowdimensional

problems with dimensions ranging from to Problems to only have a few lo cal

minima while Problems and have manyThenumb er of lo cal minima of problems

and are unknown

First weusemultistarts of gradient descenttosolve these problems The results of

multistarts reveal the degree of diculty in nding global optima For each problem we

run gradient descent times from randomly generated initial p oints Table A shows the

numb er of runs that nd the optimal solutions If the optimal solution is not found in

runs the b est solution obtained is shown in the parenthesis A large numb er of success

implies that the global optimum is easy to nd by gradient descents from random initial

p oints Table A shows that multistarts nd the optimal solutions of Problems through

and The optimal solutions of Problems and are not found

A simple oneglobalstag e Novel is applied to solve these test functions The tracebased

globalsearch phase of Novel has one stage which is run for time units Initial p oints

for the lo calsearch phase which uses gradient descent are selected based on the tra jectory

generated by the globalsearch stage One initial p oint is selected in every one time unit of

global search From these initial p oints a total of gradient descents are p erformed For

each problem the range of each dimension is mapp ed to in whichNovel p erforms the

Table A

Exp erimental results of multistarts of gradient descent on the test functions

The numb er of runs that nd the optimal solutions are shown in the table out

of runs If the optimal solution is not found the b est solution obtained is

shown in the parenthesis

Problem Id

No of Success

Table A

Exp erimental results of Novel on the test functions The index of the rst lo cal

descentofNovel that nds the global minimum is shown in the table out of

descents

Problem Id

Index of First Success

global search Novel always starts from the origin if it is not a global optimum Otherwise

Novel starts from in each dimension

The execution times of Novel and multistarts are comparable In Novel the execution

time of globalsearch phase is only a small fraction of the time sp ent in gradient descents

One gradient descentinNovel takes similar amount of time to one descentinmultistarts

Hence since b oth Novel and multistarts p erform the same numberofdescents the execution

time of Novel is slightly longer

Table A shows the index of the rst lo cal descentofNovel that nds the global optimal

solution For example it takes one descentforNovel to nd the optimal solution of Problem

and it takes descents to nd the optimal solution of Problem For Problems and

the optimal solutions are not found byNovel The b est solutions obtained are shown in the

parenthesis

To summarize Novel nds b etter solutions for the dicult problems than multistarts

in comparable amount of time Novel nds the optimal solutions of Problem while multi

starts do not For Problems and Novel nds b etter solutions than multistarts although

b oth do not nd the optimal solutions Both Novel and multistarts of gradient descent nd

optimal solutions for the easy problems Problems to and Problems and

REFERENCES

E Aarts and J Korst SimulatedAnnealing and Boltzmann Machines J Wiley and

Sons

E H L Aarts and PJvan Laarhoven SimulatedAnnealing Theory and Practice

WileyNewYork

J Abadie and J Carp entier Generalization of the wolfe reduced gradient metho d to

the case of nonlinear constraints In R Fletcher editor Optimization pages

Academic Press London

A N Akansu and R A Haddad Multiresolution Signal Decomposition Academic

Press Inc San Diego CA

N Anderson and G Walsh A graphical metho d for a class of Branin tra jectories

Journal of Optimization Theory and Applications June

L Armijo Minimization of functions having continuous partial derivatives Pacic J

Math

K J Arrow and L Hurwicz Gradient metho d for concave programming I Lo cal

results In K J Arrow L Hurwica and H Uzawa editors Studies in Linear and

Nonlinear Programming Stanford University Press Stanford CA

M Avriel Nonlinear Programming Analysis and Methods PrenticeHall

M R AzimiSadjadi and RJ Liou Fast learning pro cess of multilayer neural

networks using recursive least squares IEEE Transactions on Signal Processing

February

N Baba Y Mogami M Kohzaki Y Shiraishi and Y Yoshida A hybrid algorithm for

nding the global minimum of error function of neural networks and its applications

Neural Networks

E Balas and A Ho Set covering algorithms using cutting planes heuristics and

subgradient optimization a computational study Mathematical Programming Study

E Balas and PToth Branch and b ound metho ds In E L Lawler J K Lenstra

A H G Rinno oy Kan and D B Shmoys editors The Traveling Salesman Problem

pages Wiley New York NY

P Baldi Gradient descent learning algorithm overview a general dynamical systems

p ersp ective IEEE Transactions on Neural Networks January

W Baritompa Accelerations for a variety of global optimization metho ds Journal of

Global Optimization

W Baritompa and A Cutler Accelerations for global optimization covering metho ds

using second derivatives Journal of Global Optimization

E Barnard Optimization for training neural nets IEEE Transactions on Neural

Networks

R Battiti First and secondorder metho ds for learning Between steep est descent

and Newtons metho d Neural Computation

R Battiti The reactive tabu search ORSA J Computing

R Battiti and G Tecchiolli Parallel biased search for combinatorial optimization

Genetic algorithms and TABU Microprocessors and Microsystems

R Battiti and G Tecchiolli T raining neural nets with the reactive tabu search IEEE

Transactions on Neural Networks Septemb er

Benchmarks on nonlinear mixed optimization problems available through ftp

ftpelibzibb erlindepubmptestdata i pi ndexhtml

H S M Beigi and C J Li Learning algorithms for neural networks based on quasi

Newton with selfscaling Intel ligent Control Systems

A Bellen Parallelism across the steps for dierence and dierential equations In

C W Gear A Bellen and E Russo editors Numerical Methods for Ordinary Dier

ential Equations Lecture Notes in Mathematics pages Springer Berlin

B Betro and F Scho en Sequential stopping rules for the multistart algorithm in

global optimization Mathematical Programming

B Betro and F Scho en Optimal and suboptimal stopping rules for the multistart

algorithm in global optimization Mathematical Programming

M Bianchini M Gori and M Maggini On the problem of lo cal minima in recurrent

neural networks IEEE Transactions on Neural Networks March

L G Birta and O Ab ouRabia Parallel blo ck predictorcorrector metho ds for ODEs

IEEE Trans on Computers C

C G E Bo ender and A H G Rinno oyKanBayesian stopping rules for multistart

global optimization metho ds Mathematical Programming

C G E Bo ender and A H G Rinno oyKan On when to stop sampling for the

maximum Journal of Global Optimization

C G E Bo ender A H G Rinno oy Kan L Stougie and G T Timmer A sto chastic

metho d for global optimization Mathematical Programming

P T Boggs and J W Tolle Sequential quadratic programming Acta Numerica

I O Bohachevsky M E Johnson and M L Stein Generalized simulated annealing

for function optimization Technometrics

A D Bo oth A signed binary multiplication technique Quart J Mech Appl Math

C G Broyden Quasinewton metho ds In W Murray editor Numerical Methods for

Unconstrained Optimization pages Academic Press New York

R Brunelli Training neural nets through sto chastic minimization Neural Networks

K Burrage The search for the holy grail or Predictorcorrector metho ds for solving

ODEIVPs Appl Numer Math

R H Byrd T Derb yEEskow K P B Oldenkamp R B Schnab el and C Tri

antallou Parallel global optimization metho ds for molecular conguration problems

In Proc Sixth SIAM Conf of Paral lel Processing for Scientic Computation pages

Philadelphia

R H Byrd C L Dert A H G Rinno oy Kan and R B Schnab el Concurrent

sto chastic metho ds for global optimization Mathematical Programming

R H Byrd E Eskow R B Schnab el and S L Smith Parallel global optimization

numerical metho ds dynamic scheduling metho ds and applications to molecular con

guration In B Ford and A Fincham editors Paral lel Computation pages

Oxford University Press

R A Caruana and B J Coey Searching for optimal FIR multiplierless digital lters

with simulated annealing Technical report Philips Laboratories

V Cerny Thermo dynamical approach to the traveling salesman problem An ecient

simulation algorithm Journal of Optimization Theory and Applications

B C Cetin J Barb en and J W Burdick Terminal rep eller unconstrained sub energy

tunneling TRUST for fast global optimization Journal of Optimization Theory and

Applications April

B C Cetin J W Burdick and J Barhen Global descent replaces gradient descent

to avoid lo cal minima problem in learning with articial neural networks In IEEE

Intl Conf on Neural Networks

J Chakrapani and J SkorinKap ov Massively parallel tabu search for the quadratic

assignment problem Annals of Operation Research

Y Chang and B W Wah Lagrangia n techniques for solving a class of zeroone integer

linear programs In Proc Computer Software and Applications Conf

D Chazan and W L Miranker A nongradient and parallel algorithm for uncon

strained minimization SIAM Journal on control

CK Chen and JH Lee Design of quadrature mirror lters with linear phase in

the frequency domain IEEE Trans on Circuits and Systems II

September

CK Chen and JH Lee Design of linearphase quadrature mirror lters with p o wers

oftwo co ecients IEEE Trans Circuits Syst

O T Chen and B J Sheu Optimization schemes for neural network training In

Proc IEEE Intl Conf on Neural Networks pages Orlando

A Cicho cki and R Unb ehauen Switchedcapacitor articial neural networks for non

linear optimization with constraints In Proc of IEEE Intl Symposium on Cir

cuits and Systems pages

S A Co ok The complexity of theoremproving pro cedures In Proc of rdAnnual

ACM Symposium on Theory of Computing pages

S A Co ok An overview of computational complexity Comm of the ACM

June

A Corana M Marchesi C Martini and S Ridella Minimizing multimo dal functions

of continuous variables with the simulated annealing algorithm ACM Transactions

on Mathematical Software

C D Creusere and S K Mitra A simple methor for designing highquality prototyp e

lters for mband pseudo QMF banks IEEE Trans on Signal Processing

April

M R Crisci B Paternoster and E Russo Fully parallel RungeKuttaNystrom

metho ds for ODEs with oscillating solutions Appl Numer Math

A Croisier D Esteban and C Galand Perfect channel splitting by use of interp ola

tiondecimationtree decomp osition techniques In Proc of Intl Conf on Information

Science and Systems pages

H Crowder E L Johnson and M W Padb erg Solving largescale zeroone linear

programming problems Operations Research

E D Dahl Accelerated learning using the generalized delta rule In Proc Intl Conf

on Neural Networksvolume I I pages IEEE

G B Dantzig Linear Programming and Extensions Princeton University Press

A Davenp ort E Tsang C Wang and K Zhu Genet A connectionist architecture

for solving constraint satisfaction problems by iterative improvement In Proc of the

th National Conf on Articial Intel ligen ce pages Seattle WA

M Davis and H Putnam A computing pro cedure for quantication theory J Assoc

Comput Mach

J E Dennis and J J More Quasinewton metho ds motivation and theory SIAM

Rev

J E Dennis and R B Schnab el Numeric al methods for unconstrained optimization

and nonlinear equations PrenticeHall Englewo o d Clis NJ

J E Dennis and R B Schnab el Numerical Methods for Unconstrained Optimization

and Nonlinear Equations Prentice Hall

I Diener and R Schaback An extended continuous Newton metho d Journal of

Optimization Theory and Applications Octob er

Dimacs sat b enchmarks available through ftp

ftpdimacsrutgersedupubchallengesati sabi l ityb enchmarkscnf

L C W Dixon Neural networks and unconstrained optimization In E Sp edicato

editor Algorithms for Continuous Optimization The State of the Art pages

Kluwer Academic Publishers Netherlands

J Engel Teaching feedforward neural networks bysimulating annealing Complex

System

M H Er and C K Siew Design of FIR lters using quadrature programming ap

proach IEEE Trans on Circuits and Systems II March

S Ergezinger and E Thomsen An accelerated learning algorithm for multilayer

p erceptrons optimization layer bylayer IEEE Transactions on Neural Networks

January

D Esteban and C Galand Application of quadrature mirror lters to split band

voice co ding schemes In Proc of IEEE Intl Conf on Acoustics Speech and Signal

Processing pages

Y G Evtushenko M A Potap ov and V V Korotkich Numerical metho ds for global

optimization In C A Floudas and PMPardalos editors Recent Advances in Global

Optimization pages Princeton University Press

D K Fadeev and V N Fadeev Computational Methods of Linear AlgebraFreeman

S E Fahlman Fasterlearning variations on backpropagation An empirical studyIn

D TouretzkyGEHinton and T J Sejnowski editors Proc Connectionist Models

Summer School pages Palo Alto CA Morgan Kaufmann

SEFahlman and C Lebiere The cascadecorrelation learning architecture In D S

T ouretzky editor Advances in Neural Information Processing Systems pages

Morgan Kaufmann San Mateo CA

T Feo and M Resende A probabilistic heuristic for a computationally dicult set

covering problem Operations Research Letters

T Feo and M Resende Greedy randomized adaptive search pro cedures Technical

rep ort ATT Bell Lab oratories Murray Hill NJ February

H Fischer and K Ritter An asynchronous parallel Newton metho d Mathematical

Programming

R Fletcher Pratical Methods of Optimization John Wiley and Sons New York

R Fletcher and MJD Powell A rapidly convergent descent metho d for minimization

Computer Journal

N J Fliege Multirate Digital Signal Processing John Wiley and Sons

C A Floudas and PMPardalos ACollection of Test Problems for Constrained

Global Optimization Algorithmsvolume of Lecture Notes in Computer Science

SpringerVerlag

C A Floudas and PMPardalos editors Recent Advances in GLobal Optimization

Princeton University Press

C A Floudas and V Visweswaran Quadratic optimization In Reiner Horst and PM

Pardalos editors Handbook of Global Optimization pages Kluwer Academic

Publishers

D B Fogel An intro duction to simulated evolutionary optimization IEEE Trans

Neural Networks January

M R Garey and D S Johnson Computers and IntractabilityFreeman San Fran

cisco CA

B Gavish and S L Hantler An algorithm for optimal route selection in SNA networks

IEEE Transactions on Communications pages

R P Ge and Y F Qin A class of lled functions for nding global minimizers

of a function of several variables Journal of Optimization Theory and Applications

M R Genesereth and N J Nilsson Logical Foundation of Articial Intel ligen ce

Morgan Kaufmann

I Gent and T Walsh Towards an understanding of hillclimbing pro cedures for SAT

In Proc of the th National Conf on Articial Intel ligen ce pages Washington

DC

A M Georion Lagrangian relaxation and its uses in integer programming Mathe

matic al Programming Study

F Glover Tabu search Part I ORSA J Computing

M X Go emans and D P Williamson Improved approximation algorithms for maxi

mum cut and satisabili ty problems using semidenite programming J Assoc Com

put Mach

W L Goe G D Ferrier and J Rogers Global optimization of statistical functions

with simulated annealing Journal of Econometrics

D E Goldb erg Genetic Algorithms in Search Optimization and Machine Learning

AddisonWesley Pub Co

R P Gorman and T J Sejnowski Analysis of hidden units in a layered network

trained to classify sonar targets Neural Networks

D Gorse and A Shepherd Adding sto chastic search to conjugate gradient algorithms

In Proc of the rd Intl Conf on Paral lel Applications in Statistics and Economics

Prague TifkarenfkeZacody

S M Le Grand and Jr K M Merz The application of the genetic algorithm to the

minimization of p otential energy functions Journal of Global Optimization

A O Griewank Generalized descent for global optimization Journal of Optimization

Theory and Applications

J Gu Paral lel Algorithms and Architectures for Very Fast AI Search PhD thesis

Dept of Computer Science University of Utah August

J Gu How to solvevery largescale satisabili ty VLSS problems Technical Rep ort

UCECETR Univ of Calgary Canada Octob er

J Gu Ecient lo cal searchforvery largescale satisability problems SIGART

Bul letin January

J Gu On optimizing a search problem In N G Bourbakis editor Articial Intel li

gence Methods and ApplicationsWorld Scientic Publishers

J Gu The UniSAT problem mo dels app endix IEEE Trans on Pattern Analysis

and Machine Intel ligence Aug

J Gu Lo cal search for satisabili tySAT problems IEEE Trans on Systems Man

and Cybernetics

J Gu Global optimization for satisabili tySAT problems IEEE Trans on Know l

edge and Data Engineering Jun

J Gu ConstraintBasedSearch Cambridge Universit y Press New York to app ear

J Gu and QP Gu Average time complexities of several lo cal search algorithms

for the satisabili ty problem sat Technical Rep ort UCECETR Univ of

Calgary Canada

J Gu and QPGu Average time complexity of the sat algorithm Technical

rep ort Tech Rep Univ of Calgary Canada

J Gu and W Wang A novel discrete relaxation architecture IEEE Trans on Pattern

Analysis and Machine Intel ligence August

E R Hansen Global optimization using interval analysis M Dekker New York

P Hansen and R Jaumard Algorithms for the maximum satisability problem Com

puting

S J Hanson and L Y Pratt Comparing biases for minimal network construction with

backpropagation In D Z Anderson editor Proc Neural Information Processing

Systems pages New York American Inst of Physics

S A Harp T Samad and A Guha Designing applicationsp ecic neural network us

ing the genetic algorithm In D S Touretzkyeditor Advances in Neural Information

Processing Systems pages Morgan Kaufmann San Mateo CA

L He and E Polak Multistart metho d with estimation scheme for global satisfying

problems Journal of Global Optimization

M Held and R M Karp The traveling salesman problem and minimum spanning

trees Part I I Mathematical Programming

M R Hestenes Conjugate Direction Methods in Optimization SpringerVerlag

D Hilb ert Ueb er stetige abildung einer linie auf ein achenstuck Mathematische

Annalen

A C Hindmarsh ODEPACK a systematized collection of o de solvers In R S Steple

man et al editor scientic computing pages northholland amsterdam

A C Hindmarsh ODEPACK a systematized collection of ODE solvers In R S

Stepleman editor Scientic Computing pages North Holland Amsterdam

J H Holland Adaption in Natural and Adaptive Systems University of Michigan

Press Ann Arb or

J N Ho oker Resolution vs cutting plane solution of inference problems some com

putational results Oper ations Research Letters

B R Horng and A N Willon Jr Lagrange multiplier approaches to the design

of twochannel p erfectreconstruction linearphase FIR lter banks IEEE Trans on

Signal Processing February

R Horst A general class of branchandb ound metho ds in global optimization with

some new approaches for concave minimization Journal of Optimization Theory and

Applications

R Horst and H Tuy Global optimization Deterministic approaches SpringerVerlag

Berlin

Reiner Horst and PMPardalos editors Handbook of Global Optimization Kluwer

Academic Publishers

E C Housos and O Wing Parallel nonlinear minimization by conjugate gradients

In Proc Intl Conf on Paral lel Processing pages Los Alamitos CA

J Hwang S Lay M Maechler R D Martin and J Schimert Regression mo deling

in backpropagation and pro jection pursuit learning IEEE Transactions on Neural

Networks May

L Ingb er Adaptive SimulatedAnnealing ASA Lester Ingb er Research

R A Jacobs Increased rates of convergence through learning rate adaptation Neural

Networks

A K Jain and J Mao Articial neural networks A tutorial IEEE Computer

March

Y Jiang H Kautz and B Selman Solving problems with hard and soft constraints

using a sto chastic algorithm for MAXSAT In Proc of st Intl Joint Workshop on

Articial Intel ligen ce and Operations Research Timb erline Oregon

J D Johnston A lter family designed for use in quadrature mirror lter banks In

Proc of Intl Conf on ASSP pages

J D Johnston A lter family designed for use in quadrature mirror lter banks Proc

of Intl Conf on ASSP pages

A E W Jones and G W Forb es An adaptivesimulated annealing algorithm for

global optimization over continuous variables Journal of Optimization Theory and

Applications

S Joy J Mitchell and B Borc hers A branch and cut algorithm for MAXSAT and

weighted MAXSAT In Satisability Problem Theory and Applications DIMACS

Series on Discrete Mathematics and Theoretical Computer Science American Mathe

matical So ciety to app ear

J E Dennis Jr and V Torczon Direct search metho ds on parallel computers SIAM

Journal on Optimization

A P Kamath N K Karmarkar K G Ramakrishnan and M G C Resende Com

putational exp erience with an interior p oint algorithm on the satisability problem

Annals of Operations Research

A P Kamath N K Karmarkar K G Ramakrishnan and M G C Resende A

continuous approach to inductive inference Mathematical Programming

A H G Rinno oy Kan and G T Timmer Sto chastic metho ds for global optimization

American Journal of Mathematical and Management Sciences

R M Karp The probabilisti c analysis of some combinatorial search algorithms In

J F Traub editor Algorithms and Complexity New Direction and Recent Result

pages Academic Press New York NY

R M Karp Reducibility among combinatorial problems In RE Miller and J W

Thatcher editors Complexity of Computer Computations pages Plenum

Press NY

J A Kinsella Comparison and evaluation of variants of the conjugate gradientmeth

o ds for ecient learning in feedforward neural networks with backward error propa

gation Neural Networks

S Kirkpatrick Jr C D Gelatt and M PVecchi Optimization bysimulated anneal

ing Science

J Klincewicz Avoiding lo cal optima in the phub lo cation problem using tabu search

and GRASP Annals of Operations Research

Donald E Knuth The Art of Computer Programming Volume II AddisonWesley

Reading MA

D Ko dek and K Steiglitz Comparison of optimal and lo cal search metho ds for de

signing nite wordlength FIR digital lters IEEE Trans Circuits Syst

R D Koilpillai and PPVaidyanathan A sp ectral factorization aparoach to p esudo

QMF design IEEE Trans on Signal Processing January

R E Korf Linearspace b estrst search Articial Intel ligence

V Kumar and L Kanal The CDP A unifying formulation for heuristic search dy

namic programming and branchandb ound Search in Articial Intel ligen ceeds L

Kanal V Kumar pages

PJvan Laarhoven Parallel variable metric metho ds for unconstrained optimization

Mathematical Programming

R W Beckerand G W Lago A global optimization algorithm In Proc of the th

Al lerton Conf on Circuits and Systems Theory pages Monticello Illinois

L S Lasdon A D Warren A Jain and M Ratner Design and testing a generalized

reduced gradient co de for nonlinear programming ACM Trans Math Software

E L Lawler and D W Wood Branch and b ound metho ds A survey Operations

Research

Y Le Cun A theoretical framework for backpropagation In D TouretzkyGE

Hinton and T J Sejnowski editors Proc Connectionist Models Summer School

pages Palo Alto CA Morgan Kaufmann

A V LevyAMontalvo S Gomez and A Calderon Topics in Global Optimization

Lecture Notes in Mathematics No SpringerVerlag New York

Y C Lim and S R Parker FIR lter design over a discrete p oweroftwo co ecient

space IEEE Trans Acoust Speech Signal Processing ASSP June

F A Lo otsma Stateoftheart in parallel unconstrained optimization In M Feilmeir

E Joub ert and V Schendel editors Paral lel Computing pages North

Holland Amsterdam Holland

S Lucidi and M Piccioni Random tunneling by means of acceptancerejection

sampling for global optimization Journal of Optimization Theory and Applications

D G Luenb erger Introduction to Linear and Nonlinear Programming Wiley

D G Luenb erger Linear and Nonlinear Programming AddisonWesley Publishing

Company

F Maoli The complexityofcombinatorial optimization algorithms and the c hallenge

of heuristics Combinatorial Optimization pages

W S McCullo ch and W Pitts A logical calculus of the ideas immanentinnervous

activity Bul letin of Mathematical Biophysics

Z Michalewicz Genetic Algorithms Data Structure Evolution Programs

SpringerVerlag

Z Michalewicz Genetic algorithsm numerical optimization and constraints In L J

Eshelman editor Proc Intl Conf on Genetic Algorithms pages Morgan

Kaufmann

S Minton M D Johnson A B Philips and P Laird Minimizing conicts a heuristic

repair metho d for constraint satisfaction and scheduling problems Articial Intel li

gence Dec

D Mitchell B Selman and H Levesque Hard and easy distributions of SAT prob

lems In Proc of the th National Conf on Articial Intel ligence pages

J Mo ckus Bayesian Approach to Global OptimizationKluwer Academic Publishers

DordrechtLondonBoston

J Mo ckus Application of bayesian approachtonumerical metho ds of global and

sto chastic optimization Journal of Global Optimization

M S Moller A scaled conjugate gradient algorithm for fast sup ervised learning

Neural Networks

D J Montana and L Davis Training feedforward networks using genetic algorithms

In Proc Int Joint Conf Articial Intel ligence pages San Mateo CA

R Mo ore and E Hansen amd A Leclerc Rigorous metho ds for global optimization

In C A Floudas and PMPardalos editors Recent Advances in Global Optimization

pages Princeton University Press

P Morris The breakout metho d for escaping from lo cal minima In Proc of the th

National Conf on Articial Intel ligence pages Washington DC

S G Nash and A Sofer Blo ck truncatedNewton metho ds for parallel optimization

Mathematical Programming

S G Nash and A Sofer A generalpurp ose parallel algorithm for unconstrained

optimization SIAM Journal on Optimization

D S Nau V Kumar and L Kanal General branch and b ound and its relation to

Aand A O Articial Intel ligence

K Nayebi T PBarnwell I I I and M J T Smith Timedomain lter bank analysis

A new design theory IEEE Transactions on Signal Processing June

J A Nelder and R Mead A simplex metho d for function minimization Computer

Journal

O Nerrand P RousselRagot D Urbani L P ersonnaz and G Dreyfus Training re

current neural networks why and how An illustration in dynamical pro cess mo deling

IEEE Transactions on Neural Networks March

T Q Nguyen Digital lter bank design quadraticconstrained formulation IEEE

Trans on Signal Processing September

T Q Nguyen and PPVaidyanathan Twochannel p erfectreconstruction FIR QMF

structures which yield linearphase analysis and synthesis lters IEEE Trans on

Acoustics Speech and Signal Processing May

J M Ortega and W C Rheinb oldt Iterative Solution of Nonlinear Equations in

Several Variables Academic Press

E R Panier and A L Tits A sup erlinearly convergence feasible metho d for the

solution of inequality constrained optimization problems SIAM Journal on control

and Optimization

E R Panier and A L Tits On combining feasibilitydescent and sup erlinear conver

gence in inequlaity constrained optimization Mathematical Programming

C H Papadimitriou and K Steiglitz Combinatorial Optimization PrenticeHall

C H Papadimitriou and K Steiglitz Combinatorial Optimization Algorithms and

Complexity pages PrenticeHall

PMPardalos Complexity in numerical optimization World Scientic Singap ore

and River Edge NJ

PMPardalos and J B Rosen ConstrainedGlobal Optimization Algorithms and

Applicationsvolume of Lecture Notes in Computer Science SpringerVerlag

D B Parker Optimal algorithms for adaptive networks Second order back propa

gation second order direct propagation and second order Hebbian learning In Proc

Intl Conf on Neural Networksvolume I I pages IEEE

R G Parker and R L Rardin Discrete Optimization Academic Press Inc San

Diego CA

A G Parlos B Fernandez A F Atiya J Muth usami and W Tsai An accelerated

learning algorithm for multilayer p erceptron networks IEEE Transactions on Neural

Networks May

N R Patel R L Smith and Z B Zabinsky Pure adaptive search in Monte Carlo

optimization Mathematical Programming

G Peano Sur une courb e qui remplit toute une aire plaine Mathematische Annalen

B A Pearlmutter Gradient calculations for dynamic recurrent neural networks a

survey IEEE Transactions on Neural Networks Septemb er

M Piccioni Acombined multistartannealing algorithm for continuous global opti

mization Technical Rep ort Systems and Research Center The Universityof

Maryland College Park MD

F J Pineda Generalization of backpropagation to recurrent neural networks Phys

ical Review Letters

F J Pineda Generalization of backpropagation to recurrent neural networks Phys

ical Review Letters

V W Porto D B Fogel and L J Fogel Alternative neural network training metho ds

IEEE Expert pages June

D Powell and M M Skolnick Using genetic algorithms in engineering design opti

mization with nonlinear constraints In S Forrest editor Proc of the Fifth Intl Conf

on Genetic Algorithms pages Morgan Kaufmann

M J D Powell An ecient metho d for nding the minimum of a function of several

variables without calculating derivatives Computer Journal

B N Pschenichny Algorithms for general problems of mathematical programming

Kibernetica

BN Pshenichnyj The Linearization Method for Constrained Optimization Springer

Verlag

P W Purdom Search rearrangementbacktracking and p olynomial average time

Articial Intel ligen ce

MGC Resende and T Feo A GRASP for satisabili ty In D S Johnson and M A

Trick editors Cliques Coloring and Satisability Second DIMACS Implementation

Chal lengevolume pages DIMACS Series on Discrete Mathematics and

Theoretical Computer Science American Mathematical So ciety

MGC Resende LS Pitsoulis and P M Pardalos Approximate solution of weighted

MAXSAT problems using GRASP In Satisability Problem Theory and Applica

tions DIMACS Series on Discrete Mathematics and Theoretical Computer Science

American Mathematical So ciety

A J Robinson Dynamic Error Propagation Networks PhD thesis Cambridge Uni

versity

J A Robinson A machineoriented logic based on the resolution principle J Assoc

Comput Mach pages

H E Romeijn and R L Smith Simulated annealing for constrained global optimiza

tion Journal of Global Optimization September

D E Rumelhart GE Hinton and R J Williams Parallel distributed pro cessing

Explorations in the microstructure of cognition In Paral lel distributedprocessing Vol

pages MIT Press Cambridge MA

D E Rumelhart and J L McClelland editors Paral lel distributedprocessing vol

MIT Press Cambridge MA

H Samueli An improved search algorithm for the design of multiplierless FIR l

ters with p owersoftwo co ecients IEEE Transactions on Circuits and Systems

M S Sarma On the convergence of the Baba and Dorea random optimization meth

ods Journal of Optimization Theory and Applications

R S Scalero and N Tep edelenlioglu A fast new algorithm for training feedforward

neural networks IEEE Transactions on Signal Processing January

J D Schaer and L J Eshelman Designing multiplierless digital lters using genetic

algorithms In Proc Intl Conf on Genetic Algorithms pages San Mateo

CA Morgan Kaufmann

S Schaer and H Warsitz A tra jectoryfollowing metho d for unconstrained opti

mization Journal of Optimization Theory and Applications Octob er

I PSchagen Internal mo deling of ob jective functions for global optimization Journal

of Optimization Theory and Applications

R B Schnab el Parallel computing in optimization In K Schittkowsky editor Com

putational Mathematical Programming pages SpringerVerlag Berlin Ger

many

F Scho en Sto chastic techniques for global optimization A survey on recent advances

Journal of Glob al Optimization

R Sebastiani Applying GSAT to nonclausal formulas Journal of Articial Intel li

genceResearch

T J Sejnowski and C R Rosenb erg Parallel networks that learn to pronounce English

text Complex Systems February

B Selman private communication

B Selman and H Kautz Domainindep endent extensions to GSAT Solving large

structured satisabili ty problems In Proc of the th Intl Joint Conf on Articial

Intel ligence pages

B Selman H Kautz and B Cohen Lo cal search strategies for satisability test

ing In Proc of the Second DIMACS Chal lenge Workshop on Cliques Coloring and

Satisability Rutgers University pages o ct

B Selman H Kautz and B Cohen Noise strategies for improving lo cal search In

Proc of the th National Conf on Articial Intel ligence pages Seattle WA

B Selman and H A Kautz An empirical study of greedy lo cal search for satisability

testing In Proc of the th National Conf on Articial Intel ligen ce pages

Washington DC

B Selman H J Levesque and D G Mitchell A new metho d for solving hard

satisabili ty problems In Proc of AAAI pages San Jose CA

Y Shang and B W Wah Global optimization for neural network training IEEE

Computer March

Y Shang and B W Wah A discrete Lagrangian based globalsearch metho d for

solving satisabili ty problems J of Global Optimization accepted to app ear

Y Shang and B W Wah Discrete Lagrangianba sed search for solving MAXSAT

problems In Proc Intl Joint Conf on Articial Intel ligence IJCAI accepted to

app ear Aug

S Shekhar and M B Amin Generalization by neural networks IEEE Transactions

on Know ledge and Data Engineering April

JJ Shyu and YC Lin A new approach to the design of discrete co ecientFIR

digital lters IEEE Transactions on Signal Processing

D M Simmons Nonlinear Programming for Operations Research PrenticeHall

Englewo o d Clis NJ

S Singhal and L Wu Training multilayer p erceptrons with the extended Kalman

algorithm In D Z Anderson editor Proc Neural Information Processing Systems

pages New York American Inst of Physics

M J T Smith and T P Barnwell III Exact reconstruction techniques for tree

structured subband co ders IEEE Trans on Acoustics Speech and Signal Processing

June

S Smith E Eskow and R B Schnab el Adaptive asynchronous sto chastic global

optimization for sequential and parallel computation In T Coleman and Y Li editors

Large Scale Numerical Optimization pages SIAM Philadelphia

J A Snyman and L PFatti Amultistart global minimization algorithm with

dynamic search tra jectories Journal of Optimization Theory and Applications

July

R So cic and J Gu Fast search algorithms for the Nqueen problem IEEE Trans on

Systems Man and Cybernetics November

Ira j So dagar Kambiz Nayb ei and Thomas PBarnwell I I I Timevarying lter banks

and wavelets IEEE Trans on Signal Processing Novemb er

A K Soman PPVaidyanathan and T Q Nguyen Linear phase paraunitary

lter banks Theory factorizations and designs IEEE Trans on Signal Processing

Decemb er

Y Song Convergence and comparisons of waveform relaxation metho ds for initial

value problems of linear ODE systems Computing

R Sosic and J Gu A p olynomial time algorithm for the nqueens problem SIGART

Bul letin Octob er

R Sosic and J Gu queens in less than one minute SIGART Bul letin

April

R Sosic and J Gu Ecient lo cal search with conict minimization A case study of

the nqueens problem IEEE Trans on Know ledge and Data Engineering

R Spaans and R Luus Imp ortance of searchdomain reduction in random opti

mization Journal of Optimization Theory and Applic ations December

J Sprave Linear neighb orho o d evolution stategies In Proc of the thirdAnnual Conf

on Evolutionary Programming San Diego CA World Scientic

R E Steuer Multiple Criteria Optimization Theory Computation and Application

Krieger Publishing Company

E G Sturua and S K Zavriev A tra jectory algorithm based on the gradient

metho d I the search on the quasioptimal tra jectories Journal of Global Optimiza

tion

C Sutti Nongradient minimization metho ds for parallel pro cessing computers Part

Journal of Optimization Theory and Applications

Z Tang and G Ko ehler Deterministic global optimal FNN training algorithms Neu

ral Networks

A Torn Global optimization as a combination of global and lo cal search Gothenburg

Business Adm Studies

A Torn Cluster analysis using seed p oints and density determined hyp erspheres as

an aid to global optimization IEEE Trans Syst Men and Cybernetics

A Torn and S Viitanen Top ographical global optimization In C A Floudas

and PMPardalos editors Recent Advances in Global Optimization pages

Princeton University Press

A Torn and A Zilinskas Global Optimization SpringerVerlag

A Torn and A Zilinskas Global Optimization SpringerVerlag

E Tsang Foundations of Constraint satisfaction Academic Press San Diego CA

T E Tuncer and T Q Nguyen General analysis of twoband QMF banks IEEE

Trans on Signal Processing February

PPVaidyanathan Multirate digital lters lter banks p olyphase networks and

applications A tutorial Proc of the IEEE January

PPVaidyanathan Multirate Systems and Filter Banks PrinticeHall Inc

D Vanderbilt and S G Louie AMonte Carlo simulated annealing approachto

optimization over contin uous variables Journal of Computational Physics

T L Vincent B S Goh and K L Teo Tra jectoryfollowing algorithms for minmax

optimization problems Journal of Optimization Theory and Applications

Decemb er

A Zilinskas A review of statistical mo dels for global optimization Journal of Global

Optimization

B W Wah and YJ Chang Tracebased metho ds for solving nonlinear global opti

mization problems J of Global Optimization March

B W Wah and Y Shang A discrete Lagrangian based globalsearch metho d for

solving satisabili ty problems In DingZhu Du Jun Gu and Panos Pardalos edi

tors Proc DIMACS Workshop on Satisability Problem Theory and Applications

American Mathematical So ciety March

B W Wah and Y Shang A new global optimization metho d for neural network

training In Proc Intl Conf on Neural Networksvolume PlenaryPanel and Sp ecial

Sessions pages Washington DC June Plenary Sp eech IEEE

B W Wah Y Shang T Wang and T Yu Global optimization design of QMF lter

banks In Proc IEEE Midwest Symposium on Circuits and SystemsIowa CityIowa

accepted to app ear Aug IEEE

B W WahYShangTWang and T Yu Global optimization of QMF lterbank

design using Novel In Proc Intl Conf on Acoustics Speech and Signal Processing

volume pages IEEE April

B W Wah Y Shang and Z Wu Discrete Lagrangian metho d for optimizing the

design of multiplierless QMF lter banks In Proc Intl Conf on Application Specic

Array Processors IEEE accepted to app ear July

B W WahTWang Y Shang and Z Wu Improving the p erformance of weighted

Lagrangemultiple metho ds for constrained nonlinear optimization In Proc th Intl

Conf on Tools for Articial Intel ligence IEEE invited Nov

G R Walsh Methods of Optimization John Wiley and Sons

F H Walters L R Parker S L Morgan and S N Deming Sequential simplex

optimizationCRC Press

T Wang and B W Wah Handling inequality constraints in continuous nonlinear

global optimization Journal of Integrated Design and Process Science accepted to

app ear

Tao Wang and B W Wah Handling inequality constraints in continuous nonlinear

global optimization In Proc nd World ConferenceonIntegrated Design and Process

Technologyvolume pages Austin TX Decemb er

D Whitley T Starkweather and C Bogart Genetic algorithms and neural networks

Optimizing connections and connectivity Paral lel Computation

R J Williams and J Peng An ecient gradientbased algorithm for online training

of recurrentnetworks tra jectories Neural Computation

R J Williams and D Zipser A learning algorithm for continually running fully

recurrent neural networks Neural Computation

PWolfe Metho ds of nonlinear programming In J Abadie editor Nonlinear Pro

gramming pages John Wiley New York

PWolfe Convergence conditions for ascent metho ds SIAM Rev

J W Wo o ds editor Subband Image Coding Kluwer Academic Publishers

Ting Yu QMF Filter Bank Design Using Nonlinear Optimization MSc Thesis Dept

of Computer Science Univ of Illinois Urbana IL May

X Yu G Chen and S Cheng Dynamic learning rate optimization of the backprop

agation algorithm IEEE Transactions on Neural Networks May

Z B Zabinsky and R L Smith Pure adaptive search in global optimization Mathe

matical Programming

Z B Zabinsky et al Improving hitandrun for global optimization Journal of Global

Optimization

S Zhang and A G Constantinides Lagrange programming neural networks

IEEE Transactions on Circuits and SystemsII Analog and Digital Signal Processing

VITA

Yi Shang received his BS degree in computer science from the University of Science

and Technology of China Hefei China in and his MS degree in computer science

from the Institute of Computing Technology Academia Sinica Beijing China in He

is currently a candidate for Do ctor of Philosophy in Computer Science at The University

of Illinois at UrbanaChampaign After completing his do ctoral degree he will b ecome

an assistant professor in the Department of Computer Engineering and Computer Science

University of Missouri Columbia Missouri

His researchinterests include optimization machine learning articial intelligence and

distributed and parallel systems