Proceedings of the 7Th Annual ISC Graduate Research Symposium

Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013 April 24, 2013, Rolla, Missouri

AUTOMATED GENERATION OF BENCHMARKS WITH HIGH DISCRIMINATORY POWER FOR SPECIFIC SETS OF BLACK BOX SEARCH ALGORITHMS

Matthew Nuckolls Department of Computer Science Missouri University of Science and Technology, Rolla, MO 65409

ABSTRACT fitness is deemed optimal. This BBSA is not Determining the best black box search algorithm expected to beat any other BBSA, however one (BBSA) to use on any given optimization problem is consequence of the No Free Lunch Theorem1 is that difficult. It is self-evident that one BBSA will perform we should be able to find a benchmark problem for better than another, but determining a priori which which none of the other included BBSAs can do BBSA will perform better is a task for which we better than Random Search. currently lack theoretical underpinnings. A system could Hill Climber (HC) is a steepest ascent restarting be developed to employ heuristic measures to compare a hill climber. It starts from a random location, and at given optimization problem to a library of benchmark each time step it examines all of its neighbors and problems, where the best performing BBSA is known moves to neighbor with highest fitness. Should it for each problem in the library. This paper describes a find itself at a peak -- a point at which all neighbors methodology for automatically generating benchmarks are downhill -- and then it starts over from a new for inclusion in that library, via evolution of NK- random location. Note that this algorithm is Landscapes. vulnerable to plateaus, and will wander instead of restart. 1. INTRODUCTION At each time step, Simulated Annealing2 (SA) Some BBSAs lend themselves to straightforward picks a random neighbor. If that neighbor has a generation of a benchmark problem. For example, a Hill higher fitness than the current location, then SA Climber search algorithm, faced with a hill leading to a moves to that location. If that neighbor has a lower globally optimal solution, can be expected to rapidly or equal fitness, then SA still may move to that and efficiently climb the hill, and can furthermore be location, with probability determined by a 'cooling expected to reach the top faster than an algorithm that schedule'. Earlier in the run, SA is more likely to considers the possibility that downhill may lead to a move downhill. SA does not restart. As implemented better solution. For other BBSAs however, constructing in this paper, SA uses a linear cooling schedule so a benchmark problem for which that BBSA will that it is more likely to explore at the beginning of outperform all others is a non-trivial task. Imperfect the run and more likely to exploit at the end. understanding of the interactions between the multitude As implemented for this paper, Evolutionary of moving parts in a modern search algorithm leads to Algorithm3 (EA) is a mu + lambda (mu=100, an imperfect understanding of what sorts of problems lambda=10) evolutionary algorithm, using linear any given BBSA is best suited for. ranking (s=2.0) stochastic universal sampling for A method by which a benchmark problem can be both parent selection and survival selection. automatically generated to suit an arbitrary BBSA would allow the user to assemble a library of benchmark 3. N-K LANDSCAPES problems. A first step towards building a benchmark When using NK-landscapes4, typically the where an arbitrary BBSA beats all other BBSAs, is a experiment is set up to use a large number of benchmark where an arbitrary BBSA beats all other randomly generated landscapes to lessen the impact BBSAs in a small set. of any one landscape on the results. This line of research, however, is explicitly searching for fitness 2. BLACK BOX SEARCH ALGORITHMS landscapes that have an outsized impact. Each NK- Several standard BBSAs were chosen for inclusion landscape generated is evaluated on its ability to in the set, spanning a spectrum of behaviors. By discriminate amongst a set of search algorithms. A finding benchmark problems for a variety of BBSAs, high scoring landscape is one that shows a clear the validity of this methodology for creating a preference for one of the search algorithms, such that library of benchmark problems is strengthened. the chosen algorithm consistently finds a better Random Search (RA) simply generates random solution than all other algorithms in the set. This is individuals and records their fitness until it runs out determined via the following methodology. of evaluations. The individual found with highest 1 1.1. Implementation and survival selection. Such an algorithm needs to be In this paper, NK-landscapes are implemented able to mutate an individual, perform genetic as a pair of lists. The first list is the neighbors list. crossover between individuals, and determine the The neighbors list is n elements long, where each fitness of an individual. Mutation and crossover are element is a list of k+1 integer indexes. Each intrinsic to the representation of the individual, and element of the i'th inner list is a neighbor of i, and will be covered first. Fitness evaluation is left for a will participate in the calculation of that part of the later section. overall fitness. An important implementation detail Mutation of an NK-landscape is performed in is that the first element of the i’th inner list is i, for three ways, and during any given mutation event all, all i. Making the element under consideration part of some, or none of the three ways may be used. The the underlying data (as opposed to a special case) first mutation method is to alter the neighbors list at simplifies and regularizes the code, an important a single neighborses location. This does not alter the consideration when metaprogramming is used. A length of the list, nor may it ever alter the first second important implementation detail is that no element in the list. The second mutation method is to number may appear in a neighbor list more than alter the subfunc at a single location. All possible once. This forces the importance of a single index tuple keys are still found, but the values associated point to be visible in the second list, allowing for with those keys are altered by a random amount. easier analysis. The first list is called the neighborses The third mutation method alters k. When k is list, to indicate the nested plurality of its structure. increased, each element of the neighborses list gains The second list is the subfunctions list. The one randomly chosen neighbor, with care taken that subfunctions list is used in conjunction with the no neighbor can be in the same list twice, nor can k neighborses list to determine the overall fitness of ever exceed n. Increasing k by 1 doubles the size of the individual under evaluation. The subfunction list the subfunc key value stores, since each key in the is implemented as a list of key value stores, of list parent has two corresponding entries in the child length n. Each key in the key value store is a binary key, one ending in 0, the other ending in 1. For tuple of length k, with every possible such tuple example the key (0, 1) in the original NK-landscape represented in every key value store. For example, if would need corresponding entries for (0, 1, 0) and k is 2, then the possible keys are (0, 0), (0, 1), (1, 0), (0, 1, 1) in the mutated NK-landscape. This and (1, 1). The values for each key are real numbers, implementation starts with the value in the original both positive and negative. key and alters it by a different random amount for each entry in the mutated NK-landscape. 1.2. Evaluation of a Bit String Individual When k is decreased, a single point in the inner To evaluate an individual, the system runs down lists is randomly chosen, and the neighbor found at the pair of lists simultaneously. For each element in that point is removed from each of the lists. Care is neighborses, it extracts the binary value of the taken so that the first neighbor is never removed, so individual at the listed indexes in the first list. It k can never be less than zero. The corresponding assembles those binary values into a single tuple. It entry in subfuncs has two parents, for example if the then looks at the corresponding subfunc key value second point in the inner list is chosen, then both (0, store in the subfuncs list and finds the value 1, 0) and (0, 0, 0) will map to (0, 0) in the mutated associated with that tuple. The sum of the values NK-landscape. This implementation averages the found for each element in the pair of lists is the values of the two parent keys for each index in the fitness of that individual in the context of this NK- subfuncs list. landscape. Genetic crossover is only possible in this Part of the design consideration for this structure implementation between individuals of identical n was ease of metaprogramming for CUDA5. The and k, via single point crossover of the neighborses various components of the lists plug into a string and subfuncs lists. The system is therefore dependent template of C++ code, which is then compiled into a on mutation to alter k, and holds n constant during CUDA kernel. This kernel can then be run against a any given system run. large number of individuals simultaneously. This approach is not expected to be as fast as a hand- 1.4. Evaluation of NK-Landscape Fitness tuned CUDA kernel that pays proper respect to the The NK-Landscape manipulation infrastructure various memory subsystems available, however it is used to evolve landscapes that clearly favor a has shown to be faster than running the fitness given search algorithm over all other algorithms in a evaluations on the CPU, given a sufficiently large set. Accordingly, a fitness score must be assigned to number of individuals in need of evaluation. each NK-Landscape in a population, so that natural selection can favor the better landscapes, guiding the 1.3. Evolutionary Operators meta-search towards an optimal landscape for the The search algorithm chosen to guide the selected search algorithm. modification of the NK landscapes is a canonical mu This implementation defines the fitness of the + lambda evolutionary algorithm, with stochastic NK-landscape as follows. First, the 'performance' of universal sampling used for both parent selection each search algorithm is found. The performance is

2 defined as the mean of the fitness values of the 5. DISTRIBUTED ARCHITECTURE optimal solutions found across several (n=30) runs. Taking advantage of the parallelizable nature of While performance is being calculated all search evolutionary algorithms, this implementation used an algorithms also record the fitness of the worst asynchronous message queue (beanstalkd) and a web individual they ever encountered in the NK- server (nginx) to distribute the workload across a landscape. This provides a heuristic for an unknown small cluster of high performance hardware. Each of value: the value of the worst possible individual in the four machines in the cluster has a quad-core the NK-landscape. Once a performance value has processor and two CUDA compute cards. A worker been calculated for every search algorithm in the set, node system was developed whereby the head node the performance values and 'worst ever encountered' could place a job request for a particular search value are linearly scaled into the range [0, 1], such algorithm to be run against a particular NK that the worst ever encountered value maps to zero landscape. This job request was put into the message and the best ever encountered value maps to one. queue, to be delivered to the next available worker This provides a relative measure of the performance node. When the worker node received the request, it of the various search algorithms, as well as allowing first checked if it had a copy of the requested NK- for fair comparisons between NK-landscape. landscape in a local cache. If not, the worker node used the index number of the NK-landscape to place 4. NK-LANDSCAPE FITNESS COMPARISONS a web request with the head node and download the Once each search algorithm has a normalized data sufficient to recreate the NK-landscape and performance value, the system needs to judge how place it in the local cache. The use of the web based well this NK landscape 'clearly favors a given search distribution channel was necessary because the algorithm over all other algorithms in the set'. This representation of an NK-landscape can grow very implementation tried two approaches, one more large, and the chosen message queue has strict successful than the other. message size limits. For the results presented in this paper, the distributed system ran using 16 worker 4.1. One versus All nodes utilizing the CPU cores, plus another 8 worker The first heuristic used in this implementation nodes running their fitness functions on the CUDA wass to calculate the set of differences between the cards. performance of the favored algorithm and the Interestingly, the CUDA-based worker nodes performance of each of the other algorithms, and did not outperform the CPU based worker nodes. then find the minimum of the set of differences. The The CUDA-based fitness evaluation is very fast, but minimum of the set of differences is then used as the the number of individuals that need evaluated needs fitness of the NK-landscape. Note that the fitness to be high before the speed difference becomes ranges from negative 1 to positive 1. A fitness of apparent, due to the need to move the individuals positive 1 would correspond to an NK-landscape and kernels across the PCI bus. The exception is the where the optimal individual found by all of the non- random search algorithm, which was rewritten for favored algorithms has identical fitness to the 'worst CUDA-based evaluation. Since each of the ever encountered' individual, while the favored evaluations is independent, all evaluations can algorithm finds any individual better than that. happen in parallel. This approach suffered because it needed a 'multiple coincidence' to make any forward progress. 4. RESULTS The use of the minimum function meant that an NK- Figures 1 through 4 show the results for evolutionary landscape needed to clearly favor one algorithm over runs using the “One vs All” heuristic. For each all others. Favoring a pair of algorithms over all figure, the bold red line indicates the fitness of the others was indistinguishable from not favoring any NK-Landscape in the population that best showcases algorithms at all, so there was no gradient for the the chosen BBSA. The bold black dashed line meta-search algorithm to climb. indicates the value of k for that best NK-Landscape. The thinner colored lines indicate the relative fitness 4.2. Pairwise Comparisons of the best individual found by the other BBSAs. Making pairwise comparisons and allowing The vertical axis on the left measures the fitness of the meta-search algorithm to simply compare the the colored lines, while the vertical axis on the right normalized optimal individuals between two search measures only the dashed black line. The horizontal algorithms proved to provide a better gradient. Since axis shows how many fitness evaluations have any change in the relative fitness of the optimal elapsed. While each “One vs All” experiment was individuals was reflected in the fitness score of the repeated 30 times, for clarity each of these figures NK-landscape, the meta-search had immediate shows the results of only a single randomly chosen feedback, not needing to cross plateaus of representative run. All runs for each experiment unchanging fitness. exhibited similar behavior, and inclusion of error bars would unnecessarily clutter the graph. Figures 5 through 8 show the results for evolutionary runs using the “Pairwise” heuristic. In

3 contrast to the previous figures where each graph EA never surpassed that of HC. In Figure 5, we see corresponds to a single evolutionary run, this set of EA rapidly passing HC and continuing to grow. figures combines three runs into each graph. Each Clearly the system is capable of evolving an NK- graph shows three lines. Each line corresponds to a Landscape which favors EA over HC, so perhaps if single randomly chosen representative run, from the the experiment in Figure 1 were repeated with vastly set of 30 runs performed for each ordered pair of more evaluations allowed, the system would BBSAs. Each line in the graph shows how eventually wander across the unchanging fitness performance of the BBSA in the title of the graph plateau and find the solutions found in Figure 5. compared to the performance of the BBSA Figure 6 shows that evolving an NK-Landscape corresponding to that line, when applied to the NK- where Simulated Annealing with a linear cooling Landscape in the population that best discriminates schedule is the preferred search algorithm, may take between the two BBSAs. For clarity, the line quite some time. showing the k value of the best NK-Landscape is Figure 7 provides further evidence that Hill omitted from this graph. Climber performs very well on this sort of problem, and comparing Figure 7 to all other figures provides 5. DISCUSSION evidence that it is relatively easy to evolve NK- In each experiment, the evolutionary process made Landscapes where Hill Climber outperforms other forward progress, however most fell short of the goal BBSAs. of finding a highly discriminatory NK-Landscape. The interesting part of Figure 8 is that Random In the experiment shown for EA vs All in Figure Search should not consistently beat other search 1, a landscape with k=7 was found that improved the algorithms, unless the other algorithms are revisiting performance of all search algorithms in the set, but points on the landscape that have already been tried, EA improved the most. Later in the run a landscape and doing so at a rate faster than random search was found that hurt the performance of all search would be expected to. This behavior is expected of algorithms in the set, but EA was hurt the least. No SA, which may spend a great deal of time oscillating further gains were seen in the 1000 evaluations among already-explored states. This may also allocated. happen with EA, which may revisit a state via any The performance of Simulated Annealing is number of mechanisms. As implemented, Hill strongly dependent on its cooling schedule. As no Climber may also revisit states, when wandering attempt was made to optimize the cooling schedule across a fitness plateau. The results in Figure 8 for any given NK-Landscape, SA underperformed all indicate that this is a greater weakness in SA and EA other search algorithms in the set. Nevertheless, than it is in HC. Figure 2 shows that the evolutionary process found an NK-Landscape that hurt the performance of other 6. CONCLUSIONS search algorithms more than it hurt the performance This work definitively shows that discriminatory of SA, resulting in a net positive gain for SA. An benchmark problems can be evolved using fitness NK-Landscape found later in the run proved to be functions described using the language of NK- easier for all algorithms to solve, while increasing k. Landscapes. However, the discriminatory power This shows that while an increase in k may mean an between some BBSAs is low. A more expressive increase in difficulty, it may also mean a decrease in description language may be needed to separate the difficulty, across all search algorithms. Fitness never better BBSA, or perhaps the key is to simply allow a became positive however, meaning that the system much extended runtime. could not find an NK-Landscape where SA beat all The distributed architecture developed for this other search algorithms in the set. research allows for efficient parallelization of fitness Hill Climber performed unexpectedly well in evaluations in a heterogeneous environment. The this sequence of experiments, consistently beating all size of the questions we can ask depends on the other search algorithms in the set. In the experiment amount of computational power we can efficiently shown in Figure 3 it is interesting to note that the harness. A framework that allows for efficient performance of the non-competitive search horizontal scalability across commodity hardware algorithms did change during the run without allows us to ask bigger questions. altering the overall fitness, due to the “One versus All heuristic”. This shows that this heuristic is 7. ACKNOWLEDGMENTS indeed vulnerable to the need to cross plateaus of The author would also like to acknowledge the unchanging fitness, blind to possible progress. support of the Intelligent Systems Center for the Random Search, as shown in Figure 4, was not research presented in this paper. Furthermore, the expected to reach a fitness of 0. A fitness landscape author would like to thank Brian Goldman, a prolific where random search beat all other search idea factory who epitomizes the concept that if you algorithms would be an interesting landscape indeed. generate a hundred ideas per day then even if 99.9% Figure 5 shows the performance of EA versus of them are terrible, you’re still ahead. each of the other search algorithms. It is interesting to contrast with Figure 1, where the performance of

4 9. REFERENCES Eiben, A. E., and Smith, J. E., 2007, “Introduction to Evolutionary Computing,” Springer. Wolpert, D. H., and Macready, W. G., 1997, “No Kauffman, S., and Weinberger, E., 1989, “The N-K Free Lunch Theorems for Optimization”, IEEE Model of the application to the maturation of Transactions on Evolutionary Computation, Vol the immune response,” Journal of Theoretical 1(1), pp. 67-82. Biology, Vol 144(2), pp. 211-245. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P., Nickolls, J., et al, 2008, “Scalable Parallel 1983, “Optimization by Simulated Annealing.” Programming with CUDA,” Queue, Vol 6(2), Science, Vol 220, pp. 671-680. pp. 40-53.

Figure 1 – EA performance vs all other algorithms at once

Figure 2 – Simulated Annealing vs all other algorithms at once

Figure 3 – Hill Climber vs all other algorithms at once

Figure 4 – Random Search vs all other algorithms at once

Figure 5 – Evolutionary Algorithm vs each other algorithm, pairwise

Figure 6 – Simulated Annealing vs each other algorithm, pairwise

Figure 7 – Hill Climber vs each other algorithm, pairwise

Figure 8 – Random Search vs each other algorithm, pairwise