An Evolutionary Framework for Automatic and Guided Discovery of Algorithms

An Evolutionary Framework for Automatic and Guided Discovery of Algorithms Ruchira Sasanka Konstantinos Krommydas Intel Corporation Intel Corporation [email protected] [email protected] Abstract evolution without a fitness function, by adding several po- This paper presents Automatic Algorithm Discoverer (AAD), tentially related problems together into a group. We call this an evolutionary framework for synthesizing programs of Problem Guided Evolution (PGE) and it is analogous to the high complexity. To guide evolution, prior evolutionary algo- way we teach children to solve complex problems. For in- rithms have depended on fitness (objective) functions, which stance, to help discover an algorithm for finding the area of a are challenging to design. To make evolutionary progress, in- polygon, we may ask a student to find a way to calculate the stead, AAD employs Problem Guided Evolution (PGE), which area of a triangle. That is, simpler problems guide solutions to requires introduction of a group of problems together. With more complex ones. Notably, PGE does not require knowing PGE, solutions discovered for simpler problems are used to the exact algorithm nor the exact constituents to a solution, solve more complex problems in the same group. PGE also but rather a few potentially related problems. In AAD, PGE enables several new evolutionary strategies, and naturally allows more complex solutions to be derived through (i) com- yields to High-Performance Computing (HPC) techniques. position (combining simpler ones), and through (ii) mutation We find that PGE and related evolutionary strategies en- of alredy discovered ones. able AAD to discover algorithms of similar or higher com- Grouping related problems for PGE, like designing a fit- plexity relative to the state-of-the-art. Specifically, AAD pro- ness function, is not automatic and currently requires human duces Python code for 29 array/vector problems ranging insight. However, PGE could be a more natural way to attack from min, max, reverse, to more challenging problems like complex problems. As a concrete example, Figure 1 shows sorting and matrix-vector multiplication. Additionally, we code that AAD produced in order to sort a non-empty array find that AAD shows adaptability to constrained environ- in ascending order (SortAsc). To solve this, we grouped ten ments/inputs and demonstrates outside-of-the-box problem potentially related problems together to guide evolution: min, solving abilities. max, first/last index, reverse, remove first/last, is-in-array, and sort ascending/descending. AAD used solutions it found Keywords Synthesis, Evolution, Python, HPC itself for some of those problems to discover an algorithm for sorting: by first finding the minimum of the input array, 1 Introduction appending that minimum to a new list, removing the mini- Program synthesis involves automatically assembling a pro- mum from the input array, and repeating these steps until gram from simpler components. It is analogous to searching the entire input array is processed. Though not the most the entire space created by all possible permutations of those elegant nor the most efficient implementation, a machine components, looking for solutions that satisfy given require- being able to discover an algorithm for sorting starting from ments. Many such search strategies (such as enumerative, a basic subset of Python illustrates the capabilities of AAD deduction-based, constraint-solving, stochastic) have been and the utility of PGE. arXiv:1904.02830v1 [cs.NE] 5 Apr 2019 proposed to address this challenge [3, 12, 32, 34, 41, 47, 49]. Overall, this paper makes the following contributions: In this work, we propose an evolution-based search strat- • egy, implemented in the Automatic Algorithm Discoverer Use of Problem Guided Evolution to eliminate objective (AAD). AAD can synthesize programs of relatively high com- functions in evolutionary algorithms. • plexity (including loops, nested blocks, nested function calls, Use of multiple evolutionary strategies (such as diverse etc.), based on a subset of Python as grammar, and can gen- environments & solutions, cross-pollination, and ganged erate executable Python code. In this paper we use AAD to evolution), and evaluation of their effectiveness via a discover algorithmic solutions to array/vector problems. wide range of experiments. • Evolutionary algorithms use a fitness (objective) function Application of AAD to solve 29 array/vector problems to pick the fittest individuals from a population [23, 25, 27, in general-purpose Python language, demonstrating 28]. The traits of the fittest individuals can recombine (cross- evolutionary algorithms are capable of solving com- over) to form the next generation. However, designing an plex state-of-the-art problems. • effective fitness function could be challenging for complex Support of loops to discover algorithms that can accept problems [9, 17, 20]. We propose an alternative way to guide any (non-zero) input size. 1 def Min(arg0): arr_10 = arg0.copy() num_11 = arr_10.pop(0) for num_12 in tuple(arr_10): bool_14 = (num_11 > num_12) if (bool_14): num_11 = num_12 num_11 = num_11 return (num_11) Figure 2. Components of AAD. def LastIndOf(arg0, arg1): arr_10 = arg0.copy() results of putting AAD in test in a broad range of experiments num_11 = arg1 encompassing a variety of problems. Last, we present related for num_12, num_13 in enumerate(tuple(arr_10)): work (Section 5), a discussion and future work (Section 6), bool_15 = (num_13 == num_11) and conclude the paper in Section 7. if (bool_15): arr_10.append(num_12) 2 Design num_13 = arr_10[-1] As shown in Figure 2, AAD consists of three components: return (num_13) (i) a Problem Generator (ProbGen) to generate a problem, def RemoveL(arg0, arg1): (ii) a Solution Generator (SolGen) to come up with poten- arr_10 = arg0.copy() tial solutions (programs), and (iii) a Checker to check those num_11 = arg1 solutions. num_16 = LastIndOf(arr_10, num_11) num_14 = arr_10.pop(num_16) 2.1 Problem Generator (ProbGen) bool_12 = (num_16 < num_14) Each problem we want solved starts with its own ProbGen. return (arr_10) ProbGen is responsible for: (i) specifying the number and types of inputs and outputs, and (ii) generating inputs for a def SortAsc(arg0): given problem size. For instance, for maximum finding (Max), arr_10 = arg0.copy() ProbGen specifies that Max takes one array as its input and arr_16 = list() arr_14 = arr_16.copy() produces a number as its output. In addition, when requested for num_12 in tuple(arr_10): to generate inputs for a problem of size N , it produces an num_15 = Min(arr_10) input array filled with N numbers. arr_14.append(num_15) arr_10 = RemoveL(arr_10, num_15) 2.2 Checker return (arr_14) Checker is responsible for accepting or rejecting a solution generated for a given problem. Checker executes the gen- Figure 1. AAD synthesized algorithm for sorting erated program with input(s) generated by ProbGen, and produces output. The Checker contains logic to either accept or reject the output, as in [21]. Therefore, a Checker is • Mapping of inherently parallel evolutionary process specific to a given ProbGen, and both go hand in hand. to HPC hardware and techniques. A Checker does not necessarily need an implementation We find that PGE and related evolutionary strategies are of the algorithm it seeks to discover, although some problems effective at discovering solutions to our array/vector prob- do require an alternative implementation. For instance, the lems. Among other findings, notable is the adaptability of Checker for problem “Sort” does not have to sort the input AAD to constrained environments and inputs, as well as its array. Rather, it can compare each two adjacent elements in ability to find creative solutions. the output array and see whether the two elements are in The rest of the paper is organized as follows: In Section 2 the expected order. As soon as it detects an out of order pair, we present the design details of AAD. Specifically, we discuss: it can declare failure. If each pair of elements is in order, and (i) the three constituent parts of AAD, with special empha- the output array contains exactly the same elements as the sis on the Evolver and its three phases that construct the input array, which can be checked by removing matching solution, (ii) the evolutionary strategies employed by AAD elements, the Checker can accept the solution. and similarities to biological evolution, and (iii) engineering For some problems based on the physical world, the input challenges we faced and their solutions, as well as our HPC- and output data may be found out empirically. For instance, oriented approach in the AAD implementation. Section 3 to develop an algorithm that can predict future temperatures presents our experimental setup and Section 4 discusses the at a specific place, historical temperature data may be used 2 or data may be gathered using sensors. In other words, the Expr/Stmt Representation physical world can function as a Checker for some of the Arithmetic NUM = NUM op NUM models (algorithms) we want to discover. op = +; −; ∗; // Compare BOOL = BOOL op BOOL op = <, >, ==, <=, >=, ! = 2.3 The Solution Generator (SolGen) Head/Tail NUM = ARR[0] j ARR[-1] SolGen primarily consists of two components: (i) an Expres- Pop (Head/Tail) NUM = ARR.pop(0) sion/Idiom Store, and (ii) an Evolver. NUM = ARR.pop() Pop at Ind NUM = ARR.pop(NUM) 2.3.1 Expression/Idiom Store (ExpStore) Append ARR.append() New Array ARR = list() SolGen constructs source programs using a grammar, as Constant NUM = 0 j 1 in [2, 15, 31, 50]. The subset of Python grammar AAD uses Array copy ARR = ARR.copy() is stored in ExpStore, and is given in Table 1. AoA copy AoA = AoA.copy() In AAD, grammar rules are augmented with type informa- Func Arg ANY_TYPE = param tion, as in [13, 36, 39]. AAD supports four types: numbers Assign Stmt NUM = NUM (NUM), Boolean (BOOL), arrays (ARR), and arrays-of-arrays Return Stmt return ANY_TYPE (AoA), which can model matrices. Further, each operand of Reduction NUM += NUM an expression is marked as a Consumer (read-only), a Pro- If Stmt if BOOL: BLOCK ducer (write-only), or a ProdCon (read-modify-write).

Load more