Register Based Genetic Programming on FPGA Computing Platforms
Total Page:16
File Type:pdf, Size:1020Kb
Appears in EuroGP 2000 Presented here with additional revisions Register Based Genetic Programming on FPGA Computing Platforms Heywood M.I.1 Zincir-Heywood A.N.2 {1Dokuz Eylül University, 2Ege University} Dept. Computer Engineering, Bornova, 35100 Izmir. Turkey [email protected] Abstract. The use of FPGA based custom computing platforms is proposed for implementing linearly structured Genetic Programs. Such a context enables consideration of micro architectural and instruction design issues not normally possible when using classical Von Neumann machines. More importantly, the desirability of minimising memory management overheads results in the imposition of additional constraints to the crossover operator. Specifically, individuals are described in terms of the number of pages and page length, where the page length is common across individuals of the population. Pairwise crossover therefore results in the swapping of equal length pages, hence minimising memory overheads. Simulation of the approach demonstrates that the method warrants further study. 1 Introduction Register based machines represent a very efficient platform for implementing Genetic Programs (GP) which are organised as a linear structure. That is to say the GP does not manipulate a tree but a register machine program. By the term ‘register machine’ it is implied that the operation of the host CPU is expressed in terms of operations on sets of registers, where some registers are associated with specialist hardware elements (e.g. as in the ‘Accumulator’ register and the Algorithmic Logic Unit). The principle motivation for such an approach is to both significantly speed up the operation of the GP itself through direct hardware implementation and to minimise the source code footprint of the GP kernel. This in turn may lead to the use of GPs in applications such as embedded controllers, portable hand held devices and autonomous robots. Several authors have assessed various aspects of such an approach, early examples being [1], [2] and [3]. The work of Nordin et al., however, represents by far the most extensive work in the field with applications demonstrated in pattern recognition, robotics and speech recognition to name but three [4], [5]. Common to Nordin’s work however is the use of standard Von Neumann CPUs as the target computing platform; an approach which has resulted in both RISC [4] and CISC [6] versions of their AIM-GP system. In all cases a 3-address 32-bit format is used for the register-machine, where this decision is dictated by the architecture of the host CPU. In the case of this work, the target host computational platform is that of an FPGA-based custom computing machine. Such a choice means that we are free to make micro-architecture decisions such as the addressing format, instructions and word-lengths, the degree of parallelism and support for special purpose hardware (e.g. fast multipliers). Moreover, the implementation of bit-wise operations, typical to GP crossover and mutation operators, is very efficiently supported on FPGA architectures. However, it should be emphasised that the work proposed here is distinct from the concept of Evolvable Hardware for which FPGA based custom computing platforms have received a lot of interest [7]. In particular the concept of Evolvable Hardware implies that the hardware itself begins in some initial (random) state and then physically evolves to produce the solution through direct manipulation of hardware. In the past the technique has been limited by the analogue nature of the solutions found [8]. Recent advances facilitate evolution of digital circuits with repeatable performance [9]. In contrast, the purpose of this work is to provide succinct computing cores of a register machine nature, thus the FPGA is used to produce a GP computing machine from the outset as oppose to mapping the requirements of GP to a general-purpose machine. In the following text, section 2 defines the concept of register machines and introduces the instruction formats used later for 0-, 1-, 2- and 3-addressing modes. Limitations and constraints on the register machine, as imposed by an FPGA custom computing platform, are discussed in section 3. This sets the scene for the redefinition of the crossover operator and the introduction of a second mutation operator. Section 4 summarises performance of each approach on a benchmark problem. Finally the results are discussed and future directions indicated in section 5. 2 Address Register Instruction formats As indicated above a linear GP structure is employed, hence individuals take the form of register-level transfer language instructions. The functional set is now in the form of opcodes, and the terminal set becomes the operands (e.g. permitted inputs, outputs and internal registers or general address space). Before detailing the specific format used here to implement register machines, it is first useful to summarise what a register machine is. As indicated in the introduction the modern CPU, as defined by Von Neumann, is a computing machine composed from a set of registers, where the set of registers is a function of the operations, hence application of the CPU [10, 11]. The basic mode of operation therefore takes the form of (1) fetching and (2) decoding of instructions (where this involves special purpose resisters such as the program counter, instruction and address registers) and then (3) manipulating the contents of various registers associated with implementing specific instructions. The simplest Von Neumann machines therefore have a very limited set of registers in which most operations are performed using, say, a single register. This means, for example, that the ability to store sub-results is a function of execution sequence, as in the case of a stack-based evaluation of arithmetic and logical operators. If the instruction sequence is so much as a single instruction wrong, then the entire calculation is likely to be corrupted. In this case additional registers are provided to enable the popping/ pushing of data from/ to the stack before the program completes (at which point the top of the stack is (repeatedly) popped to produce the overall answer(s)). However, as the number of internal registers capable of performing a calculation increases, then our ability to store sub-results increases. Within the context of a GP these provide for a divide and conquer approach to code development. Such a capability is particularly important in the case of linear GP structures as no structured organisation of instruction sequences is assumed. Identification of useful instruction ‘building blocks’ is a function of the generative activity of GP. Moreover, such a property is a function of the flexibility of the addressing format associated with the register language (c.f. 0- , 1-, 2- and 3- register addressing). Table 1 provides an example of the significance of the register addressing employed when evaluating the function x4 + x3 + x2 + x. It is immediately apparent that the 2- and 3- register addressing modes provide a much more compact notation than 0- and 1- register addressing modes. However, this does not necessarily imply that the more concise register addressing modes are easier to evolve, although it may well be expected given that the three-address format roughly corresponds to a functional node in the classical tree-based structure. In short, the 0- address case requires a stack to perform any manipulation of data – arithmetic or register transfer. The 1-address format relaxes this constraint by enabling direct inclusion of register values and input ports within arithmetic operations. The target register is always the accumulator however, and one of the operands (terminals) is always the present value of the accumulator. A 2-address format replaces the accumulator with any designated internal register, whereas 3-address register machines permit the use of internal registers as both the operands (terminals) and result registers as well as arbitrary target and source registers. Table 1. Evaluating x4 + x3 + x2 + x using different register addressing modes. 0-address 1-address 2-address 3-address TOS ← P0 AC ← P0 R1 ← P0 R1 ← P0 * P0 TOS ← P0 AC ← AC * P0 R1 ← R1 * R1 R1 ← R1 + P0 ← ← ← ← TOS TOSt * TOSt-1 AC AC + P0 R1 R1 + P0 R2 R1 * P0 R1 ← TOS AC ← AC * P0 R2 ← R1 R2 ← R1 * P0 TOS ← R1 AC ← AC + P0 R2 ← R2 * P0 TOS ← P0 AC ← AC * P0 R2 ← R2 * P0 ← ← ← TOS TOSt * TOSt-1 AC AC + P0 R1 R1 + R2 TOS ← P0 ← TOS TOSt + TOSt-1 TOS ← R1 TOS ← R1 ← TOS TOSt * TOSt-1 ← TOS TOSt + TOSt-1 TOS ← R1 ← TOS TOSt + TOSt-1 KEY: TOS – top of stack; R× - internal register ×; P× - external input on port ×; AC – accumulator register; {*, +}- arithmetic operators; ← - assignment operator. Where smaller addressing modes benefit however, is in terms of the number of bits necessary to represent an instruction. That is to say if the word-length of a shorter address instruction is half or a quarter that of a 3-address instruction, then the smaller footprint of the associated register machine, may well enable several register machines to exist concurrently on a single FPGA. Hence, a GP defined in terms of a smaller word-length can perform the evaluation of multiple individuals in parallel. In summary it is expected that the smaller register address formats will produce a less efficient coding, but provide a smaller hardware footprint. Hence although more generations may be necessary to achieve convergence, multiple evaluations are performed at each iteration. However, the assumption of a smaller algorithm footprint for the shorter address cases does not always hold true. In particular, for the special case of a 0-address register machine, a stack is required to perform the calculations. This will therefore consume a significant amount of hardware real estate. The method is retained in the following study, however, given the historical interest in the method [3].