Automatic Generation of Cellular Automata on FPGA

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Open Repository of the University of Porto Automatic Generation of Cellular Automata on FPGA Andre´ Costa Lima Joao˜ Canas Ferreira Faculdade de Engenharia INESC TEC and Faculdade de Engenharia Universidade do Porto Universidade do Porto [email protected] [email protected] Abstract plex task, as several aspects such as concurrency must be taken into account, and may be time-consuming. CA of- Cellular automata (CA) have been used to study a great fer simplicity when it comes to define each cell’s func- range of fields, through the means of simulation, owing to tionality and owing to its inherent massive spacial paral- its computational power and inherent characteristics. Fur- lelism, locality and discrete nature, such systems are natu- thermore, CAs can perform task-specific processing. Spa- rally mapped onto the regular architecture of an FPGA, or- cial parallelism, locality and discrete nature are the main ganized in reconfigurable logic blocks; it is possible to per- features that enable mapping of CA onto the regular archi- form complex operations with efficiency, robustness and, tecture of an FPGA; such a hardware solution significantly most importantly, decrease drastically the simulation time accelerates the simulation process when compared to soft- with high-speed parallel processing. Thus, as a reconfig- ware. In this paper, we report on the implementation of a urable hardware device, FPGA platforms are a very good system to automatically generate custom CA architectures candidate for CA architecture implementations on hard- for FPGA devices based on a reference design. The FPGA ware: each cell functionality can be implemented in look- interfaces with a host computer, which initializes the sys- up tables, its state stored in flip-flops or block-RAM, and tem, downloads the initial CA state information, and con- it is possible to update the state of a set of cells in parallel trols the CA’s operation. The user interface is are provided every clock cycle. by a user-friendly graphical desktop application written in In this paper, we report on the implementation of a sys- Java. tem to automatically generate CA architectures on a tar- get FPGA technology, a Spartan6, the characteristics and rules of which are specified by the user through a software 1. Introduction application, that also allows controlling the operation, ini- tializing and reading the state of the CA. The implementa- Cellular automata have already been studied for over tion results show that it is possible to achieve a speed-up of half a century. The concept was introduced by John von 168, when comparing to software simulations, for a lattice Neumann in the late forties when he was trying to model size of 56 56 cells for the Game of Life [4] [5]. Further- × autonomous self-replicating systems [1]. Since then, CA- more, we achieved lattice dimensions of 40 40 and 72 72 × × based architectures have been extensively explored by the cells for the implementation of a simplified version of the scientific community for studying the characteristics, pa- Greenberg-Hastings [6] and lattice gases automata [7], re- rameters and behaviour through simulation of dynamic spectively. complex systems, natural and artificial phenomena, and The remainder of the paper is organized as follows. Sec- for computational processing [2]. Specific-application ar- tion 2 introduces the key characteristics and properties of eas are, e.g., urban planning, traffic simulation, pseudo- CA. Section 3 presents some of the existing work related RNG, complex networks, biology, heart modeling, image to implementations of CA in FPGA platforms. Section 4 processing, sound waves propagation and fluid dynamics. describes the overall system architecture, its features and Furthermore, CA is also a possible candidate for future al- specification. In section 5 the implementation results are ternative computer architectures [3]. summarized and discussed. Finally, section 6 concludes Nowadays, the need for creating simulation environ- this paper and future developments are proposed. ments of a high degree of complexity to describe rigorously the complete behaviour of a system, and for developing al- gorithms to process data and perform thousands of calcu- 2. Cellular automata lations per second, typically incurs a high computational cost and leads to long simulations. Taking advantage of Cellular automata are mathematical models of dynamic multi-core architectures, with parallel programming, to as- systems that evolve autonomously [8]. They consist in a set sign multiple task execution to different threads, enables of large number of cells with identical functionality with significant software optimizations and accelerated execu- local and uniform connectivity, and organized in a regular tion. However, parallel programming in software is a com- lattice with finite dimensions [9], for e.g., an array a matrix ISBN: 978-972-8822-27-9 REC 2013 51 or even a cube for the multidimensional scenario; bound- alatticegasCA(LGCA)tomodelandstudysoundwaves ary conditions, that can be fixed null or periodic (the lattice propagation. LGCA are models that allow to emulate fluid is wrapped around), define the neighbourhood of the cells or gas dynamics, i.e. particle collisions, and are conve- located at the limits of the lattice. For square-shaped cells, niently implemented with digital systems; thus, the real- the neighbourhood is typically considered to be of Moore world physics are described in discrete interactions. The [10] or von Neumann [11] types which are applicable to 2D authors describe the behaviour of each cell with a simple and 3D lattices; considering a single cell, the local neigh- set of collision rules. For simplicity, they consider unit ve- bourhood is defined by the adjacent cells surrounding it, locity and equal mass for every particle and four distinct di- i.e. the first order or unit radius neighbourhood. Thus, for rections, i.e. a typical von Neumann neighbourhood. The 2D lattices, the Moore-type neighbourhood includes all the current state of each cell indicates the direction in which 8 possible in a square shape and the von Neumann-type existing particles are traveling according to the collision neighbourhood includes 4 cells in a cross shape. Each cell logic implemented by them; thus, 4 bits are required to has a state that can be discrete or real-valued and the up- represent the outgoing momentum for each direction with date occurs in a parallel fashion in discrete time steps, i.e. the 9 possible combinations: a pair (vx,vy) of integer val- every cell in the lattice update its state simultaneously; at ues from ( 1, 1) to (1,1). The collision logic function − − the time instant ti, the cell states in the local neighbour- that evaluates a particle stream and outputs those that carry hood of a cell c are evaluated, and the next state for c, at on in the next cycle is defined as follows. A particle trav- ti+1, is determined according to its deterministic state tran- els to the east if a) a particle arrives from the west except sition rule that is a function combining arithmetic or logical if there is an east-west head-on collision and b) if there operations. However, this rule can be probabilistic if it de- is a north-south head-on collision. An head-on collision pends on a random nature variable; in this case, the CA is involves only 2 particles traveling in opposing directions. considered to be heterogeneous as the cells has no longer a This definition is analogous to the remaining directions. homogeneous functionality. The design was implemented in a SPACE (Scalable Par- allel Architecture for Concurrency Experiments) machine 3. Related work [12] which is an array of Algotronix CAL FPGAs whose technology is outdated by now. Results showed that it was possible to reach 3 107 cell updates per second, 2.2 times Vlassopoulos et al. [6] presented a FPGA design to × implement a stochastic Greenberg-Hastings CA (GHCA), lower than two CAM-8 modules. CAM (CA-machine) [13] which is a model that mimics the propagation of reaction- [14] is a dedicated machine whose architecture is optimized diffusion waves in active media. The architecture is orga- for high-scale CA simulation; it was developed by Tom- nized in group, block and cell partitions in a hierarchical maso Toffoli and Normal Margolus and it received a lot of way across the FPGA. Each group contains a set of blocks attention. and is processed in parallel as a top-level module; within a group, blocks are processed sequentially. Each block 4. System architecture contains a set of cells that are distributed around BRAM- type memory resources, to maximize its usage, and con- The overall architecture of the system described in this trol logic. Each cell contains a random event generator cir- work consists of a digital system design for a FPGA device cuit (LFSR), which is a requirement of the GHCA model, and software desktop application, which is a graphical tool state control logic and registers. Additional BRAMs are for the system user that provides an interface to control the used to handle boundary cells data, which are shared with hardware operation, a representation of the CA to initialize a subset of groups’ blocks. The FPGA environment inter- and read its state map, and the generation of bitstream files faces with an external host machine to initialize the CA, each one equivalent to a single custom CA specification. collect results and control its operation. A Xilinx Virtex 4 Thus, our approach allows the system user to parameter- (XC4VLX100-10FF1513) FPGA device was used operat- ize the hardware architecture according to its needs. Pre- ing at a frequency of 100 MHz; with a lattice of maximal built templates, that describe in Verilog the circuits of sep- dimensions of 512 512 cells, 26 % of the total slides avail- arate modules, are used as a reference to integrate complete × able and 136 out of 240 were required for implementation.

Load more