Cellular Wave and CNN Technology – a SoC architecture with xK processors and sensor arrays*

Tamás ROSKA1 Fellow IEEE

1. Introduction and the main theses

1.1 Scenario: Architectural lessons from the trends in manufacturing billion component devices when crossing the threshold of 100 nm feature size Preliminary proposition: „ The nature of fabrication technology, the nature and type of data to be processed, and the nature and type of events to be detected or „computed” „ will determine „ the architecture, the elementary instructions, and the type of algorithms needed, hence also „ the complexity of the solution.

In view of this proposition, let us list a few key features of the electronic technology of Today and its consequences..

(i) Convergence of CMOS, NANO and OPTICAL technologies towards a cellular architecture with short and sparse wires CMOS chips: * Processors: K or M transistors on an M or G transistor die =>K processors /chip * Wires: at 180 nm or below, gate delay is smaller than wire delay NANO processors and sensors: * Mainly 2D organization of cells integrating processing and sensing * Interactions mainly with the neighbours OPTICAL devices: * parallel processing * optical correlators * VCSELs and programable SLMs, Hence the architecture should be characterized by * 2 D layers (or a layered 3D) * Cellular architecture with * mainly local and /or regular sparse wireing leading via Î a Cellular Nonlinear Network (CNN) Dynamics

1 The Faculty of Information Technology and the Jedlik Laboratories of the Pázmány University, Budapest and the and Automation Institute of the Hungarian Academy of Sciences, Budapest, Hungary ([email protected], www.itk.ppke.hu) * Research supported by the Office of Naval Research, Human Frontiers of Science Program, EU Future and Emergent Technologies Program, the Hungarian Academy of Sciences, and the Jedlik Laboratories of the Pázmány University, Budapest

0-7803-9254-X/05/$20.00 ©2005 IEEE. 557 towards Î Cellular Wave Computers (ii) The DATA and EVENTS are: „ DATA: Multimodal data flows: 2D topographicFLOWS, combined with non- topographic signals, „flowing” continuously from the sensor arrays, tuned interactively with the array „ EVENTS: Spatial-temporal multimodal events - spatial-temporal PATTERNS (iii) The algorithms and logic: *Spatial-temporal Non-Boolean logic on dynamic patterns via *α recursive functions A new world of software with elementary instructions as nonlinear waves (PDE solutions) is emerging implemented in the stored programmable sensory cellular wave computers.

1.2 Mind-like versus brain-like computing Present day classical computers, developed during the last sixty years are essentially logic machines, based on binary logic and arithmetic, acting on discrete valued (binary coded) data. Its unique property is algorithmic (stored) programmability, invented by John von Neumann. The mathematical concept is based on a Universal Machine on integers (). We call it a Mind-like computer, since the elementary instructions are based on arithmetic-and-logic operations, abstract notions reflecting our mind, naturally abstracted from the world. Their algorithms are logic sequences on these operations. When these machines have been invented, this was in complete agreement with the view how neurons and the brain were envisaged, as threshold logic. The invention of the transistor (1948) and later the (1960) made this not only practical, but also cheap and, nowadays, an ubiquitous commodity. However, now we know that neurons and the nervous tissues operate differently. Today, a brain-like system has the following properties: • Continuous time continuous valued (analog) signal arrays (flows) • Several 2 Dimensional strata of analog „processors” (neurons)Typically, mainly local, or sparse global (-like) interconnectionsSensing and processing are integratedVertical interconnections between a few strata of neuron “processors”Variable delaysSpatial-temporal active waves with Events as patterns in space and/or timeThese features are seemingly in almost complete agreement with the properties concluded in the previous section. Hence, they are strongly modifying our view and practice in building complex electronic systems, including sensing, computing, activating and communicating devices and systems. This way of thinking, however, is supposing a completely different architecture, physical and algorithmic alike, and supposes tens of thousands or millions of parallel physical processing devices.

2 An axiomatic introduction of adaptive sensory Cellular Wave Computers and Wave-Logic

558 A universal and canonical computing architecture, after the forms of data are set, contains the simplest possible building blocks, with the simplest possible interconnections, elementary instructions and programming constructs. Then we introduce algorithmic stored programmability to make it universal and practical. A most successful example is the digital computer, with a core universal machine on integers (Turing machine).

2.1 The basic cellular nonlinear network CNN dynamics In view of the properties of a brain-like computer, the data are topographic (image) flows. We assign one processor to one sensory element (pixel, taxel, etc.). In the simplest case: we have a time varying pixel array with each pixel, at each time instant (defining a picture), having a light intensity of gray values between black (say, +1) and white (say, – 1) values. Color pictures are composed of several pictures with different color content. A special caseof a picture is a binary (black-andwhite) mask.

Now, let us construct a programmable topographic cellular sensory dynamics, as implementing the protagonist elementary instruction. The recipe is as follows. • Take the simplest dynamical system, a cell (input u, state x and output y are all real valued vector function of the continuous valued time)) • Take the simplest spatial grid for placing the cells with the simplest neighborhood relation (2D sheets) • Introduce the simplest spatial interactions between dynamic cells, being programmable (called cloning template or gene, or simply template) • Add cellular sensors, typically, cell by cellThese steps are leading to a one-layer cellular nonlinear network (CNN) architecture with programmable cloning templates T {A, B, z} shown in Figures 1. The corresponding canonical dynamic differential equations for each cell are shown in Figure 2, and a bus-like sparse set of connections is useful in siome cases, as well. Multi-layer architectures are not considered here. Details can be found in the recent textbook [1]. Observe that : „ The canonical differential equations shown in Figure 2, called also as standard CNN dynamics are very sparse and very simple, though quite rich in spatial- temporal dynamics. Each equation contains 20 terms only (including the time constant) for a 3x3 neighborhood, independently of the number of cells. These 20 (19 if the time constant is considered as a unit) terms are the parameters of the cloning template T. Many complex wave equations can be described by them, with possible 2-layer architecture. Indeed, the archetype is the Turing morphogenesis equation, a special case of this 2-layer CNN equation. „ This more general definition of CNN is including many special constructs (including multilayer or complex cell), many dynamic patterns as detected events (stable gray scale or binary patterns, periodic attractors, spatial-temporal chaotic attractors, etc.), and is physical implementation independent. The diverse physical implementations so far include: a mixed signal (analog-binary) circuit array, an optical system, an emulated digital device, a quantum dot array, a molecular array (e.g. bacteriorhodopsine), etc. „ As a special case, the CNN could be programmed to model locally connected neural networks, e.g. modeling different retinas, etc.

559 Considering the input array flow and the output array flow as the input-output relation, the CNN dynamics is an elementary instruction of an array computer on image flows. The functionality of it is described by the cell dynamics and the cloning template. Notice that this computing array is not necessarily a Single Instruction Multiple Data (SIMD) computing machine, indeed, with a slight extension (already available in operational visual ) it is also a Multiple Instruction Multiple Data (MIMD) machine, having a space variant (even locally adaptive) template Tij (the simplest case is the space variant threshold or bias, bij ).

2.2 The Universal Machine on Flows and the Cellular Wave Computer [1, 2, 7] Now, we select the CNN dynamics as an elementary instruction of an array computer on image flows. This is a drastic departure in constructing a computer, with the protagonist instruction implemented by a programmable CNN dynamics solving a nonlinear wave equation on data as image flows. The axiomatic foundation or the recipe to form a generic spatial-temporal machine is as follows. • Take the topographic sensory cellular dynamics, axiomatically introduced in Section 2.1 , as the protagonist instruction, with programmable templates; • Construct global operators (functional) on a picture or on an image flow;Construct local memories in each cell to store intermediate results cell by cell;Construct a local communication and in each cell communicating with the global programming unit, called the Global Analog-logic Programming Unit (GAPU);This unit, the GAPU, hosts the global instruction registers (the templates as well as some optional local logic or arithmetic function codes), the configuration registers for reconfiguring the cells, and the executable program, that is the algorithmic sequence of CNN wave dynamics, global decisions, and local logic or arithmetic instructions. The machine code of this program is hosted in the Global Analog-logic Control Unit (GACU).Hence, we get formally the CNN Universal Machine (CNN-UM) on image flows, or more rigorously, a Cellular Wave Computer on Flows. Observe that here the simplest elementary instruction, the solution of a nonlinear PDE is the one that is typically the most complex task on a digital . This is the reason, why a CNN-UM based visual microprocessor, e.g. the ACE 16k, is so powerful in Speed-Power-Area measures.

2.3 Wave algorithms and wave logic: combining spatial-temporal nonlinear waves and binary logic In classical computers, algorithms are defined on integers and one of the most difficult tasks is the solution of a nonlinear wave equation, for example the Navier-Stokes equation for fluid dynamics or a reaction diffusion equation. The formal definition of an algorithm in a Cellular Wave Computer has been recently introduced formally [7], although, many outstanding algorithmic solutions for very difficult problems have already been invented during the last 12 years when using the various implementations of the CNN Universal Machine [2]. The data are image flows, the operators on image flows

560 and the formal definition of an algorithm, is defined by the α-recursive function [7 ]. This Universal Machine on Flows is universal in both Turing sense as well as in a sense of a nonlinear spatial-temporal operator with fading memory. It is remarkable, that the discrete space Partial Differential Difference Equation (PDDE), as a CNN dynamics, is more general than the continuous space PDE [12]. The numerous examples and application case studies published in the literature during the last 12 years illustrate the power of the concept of the CNN Universal Machine on Flows (UMF) and the various physical implementation of CNN Technology described in the next section are, however, only the tip of an iceberg. Indeed, a new kind of algorithmic thinking is emerging and a new software technology is developing based on the different physical implementations.

3 Topographic sensory-computing- activating circuits, systems, and microprocessors via diverse physical implementations 3.1 Topographic sensory (incl. visual) CMOS microprocessors Recently, a powerful physical implementation of a cellular wave computer principle had been fabricated the ACE 16k visual microprocessor [13], it serves also as the entry level member of a new family of visual microprocessors and related systems. The mixed-mode CMOS chip, ACE 16k, has been embedded into the self contained Bi-I camera computer (with two “eyes”), the fastest of its kind, with over 30 000 frame per sec, winning the product of the year title at Vision 2003. Optical sensing is integrated with stored programmable cellular array computing, and the results are sent out in digital form, as an event address as well. The present roadmap is shown in Figure 3. The main challenge is now to find the optimum design in trading off universality, flexibility with efficiency. This is an area where the CAD technology of deep submicron integrated circuits would play a major role. Many other types of CMOS implementations are available, including emulated digital and FPGA. 3.2 Cellular wave computing principles in optics and nanotechnology – the function-in-layout principle The Cellular Wave Computer architecture is a natural one for optical computers (see our POAC system) where the corralation is done with the speed of light and by the law of physics, the main problem was to introduce stored programmability. Likewise, as soon as we go below 100 nm the blessing of short wires is trivial. Moreover, the function is layout means that the locally active cells, even with the given parasitics will be arranged to implement function via the locally connected layout.

4 The Bio component The CNN architecture is seemingly a prototybe “design” for many sensory processing organs, as well as in many cortical parts of the nervous system. This area is studied extensively in the literature. Recently, there is an emerging area at the crossroads of Information and Communication Technologies and Life Sciences . We call it bionics. A paradigmatic area is vision, in particular the retina. The recent breakthrough in undestanding the multi-channel model of the mammalian retina [9] proved the importance in modeling via the multilayer CNN dynamics as well as the real time

561 implementation using the Bi-i camera-computer. Moreover, we have developed other sensory models (tactile, etc.) and bio-inspired cellular wave computing algorithmic principles

7 References [1] L.O.Chua and T.Roska, Cellular Neural Networks and Visual Computing, Cambridge University Press, Camridge, UK, 2002, paperback: 2005 [2] T. Roska, “ Cellular Wave Computers for Brain-like Spatial-Temporal Sensory Computing”, IEEE Circuits and Systems Magazine, Vol. 5, pp.5-19, 2005 [3] T.Roska and Á.Rodriguez-Vázquez, “Towards Visual Microprocessors”, Proceedings of the IEEE, July 2002 [4] Á.Zarándy and Cs. Rekeczky, the companion paper in [2] [5] L. O. Chua, CNN: A Paradigm for Complexity, World Scientific, Singapore, 1998 [7] T.Roska, “Computational and Computer Complexity of Analogic Cellular Wave Computers”, Journal of Circuits, Systems and Computers, vol. 12, pp. 539-562, 2003 [8] Sz. Tőkés et. al., “Flexibly Programmable Opto-electronic Analogic CNN Computer (POAC) Implementation Applying and Efficient, Unconventional Optical Correlator Architecture”, ibid. , pp.739-768, 2003 [9] B.Roska and F.S.Werblin, “Vertical interactions across ten parallel, stacked representations in the mammalian retina“, Nature, vol. , pp. , March 29, 2001 [10] D. Bálya, B. Roska, T. Roska, and F. S. Werblin, “A CNN Framework for Modeling Parallel Processing in the Mammalian Retina”, Int, J. Circuit Theory and Applications, Vol. 30, pp.363-393, 2002 [11] M. Gilli, T. Roska, P. P. Civalleri, and L. O. Chua, “CNN dynamics represents a broader class than PDEs”, Int. J. Bifurcation and Chaos, vol. , pp. 2003 [12] Cs. Rekeczky and L. O. Chua, “Computing with Front Propagation: Active Contour and Skeleton Models in Continuous-Time CNN”, Journal of VLSI Signal Processing, Vol.23. No.2/3. pp.373-402, Kluwer, 1999 [13] S. Espejo, R. Carmona, R. Domínguez-Castro and Á. Rodríguez-Vázquez, “A CNN Universal Chip in CMOS Technology”, International Journal of Circuit Theory and Applications, vol. 24, pp. 93-109, 1996

Recent Special issues: Int. J. Circuit Theory and Applications, March-June, 2002 Journal on Circuits, Systems and Computers, August and December, 2003 International Journal on Bifurcation and Chaos, February, 2004 (a bio-issue, 3 papers related to CNN) IEEE Transactions on Circuits and Systems I, May, 2004 web site : http://www.ieee-cas.org/~cnnactc

562 xij - state/ yij - output

z - bias

uij – sensory input z= -0.5

Template - the program of the network: [A B z]

Figure 1

= + dx ij / dt - ax ij i t = + + + - ax ij ∑ A(ij,kl) y kl ∑ B(ij,kl) u kl z ij

Figure 2 The CNN dynamic array equations

563

CNN technology roadmap

ACE16k Q-Eye*

XENON+ ACE400 ACE4k 176x144 20x22, QCIF bin I/O, 128x128, gray I/O, dual 128x96,gray optical input optical input optical 30 000 fr/sec 50 000 fr/sec I/O,optical input 5 000 fr/sec input 64x64, gray I/O, emulated- par. optical input, digital 1000 fr/sec

1995-96 1998-99 2003 2005 2004/05

Except the Xenon, all are designed in Seville *AnaFocus Ltd., Seville, + Comp. Aut. Inst. Budapest Figure 3

564