Programmable Cellular Architectures at the Nanoscale

Nano Communication Networks 1 (2010) 77–85 Contents lists available at ScienceDirect Nano Communication Networks journal homepage: www.elsevier.com/locate/nanocomnet Programmable cellular architectures at the nanoscale Pritish Narayanan a, Teng Wang b, Csaba Andras Moritz a,∗ a 151 Holdsworth Way #301, University of Massachusetts Amherst, Amherst MA 01003, United States b QualComm Inc. 5535 Morehouse Dr, San Diego, CA 92121, United States article info a b s t r a c t Article history: This paper presents the first fully programmable digital cellular design for nanodevice- Received 4 June 2010 based computational fabrics. The system has a fully regular structure and consists of a large Accepted 9 July 2010 number of simple functional units called cells. It is programmable, based on a small number Available online 27 July 2010 of global signals routed from supporting CMOS and associated nanoscale circuitry. The architecture may be adapted to suit a multitude of information-processing paradigms. One Keywords: example is shown on a two-dimensional (2D) semiconductor nanowire fabric including NASICs corresponding circuit-level aspects. Key metrics such as the density and performance are Cellular neural networks Parallel architectures evaluated. It is seen that this digital cellular design may be up to 22 times denser than Nanowires an equivalent projected 16 nm CMOS version for image-processing applications. High Nanofabrics performance is achieved, with megapixel-size images estimated to require only a few microseconds for processing. Possible manufacturing routes and defect tolerance aspects in the context of image-processing applications are also discussed. ' 2010 Elsevier Ltd. All rights reserved. 1. Introduction In this paper, we explore a new architecture for nanodevice-based computational fabrics called the NAno- Reliable manufacturing of large-scale computational device-based Programmable Architecture (NAPA). We systems based on promising nanodevices such as semi- discuss NAPA in the context of the NASIC computational conductor nanowires (NWs), carbon nanotubes (CNTs) and fabric. However, NAPA and the methodology may be molecular devices continues to be challenging. Different equally applicable to other nanodevice-based fabrics which employ some form of two-level logic, e.g., CMOL, fabric architectures for nanoscale systems have been pro- FPNI, NanoPLA, etc. Given constraints on customization in posed; see, for example, [2,13,18,8]. In all these cases, an nanoscale systems (i.e., due to their being built at least important objective of the fabric has been to minimize partially with bottom-up chemical self-assembly), NAPA- the underlying manufacturing and device requirements. like systems are a potentially promising direction for For example, a microprocessor design built on semicon- applications such as signal processing, single instruction ductor nanowire grids using the NASIC (nanoscale applica- multiple data (SIMD), and image processing on such tion specific integrated circuit) fabric is shown in [13,14]. fabrics. Various optimizations, enhancements and defect/fault tol- The NAPA design is regular and highly parallelized, erance techniques have been proposed for NASICs to re- with a large number of identical functional units or duce the manufacturability constraints and improve the cells performing a given task. Instructions to each cell performance and projected yield of these designs. are transmitted on a limited number of global signals controlled from reliable peripheral CMOS circuitry. The cells themselves are composed of a small number of ∗ Corresponding author. nanodevices that perform simple computations and are E-mail addresses: [email protected] (P. Narayanan), locally interconnected. The collective behavior of a large [email protected] (C.A. Moritz). number of interconnected cells achieves information URL: http://www.ecs.umass.edu/ece/andras (C.A. Moritz). processing. 1878-7789/$ – see front matter ' 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.nancom.2010.07.003 78 P. Narayanan et al. / Nano Communication Networks 1 (2010) 77–85 These sorts of collective computation systems may be a dynamic rule of evolution of the CNN is the Kirchoff current `natural fit' for nanoscale implementations because (i) high law at the state node. Circuit-level implementations of an densities are available for a high level of parallelism, (ii) analog CNN are shown in [7,11]. local interconnections with minimal global routing remove A detailed comparison of different approaches to the need for complex interconnect structures and (iii) 100% building CNN systems at the nanoscale is presented in [16]. fault-free components are not a requirement since some In general, analog implementations of a CNN may be defective cells in the design may not adversely affect the difficult if not impossible to realize at the nanoscale. overall output integrity, and this allows more aggressive These designs require arbitrary sizing and customization manufacturing processes. However, given the high rates of devices and precise control of transistor operating of defects in nanomanufacturing, built-in defect/fault points, and may therefore not be achievable with self- tolerance techniques or reconfiguration support (such assembly-based approaches that favor the formation as assumed in [2,18,19]) will need to be incorporated of regular structures and are not amenable to that at various levels of NAPA to ensure that a sufficiently degree of nanoscale customization. While the CMOL large number of cells are functioning correctly for output Crossnet [8] is an analog Hopfield-like neural network, integrity. it uses CMOS for computation and signal restoration. NAPA may be used to implement a variety of com- Nanowires/nanodevices are used only for interconnects. putational paradigms. In this paper one possible imple- Digital MOSFET-based implementations of the CNN mentation based on a computational model used in a include [17,3]. In digital CNN systems, the cell is a digital discrete-time digital cellular neural network (CNN) for circuit evaluating a Boolean expression over discrete clock image-processing tasks is discussed. Multiple image- cycles. The input, state and outputs are binary values, and processing tasks are possible by controlling the global in- state evolution is through the execution of combinational struction signals, which act as the templates for the CNN logic followed by latching at the clock edge. Templates are cells. Hence, the design is fully programmable and can run stored in and retrieved from registers. a variety of processing tasks in a sequence. In addition to analog and digital implementations, The main contributions of this paper are as follows. researchers are also exploring the creation of nanoscale (i) A new cellular architecture for nanodevice-based dig- architectures using specialized devices. For example, [9,6] ital computational fabrics is presented; (ii) a method to show the use of resonant tunneling diodes (RTDs) for CNN achieve full programmability for this architecture without architectures. [23] explores a tunneling phase logic (TPL)- any additional manufacturing requirements is proposed; based implementation of a CNN in which the electrical and (iii) key metrics such as area and delay are evaluated; phase of the tunneling phenomenon is used to encode the the density benefits of the nanodevice-based design over logic state, and nearest-neighbor interaction is achieved an equivalent projected 16 nm CMOS implementation are by capacitive coupling. While these specialized device- discussed. As far as we are aware, NAPA is the first pro- based approaches may possibly achieve high integration posed nanodevice-based digital cellular implementation densities and fast performance, there are many unresolved that properly addresses programmability and physical im- issues: for example, the need to address each individual plementation issues such as how to deliver template values nanodevices, support for template programming, and to individual nanocells. in general the manufacturability of complete fabrics The rest of the paper is organized as follows. Section 2 based on these devices. We therefore believe that these provides an introduction to cellular neural networks. systems, while theoretically very interesting, may only Section 3 discusses the new cell-based architecture. be feasible in the longer term, if at all. By contrast, key Section 4 evaluates the area and delay on a two- manufacturing steps for digital systems such as NASIC have dimensional (2D) nanowire grid. Section 5 discusses defect already been demonstrated [15,13]. With improvements tolerance aspects. Section 6 concludes the paper. in nanomanufacturing, digital nanoscale designs may be aggressively scaled to achieve tremendous benefits in 2. Cellular architectures density, performance and other key metrics. A cellular neural network (CNN, also known as a cellular 3. The NAPA cellular architecture nonlinear network) [1] is a massively parallel computing system made of identical units called cells. Each cell has 3.1. Overview of NASIC an input, an output and an internal state that evolves based on certain dynamic rules called templates. Most NASIC (nanoscale application specific integrated cir- interconnections are local, and each cell exchanges signals cuit) fabric is a nanoscale fabric proposed for semicon- with its nearest neighbors. This system may be used for ductor nanowires and targeting datapaths. NASIC designs applications such as image

Programmable Cellular Architectures at the Nanoscale

System Trends and Their Impact on Future Microprocessor Design

An Overview of the Blue Gene/L System Software Organization

Cellular Wave Computers and CNN Technology – a Soc Architecture with Xk Processors and Sensor Arrays*

Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64

Focal-Plane Analog VLSI Cellular Implementation of the Boundary Contour System

Software-Defined Hyper-Cellular Architecture for Green and Elastic

An Overview on Cyclops-64 Architecture - a Status Report on the Programming Model and Software Infrastructure

Virtualized Baseband Units Consolidation in Advanced Lte Networks Using Mobility- and Power-Aware Algorithms

Simulating Linux Clusters on Linux Clusters

Evaluating Cyclops64

Efficient Synchronization for a Large-Scale Multi-Core Chip Architecture

Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture