Seeking Solutions in Configurable Computing

Seeking Solutions in Configurable Theme Feature Computing Configurable computing offers the potential of producing powerful new computing systems. Will current research overcome the dearth of commercial applicability to make such systems a reality? onfigurable computing systems com- figuration information for the programmable switches William H. bine programmable hardware with and state information for active computations. Before Mangione-Smith programmable processors to capitalize programming, the chips present a partial architecture, University of California, on the strengths of hardware and soft- which is then refined according to the configuration Los Angeles C ware. Often these systems must also information. The configured device provides an exe- Brad Hutchings address the difficulties of both hardware and software, cution environment for a specific application. Brigham Young University because they mix the technology. While the origins of The most common devices used for configurable configurable computing go back at least 30 years, the computing are field programmable gate arrays. FPGAs David Andrews University of past eight years have brought about a significant present the abstraction of gate arrays, allowing devel- Arkansas increase in research activity. opers to manipulate flip-flops, small amounts of mem- André DeHon Since at least 1989,1 configurable computing sys- ory, and logic gates. University of tems2 have demonstrated the potential for achieving Figure 1 illustrates the basic architectural compo- California, Berkeley high performance for a range of applications, includ- nents of all configurable computers. This highly Carl Ebeling ing image filtering, convolution, morphology, feature abstracted model allows a wide range of design choices, University of extraction, and object tracking. Researchers have all of which revolve around three main decisions. Washington developed prototype systems that achieve performance Reiner an order of magnitude higher than more conventional • Granularity of programmable hardware. Most Hartenstein approaches for a number of applications. However, existing configurable computers use commercial University of Kaiserslautern realizing this potential outside of the laboratory has FPGAs. Consequently, application development Oskar proven difficult because these systems rely on manip- involves the use of traditional CAD tools, which Mencer ulating low-level abstractions—digital circuits, for were developed for application-specific integrated Stanford example—and thus require highly skilled developers. circuits (ASICs). Many application developers University find this low-level abstraction difficult to work John Morris CURRENT STATE OF AFFAIRS with, and the systems achieve poor circuit den- University of Western The earliest configurable computing machine was sity for highly regular structures such as multi- Australia likely proposed, designed, and implemented by Gerald pliers. To raise the level of abstraction, several Krishna Palem Estrin at UCLA in the early 1960s.3 Estrin proposed the configurable computing systems under develop- New York University “fixed plus variable structure computer,” which dedi- ment limit the programmable hardware to the Viktor K. cated hardware to both an (inflexible) abstraction of a interconnect, and in the place of gates and flip- Prasanna programmable processor and a (flexible) component flops they use components such as arithmetic logic University of that implemented digital logic. This basic architecture, units (ALUs) or multipliers. Southern • California which supports programmed hardware and software, is Proximity of the CPU to the programmable hard- at the core of all subsequent configurable computing ware. First-generation systems typically used Henk A.E. Spaanenburg systems. Unfortunately, Estrin’s architectural concepts peripheral buses like the Sparc SBus to provide a Lockheed were well ahead of the enabling technology, and he was coprocessor-like structure. Recently, some re- Sanders only able to prototype a crude approximation of his searchers have argued that the programmable vision. Many of the concepts that are now being dis- hardware must be much closer to the processor, covered by the configurable computing community lie perhaps even on the datapath, fed by processor quietly unheeded in Estrin’s early publications. registers. This issue affects hardware design as The enabling technology behind the renewed inter- well as application development. est in configurable computing is the availability of • Capacity. Different system designers have made high-density VLSI devices that use programmable drastically different choices about fundamental switches to implement flexible hardware architectures. questions of system capacity. What is the best These chips contain memory cells that hold both con- ratio of programmable hardware to memory size 38 Computer 0018-9162/97/$10.00 © 1997 IEEE . and bandwidth? Or processor communication Memory bandwidth? How much programmable hardware is required: an unlimited amount for applications with unbounded parallelism or only so much? Granularity of programmable hardware The configurable computing community is divided into two camps, according to the level of abstraction provided by the programmable hardware. The major- Microprocessor ity of current research efforts use commercial FPGAs Memory and manipulate digital circuits through logic gates and Programmable flip-flops. We will refer to these devices as netlist com- hardware: Gates puters. As part of conventional CAD development of and/or interconnect ASICs, digital circuits are translated into netlists, which are composed of logic gates and flip-flops. In the second camp are the newer architectures, Memory which are based on “chunky” function units such as Memory complete ALUs and multipliers. These architectures limit the programmable hardware to the interconnect among the function units, but implement those units Operation Operation specified with in much less IC area. specified in explicit concurrency, such Netlist computers. A typical netlist computing high-level as loops extracted from language, C or Fortran, device is an FPGA containing thousands of low-pow- like C, VHDL, Verilog, or ered processing elements. For example, an FPGA cell Fortran, schematic diagrams might consist of a single flip-flop and a function gen- Pascal, or Java of digital circuits erator that implements a Boolean function of four variables. FPGAs have a programmable interconnect that is manipulated as individual wires. Because of must be concerned with the size and usage of the FPGA Figure 1. their fine granularity, netlist computers are the most devices, the size and usage of the memories, and finally Architectural flexible configurable computers; their elements can the overall interconnection of all devices on the plat- components of a be used to implement state machines, datapaths, and form during all phases of the design process. The configurable nearly any digital circuit. This flexibility is purchased design process is therefore more difficult and time- computer. with additional silicon, and it results in lowered per- consuming. Furthermore, modifications can require a formance on certain classes of problems, compared significant amount of CAD compilation time. to chunky architectures. The two best-known netlist The challenge to the designers of netlist computers computers are Splash4 and DECPeRLe-1.1 is to show that the increased flexibility presented by Conceptually, Splash consists of a linear array of a low-level abstraction is essential to enable an impor- processing elements. This topology makes Splash a tant class of applications, thus compensating for the good candidate for linear-systolic applications, which increased design difficulty. While a small number of stress neighbor-to-neighbor communications. Because netlist computing systems are available,5 they have of limited routing resources, Splash has not proven as achieved little commercial success thus far. effective at implementing multichip applications that Chunky function unit architectures. General-pur- are not linear systolic, though some progress has been pose processors (including digital signal processors) made. The DECPeRLe-1 is organized as a two-dimen- use optimized function units that operate in bit-par- sional mesh and consists of a 4 × 4 array of FPGAs. allel fashion on long data words. Compared with Each FPGA has connections to its nearest neighbors GPPs, FPGAs are inefficient for performing ordinary as well as to a column bus and a row bus. arithmetic and logic operations. Netlist computing The designers of Splash and the DECPeRLe-1 con- has the advantage when it comes to nonstandard bit- structed them as attached accelerators alongside work- oriented computations such as count-ones, find-first- stations. Neither Splash nor DECPerRLe-1 provide one, or complicated masking and filtering. general-purpose routing networks between FPGAs. At the same time, much of the research in config- Instead they require the designer to manually partition urable computing has focused on parallel DSP applica- the circuit during the design phase, ensuring that the tions. Examples include image morphology, sensor available interconnect is used as efficiently as possible. beam forming, and object recognition. These tasks usu- The netlist computer presents a number of serious ally process sensor data that is 8-12 bits wide. Some fil- challenges to application development. The developers tering might also be necessary, which typically requires December 1997 39

Seeking Solutions in Configurable Computing

Sustaining Moore's Law Through Inexactness

An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors

School of Electrical and Computer Engineering 2000-2001 Annual Report

Flexible Error Handling for Embedded Real-Time Systems – Operating System and Run-Time Aspects –

Lecture Notes in Computer Science 1742 Edited by G

Hardware and Software for Approximate Computing

Overcoming the Power Wall by Exploiting Application Inexactness and Emerging COTS Architectural Features: Trading Precision for Improving Application Quality 1

GRADUATE STUDY in COMPUTER SCIENCE Rice University Graduate

An Optimum Inexact Design for an Energy Efficient Hearing

Exploiting Errors for Efficiency: a Survey from Circuits to Algorithms

Error-Efficient Computing Systems

Microarchitecture-Aware Physical Planning for Deep Submicron Technology Mongkol Ekpanyapong