Micro-Sequencer Approach Speeds Reconfiguration
Total Page:16
File Type:pdf, Size:1020Kb
On The Softer Side Configurable Digital Signal Processing Micro-Sequencer Approach Speeds Reconfiguration Not all reconfigurable computing schemes were created equal. A micro-sequencer architecture offers faster performance and greater flexibility. Roozbeh Jafari, Graduate Student consumption and silicon area are cute their task. A non-flexible hardware Henry Fan, Graduate Student reduced by 72% and 77% respectively, by realization for such applications has to fit Majid Sarrafzadeh, Professor using a customized 8-bit data bus versus all required algorithm variations on the UCLA Computer Science Department a 64-bit data bus, while the speed is die. This, if possible, makes the design improved by 157%. and fabrication processes more compli- ey military applications such as cated and expensive. image processing, sonar/radar and FPGAs Make it Possible K SIGINT can enjoy significant perfor- Before getting into the details of the Dealing with Delays mance gains by using reconfigurable com- micro-sequencer scheme, it’s helpful to A major drawback of using runtime puting. Therefore, it’s perhaps surprising examine the underlying hardware that reconfiguration is the significant delay of that the technology has been so slow to makes reconfigurable computing possi- reprogramming the hardware. The total gain broad acceptance. Among the barriers ble. Programmable system capability runtime of an application includes the is a lack of clear understanding about its forms the heart of reconfigurable com- actual execution delay of each task on the real performance differences compared to puting. Programmable devices including hardware along with the total time spent traditional compute architectures. FPGAs contain an array of programma- for hardware reconfiguration between Group RTC The / ©2002 Call (949)226-2000 Work done at ULCA’s Computer ble computational units that can be pro- computations. The latter might domi- Science department paints a clearer pic- grammed through the configuration bits. nate the total runtime, especially for ture and proposes a method to enhance This gives us the flexibility of having ded- classes of applications with a small the performance of reconfigurable com- icated hardware to perform specific com- amount of computation between two puting. Researchers there devised the putational units combined with a paral- consecutive reconfigurations. Hardware concept of a micro-sequencer-based lelism capability. reconfiguration often takes hundreds of Reprint Orders reconfigurable system, and introduced Reconfigurable systems provide the milliseconds or longer based on the size For For with it the concept of Quick Recon- flexibility and reuse of hardware for mul- of the application. To reduce the recon- figuration. The approach exploits simi- tiple applications. Reconfigurable hard- figuration overhead, some previous larities among various applications to ware can be used to execute designs that works have used different approaches. save on reconfiguration time. are larger than the available hardware In many applications, only a small Using the micro-sequencer architec- resources. In such cases, a part of a large portion of the design changes at a time ture, the group demonstrated that a sig- application is executed on the hardware. and the entire hardware does not have to nificant increase in speed is gained by By reusing the reconfigurable hardware, be reconfigured. This has led the industry reconfiguring a system on a set of image- the remaining tasks of the application can to add the capability of Partial processing benchmarks. The benchmarks be loaded and executed on the hardware Reconfiguration to some of their recent took merely hundreds of microseconds at runtime. This is known as runtime products. FPGAs are examples of such versus the hours needed in traditional reconfiguration. Another issue that neces- reconfigurable hardware, and some of FPGA reconfigurations. Using this archi- sitates the integration of reconfiguration the recent FPGA devices have the capa- tecture also makes the parameters of a in a hardware platform is that some appli- bility of partial runtime reconfiguration. system, for instance the size of data bus, cations require reconfiguration in differ- Another method used to gain speed easy to modify in order to customize the ent abstraction levels of the system. For is called configuration prefetching. This system for a specific application. The example, some applications require dif- method tries to overlap the computation results, detailed later, show that power ferent variations of an algorithm to exe- with the reconfiguration of the hardware. June 2003 COTS Journal [ 49 ] The Softer Side Therefore, to maximize this overlap, it not vary substantially from one task to seeks a way to minimize the chance that another. The motivation of this research Op-codes for an Algorithm reconfiguration is prefetched falsely. is to enhance the reconfiguration process PLA (computes the start address of the micro-instruction) Configuration compression is anoth- for image-processing algorithms. It’s er approach to minimize the reconfigura- important, therefore, to verify if image- MUX tion time. In this method, the configura- processing algorithms are altered substan- µPC tion delay is reduced by compressing the tially during the reconfiguration process. data transferred from the host computer Most image-processing algorithms Control Store to the programmable system. A major are computationally intensive and should µInstruction Word portion of delay is due to the distance be executed on hardware resources to between the host computer and the pro- allow real-time processing. Moreover, Input Data Data Path Output Data grammable device. Reconfiguration can these algorithms change in nature and be accelerated by using a fast memory parameters based on information avail- Figure 1 (configuration cache) near the reconfig- able from targets. For instance, in a fea- The block diagram depicts a typical urable array. This method is called config- ture-tracking algorithm, the number of design of a microcoded control unit. The uration caching. targets, their position and their distance to the camera can change the algorithm microprogram counter contains the Speedy Micro-Sequencing (or its parameters) to increase the effi- address of the next microinstruction to be In Micro-Sequencer-based Quick ciency of tracking motions. As a result, fetched from the control store, a fast local Reconfiguration (MSQR), the reconfigura- image-processing algorithms are proper memory that contains the control words. tion is performed by altering the algorithm candidates for mapping onto reconfig- The control word is copied into the loaded onto the memory of the proposed urable resources. This not only provides microinstruction registers. architecture. Also new instructions can be fast running time, but also allows dynam- generated by modifying the microcodes ic modification of the algorithm through to be fetched from the control store, a fast stored in the control unit. Using microcod- runtime reconfiguration. Both cannot be local memory that contains the control ed architecture in the control unit of the achieved by mapping this type of algo- words. The control word is copied into micro-sequencer facilitates the parallelism rithm onto traditional fixed software or the microinstruction registers. The con- and adds new computational units to the hardware platforms. trol store consists of microinstructions datapath. In this case, the control unit The micro-sequencer architecture that control the datapath directly. needs the minimum modification since it makes use of microcode architecture for Because image-processing algo- is highly regular compared to sequential- the design of the control unit. In this rithms tend to have similar computation- machine-based controllers. approach, the relation between inputs al behavior, it can be inferred that the Group RTC The / ©2002 Call (949)226-2000 Efficient reconfiguration of an FPGA and outputs is treated as a memory sys- capabilities of the datapath satisfies the is a critical issue because of the time over- tem. Control signals are stored as words new algorithm and it does not have to be head involved. Sometimes in order to in a microcoded memory. At each clock modified. However, since the algorithm reconfigure a system from one algorithm tick during instruction execution, the changes, the opcode part of the micro- to another, the processes of synthesis, appropriate (micro) control word is sequencer, which contains the instruc- placement and routing have to be per- fetched from microprogram memory to tions for a specific algorithm, has to be Reprint Orders formed, which are highly expensive in supply the control signals. updated. This process can be simply done For For terms of CPU time and may take hours to The microcode control unit itself is a by writing the new algorithm (the new complete. In the micro-sequencer small stored program computer. It has a instructions) onto the memory. This approach, when a system is to be recon- micro program counter, a microprogram approach is significantly faster than tra- figured, only the new algorithm has to be memory and a microinstruction word, ditional physical reconfiguration. loaded onto the memory micro- which contains the control signals and Sometimes, due to the constraints of sequencer. When new instructions are sequencing information. The action of new algorithms, the order of utilization required for the new algorithm, the the microcode control unit