(2009 A-SSCC).Indd

7-2 IEEE Asian Solid-State Circuits Conference November 16-18, 2009 / Taipei, Taiwan CRISP-DS: Dual-Stream Coarse-grained Reconfigurable Image Stream Processor for HD Digital Camcorders and Digital Still Cameras Tsung-Huang Chen, Jason C. Chen, Teng-Yuan Cheng, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Electronics Engineering and Department of Electrical Engineering National Taiwan University BL-421, 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan Abstract— A 329mW 600M-Pixels/s dual-stream coarse- In this paper, a dual-stream coarse-grained reconfigurable grained reconfigurable image stream processor is implemented in image stream processor (CRISP-DS) is designed to support μm 2 TSMC 0.13 CMOS technology with a core size of 4.84mm . HD image processing with high power efficiency. It is charac- The reconfigurable pipelined processing element array architecture makes a good balance between computing performance and terized as follows. First, the reconfigurable pipelined process- flexibility with only 10Kb on-chip memory. Moreover, a new dual- ing element (PE) array architecture can provide a good balance stream architecture is proposed to improve the flexibility and between computing performance and flexibility with low on- hardware efficiency by processing two independent image streams chip memory cost. Second, the new concept of dual-stream with two-layer context switching, and an isolation technique is processing is introduced to improve both the performance and also proposed to improve the power consumption. Implementa- tion results show that it achieves 1.52 times power efficiency than flexibility. With PE isolation technique, CRISP-DS can achieve previous works and can meet the requirements of high-definition better power efficiency than previous works. It can meet the video camcorders and digital still cameras. requirements of HD digital camcorders and still cameras with only 329mW in power consumption. I. INTRODUCTION This paper is organized as follows. First, the architectural concept of CRISP-DS is shown in Section II, and the proposed Image signal processing engine in a digital camcorder and architecture is introduced in Section III based on this concept. digital still camera is a critical part to generate high-quality After that, Section IV shows the implementation results and images and video frames [1]. As the resolution and frame the comparison to previous works. Finally, Section V con- rate of image sensors grow higher and higher, several design cludes this paper. challenges are introduced. First, to process high-definition (HD) videos and images with more than 10M pixels, the II. ARCHITECTURAL CONCEPT OF CRISP-DS required computation becomes enormous. It usually leads to high hardware cost and high power consumption, which is Fig. 1 shows the architectural comparison between SIMD not beneficial for these handheld devices. Second, to execute image processors and CRISP. In an SIMD image processor more advanced image processing algorithms, a high flexibility shown in Fig. 1(a) [4] [5] [6], an SIMD array with more is required, which means programmable processors are better than one hundred of processing elements (PEs) are designed solutions. Therefore, hybrid approaches with a DSP and a to provide high computing power. However, to fill these PEs dedicated hardware are adopted in many commercial products with required pixel streams, a huge on-chip memory buffer is [2] [3]. To support more complex image processing algorithms required as well as a high-bandwidth channel to the external and higher image resolution, many SIMD image processors memory, which enormously increase the power consumption are proposed in recent years [4] [5] [6]. However, the power and cost. In our coarse-grained reconfigurable image stream consumption is still large even with high-end technologies. processor (CRISP) architecture [7], as shown in Fig. 1(b), Our previous work, coarse-grained reconfigurable image several types of reconfigurable stage processing elements stream processor (CRISP) [7] can achieve better power ef- (RSPEs) are specially designed to fit the characteristics of ficiency. By inspecting the characteristics of image processing image processing algorithms as reconfigurable fabrics. These algorithms, several reconfigurable fabrics are designed with RSPEs are connected by the reconfigurable interconnection a reconfigurable interconnection. The flexibility can also be unit. To map an image processing algorithm on CRISP, the achieved by changing the contexts of this reconfigurable designers only need to write the context registers (CRs) to hardware. It is proved that CRISP provides higher power reconfigure the RSPEs and interconnection. Then the image efficiency compared with DSP based approaches; however, stream can be fed from and written back via the SoC bus, and there is still a large room for flexibility improvement. the large on-chip memory buffer is not required. 978-1-4244-4434-2/09/$25.00 ©2009 IEEE 193 High Bandwidth Channel Image Processing Algorithm I Type Type Type Type Type On-Chip Memory Buffer Program B A C D A Memory Off-Chip SIMD Array Memory Input RSPE RSPE RSPE RSPE Output I/F A B C D I/F P P P P P P P P Contoller E E E E E E E E Time Frame 1 Input RSPE RSPE RSPE RSPE Output (a) I/F A B C D I/F Reconfigurable Stage Processing Element (RSPE) Time Frame 2 A B C D X (a) Reconfigurable Interconnection Image Processing Algorithm II Type Type Type Type Type B A C D A SoC Bus CR CR CR CR Input RSPE RSPE Register Output I/F A B File I/F Input RSPE RSPE Output I/F C D I/F RSPE RSPE C D (b) (b) Fig. 1. Architectural comparison between (a) image processors with SIMD array [4] [5] [6] and (b) coarse-grained reconfigurable image stream processor Fig. 2. Architectural concepts of (a) single-stream CRISP [7] and (b) dual- (CRISP) [7]. stream CRISP (this work). Although the CRISP architecture achieves high hardware dual-stream RSPEs (DS-RSPE). The local memory RSPEs are efficiency, it still has several limitations. It can well implement designed to support different kinds of data accessing patterns single-path image processing algorithms as shown in Fig. 2(a). for image processing tasks, and the register file RSPE acts However, for more complex algorithms where the required as a latency adjustment unit to synchronize different image number and types of RSPEs do not match to those in the streams; color interpolation RSPE is designed to support fabricated chip, more than one time frames are required, as various demosaicking algorithms; multiplier-and-accumulator shown in Fig. 2(a), which will lead to long execution time (MAC) RSPE is designed to support image filtering and since the whole image is needed to be stored out to the off- matrix operations; pixel-based RSPE is designed to support chip memory at the end of each time frame and is loaded pixel-independent arithmetics and table-look-up operations; back at the beginning of each time frame. Moreover, it cannot accumulator (ACC) RSPE is designed to support several support multi-path image processing algorithms as shown in measurement functions for auto-white-balance, auto-exposure, Fig. 2(b). and auto-focus; downsampler RSPE is a programmable down- To improve the flexibility and efficiency of CRISP, in this sampler module; ALU RSPE is designed to support general- paper, a new concept of dual-stream processing is proposed, purpose image processing operations. These RSPEs are dy- as shown in Fig. 2(b). To implement multi-path algorithms, a namically connected with the reconfigurable interconnection register file is designed as a dummy RSPE to achieve synchro- unit, which is also configured by the context registers. The nization between different paths. Besides, since the working details of several RSPEs will be demonstrated in Section III- frequency of the image processor is usually higher than the C. clock rate of image sensors, two image streams are handled by one RSPE as two threads with context switching. With this B. Circuits and Communication Protocol of a DS-RSPE concept, more complicated algorithms can be mapped within one time frame, as shown in Fig. 2(b), where RSPE A is a The circuits and the communication protocol for each dual- dual-stream RSPE. stream RSPE (DS-RSPE) are shown in Fig. 4. As shown in Fig. 4(a), the core of each RSPE is the reconfigurable datapath, III. PROPOSED ARCHITECTURE which accesses input stream from the data selector and output data to the output registers. To communicate and synchronize A. System Architecture between different RSPEs, a unified communication protocol Fig. 3 shows the system architecture of CRISP-DS. The is designed as shown in Fig. 4(b). With the time division input stream can come from a 12b image sensor interface in the multiplex (TDM) protocol, a DS-RSPE can process two in- preview/camcorder mode or from the SoC bus (AHB) in the dependent streams with two layers of contexts switched in picture-taking mode, and the output stream is written out via different time slots. In order to avoid stream conflict in the the SoC bus. There are ten types of RSPEs configured by the same time slot, a re-synchronization module is designed to context registers, and they can be classified into three classes: allocate the two input streams in different time slots, as shown local memory RSPEs, single-stream RSPEs (SS-RSPE), and in Fig. 4(c). Moreover, Fig. 4(c) also shows the isolation 194 DS-RSPE Date Stream 1 AHB Master/Slave Stream1 Output Reigsters Data Selector Host Master Wrapper Slave Wrapper Valid1 Resync. Date Stream 2 Processor Reconfigurable Steam2 Output Image Input Output and Sensor Data Datapath Reigsters Sensor Interface Interface Isolation Valid2 Sync. Signal 1 PE Controller Sync. Signal 2 O Input Stream Display u To Main CRISP-DS t p Interface u Controller Statistical t Context Registers S Registers t r e a Interconnection Module Main Controller m Stream1 Sync.

Load more