(2009 A-SSCC).Indd

Total Page:16

File Type:pdf, Size:1020Kb

(2009 A-SSCC).Indd 7-2 IEEE Asian Solid-State Circuits Conference November 16-18, 2009 / Taipei, Taiwan CRISP-DS: Dual-Stream Coarse-grained Reconfigurable Image Stream Processor for HD Digital Camcorders and Digital Still Cameras Tsung-Huang Chen, Jason C. Chen, Teng-Yuan Cheng, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Electronics Engineering and Department of Electrical Engineering National Taiwan University BL-421, 1, Sec. 4, Roosevelt Rd., Taipei 106, Taiwan Abstract— A 329mW 600M-Pixels/s dual-stream coarse- In this paper, a dual-stream coarse-grained reconfigurable grained reconfigurable image stream processor is implemented in image stream processor (CRISP-DS) is designed to support μm 2 TSMC 0.13 CMOS technology with a core size of 4.84mm . HD image processing with high power efficiency. It is charac- The reconfigurable pipelined processing element array architec- ture makes a good balance between computing performance and terized as follows. First, the reconfigurable pipelined process- flexibility with only 10Kb on-chip memory. Moreover, a new dual- ing element (PE) array architecture can provide a good balance stream architecture is proposed to improve the flexibility and between computing performance and flexibility with low on- hardware efficiency by processing two independent image streams chip memory cost. Second, the new concept of dual-stream with two-layer context switching, and an isolation technique is processing is introduced to improve both the performance and also proposed to improve the power consumption. Implementa- tion results show that it achieves 1.52 times power efficiency than flexibility. With PE isolation technique, CRISP-DS can achieve previous works and can meet the requirements of high-definition better power efficiency than previous works. It can meet the video camcorders and digital still cameras. requirements of HD digital camcorders and still cameras with only 329mW in power consumption. I. INTRODUCTION This paper is organized as follows. First, the architectural concept of CRISP-DS is shown in Section II, and the proposed Image signal processing engine in a digital camcorder and architecture is introduced in Section III based on this concept. digital still camera is a critical part to generate high-quality After that, Section IV shows the implementation results and images and video frames [1]. As the resolution and frame the comparison to previous works. Finally, Section V con- rate of image sensors grow higher and higher, several design cludes this paper. challenges are introduced. First, to process high-definition (HD) videos and images with more than 10M pixels, the II. ARCHITECTURAL CONCEPT OF CRISP-DS required computation becomes enormous. It usually leads to high hardware cost and high power consumption, which is Fig. 1 shows the architectural comparison between SIMD not beneficial for these handheld devices. Second, to execute image processors and CRISP. In an SIMD image processor more advanced image processing algorithms, a high flexibility shown in Fig. 1(a) [4] [5] [6], an SIMD array with more is required, which means programmable processors are better than one hundred of processing elements (PEs) are designed solutions. Therefore, hybrid approaches with a DSP and a to provide high computing power. However, to fill these PEs dedicated hardware are adopted in many commercial products with required pixel streams, a huge on-chip memory buffer is [2] [3]. To support more complex image processing algorithms required as well as a high-bandwidth channel to the external and higher image resolution, many SIMD image processors memory, which enormously increase the power consumption are proposed in recent years [4] [5] [6]. However, the power and cost. In our coarse-grained reconfigurable image stream consumption is still large even with high-end technologies. processor (CRISP) architecture [7], as shown in Fig. 1(b), Our previous work, coarse-grained reconfigurable image several types of reconfigurable stage processing elements stream processor (CRISP) [7] can achieve better power ef- (RSPEs) are specially designed to fit the characteristics of ficiency. By inspecting the characteristics of image processing image processing algorithms as reconfigurable fabrics. These algorithms, several reconfigurable fabrics are designed with RSPEs are connected by the reconfigurable interconnection a reconfigurable interconnection. The flexibility can also be unit. To map an image processing algorithm on CRISP, the achieved by changing the contexts of this reconfigurable designers only need to write the context registers (CRs) to hardware. It is proved that CRISP provides higher power reconfigure the RSPEs and interconnection. Then the image efficiency compared with DSP based approaches; however, stream can be fed from and written back via the SoC bus, and there is still a large room for flexibility improvement. the large on-chip memory buffer is not required. 978-1-4244-4434-2/09/$25.00 ©2009 IEEE 193 High Bandwidth Channel Image Processing Algorithm I Type Type Type Type Type On-Chip Memory Buffer Program B A C D A Memory Off-Chip SIMD Array Memory Input RSPE RSPE RSPE RSPE Output I/F A B C D I/F P P P P P P P P Contoller E E E E E E E E Time Frame 1 Input RSPE RSPE RSPE RSPE Output (a) I/F A B C D I/F Reconfigurable Stage Processing Element (RSPE) Time Frame 2 A B C D X (a) Reconfigurable Interconnection Image Processing Algorithm II Type Type Type Type Type B A C D A SoC Bus CR CR CR CR Input RSPE RSPE Register Output I/F A B File I/F Input RSPE RSPE Output I/F C D I/F RSPE RSPE C D (b) (b) Fig. 1. Architectural comparison between (a) image processors with SIMD array [4] [5] [6] and (b) coarse-grained reconfigurable image stream processor Fig. 2. Architectural concepts of (a) single-stream CRISP [7] and (b) dual- (CRISP) [7]. stream CRISP (this work). Although the CRISP architecture achieves high hardware dual-stream RSPEs (DS-RSPE). The local memory RSPEs are efficiency, it still has several limitations. It can well implement designed to support different kinds of data accessing patterns single-path image processing algorithms as shown in Fig. 2(a). for image processing tasks, and the register file RSPE acts However, for more complex algorithms where the required as a latency adjustment unit to synchronize different image number and types of RSPEs do not match to those in the streams; color interpolation RSPE is designed to support fabricated chip, more than one time frames are required, as various demosaicking algorithms; multiplier-and-accumulator shown in Fig. 2(a), which will lead to long execution time (MAC) RSPE is designed to support image filtering and since the whole image is needed to be stored out to the off- matrix operations; pixel-based RSPE is designed to support chip memory at the end of each time frame and is loaded pixel-independent arithmetics and table-look-up operations; back at the beginning of each time frame. Moreover, it cannot accumulator (ACC) RSPE is designed to support several support multi-path image processing algorithms as shown in measurement functions for auto-white-balance, auto-exposure, Fig. 2(b). and auto-focus; downsampler RSPE is a programmable down- To improve the flexibility and efficiency of CRISP, in this sampler module; ALU RSPE is designed to support general- paper, a new concept of dual-stream processing is proposed, purpose image processing operations. These RSPEs are dy- as shown in Fig. 2(b). To implement multi-path algorithms, a namically connected with the reconfigurable interconnection register file is designed as a dummy RSPE to achieve synchro- unit, which is also configured by the context registers. The nization between different paths. Besides, since the working details of several RSPEs will be demonstrated in Section III- frequency of the image processor is usually higher than the C. clock rate of image sensors, two image streams are handled by one RSPE as two threads with context switching. With this B. Circuits and Communication Protocol of a DS-RSPE concept, more complicated algorithms can be mapped within one time frame, as shown in Fig. 2(b), where RSPE A is a The circuits and the communication protocol for each dual- dual-stream RSPE. stream RSPE (DS-RSPE) are shown in Fig. 4. As shown in Fig. 4(a), the core of each RSPE is the reconfigurable datapath, III. PROPOSED ARCHITECTURE which accesses input stream from the data selector and output data to the output registers. To communicate and synchronize A. System Architecture between different RSPEs, a unified communication protocol Fig. 3 shows the system architecture of CRISP-DS. The is designed as shown in Fig. 4(b). With the time division input stream can come from a 12b image sensor interface in the multiplex (TDM) protocol, a DS-RSPE can process two in- preview/camcorder mode or from the SoC bus (AHB) in the dependent streams with two layers of contexts switched in picture-taking mode, and the output stream is written out via different time slots. In order to avoid stream conflict in the the SoC bus. There are ten types of RSPEs configured by the same time slot, a re-synchronization module is designed to context registers, and they can be classified into three classes: allocate the two input streams in different time slots, as shown local memory RSPEs, single-stream RSPEs (SS-RSPE), and in Fig. 4(c). Moreover, Fig. 4(c) also shows the isolation 194 DS-RSPE Date Stream 1 AHB Master/Slave Stream1 Output Reigsters Data Selector Host Master Wrapper Slave Wrapper Valid1 Resync. Date Stream 2 Processor Reconfigurable Steam2 Output Image Input Output and Sensor Data Datapath Reigsters Sensor Interface Interface Isolation Valid2 Sync. Signal 1 PE Controller Sync. Signal 2 O Input Stream Display u To Main CRISP-DS t p Interface u Controller Statistical t Context Registers S Registers t r e a Interconnection Module Main Controller m Stream1 Sync.
Recommended publications
  • Energy Efficient and Programmable Architecture for Wireless Vision Sensor Node
    Thesis work for the degree of Doctor of Technology Sundsvall 2013 Energy Efficient and Programmable Architecture for Wireless Vision Sensor Node Muhammad Imran Supervisors: Prof. Mattias O’Nils Dr. Najeem Lawal Prof. Bengt Oelmann Faculty of Science, Technology and Media, Mid Sweden University, SE-851 70 Sundsvall, Sweden ISSN 1652-893X Mid Sweden University Doctoral Thesis 167 ISBN 978-91-87557-12-5 Akademisk avhandling som med tillstånd av Mittuniversitetet i Sundsvall framläggs till offentlig granskning för avläggande av teknologie doktors examen i elektronik tisdagden 22 October 2013, klockan 13:15 i sal M108, Mittuniversitetet Sundsvall. Seminariet kommer att hållas på engelska. Energy Efficient and Programmable Architecture for Wireless Vision Sensor Node Muhammad Imran © Muhammad Imran, 2013 Faculty of Science, Technology and Media Mid Sweden University, SE-851 70 Sundsvall Sweden Telephone: +46 (0)60 148561 Printed by Kopieringen Mittuniversitetet, Sundsvall, Sweden, 2013 ABSTRACT Wireless Vision Sensor Networks (WVSNs) is an emerging field which has attracted a number of potential applications because of smaller per node cost, ease of deployment, scalability and low power stand alone solutions. WVSNs consist of a number of wireless Vision Sensor Nodes (VSNs). VSN has limited resources such as embedded processing platform, power supply, wireless radio and memory. In the presence of these limited resources, a VSN is expected to perform complex vision tasks for a long duration of time without battery replacement/recharging. Currently, reduction of processing and communication energy consumptions have been major challenges for battery operated VSNs. Another challenge is to propose generic solutions for a VSN so as to make these solutions suitable for a number of applications.
    [Show full text]
  • VLSI Architectures for Digital Signal Processing on Energy-Constrained Systems-On-Chip
    VLSI Architectures for Digital Signal Processing on Energy-Constrained Systems-on-Chip A Dissertation Presented to the Faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the requirements for the Degree Doctor of Philosophy (Electrical and Computer Engineering) by Alicia Klinefelter August 2015 c 2015 Alicia Klinefelter Abstract The design of ultra-low power (ULP) integrated circuits for Systems-On-Chip (SoCs) requires consideration of both flexibility and robustness during low-energy operation. Due to their quadratic relationship, the strongest design knob for reducing energy on-chip is the supply voltage. By operating digital circuits in the subthreshold region, using a supply voltage that falls below the threshold of the device, designers can minimize energy per operation at the cost of a performance penalty. However, there are many applications with low throughput requirements that can benefit from these energy savings. For example, systems that process biomedical data such as ECG, EEG, and EMG require sampling and processing rates on the order of kHz, making subthreshold operation feasible. The result of these substantial energy improvements is an emerging application space for these ULP SoCs in batteryless sensor nodes capable of running on energy harvested from the surrounding environment alone. This work presents two versions of a highly integrated, flexible SoC platform targeted for Internet-of-Things (IoT) applications. These SoCs support multiple sensing modalities, extract information from data flexibly across applications, harvest and deliver power efficiently, and communicate wirelessly. The first version of the chip acquired ECG data, extracted the heart-rate, and transmitted the raw signal operating off of harvested energy while consuming 19µW.
    [Show full text]
  • Energy-Efficient Foreground Object Detection on Embedded Smart
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications from the Department of Electrical & Computer Engineering, Department Electrical and Computer Engineering of 2011 Energy-efficientor F eground Object Detection on Embedded Smart Cameras by Hardware-level Operations Mauricio Casares University of Nebraska-Lincoln, [email protected] Paolo Santinelli University of Modena, [email protected] Senem Velipasalar University of Nebraska-Lincoln, [email protected] Andrea Prati University of Modena, [email protected] Rita Cucchiara University of Modena and Reggio Emilia Follow this and additional works at: https://digitalcommons.unl.edu/electricalengineeringfacpub Part of the Electrical and Computer Engineering Commons Casares, Mauricio; Santinelli, Paolo; Velipasalar, Senem; Prati, Andrea; and Cucchiara, Rita, "Energy- efficientor F eground Object Detection on Embedded Smart Cameras by Hardware-level Operations" (2011). Faculty Publications from the Department of Electrical and Computer Engineering. 202. https://digitalcommons.unl.edu/electricalengineeringfacpub/202 This Article is brought to you for free and open access by the Electrical & Computer Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Faculty Publications from the Department of Electrical and Computer Engineering by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. IEEE Computer Society Conference on Computer
    [Show full text]
  • Design of an FPGA-Based Smart Camera and Its Application Towards Object Tracking
    Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. Design of an FPGA-Based Smart Camera and its Application Towards Object Tracking A thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Electronics and Computer Engineering at Massey University, Manawatu, New Zealand Miguel Contreras 2016 Abstract Smart cameras and hardware image processing are not new concepts, yet despite the fact both have existed several decades, not much literature has been presented on the design and development process of hardware based smart cameras. This thesis will examine and demonstrate the principles needed to develop a smart camera on hardware, based on the experiences from developing an FPGA-based smart camera. The smart camera is applied on a Terasic DE0 FPGA development board, using Terasic’s 5 megapixel GPIO camera. The algorithm operates at 120 frames per second at a resolution of 640x480 by utilising a modular streaming approach. Two case studies will be explored in order to demonstrate the development techniques established in this thesis. The first case study will develop the global vision system for a robot soccer implementation. The algorithm will identify and calculate the positions and orientations of each robot and the ball. Like many robot soccer implementations each robot has colour patches on top to identify each robot and aid finding its orientation. The ball is comprised of a single solid colour that is completely distinct from the colour patches.
    [Show full text]
  • CMOS-3D Smart Imager Architectures for Feature Detection M
    IEEE JETCAS-SPECIAL ISSUE ON HETEROGENEOUS NANO-CIRCUTS AND SYSTEMS 1 CMOS-3D Smart Imager Architectures for Feature Detection M. Suarez,´ V.M. Brea, J. Fernandez-Berni,´ R. Carmona-Galan,´ G. Lin˜an,´ D. Cabello and A. Rodr´ıguez-Vazquez´ Fellow, IEEE, Abstract—This paper reports a multi-layered smart image dominate the market of area imagers, with more than 90% of sensor architecture for feature extraction based on detection the total share [10]. of interest points. The architecture is conceived for 3D IC The most important asset of CISs is the incorporation of technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to intelligence on-chip [1]. Different levels of intelligence can be perform Gaussian filtering and generate Gaussian pyramids in contemplated. The lowest involves basically readout, control fully concurrent way. The circuitry in this tier operates in mixed- and error correction, and is the only one yet exploited by signal domain. It embeds in-pixel Correlated Double Sampling industry [11], [12]. Higher intelligence levels, as required to (CDS), a switched-capacitor network for Gaussian pyramid analyzing, extracting and interpreting the information con- generation, analog memories and a comparator for in-pixel ADC (Analog to Digital Conversion). This tier can be further split tained into images have been explored for years at academia into two for improved resolution; one containing the sensors and [13] - [25], but with scarce industrial impact [11]. From now another containing a capacitor per sensor plus the mixed-signal on we will refer to CISs with high-level intelligence attributes processing circuitry.
    [Show full text]
  • Investigation of Architectures for Wireless Visual Sensor Nodes
    Thesis work for the degree of Licentiate of Technology Sundsvall 2011 Investigation of Architectures for Wireless Visual Sensor Nodes Muhammad Imran Supervisors: Professor Mattias O’Nils Professor Bengt Oelmann Dr. Najeem Lawal Electronics Design Division, in the Department of Information Technology and Media Mid Sweden University, SE-851 70 Sundsvall, Sweden ISSN 1652-8948 Mid Sweden University Licentiate Thesis 66 ISBN 978-91-86694-45-6 Akademisk avhandling som med tillstånd av Mittuniversitetet i Sundsvall framläggs till offentlig granskning för avläggande av teknologie Licentiate examen i elektronik onsdagen den 10 Juni 2011, klockan 10:30 i sal O102, Mittuniversitetet Sundsvall. Seminariet kommer att hållas på engelska. Investigation of Architectures for Wireless Visual Sensor Nodes Muhammad Imran © Muhammad Imran, 2011 Electronics Design Division, in the Department of Information Technology and Media Mid Sweden University, SE-851 70 Sundsvall Sweden Telephone: +46 (0)60 148561 Printed by Kopieringen Mittuniversitetet, Sundsvall, Sweden, 2011 ABSTRACT Wireless visual sensor network is an emerging field which has proved useful in many applications, including industrial control and monitoring, surveillance, environmental monitoring, personal care and the virtual world. Traditional imaging systems used a wired link, centralized network, high processing capabilities, unlimited storage and power source. In many applications, the wired solution results in high installation and maintenance costs. However, a wireless solution is the preferred choice as it offers less maintenance, infrastructure costs and greater scalability. The technological developments in image sensors, wireless communication and processing platforms have paved the way for smart camera networks usually referred to as Wireless Visual Sensor Networks (WVSNs). WVSNs consist of a number of Visual Sensor Nodes (VSNs) deployed over a large geographical area.
    [Show full text]