A Modular FPGA-Based Smart Camera Architecture Merwan Birem, François Berry
Total Page:16
File Type:pdf, Size:1020Kb
DreamCam: A modular FPGA-based smart camera architecture Merwan Birem, François Berry To cite this version: Merwan Birem, François Berry. DreamCam: A modular FPGA-based smart camera architecture. Journal of Systems Architecture, Elsevier, 2014, 60 (6), pp.519 - 527. 10.1016/j.sysarc.2014.01.006. hal-01625648 HAL Id: hal-01625648 https://hal.archives-ouvertes.fr/hal-01625648 Submitted on 28 Oct 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Journal of Systems Architecture 60 (2014) 519–527 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc DreamCam: A modular FPGA-based smart camera architecture ⇑ Merwan Birem , François Berry Institut Pascal – UMR 6602 UBP/CNRS – Campus des Cézeaux, 24 Avenue des Landais, 63177 Aubiere Cedex, France article info abstract Article history: DreamCam is a modular smart camera constructed with the use of an FPGA like main processing board. Received 9 July 2012 The core of the camera is an Altera Cyclone-III associated with a CMOS imager and six private Ram blocks. Received in revised form 9 October 2013 The main novel feature of our work consists in proposing a new smart camera architecture and several Accepted 21 January 2014 modules (IP) to efficiently extract and sort the visual features in real time. In this paper, extraction is per- Available online 31 January 2014 formed by a Harris and Stephen filtering associated with customized modules. These modules extract, select and sort visual features in real-time. As a result, DreamCam (with such a configuration) provides Keywords: a description of each visual feature in the form of its position and the grey-level template around it. Smart camera 2014 Elsevier B.V. All rights reserved. Image processing Ó Interest points VHDL Harris and Stephen algorithm Field Programmable Gate Array (FPGA) Hardware implementation Real-time system 1. Introduction tions [4]. Although they are slower than the traditional Application Specific Integrated Circuit (ASIC), their design flexibility is a major Intelligent robots are becoming increasingly important. One of advantage. Users can change program as desired at any stage of the the key components of an intelligent robot is its ability to under- experiment thereby saving time and cost. stand its environment and recognize its position. In the robot Throughout this paper, we propose a customized smart sensor community, most researchers use information from sources such based on a CMOS imager. The main original feature is system as odometry, laser-range-finders, and sonar sensors. In contrast, management by a System-On-Chip integrated in an FPGA. Our in the vision community, new methods using information from approach allows most early perception processes to be performed camera sequence are being developed, see [1,2]. in the main sensing unit (FPGA), and sends just the main sensing Using an entire image as an observation is difficult or impossi- features to the host computer so as to reduce a classic communica- ble owing to the high resolution, typically of the order of a hundred tion bottleneck. Another advantage of this method is the real time thousand pixels. Thus, to identify interest points, feature extraction feedback on the sensor. The different parameters can be actively which is the first crucial step, should be used [3]. tuned to optimize perception to render it similar to primate The algorithms that extract features are time-consuming, which perception. For instance, in strong light the pupil contracts and is a huge drawback when developing real-time applications. One becomes small, but still allows light to be cast over a large part solution to this problem is the use of dedicated hardware for the of the retina [5]. This embedded sensor can be considered as a algorithms, such as Field Programmable Gate Array (FPGA), which reactive architecture, and above all, as a research platform for can provide dedicated functional blocks that perform complex the smart sensor. image processing operations in parallel. In addition to the parallel To highlight the novel elements in our work, we present in the properties of FPGA, which lead to a high-throughput, FPGA has a following section previous research carried out in this field. In Sec- small footprint system and low power consumption, which makes tion 3, we give a large overview of the smart sensor. An application it ideal for mobile applications. for the extraction of visual features based on the Harris and Ste- FPGAs have achieved rapid acceptance and growth over the past phen algorithm is presented in Section 4. In this work, we consider decade because they can be used in a very wide range of applica- feature extraction as a combination of a feature detection followed by a description. Thus, feature detection consists in finding the interest points (features) in the image, whereas feature extraction ⇑ Corresponding author. Tel.: +33 760517676. consists in representing them. The final goal is to compare them E-mail address: [email protected] (M. Birem). http://dx.doi.org/10.1016/j.sysarc.2014.01.006 1383-7621/Ó 2014 Elsevier B.V. All rights reserved. 520 M. Birem, F. Berry / Journal of Systems Architecture 60 (2014) 519–527 with other interest points for applications such as navigation, Table 2 object recognition, etc. In Section 5 we include results that support Hardware implementation of H&S algorithm. the relevance of our approach and in Section 6 we give a System Platform capabilities conclusion. Author (s) Sensor Processor Resol. @ Fps [19] EyeBot M6 Xilinx 352 Â 288 @ 7.37 2. Previous work OV6630 Spartan-3E FPGA Comparison of our work with previous works can be done on [18] MT9P031 Xilinx 320 Â 480 @ 27 two levels: CMOS Virtex-II Micron FPGA [20] Video Xilinx 640 Â 480 @ 266 System-level: At this level, we propose to study the most popu- frame from Virtex-5 lar smart cameras developed in the last decade. As a reminder, a a memory FPGA smart camera is defined as a vision system in which the funda- mental function is the production of a high level understanding ones and to describe each feature by a grey-level template. of the imaged scene. A camera is called a ’’smart camera’’ when These last steps are fundamental in computer vision, in which it performs application specific information processing (ASIP). the input is not images but semantic features. (see Table 2). The output of such cameras is either the features extracted from the captured images or a high-level description of the scene. More details about this system can be found in articles by Wolf 3. Hardware description of the ‘‘DreamCam’’ [6] and Shi and Lichman [7]. Table 1 presents an overview of the most common smart cam- The goal of artificial vision research is to exploit images or era platforms found in the literature. image sequences generated by sensors, in order to effectively Others works have been based on Silicon-integrated smart cam- translate an environment. From this translation, different processes eras. In these systems, the authors propose an approach in can be used to identify or inspect an object, control robots, etc. The which image sensing and processing are integrated on a single first way to treat the vision problem is to carry out passive vision silicon die. This kind of device is called ‘‘vision chips’’. The inter- approaches, which is the classic way to analyze images. In contrast, ested reader can find a detailed description in [14]. Among another approach exists known as ‘‘active vision’’, which is the these works, the project scamp by P. Dudeck is one of the most result of an attempt to simulate the human visual system. well-known [15]. Other contributions in the same vein can be Based on this concept, our approach consists in integrating the found in [16,17]. control of the imager in the perception loop, especially in the early Algorithm-level: As explained below, we implemented a ‘‘Harris perception processes. By integration of early processing, close to and Stephen’’-based algorithm to extract visual features in our the imager, a reactive sensor can be designed. With such a smart camera. At the output of this extraction step, several modules sensor, it is possible to perform basic processing and the selection (filtering, sorting, description) were added to provide high-level of relevant features [4]. For example, FPGAs have already been features from the image. Thus, in this part, we propose a short used to accelerate real-time point tracking [21], stereo-vision overview of published papers about the implementation of [22], color-based object detection [23], and video and image com- the Harris and Stephen detector on FPGA. To our knowledge, pression [24]. In our case, the notion of a system on programmable there is no work proposing a full approach with detection, filter- chip (SOPC) describes the whole system. ing, sorting and description steps. Consequently, the works pre- Most vision applications are focused on several small image sented above, represent a fragmented bibliography mainly areas and consequently acquisition of the whole image is not focused on the Harris detection. always necessary. It is evident, therefore, that one of the main goals The work presented in [18] implements a Harris detection on a of an efficient vision sensor is to select regions of interest (ROI) in FPGA connected to a stereo rig.