A 2000 Frames / S Programmable Binary Image Processor Chip for Real Time Machine Vision Applications
Total Page:16
File Type:pdf, Size:1020Kb
A 2000 frames / s programmable binary image processor chip for real time machine vision applications A. Loos, D. Fey Institute of Computer Science, Friedrich-Schiller-University Jena Ernst-Abbe-Platz 2, D-07743 Jena, Germany {loos,fey}@cs.uni-jena.de Abstract the inflexible and fixed instruction set. To meet that we present a so called ASIP (application specific instruction Industrial manufacturing today requires both an efficient set processor) which combines the flexibility of a GPP production process and an appropriate quality standard of (General Purpose Processor) with the speed of an ASIC. each produced unit. The number of industrial vision appli- cations, where real time vision systems are utilized, is con- tinuously rising due to the increasing automation. Assem- Embedded Vision System bly lines, where component parts are manipulated by robot 1. real scene image sensing, AD- CMOS-Imager grippers, require a fast and fault tolerant visual detection conversion and read out of objects. Standard computation hardware like PC-based 2. grey scale image image segmentation platforms with frame grabber boards are often not appro- representation priate for such hard real time vision tasks in embedded sys- 3. raw binary image image enhancement, tems. This is because they meet their limits at frame rates of representation removal of disturbance a few hundreds images per second and show comparatively ASIP FPGA long latency times of a few milliseconds. This is the result 4. improved binary image calculation of of the largely serial working and time consuming process- projections ing chain of these systems. In contrast to that we designed 5. x, y and diagonal projections calculation of an application-specific instruction processor chip which ex- object centroid and ploits massive parallelization of often used image prepro- orientation cessing algorithms to minimize computation times. To get a // Physical attributes centroid = [5.3 4.6] Further processing feasible image resolution of 320 x 240 pixels at processing orientation = 35 ° (robot control) frame rates up to 2000 frames per second we realized an im- age processor on a semi-custom 0.18 µm pure logic CMOS Figure 1. Image processing flow platform. The paper presents the architecture, the perfor- mance parameters of the designed processor chip and some Before we explain some details of our processor architec- simulation test results. ture we point out the characteristic data processing flow of the considered machine vision system in which our proces- sor chip will work. Figure 1 illustrates a generic procedure of the embedded machine vision environment. In the be- 1 Motivation and introduction ginning a real scene is captured by a CMOS imager and converted to digital values (1). Afterwards the gray scaled The motivation to present that paper emerges from a firm image is segmented (2) and we receive a raw binary image tendency to substitute PC-based standard machine vision representation. The quality of the image is improved by ap- systems with smaller and faster embedded components plying e. g. morphological filter operations (3). In the next (e. g. smart cameras), what is currently a world-wide step we will calculate diagonal, vertical and horizontal pro- on-going ambitious research topic [1],[2],[3]. One way to jections of the binary image (4). This means simply that manage that is to use application specific integrated circuits pixels are counted what can be perfectly parallelized. This (ASICs) as basic platform. The advantages of ASIC information can then be used in the next step to calculate the based components confront with their main weakness: object’s centroid and orientation, what is one of the most important tasks in industrial image processing. For step 1 Image Data we can use a commercial CMOS sensor as well as a sensor which was especially designed in our project. The steps 2 y a r and 5 are solved by hardwired algorithms implemented in Control r a unit r e n i a low-priced medium class FPGA (field programmable gate t a n 0 h u 2 c o 3 c Core n array). The steps 3 and 4 are calculated in our ASIP chip l a . Data a c t s n which is the core of our embedded real time vision system IO . o y r z i . a r d and which we present in this paper. o n H u o The rest of the paper is organized as follows. Chapter 2 B JTAG/ presents the ASIP chip architecture and the primary system SPI parameters, chapter 3 shows some application examples. control The fourth chapter presents the currently working process. 240 Result Data Finally, some conclusions are given. Vertical counter array 2 Chip architecture Figure 2. Block diagram of the image proces- sor 2.1 Overview and general chip features The chip is driven by a 40 MHz clock. As result, a single morphological operation performed on a 320 x 240 pixel The designed architecture of our ASIP is a result of a image including data in/output only needs 250 µs. For even precise analysis of the needed performance and the algo- faster data in/output a ROI (region of interest) can be de- rithms to solve image processing problems as described fined in steps of 20 pixels in horizontal and 32 pixels in before. Our design strategy can be summarized as follows: vertical direction. don’t integrate as much as possible functionality but rather To solve general object criteria like centroids and object ori- as much as even required. Figure 2 shows a block diagram entations a fast executable preprocessing method is realized of our ASIP chip architecture, its data paths (wide arrows) on chip. This is performed, as mentioned above, by two and its control paths (small black arrows). The main parallel working counter arrays determine the three possible components are the processor array (core), the control unit, projections in horizontal, vertical, and diagonal direction. which contains a microprogram unit and the vertical and The output data can be a preprocessed image or the pixel horizontal counter arrays. Using the capabilities of the projection values. The control of both the data in/output micro program unit self-defined arbitrary morphological streams and the data processing within the core and the 3x3 operators can be loaded into the control unit. Therefore counter arrays is organized by a finite state machine which the possibilities to manipulate binary images are almost is part of the control unit. unlimited. The task of the vertical and horizontal counters A boundary scan chain located on the left chip side and located in each pixel row and pixel column is to realize the closely assigned to the JTAG control module provides board pixel counting for the projection operations as described level test purposes. above. Two standardized and user friendly interfaces (JTAG, SPI) serve for the connection to the chip’s outside 2.2 Processor core world. These interfaces allow to load the microprogram or to chose simple, but otherwise time consuming algorithms During the design analysis it became obvious, that it is (in the serial computing case) to remove disturbances in not recommended to create a full parallel circuit, where images, e. g. holes in objects or speckles on the background each image pixel has its own processing unit. This would re- or to detect edges. sult in too much chip area. Therefore we decided to design The input data can be read serially (slow mode) or via a a mixture of a time and space multiplexing architecture, i.e. 16 bit wide data bus (fast mode). No external memory a group of pixels is serially processed by one PE and sev- modules are required, the chip has internal registers to eral of such PEs are working simultaneously. This requires store the entire image and all required temporary data. An to find a trade off between a fixed number of serially pro- exceeding feature is the fact, that the consumed processing cessed pixels by one PE and the resulting image operator time is not depending on the image size since only local latencies. operators are applied. In order to compute the 76800 image pixels (320 x 240) on a strongly limited chip area of 25 mm2 within a reasonable processing time, we fixed that parameter to 16 pixels per processign element (PE) ([4]). As result the processor core PE PE consists of 4800 PEs. The architecture of one is shown in 1 2 ... M 1 2 ... M Figure 3. PE PE Pixel P1 P2 P3 P4 P5 P6 P7 P8 1 2 ... M 1 2 ... M PE PE 1 2 ... M 1 2 ... M MUX 4 3 Reg0 Figure 4. Interconnected multiplexed pro- D LU CE cessing elements CLK Control signals 3 Application example Reg1 Reg2 Reg3 D D D CE CE CE In table 1 the result of a functional simulation of our CLK CLK CLK ASIP prototype for a programmed segmentation operation is shown. An image of a vehicle rim (a representation of a typical industrial scene) is captured by an image sensor and Newpixel Result segmented with a fixed threshold. In addition the image is cut by the in-built ROI functionality to the required image Figure 3. Schematic structure of one pro- section (1 and 2a). The removal of some disturbances can cessing element be performed by the designed circuit (temporary images 2b - 2d). The overlay of the preprocessed image, which is the ASIP output, and the original gray scale image is quite good One PE has three synchronous clocked 1 bit registers and a (3). small combinatorial network (LU, logical unit) to compute its own new state depending on the own pixel value and the value of the eight neighbored pixels in each calculation 1 3 step.