<<

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1

Gradient-based Control for Outdoor Mobile Platforms

Inwook Shim, Tae-Hyun Oh, Joon-Young Lee, Jinwook Choi, Dong-Geol Choi† and In So Kweon

Abstract—We introduce a novel method to automatically adjust camera exposure for image processing and computer vision applications on mobile robot platforms. Because most Auto Manual Ours image processing algorithms rely heavily on low-level image features that are based mainly on local gradient information, we consider that gradient quantity can determine the proper Auto Manual Ours exposure level, allowing a camera to capture the important image features in a manner robust to illumination conditions. We then extend this concept to a multi-camera system and present a Auto Manual Ours new control algorithm to achieve both consistency Fig. 1: Images captured under different illumination condi- between adjacent and a proper exposure level for tions. From left to right the images are from cameras with a each camera. We implement our prototype system with off-the- built-in auto-exposure method, a manually tuned fixed expo- shelf machine-vision cameras and demonstrate the effectiveness of the proposed algorithms on practical applications, including sure setting, and our method. Both the built-in auto-exposure pedestrian detection, visual odometry, surround-view imaging, method and the manual setting fail to capture well-exposed panoramic imaging and stereo matching. images, while our method captures images that are suitable Index Terms—Auto exposure, exposure control, camera param- for processing with computer vision algorithms. eter control, gradient information, visual odometry, surround- view, panoramic imaging, stereo matching a scene has a significant illumination gap between dynamic ranges of the region of interest and the background. This I.INTRODUCTION common condition degrades the performance of the subse- Recent improvements in image processing and computer vi- quent computer vision algorithms. Therefore, overcoming the sion technologies for object detection, recognition and tracking problem of diverse and challenging illumination conditions have enabled various vision systems to operate autonomously, at the image capture stage is an essential prerequisite for leading to the feasibility of autonomous mobile platforms [1]. developing robust vision systems. In such real-time vision-based systems, images captured from More specifically, Figure 1 shows some comparisons of a camera are fed directly into subsequent algorithms as in- images resulting from the standard built-in auto-exposure and put. The quality of the captured images strongly affects the the fixed-exposure approaches in an outdoor environment. algorithms’ success; however, research on camera control for When the of the scene is relatively narrow, robust image capture has been neglected; camera control has both methods capture well-exposed images. Consequently, been far less studied than have the computer vision algorithms under a narrow dynamic range, a single representative pa- themselves. rameter can easily be determined. In contrast, both methods Most mobile platform vision systems rely on a standard result in undesirably exposed images under abruptly varying 1 arXiv:1708.07338v3 [cs.CV] 13 Jun 2018 auto-exposure method built into the camera or by a fixed illumination conditions. The rationales behind these results exposure hand-tuned by users. Conventional auto-exposure can be characterized as follows: 1) the auto-exposure control methods adjust camera exposure by evaluating the average algorithms have limited adaptability (i.e., prediction), 2) do brightness of an image [2, 3, 4]. Using this simple approach, not consider the limited dynamic range of the camera, and 3) undesirably exposed images are common—particularly when use weak criteria to assess the exposure status. We address the first and the issues using a simulation-based approach Copyright c 2018 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be and the third issue using a gradient-based metric. obtained from the IEEE by sending an email to [email protected]. In this paper, we present a new method to automatically This work was supported by the National Research Foundation of Korea grant adjust camera exposure using the gradient information. To funded by the Korea government under Grant 2010-0028680 and Hyundai Motor Group. ( Corresponding author:Dong-Geol Choi.) handle severe illumination changes and a wide dynamic range I. Shim is with† the Ground Autonomy Laboratory, Agency for Defense of scene , we simulate the proper exposure of the Development, Korea (e-mail: [email protected]). scene in the gradient domain; this process is followed by a T-H. Oh is with the CSAIL, MIT, USA (e-mail: [email protected]). J-Y. Lee is with the Adobe Research, USA (e-mail: [email protected]) feedback mechanism to perform auto-exposure. Because the D.-G. Choi and I.S. Kweon is with Department of Electrical Engineering, gradient domain is robust against illumination changes and KAIST, Korea (e-mail: dgchoi, iskweon @kaist.ac.kr). has been leveraged by many computer vision algorithms, the J. Choi is with Hyundai{ Motor Group,} Korea. 1Specific methodologies are often not disclosed or are considered confiden- proposed method is suitable for capturing well-exposed images tial by camera manufacturers. with enriched image features that are beneficial for computer IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2

Same process for the single camera as below Overlapped region size or gain (ISO) [7], mounting some type of density fil- Camera0 Process of auto-adjusting exposure for a single camera system ter [8, 9, 10], or designing new camera concepts such as Image Capture 훾-correction & outputting mapping 훾 estimation Diff. ratio of x 104 2 overlapping regions computational cameras [11]. Each approach has advantages 0 1 γ=0.1 γ=0.5 γ=0.8 γ=1.0 γ=1.2 γ=1.5 γ=1.9 1.5 퐼 퐼 Update 1 훾 0.5 function and disadvantages. speed affects image blur, gain is

Camera1 0 Current frame 0 0.5 1 1.5 2 0 1 1∗ 훾 푝푡 푝푡 퐸푡+1 related to , and size changes the depth of 1 퐸푡+1 Exposure field. The other methods listed above require external devices, balancing 푟푖=0,푗=1 such as a density filter or a new optical device. Among these Fig. 2: The overall framework of our camera exposure con- possible approaches, adjusting the and the gain trol. Our method adjusts camera exposures to maximize the are the most popular and desirable in vision applications gradient information of the captured images. We apply a for mobile platforms since they ideally preserve the linear γ correction technique to simulate information changes by relationship between and intensity measurement. exposure variations and then update the exposure settings Although methods such as changing aperture size or mounting using a real-time feedback system. In addition, we balance a density filter could be good complements in some beneficial the exposures of multiple cameras by analyzing the intensity situations, they can introduce artifacts2 by changing the depth differences between neighboring cameras. of field (causing spatially varying blur) or causing shifts, respectively. These approaches introduce non-linear artifacts vision algorithms. into the subsequent vision algorithms. In this paper, we focus Figure2 shows the overall framework of our proposed our review on approaches that control shutter speed and gain method. To build a real-time feedback control system, we use a and compare their criteria to determine the proper parameters. γ correction technique [5] that imitates exposure changes and We then discuss algorithmic efforts to overcome the limita- then determines the best γ value. Our framework implicitly tions of exposure control and the extensions to jointly adjust guides the direction of exposure updates in the feedback multiple cameras. system to maximize the gradient information. Moreover, in One conventional approach to achieve auto-exposure is to the multi-camera case, our method adjusts camera exposure extract image statistics and control the camera parameters by also considering neighboring cameras. to satisfy the statistics of certain conditions. The simplest In SectionII we review the existing alternatives, including approach is to measure the average brightness over an entire both direct exposure control and post-processing; however, image or in specific regions and adjust it to a midrange (e.g., we stress that none of these alternatives considers the per- 128 for 8-bit images) [4, 12].3 This intensity-based method formance of subsequent algorithms, which may benefit from may result in a proper exposure level for scenes in which enriched gradients in images. In Section III-A, we describe illumination intensities are well-distributed across all regions the development of a gradient-based criterion, based on which and where the dynamic range of the scene is narrow. However, a simulation-based control mechanism is described in Sec- in practice, a scene has multiple intensity distributions over tion III-B. We extend our prevision work [6] by 1) improving an image; thus, this simple method may lead to under- or the stability of exposure updates in Section III-C, 2) extend- over-saturation of important image details. To overcome this ing the proposed methodology to a multi-camera system in limitation, various attempts have adopted other measures that Section IV, and 3) providing further technical implementation may be more robust to scene illumination variations, such details and comprehensively analyzing the behavior of the as entropy [13] and histograms [14, 15]. In this paper, we proposed method in Section V. Finally, we conclude this maximize the sum of the log response of image gradients to work with a discussion in Section VI. Our contributions are enrich the image gradient information. summarized as follows: Several methods measure statistics over regions of interest We propose a novel camera exposure control mechanism • (ROI) to create algorithms that are robust to scene varia- that leverages image gradient information to determine a tions [14, 16, 17, 18]. These methods measure the statistics proper exposure level. Specifically, we apply the novel of an ROI such as the image’s center [12], a moving object notion of simulation-based exposure prediction, whereby or facial regions [18], of a user predefined ROI [14], or the camera exposure state is updated through a newly of foreground regions, by separating the background [16] or proposed control function. This algorithm features rapid backlit regions [17]. However, because these methods control convergence to the target exposure level. the camera exposure based on specific interest areas in an We extend the approach for multi-camera setup, where • image, they are preferable only in specific individual applica- exposure balancing issue across cameras arises. tions rather than in general scenarios. Nonetheless, ROI-based We implement a mobile system with synchronized cam- • methods are potentially useful because they can be combined eras to prove the concept of the proposed methodology and extended, allowing multiple measures to be adopted in a and to conduct fully controllable and ease experiments. similar framework. Here, we adopt the ROI approach to assess We validate our method by extensive experiments with suitable exposure levels over only interesting regions for the practical computer vision applications. surround-view imaging application, as described in Section V.

II.RELATED WORK 2Specialized hardware implementation is required as in [8, 9, 10]. Several methods exist for controlling the amount of 3Concepts of camera metering and auto exposure are introduced in: reaching a camera sensor: adjusting shutter speed, aperture http://www.cambridgeincolour.com/tutorials/camera-metering.htm. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 3

1

Other methods exploit prior knowledge of a scene to i , m , determine the proper camera exposure. Given a pre-defined 0.8 reference area ( and , brightness, or color patterns) mi ≥ δ in captured images, the exposure is adjusted based on an 0.6

0.4 intensity histogram [19], color histogram [15], and image λ=100 1 mi < δ λ=10 entropy [13]. While approaches based on prior information 2 0.2 λ=10 Amount Amount of gradient information work well in a known environment, it is difficult to assume that λ=103 λ=104 0 prior information is always present in any scene—especially 0 0.2 0.4 0.6 0.8 1 Gradient responses, m in most image processing applications for outdoor mobile i platforms. Therefore, we focus on measuring goodness-of-fit Fig. 3: Our mapping function, Eq. (1) between the gradient for exposure with no reference. magnitude and amount of gradient information according to The work of Zhang et al. is closely related to our method the control parameters, δ and λ. We design the mapping [20] because it mainly focuses on improving the performance function by analyzing general gradient characteristics. of the visual odometry application and adopts a weighted as postprocessing steps that occur after image acquisition. version of gradient information. The primary difference lies Unavoidably, scene information beyond the dynamic range of a their method for updating exposure, which is derived from camera is difficult to recover after an image has been captured; the derivative of the gradient magnitude. To derive a direct therefore, the working range of postprocessing approaches is update rule from the gradient measurement to exposure, they limited by the input. In contrast, our extension for multi- rely highly on radiometric calibration. Our method does not camera setups allows all the cameras to obtain coherently require radiometric calibration. Instead, we directly update the exposed images; thus, subsequent image processing algorithms exposure based on a simulation process. benefit from this image stability and robustness. To overcome the limitations of hardware and exposure control, high dynamic range (HDR) imaging provides a way III.GRADIENTBASED AUTOMATIC EXPOSURE CONTROL to obtain well-exposed images that are both visually and A. from an image-processing perspective physically compelling (i.e., recovering irradiance). Because all the difficulties of auto-exposure stem from the limited Intensity gradient is one of the most important cues for dynamic range of camera sensors, combining multi-exposure processing and understanding images. Most image features images can be an alternative, typical HDR approach [21, 22]. such as edges, corners, SIFT [34], and HOG [35], leverage the Maximizing the visual information in an HDR result requires robustness of the intensity gradient to illumination changes. a set of images with different exposures [23, 24]. However, Moreover, the gradient information also characterizes object such multi-exposure approaches are not suitable appearance and shape well; consequently, it is typically ex- for dynamic scenes. While HDR methods for dynamic scenes ploited in applications requiring a mid-level understanding of exist [25, 26, 27], recovering the true irradiance of the dynamic image content (e.g., object detection, tracking, recognition, and portion is challenging when the inputs are captured at different simultaneous localization and mapping (SLAM)). Therefore, times [28]. Even when a pair of low dynamic images from a capturing images with rich gradient information is an impor- stereo camera are captured simultaneously to generate an HDR tant first step in the success of many vision algorithms. In this image [29], the images may still suffer from misalignment section, we explain how the quality of exposure of a scene is artifacts due to imperfect stereo matching. This problem led evaluated in terms of gradient quantity. to the development of a practical hardware implementation for To evaluate image quality from a computer vision perspec- HDR capture [8] that simultaneously captures several different tive, it is natural to exploit the gradient domain since gradient exposure images of the same view. In this work, rather than is the dominant source of visual information for many com- adopting an expensive specialized hardware setup, we develop puter vision algorithms [36]. We define well-exposed images a method to obtain well-exposed images from off-the-shelf as those that have rich gradient information; therefore, we cameras. evaluate exposure by the total amount of gradient information Previous studies have mostly relied on single camera setup, in an image. while multi-camera exposure control has rarely been investi- The gradient magnitude of a natural scene has a heavy-tailed gated. In practical image processing applications for mobile distribution [37, 38]; consequently, most gradients have small platforms, multi-camera setups are popular, and acquiring values (including noise or zeros) relative to the maximum images with similar brightness between cameras is required gradient value, and only sparse strong gradients exist. Because in many applications. Nonetheless, exposure differences be- these strong gradients typically occur around object bound- tween cameras is common and such differences may be quite aries, they have a high probability of embedding important large due to different viewpoints when exposure control is information. On the other hand, gradients are sensitive to applied to each camera independently. Conventional multi- subtle intensity variations (i.e., small magnitude gradients such camera vision applications have been developed with spe- as image noise should be filtered properly). cially designed algorithms to compensate or overcome bright- To balance the importance of weak and strong gradients, we 4 ness differences. These algorithms include correspondence use a logarithmic function to map the relationship between matching [30], panoramic imaging [31] and visual odometry 4In information theory, the quantity of information is often measured by a [32, 33], etc. However, all these approaches are applied logarithmic function, e.g., entropy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 4

d=0.4 Kp=1.0 2 2 2 gradient magnitude and the quantity of gradient information. Kp = 1.0 Kp = 1.0 d = 1.0 Kp = 0.5 Kp = 0.7 d = 0.7 Kp = 0.2 Kp = 0.4 d = 0.4 Kp = 0.1 d = 0.1 However, logarithmic mapping may over-emphasize small 1.5 1.5 1.5

Et+1 Et+1 Et+1 gradients that occur due to image noise. To perform mapping Et Et Et that is robust against image noise, we modify the mapping 1 1 1

0.5 0.5 0.5 function to discard small noise values using a simple threshold, 1/2 1/1.5 1 1.5 2 1/2 1/1.5 1 1.5 2 1/2 1/1.5 1 1.5 2 . defined as follows: ^ .^ .^ 1 (a) Linear update (b) Nonlinear update (c) Nonlinear update N log(λ(mi δ) + 1) for mi δ m¯ i = − ≥ (1) function by Kp function by Kp function by d 0 for mi < δ  Fig. 4: Examples of linear and nonlinear update functions with 5 where N = log(λ(1 δ) + 1), mi is the gradient magnitude − varying values of the control parameters Kp and d. at location i, δ is the activation threshold value, λ is a control parameter to adjust mapping tendencies, and m¯ i Computing Eq. (2) for all possible γ values would not represents the amount of gradient information corresponding be a negligible operation for high-resolution image regimes. to gradient magnitude. In addition, N is a normalization factor However, computing the gradient information at such a fine that restricts the output range of the function to [0, 1]. scale may be unnecessary because exposures are insensitive to Eq. (1) has two user control parameters, δ and λ, that subtle gradient differences in the image domain to some extent; allow users to tune our method based on their needs. The hence, as a trade-off between computational cost and accuracy, parameter δ determines the activation threshold value: the an effective number of anchor images and image resolution mapping function regards a gradient value smaller than δ as must be determined. To determine a proper option, we reveal noise and ignores it. The parameter λ determines the map- the trade-off relationship through synthetic experiments in ping tendencies. We can emphasize strong intensity variations Section V-A3. by setting λ to a small value or emphasize subtle texture At runtime, to improve the accuracy of the γˆ estimate, variations by setting λ to a large value. Figure3 shows a we first compute the gradient values for each anchor image plot of the mapping functions with varied control parameters. and then fit the gradient values with a fifth-order polynomial. Using Eq. (1), we compute the total amount of gradient We pick the maximum value of the polynomial function in 1 information in an image as M = m¯ i. Our method regards the range of γ = [ 1.9 , 1.9] and assign its corresponding γ images with larger values of M as better-exposed images that value to γˆ. The range parameters are empirically determined; P contain more rich gradient information in a scene. In all the the related experiments can be found in the supplementary experiments in this work, we empirically set δ and λ to 0.06 material. and 103, respectively. The related experiments are presented in Section V-A2. C. Camera exposure update function B. Auto-adjusting camera exposure In this section, we describe the update rules for camera exposure given γˆ. The update rules are designed such that Our method adjusts camera exposure at each frame by the current exposure will move toward a value that eventually increasing the proposed criterion in Eq. (1). One challenge results in γˆ being 1. We propose two methods to update the is that the true relationship between exposure time and sensed camera exposure: a linear update function and its extension gradient (or intensity) is unknown [40]; instead, complex to a nonlinear version. These two functions are designed to imaging pipelines exist in modern cameras. Revealing the adjust camera exposure in inverse proportion to γˆ. Figure4 (a– effects of such pipelines is another challenging research area. c) shows how the linear and nonlinear update functions work Thus, rather than modeling the complex relationship explicitly, based on the control parameters, K and d. The details of the we propose an alternative approach that avoids the difficulty p update functions with the control parameters are described as involved in modeling the imaging pipeline. We develop a follows. simulation-based feedback system that allows us to find the Linear update function. The linear update is defined as approximate direction of the exposure to be updated. We simply adopt a γ-mapping to simulate exposure changes, 1/2 for γˆ 1 Et+1 = (1 + αKp(1 γˆ))Et, α = ≥ which roughly approximates non-linear imaging pipelines as − 1 for γˆ < 1,  well as under- or over-saturation effects. We generate γ- (3) γ 6 where E is the exposure level at time t, and K is the mapped images Iout = Iin from the current input image Iin . t p The γ-mapping results in a darker image when γ<1, and in a proportional gain that controls the convergence speed by brighter image when γ>1. Using this approach, we simulate adjusting the maximum and minimum values of the exposure exposure changes from a batch of γ-mapped images, compute update ratio. Figure 4 (a) shows the tendency of this function the amount of gradient information for each image, and then according to Kp. As shown in the figure, there is a trade-off find a γ such that maximizes the gradient information: between the convergence speed and the stability of the update γ function. A high Kp value causes the feedback system to catch arg maxγ M(Iin). (2) up quickly but may cause oscillation and overshooting. The oscillation and overshooting 5In this paper, the gradient magnitude is computed by the Sobel opera- Nonlinear update function. tor [39], and we normalize the magnitude to a range of [0, 1]. problems of the linear update function are caused by the non- 6We assume that intensity is in the range of [0, 1]. smooth transition of exposure update ratios at the convergence IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 5

Algorithm 1: Gradient-based Camera Exposure Control

1 function Main (camerai, cameraj = Null); Input : Camera index (i) and its neighboring cameras (j) 2 Set Camera Mode(camerai, built-in AE); 3 pause(1); (a) Built-in auto-exposure (b) Our proposed method 4 Set Camera Mode(camerai, manual); 5 while camerai is alive do Fig. 5: The images in this figure show one of the worst cases 6 Ei Get Exposure (camerai); ← 7 Ii Get Image (camerai); in the performance of the surround view application caused ← 8 for γ in anchors do by low inter-image consistency. Images (a) and (b) show the γ 9 Ii Correction (Ii, γ); γ← γ original images and the result of the surround-view image 10 M Compute Gradient (I ), Eq. (1); i ← i using the built-in camera auto-exposure and the proposed 11 end γ method, respectively. Each camera is individually adjusted 12 γˆ argmaxγ M , Eq. (2); ← i 13 newEi Update Exposure (γˆ, Ei), Eq. (3 or 4); without exposure balancing. ← 14 if cameraj is not Null then 15 Ij Get Image (cameraj ); ← point (γˆ = 1.0) (the curved point in Figure 4 (a)). To relieve 16 newEi Balancing (Ii,Ij ,Ei,newEi), Eq. (5); ← this problem, we designed a new update function that is 17 end parametric and nonlinear. This nonlinear update function is 18 Set Exposure (camerai, newEi); defined as follows: 19 end E = (1 + αK (R 1))E t+1 p − t s.t. R = d tan (2 γˆ) arctan( 1 ) arctan( 1 ) + 1, · { − · d − d } achieves gradient-based camera control and simultaneously (4) maintains brightness consistency between neighboring images. where the same α with Eq. (3) is used. The R mapping On top of our gradient-based exposure control, we formulate is designed to realize curved shapes. The nonlinear update an optimization problem that determines camera exposure function has two control parameters Kp and d. Figure 4 (b) by considering a balance between individual image quality and (c) show how the parameters Kp and d work to control and brightness constancy. Given the desired camera exposures the convergence speed and stability of exposure update ratios. obtained by Eq. (4), our optimization function considers the As with the linear update function, Kp controls the speed of desired exposure as a unary term and brightness similarity convergence from the current exposure level to the desired in overlapped regions as a pairwise term. The optimization exposure level. The additional control parameter d controls the function is defined as follows: nonlinearity of the update function. As shown in the figure, i i 1 αi a smaller d value causes greater nonlinearity of the slope of Et+1∗ = arg min α u(i, t) + − p(i, j, t), N j G(i) X ·E ∈ E the update ratios, which results in a smoother transition at the i 2 X convergence point. s.t. u(i, t) = X Et+1 , E k − ij k i 2 (i, j, t) = X r E ∗ , Ep k − · t k mean(pj ) IV. EXPOSUREBALANCINGFORMULTI-CAMERASYSTEM rij = median , p P, mean(pi)+ ∈ Figure 5 shows an example of a surround-view imaging   (5) application that stitches multiple camera images together. where i denotes a target camera on which the exposure level Even though the images captured by each camera have a should be updated, j denotes a neighboring camera whose reasonable quality, the brightness inconsistency among the field of view overlaps with camera i, N is the number of neighboring cameras, and t denotes the time sequence. Here, images degrades the overall quality of the surround-view im- i age. This type of problem in multi-camera systems can easily Et+1∗ indicates an estimated optimal exposure of camera i for the t + 1 frame (next frame), and Ei is the estimated be observed in many applications, such as 360◦ panoramic t+1 imaging and multi-camera tracking when each camera controls of camera i from Eq. (4). P represents a set of overlapped patches between two cameras, mean(p) is its exposure individually. In contrast, when all the cameras ij share the same camera exposure to achieve constant brightness the average patch intensity, and r is the median value of levels, the system captures only a narrow dynamic range and the relative brightness ratios between corresponding patches may lose important scene information. from cameras i and j. We use the median ratio of the Using our gradient-based exposure control approach, we average patches to make our solution robust to image noise also address the problem of advanced exposure control for and misalignment of the overlapped regions. multi-camera systems, which are popular in many vision sys- The exposure values among cameras were balanced by our tems such as autonomous vehicles. While independent control exposure balancing method in Eq. (5). The control parameter of each camera enables capturing of better image features, α in Eq. (5) adjusts the weights of the unary and pairwise many cases exist for multi-view image processing such as terms. To account for varying illumination conditions well, panoramic imaging and stereo matching that favor maintaining we updated α as follows: consistent brightness among adjacent cameras. To address such (1 Ri) + 0.5 if Ri < 1.0, αi = − (6) cases, we present an exposure balancing algorithm that both Ri/2 if Ri 1.0,  ≥ IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 6

0.2 exposure estimation, and the update functions described in /=0 /=0 0.15 /=0 Section III-A, Section III-B, and Section III-C, respectively. /=0.04 /=0.04 In the real experiments, we show the effectiveness of our 0.1 /=0.04 /=0.10 Count /=0.10 proposed method with five image processing applications. 0.05 /=0.10 The surveillance and automotive visual odometry experiments 0 0 0.2 0.4 0.6 0.8 1.0 demonstrate the results of the proposed single-camera ex- Gradient posure control method (Ours-single, henceforth). We also (a) Normalized histogram of gradient according conduct experiments using surround-view, panoramic imaging, to δ and stereo matching applications using the proposed multi- 0.05

6=10 0, /=0 camera exposure control method (Ours-multi, henceforth). 0.04 2 6=10 , /=0 We compare our method to two conventional methods, a 6=10 4, /=0 0.03 6=10 0, /=0.04 camera built-in auto-exposure method (AE, henceforth) and a 6=10 2, /=0.04 0.02 6=10 4, /=0.04 manually tuned fixed exposure setting (ME, henceforth). We 6=10 0, /=0.10 0.01 6=10 2, /=0.10 perform all the experiments using machine vision cameras 6=10 4, /=0.10 0 Gradient Magnitude 0 0.2 0.4 0.6 0.8 1.0 that have a linear camera response function and configure Gradient the initial camera exposures using the AE method. All the (b) Normalized histogram of gradient parameters in our method are fixed for all the experiments. magnitude according to λ and δ 10 You can find additional materials such as video clips on our /=0.00 7 8 /=0.02 project webpage . /=0.04 6 /=0.06 /=0.08 /=0.10 4 /=0.12 A. Analysis of our proposed method /=0.14 2 /=0.16 /=0.18 1) Synthetic experiments using HDR: We devise a syn-

Est. Exposure (ms) 0 10 0 10 1 10 2 10 3 10 4 thetic simulator-based experiment to facilitate the complicated 6 comparison process in reproducible test environments. We (c) Relationship between estimated exposure mimic a simplified camera imaging pipeline for rendering times and the parameters, λ and δ low dynamic range images (LDR) from high dynamic range Fig. 6: Changes of gradient distributions according to the λ images (HDR) based on desired exposure times and pre- γ and δ parameters: (a) normalized histogram of the gradient by compute the ground truth values for and exposure. Given this δ, (b) distributions of gradient magnitude by λ and δ, and (c) controlled simulator, we tested various setups to observe the the relationship between desired exposure times and the user relationship between exposure times and our gradient metric λ δ parameters. Additional experimental results can be found in according to the and values in 2), the trade-off relationship the supplementary material. between the number of anchor and image resolution in 3), and the convergence speed of the update functions in 4). where i is the camera index, and R is the exposure ratio 2) Analysis of image quality metric: We analyze the gradi- between the current and estimated exposure values of our ent statistics of a sample HDR scene according to the user Ei i t+1 parameters described in Section III-A. Figure 6 shows the update function as R = Ei [0.5, 2.0]. Therefore, we t ∈ give a large weight to the unary term for fast scene adaptation gradient distributions, from which we can observe the effects when the exposure levels vary widely, and we increase the of changing λ and δ values on scene characteristics and desired influence of the pairwise term to quickly converge to an exposure times: (a) shows the histogram changes resulting exposure balancing point when the exposure levels are largely from the activation threshold δ. A high δ value removes stationary. considerable gradient information when computing gradient In this optimization, ( ) and ( ) denote the unary and magnitude and adversely affects the exposure estimation; (b) Eu · Ep · pairwise terms, respectively, and the two terms are balanced by shows the gradient magnitude over the entire gradient range; α. Therefore, a large α encourages more gradient information (c) shows the relationship between the estimated exposure and in a scene while a small α enforces the brightness similarity the two parameters. between cameras. Eq. (4) has a closed-form solution. For each Given the results of this experiment, we empirically set 3 frame, we solve Eq. (4) and Eq. (5) only once for each camera δ and λ to 0.06 and 10 , respectively. These values show i because they will eventually converge through our feedback good patterns of the gradient distributions and they work system. This procedure allows the exposure parameters to consistently on the synthetic simulation compared to other progressively adapt to a scene. Algorithm 1 describes the entire parameter values. We fix these two parameters throughout all process of the proposed method. the subsequent experiments. 3) Performance analysis according to the number of an- chors and image resolution: Figure 7 shows the performance V. EXPERIMENTAL RESULTS variations according to the number of anchors and image To validate the performances of our proposed methods, we resolution. We measure the error against the ground truth conduct both synthetic experiments and real experiments. In γ, which is obtained by exploiting all the finely quantized the synthetic experiments, we analyze the various components of our proposed method regarding image quality metric, 7 https://sites.google.com/site/iwshimcv/home/multicamera IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 7 candidate γ values. Figure 7 (a) shows that at least five leads to noticeable behavior differences in Zhang’s method, anchors are required to achieve a stable γ estimation. The we tested it with varying boundary parameters, i.e., (70, 190), computational speed depends on both the image resolution (50, 210), (30, 230), and (10, 250), in Figure 9. For example, and the number of anchors. (b) shows the processing time Zhang–(70, 190) controls the exposure based on its gradient of the proposed method with different image resolutions, and metric only when the mean intensity of a captured image is (c) shows how the image resolution affect the accuracy of between 70 and 190. In the range from the 1-st to 20-th frame the estimated exposure. In terms of our proposed metric, we in Figure 9-(top), Zhang–(70, 190),–(50, 210), and –(30, 230) consider the exposure computed from an image resolution of control the exposure using mean intensity only, because the 2, 560 1, 280 as a reference. given mean intensities are outside of the predefined bound- × In this section, we investigate the relationship between the aries. After that, they switch their control scheme into the effective number of anchor images and image resolution to gradient-based method. Zhang–(10, 250) does not update the understand the trade-offs between the computational cost and exposure, because the camera response function was unable accuracy. To determine a proper value, we provide the trade- to provide gradients able to update at low intensities from the off relationship through synthetic experiments in Figure 7. initial frame. We also observed that Zhang et al. is sensitive Given the results of this experiment, we choose seven to the quality of the radiometric calibration. anchors, γ [ 1 , 1 , 1 , 1.0, 1.2, 1.5, 1.9], and fix them We also implemented a reference method using the standard ∈ 1.9 1.5 1.2 through all the subsequent experiments in this work. These PID controller. Because we can consider that the optimal anchors have similar errors as other larger anchor sets but value of γˆ should be one when the exposure is desirable, achieve a reasonable processing time. For the image resolution, we use the difference between the estimated γˆ and 1 as we adopted a resolution of 320 240 pixels to achieve the input error to implement the PID-based method. We × real-time performance (14.15ms using an Intel Core i5- tuned the PID parameters based on the Ziegler–Nichols tuning [email protected] without parallel processing); this resolution method [41, 42]. While we attempted to further tune the preserves most of the major structures in the original image. PID update function and find the model that works best, we 4) Analysis of the convergence speed of the update methods: did not find parameters that achieved faster convergence than To determine the proper parameter values for the linear and our proposed nonlinear update method with small overshoot nonlinear update functions, we ran a parameter-sweeping and small oscillations . Our linear update function performs experiment. To conduct this experiment, we control the en- comparably to Zhang et al.and the PID-based method, and vironment in a dark room with controllable illumination for our nonlinear update function performs favorably against all two reasons: 1) the parameters are related only to the conver- the other methods in terms of the exposure convergence gence speed and stability; thus, various levels and continuous speed, while its qualitative results at the converged points are illumination variation is necessary, and 2) the parameters are comparable (see the supplementary material). invariant to the absolute value of illumination but only relevant to its changes; thus we can generalize the results to a real B. Single-camera experiments environment. 1) Implementation: Figure 10 (a) shows the camera sys- We used five LED light sources. Each LED was switched tem used to evaluate the single-camera exposure control. For on and off repeatedly at different time intervals controlled comparative evaluation, we used three Flea3 cameras, each by digital timers. All the sets of parameters were tested at of which is equipped with a Sony ICX424 CCD 1/300 sensor intervals of 0.1, and we empirically evaluated the convergence with a 640 480 resolution and a 58.72 dB dynamic range. × speed and stability for each parameter set. Figure 8 shows the The three cameras are placed in parallel and synchronized results with the best-performing parameters, which resulted in by using an internal software trigger. Each camera’s exposure fast convergence with low overshoot and small oscillations. parameters were determined by AE, ME, and Ours-single. Due to the smooth slope of the nonlinear update function We used two camera parameters, shutter speed (exposure around the convergence point (1, 1), the nonlinear update time) and gain, because controlling the shutter speed affects function rarely suffered from overshoot or oscillation, while the frame rate, which is often critical for an application. We the linear update function suffered from both problems. describe both parameters as the exposure level E in Eq. (3). In We compared the proposed update methods with Zhang et our implementation, to increase E, we first control the shutter al.[20] and standard PID [41] using the synthetic experiment speed based on the value of E until it reaches a pre-defined described in Section V-A1. To implement the method of maximum value (we set it to 25.51ms for the following two 8 Zhang et al., intensity boundary values are used as addi- applications, surveillance and automotive visual odometry). tional control parameters to limit the operating range and Then, after the shutter speed reaches the maximum value, we avoid uncontrollable situations. When the mean intensity of increase the gain. In the opposite case, the shutter speed is the captured image is within the predefined boundaries, the adjusted when it is less than the maximum value. method works plausibly. However, when the mean intensity is 2) Surveillance application: To validate Ours-single in a outside the boundary range, this method controls the exposure surveillance application, we recorded image sequences every by relying solely on image intensity. Because this parameter two hours from 8:00 to 18:00 each day. We collected two types of datasets. 8We obtained the original code from the authors, and we used all the parameters suggested by the authors. The intensity boundary parameter Dataset. One dataset was collected from the three cameras (70, 190) is the original parameter used by the authors. after the cameras reached steady-state (SURVEILLANCE-A). IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 8

0.4 10 4 14

0.3 1550.24 ms 12 982.58 ms 10 3 0.2 436.76 ms 388.57 ms 10 247.16 ms 109.27 ms 8 0.1 2 87.91 ms 10 56.41 ms 6 0 25.19 ms 22.14 ms 14.15 ms 2560x1920 4 10 1 6.51 ms -0.1 4.41 ms 1280x960 3.09 ms 2 640x480 -0.2 1.38 ms 1.46 ms 0.88 ms 320x240 0 10 0 0.61 ms 160x120 0.40 ms -0.3 0.34 ms 80x60 -2 0.15 ms 40x30 Total Processing Time (ms) -0.4 10 -1 80x60 40x30 640x480 320x240 160x120 3 5 7 9 11 3 5 7 9 11 1280x960

Mean Err. and Std of Est. Gamma 2560x1920

Number of Anchors Number of Anchors Mean Err. and Std. of Est. Exposure(ms) Image Resolution (a) Mean errors and its std. of estimated (b) Processing time according to the (c) Mean errors and its std. of estimated γˆ according to the number of anchors number of anchors and image resolutions exposure according the image resolutions Fig. 7: Performance analysis based on the number of anchors and image resolutions: (a) the image resolution was fixed to 2560 1920 to test varying numbers of anchors, and (c) we used 11 anchors to analyze the effects of varying the image × resolution.

Auto Ours Manual 25 dB Kp = 0.2 Kp = 0.2, d = 0.75 AE 20

15 5s (a) (b)

10 0 400 800 1200 frames Fig. 10: Camera systems used for the experimental validations. The three cameras in (a) have the same hardware specifica- Fig. 8: Camera parameter changes according to illumination tions, and they are synchronized using an internal software changes. The built-in auto exposure method has different trigger. Each camera’s exposure parameters were determined exposure values than our method because it uses different by AE, Ours-single, and ME, respectively. (b) shows the multi- 150 exposure evaluation metrics. camera system on a mobile robot (Clearpath HUSKY A200). 12 The four-camera system has the same hardware specifications,100 10 and they are synchronized by a hardware trigger. 8 nonlinear 50 linear 6 PID 08:00 inliers of number The AE ME Zhang(70,190) 10:00 Ours 4 0 Zhang(50,210) 14:0016:0018:00 Zhang(30,230) 12:00 Dataset 2 Zhang(10,250) 14:00 Exposures (ms)

0 16:00 20 40 60 80 100 120 140 18:00 30 Frames Min. Exposures Max. 25 nonlinear linear #10 4 20 PID 6 AE Zhang(70,190) ME 15 4 Ours

10 2 5 Exposures (ms)

Gradient Magnitude 0 5 10 15 20 25 30 35 40 45 50 08:00 10:00 12:00 14:00 16:00 18:00 Frames Fig. 9: Convergence speed comparison of update methods in Fig. 11: Real scene comparison in the SURVEILLANCE-B the synthetic simulator. The top and bottom graphs show the dataset. The images in the top rows were captured every respective scenarios under increasing and decreasing expo- two hours from 8:00 to 18:00. In each row, we display 21 sures. The differences between the saturation points of our images sampled from 210 different exposure images at every methods and those of Zhang et al.[20] are caused by the use time step. The graph at the bottom shows bar graphs of the of different image quality metrics. gradient magnitudes corresponding to the 21 images at each time step. For example, the third bar in the first column (08:00) Sequences were recorded for approximately 10 minutes at indicates the gradient magnitude of the third image in the first every time step. This dataset was used to compare the perfor- row of the top. The , green, and markers indicate the mance of pedestrian detection as a surveillance application; ap- best exposures (steady-state exposure) recommended by the proximately 2, 500 pedestrians appeared in the sequences.For respective methods at each time step. the ME method, we used the same parameter in the dataset, which was initially determined to be 8:00 by manual tuning. in a real daytime scene from the SURVEILLANCE-B dataset. The other dataset was acquired by sweeping the full At each acquisition time, we annotate the color markers range of possible exposure levels to validate our algorithm to indicate the best-exposed image as recommended by the (SURVEILLANCE-B). We sampled 210 exposure parameters respective methods. While we tuned ME against the scene at covering the full exposure range; therefore, the dataset con- 8:00 AM, its fixed parameter obviously causes the results to sisted of 1, 260 (= 210 6 time steps) images. suffer from illumination changes over time. The AE approach × Comparison of steady-state exposures. Figure 11 shows a excludes textural details due to the dominant illumination comparison of the results of our method with AE and ME from the sky region. In contrast, Ours-single properly adjusts IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 9

0.53 0.68 0.74 0.79 0.84 0.95 50 14:00 50 50 16:00 P1 40 18:00 40 40 P1 P2 30 P1 30 30

20 20 20 P2 10 10 10

0 0 0 P2 1.90 1.90 1.90 1.87 1.09 1.06 -10 -10 -10

-20 -20 -20

Fig. 12: Progress of our feedback system. We put the leftmost -40 -30 -20 -10 0 10 -40 -30 -20 -10 0 10 -40 -30 -20 -10 0 10 images into our feedback system, and the system iteratively (a) AE (b) ME (c) Ours-single converges to the rightmost image. The numbers indicate the γˆ values estimated by Ours-single. Fig. 15: Trajectories estimated by the visual odometry [45]. Because the vehicle followed a homing path, the ground-truth 1 1 values of the start and end points of all trajectories are centered

0.5 0.5 e e on (0, 0) a t a t r r P1 P2 P1 P2 P1 P2 s s s s

m i 0.2 AE m i 0.2 AE ME ME Ours Ours 0.1 0.1 10 -1 10 0 10 1 10 2 10 -1 10 0 10 1 10 2 P1 P2 P1 P2 P1 P2 DPM, fppi SSD, fppi

Fig. 13: Quantitative evaluation results of the pedestrian de- tection experiment. Results closer to the bottom left indicate (a) AE (b) ME (c) Ours-single better performance. The graphs show the false positive per Fig. 16: Images are shown from two locations on the path as image (fppi) of DPM [43] and SSD [44]. indicated in Figure 15. The top row shows images at 14:00 the camera parameters to capture images with the maximum and the bottom row shows images at 18:00. The green lines gradient information at every time step. Our method is more indicate tracked image features between adjacent frames. likely to result in an exposed image that both quantitatively our feedback system until it converged. Our system converged and qualitatively contains richer textural details than the results to the rightmost image, which demonstrates that our method of the other methods. can reliably adjust camera exposure even in extreme cases. Validation of our feedback system. In Figure 12, we show The numbers in the figure indicate the γˆ values estimated by the feedback system procedure, which is explained in Sec- Ours-single. We can observe that γˆ converged to one as the tion III-B. To simulate extreme cases in which the illumination output images converged to the appropriate exposure level in varies rapidly, we applied Ours-single to SURVEILLANCE-B. the rightmost image. In the figure, the output images of our feedback system are presented from left to right. We first put both an under-exposed Note that the scene has a dynamic range that exceeds image (the leftmost image on the top row) and an over- the dynamic range of our camera. In the same situation, exposed image (the leftmost image on the bottom row) into our the AE method wiped out important details in low radiance feedback system. From each image, our algorithm estimated γˆ areas to prevent saturation in the sky region, as shown in and updated the camera exposure according to Eq. (3). Using Figure 14. Our method naturally adjusts camera exposure to an updated camera exposure, we took an image that matched emphasize important details by evaluating camera exposure in the exposure parameter in the dataset, and iteratively applied the gradient domain. Pedestrian detection. In this experiment, we compared the pedestrian detection performance with that of other exposure Auto Auto Manual Manual Ours Ours methods. Pedestrian detection is an important task in surveil- lance. The pedestrian detection sequences for this experiment came from the SURVEILLANCE-A dataset, and we used two Auto Auto Manual Manual Ours Ours types of detectors: hand-craft gradient feature based pedestrian detector, the DPM [43] and the convolutional neural network 9 Auto Auto Manual Manual Ours Ours based object detector, SSD [44]. Figure 13 shows the quantitative evaluation result of the

Auto Auto Manual Manual Ours Ours pedestrian detection experiment. For this evaluation, the ground-truth was manually labeled for all the data. Follow- ing [48], we adopt the miss rate against false positives per image (FPPI) as an evaluation metric. The evaluation metric (a) AE (b) ME (c) Ours-single indicates better performance as results get closer to the bottom Fig. 14: The images show example results of pedestrian left; therefore, it shows that our method has better capability detection and visualizations of feature spaces using HOG- of preserving the details of images at the capturing stage than gles [36]. All the input images are from the SURVEILLANCE- do the conventional AE and ME methods. A dataset, and the images in each row were captured at the same time. From top to bottom, the images were captured at 9The SSD model were pre-trained on VOC 2007 and 2012 datasets [46, 47] approximately 8:00, 14:00, 16:00, and 18:00. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 10

Figure 14 shows some example results of pedestrian de- 150 18 16 AE ME tection. The images obtained by the AE method are under- 14 Ours exposed due to the high illumination in the sky region, and 100 12 the images obtained by the ME method are over- or under- 10 8 exposed due to illumination changes. More importantly, the 50 6 4 pedestrian detector was sensitive to image quality; it failed to AE Distance error ratio (%) ME The average inliers The of number 2 detect humans in poorly exposed images. Ours 0 0 14:00 16:00 18:00 14:00 16:00 18:00 We used HOGgles [36] to visualize images from an image Dataset Dataset processing perspective to qualitatively evaluate the exposure (a) Inliers (b) Distance error ratio control algorithm. HOGgles inverts HOG feature spaces back to a natural image; therefore, it is useful for understanding Fig. 17: Quantitative evaluation results of the visual odometry how a computer sees visual images. In Figure 14, the HOG- experiment: (a) means and standard deviations of the number gles visualizations of our method consistently show detailed of inlier features, (b) relative distance errors. features despite large illumination changes, while the AE and in (a), and the distance error ratios of each trajectory are ME methods were unable to preserve the visual information in presented in (b). Distance error ratio is computed by dividing low radiance regions due to incorrect exposures. The HOGgles the distance error between the starting point and ending point visualization clearly demonstrates that our method is more of a closed-loop trajectory by the length of an estimated suitable than are the conventional AE and ME methods for trajectory. The distance error ratio is used as an evaluation outdoor computer vision tasks. metric because an estimated trajectory has a scale ambiguity 3) Automotive visual odometry application: We performed in monocular visual odometry. In the figure, the results of the visual odometry experiments using the automotive driving our method consistently extracted a larger number of inlier dataset. Visual odometry is the process of incrementally es- features and resulted in smaller errors than did the AE and timating the pose of a vehicle by analyzing images captured ME methods. For robust estimation, a larger number of inlier by a vehicle-mounted camera. This experiment shows how features is preferred, which is the main reason that our method the image-capturing methods affect the feature extraction achieved better results than the others. performance, which reveals the cumulative effects on tracking and the trajectory path results. Dataset. In the automotive visual odometry experiment, we C. Multi-camera experiments collected images taken from a vehicle driving through a cam- To evaluate Ours-multi and demonstrate the versatility of pus. To validate the performance under various illumination our proposed method, we performed experiments with three conditions, we tried to drive almost the same path three times, applications: surround-view imaging, panoramic imaging, and at 14:00, 16:00, and 18:00 hours. The exposure parameter for stereo matching. ME was initialized at 14:00. We chose a homing path to easily 1) Implementation: Figure 10-(b) shows our multi-camera measure the translation error between the starting and ending experiment system with a mobile platform, the Clearpath points of the path. This error measurement has been used to Husky-A200. The multi-camera system has four cameras syn- evaluate many localization methods [49, 50] because only a chronized by a hardware trigger, which generates synchroniza- few reference points are needed rather than the ground truth tion signals at a of 20Hz. For each camera, we used for the entire path. a Sony ICX445 CCD 1/300 sensor with a fixed Visual odometry. To conduct visual odometry, we calibrated of 1.4mm. Thus, each camera had a 1288 964 resolution × the camera’s intrinsic parameters using the method of [51] and with a 58.44 dB dynamic range and a 185◦ 144◦ field-of- × calibrated the pitch angle and height from the ground using the view (FoV). The multi-camera system was fully calibrated, vanishing points of an image. We used the monocular SLAM including both intrinsic and extrinsic parameters. To achieve algorithm [45] for this experiment. the desired exposure level, we first adjusted the shutter speed Figure 15 shows the visual odometry trajectories using until it reached the pre-defined maximum value. Then, we representative source images from AE, ME, and Ours-single. continued by the gain—the same approach as was used in Because Ours-single method captures richer gradient images, the previous single-camera experiments. the subsequent algorithm extracts better features for tracking In the subsequent experiments, the maximum shutter speed and performs more consistent localization with the smallest was set to 49ms, and both the shutter speed and gain param- end-point errors across different times. eters of each camera were adjusted by the proposed nonlinear Images captured at two locations (P1 and P2) along the path update function Eq. (4). are shown in Figure 16. We can observe that the illumination 2) Surround-view imaging application: Surround-view conditions varied widely across both space and time. Neither imaging (Top-view) application is one of the most popular the AE nor the ME methods were able to successfully sup- vision techniques in the automobile industry. A surround-view port feature tracking due to under-exposures. In contrast, our image for a vehicle is commonly created by stitching the method captured well-exposed images and successfully sup- ground parts of images from multiple cameras together; these ported the extraction of image features conducive to tracking. images are mainly used to assist drivers while parking. In Figure 17, we present the quantitative evaluation results. In this experiment, we generated a surround-view image The statistics of the number of inlier features are presented from four partially overlapped input images as shown in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 11

Surround-view Close-ups Intensity profile Auto Ours-single Ours-multi 250 휇 = 1.31, 휇 = 1.27, 휇 = 1.15, 휎 = 0.07 200 휎 = 0.23 휎 = 0.26

150

100

50

0 200 400 600 800 1000 1200 1400 1600 (a) AE 250 Fig. 19: Histogram of the intensity ratios for all overlapped 200 regions 150 100 view images). 50 For the quantitative evaluation, we captured image se- 0 200 400 600 800 1000 1200 1400 1600 (b) Ours-single 250 quences using a mobile robot with a multi-camera system

200 under various illumination conditions (, cloudy, park- 150 ing lot, and nighttime scene). We drove along almost the 100 same path three times using different methods (AE, Ours- 50 single, and Ours-multi) to obtain images for each dataset.

0 200 400 600 800 1000 1200 1400 1600 (c) Ours-multi Figure 19 presents the histograms of the intensity similarities Fig. 18: A qualitative comparison of the surround-view ap- of all the overlapped regions. These normalized histograms plication using different exposure methods: (a) AE, (b) Ours- were computed by the intensity ratios between images from single, and (c) Ours-multi. The image patches in the second neighboring cameras in the overlapped region for all candi- column indicate regions in the vicinity of the green, red, dates. The histograms also plot an estimate of the probability , and gray points in the surround-view images. The last density function for intensity ratio. The result obtained by column shows the 1D signals of the pixel values on the Ours-multi had the smallest mean value and standard deviation lines of the surround-view images. We recommend that readers of the intensity ratio, which means that the cameras controlled zoom-in to see the details clearly. by Ours-multi result in the most similar images in the intensity space. This helps generate more natural surround-view images without post-processing steps, such as color transfer [52]. Figure 18. We pre-calibrated the relative poses between cam- Additional results from the surround-view imaging experiment eras using a flat-ground assumption and applied a simple can be found in the supplementary material. linear blending method [7] as a post-processing step.. These 3) Panoramic imaging application: Panoramic imaging is overlapped regions are used to compute the relative brightness another good example to demonstrate the effectiveness of our ij ratio r in Eq. (5) to balance the exposure across cameras. In method for multi-camera exposure control. Panoramic imaging this application, we consider only the bottom-half region of the is similar to the previous surround-view imaging application; original image domain (corresponding to the ground region) however, rather than only the ground portion, this application because only the ground regions of images are displayed to considers the entire region of images. Thus, it is important ij users. We compute r over the bottom half region. to be able to capture informative image features from scenes Figure 18 shows a comparison of the surround-view image with an extremely wide dynamic range. results from different exposure control algorithms. In the In this experiment, we used a cylindrical panoramic figure, we show the intensity changes at the boundary regions model [53] with simple alpha and linear blending methods [7]. between cameras in the close-up views and also visualize the Additionally, to accurately match each pair of images, we effect of exposure balancing by plotting the intensity profiles initialized the camera poses using pre-calibrated extrinsic along the boundaries of the rectangle (orange colored lines) parameters. Then, we optimized the initial camera poses by in the surround-view images. The of the boundaries in minimizing the re-projection error of the matched feature the close-up views correspond to the pixel locations with the points in an overlapped region of each image pair. We used same color points in the surround-view images, respectively. the feature matching framework in [54] with the Harris corner As shown in (a) and (b), the surround-view images processed detector [55] and BRIEF feature descriptor [56]. The patches by the AE and Ours-single methods have noticeably large of corresponding feature points were used to compute the intensity transitions at the boundary regions of two neigh- relative brightness ratios rij in Eq. (5). boring images; these are caused by the independent exposure Figure 20 shows the results of panoramic . control of each camera. In this condition, it is difficult to The panoramic images with simple alpha blending clearly achieve constant brightness across all the cameras. In contrast, show the differences between those resulting from exposure the Ours-multi method results in relatively small intensity calculated by Ours-multi and those calculated by the conven- transitions by virtue of our exposure balancing method. tional AE method. The strong sunlight behind the building In Figure 18-(c), the un-warped images of Ours-multi are causes a wide dynamic range in the scene; therefore, the over-saturated except in the ground regions. Note that we images obtained by the AE method have large brightness adjust camera exposure levels by considering only the ground differences, and many informative parts of the foreground regions because surround-view imaging is only concerned with region are under-saturated. Such wide difference can cause the the ground portions of the images (as shown in the surround- following computer vision algorithms, which mainly operate IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 12

Original images Original images

Alpha blending Alpha blending AE

Linear blending Linear blending Ours-multi (a) Scene1 (a) AE (b) Ours-multi

Fig. 20: Examples of panoramic images with alpha and linear AE blending. The top images of both (a) and (b) were generated by alpha blending (α = 0.5), and the bottom images were generated by linear blending. Ours-multi on foreground objects, to fail. In contrast, the results produced (b) Scene2 by Ours-multi show small intensity gaps in the overlapped regions and preserve the important information in the fore- AE ground regions well. By smoothing the brightness differences between the overlapped regions, the linear blending operation causes both results to be visually pleasing, but it is unable to

recover information already lost in the foreground region as Ours-multi shown in the close-up views in Figure 20 (a). (c) Scene3 4) Stereo matching application: In the previous two ex- Rectified left Rectified right Block [57] SGM [58] periments, we compared differences among exposure control Fig. 21: Qualitative comparison of estimated disparity maps methods by showing the results in the color image domain. from two exposure methods: AE and Ours-multi. The images In this experiment, we applied stereo matching algorithms to of the first two columns are rectified stereo image pairs, and verify the effectiveness of the Ours-multi method in the gradi- the third and fourth columns show the estimated disparity ent domain. Following the panoramic imaging experiment, we maps using Block [57] and SGM [58], respectively. used the feature matching framework in [54] and computed the relative brightness ratios rij in Eq. (5) using corresponding on outdoor mobile platforms. In outdoor environments, scene patches. For this experiment, we applied two representative radiance often has a much wider dynamic range than the stereo matching methods: area-correction stereo (Block, [57]) possible range of cameras. The conventional camera expo- and semi-global matching (SGM, [58]). sure calculation methods fail to capture well-exposed images, Figure 21 shows the results of stereo matching. The block- which degrades the performance degradation of subsequent matching method computes disparity by comparing the sum image-processing algorithms. To solve this problem, our auto- of absolute differences (SAD) in a local area without any pre- exposure method adjusts the camera exposure to maximize or post-processing and regularization. This approach allows the gradient information with respect to the current scene; us to directly compare the quality of images that affect the therefore, our method is able to determine a more appropriate patch-matching performance. In the images obtained by the exposure that is robust to severe illumination changes. More- AE method, much of the visual information in the foreground over, our proposed exposure balancing method is extended to regions is missing because of the strong illumination behind multi-camera systems. the foreground. In particular, it is difficult to recognize where and what the pedestrian in (b) and the bush in (c) are in the We evaluated our methods through extensive outdoor exper- block matching results. In contrast, Ours-multi captures the iments with off-the-shelf machine vision cameras using a vari- important image features in the foreground regions; conse- ety of image processing and computer vision applications. We quently, its disparity map quality is better than that of the AE. believe that our method constitutes an alternative or additive SGM improves the quality of the disparity maps for both solution to conventional auto-exposure methods, especially for the AE and Ours-multi methods by virtue of its spatial outdoor computer vision systems used for purposes such as regularization. However, a visible gap still exists between the surveillance monitoring and autonomous mobile platforms. AE and Ours-multi methods even after optimization. Ours- We close this paper with a further discussion of our method multi presents clearer and more dense disparity maps than does and its limitations as follows. the AE method. This result further highlights the importance of Discussion and Limitation Compared to the conventional image quality at the image-capturing stage; difficulties during AE methods, we have shown that the proposed method is image capture may cause subsequent algorithms to fail. especially beneficial when capturing scenes that involve dark foreground regions and strong back-lighting are involved (i.e., scenes with wide dynamic range). Note that this does not VI.DISCUSSIONAND CONCLUSIONS suggest that our method is a complete alternative solution In this paper, we presented a novel auto-exposure method to the conventional AE and ME methods, which have been that is designed specifically for gradient-based vision systems used in enormous numbers of situations. When a scene has IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 13 mild light conditions, conventional auto-exposure methods are [2] M. Muramatsu, “ device for a camera,” Jan. 7 1997, uS Patent already a good choice. In this regard, the value of the proposed 5,592,256. [3] B. K. Johnson, “Photographic exposure control system and method,” method is that it provides a strong alternative approach that Jan. 3 1984, uS Patent 4,423,936. can be beneficially selected and used depending on the user [4] N. Sampat, S. Venkataraman, T. Yeh, and R. L. Kremens, “System im- scenario, such as in outdoor computer vision systems. plications of implementing auto-exposure on consumer digital cameras,” in Proceedings of the SPIE Electronics Imaging Conference, 1999. In several of the figures, some of the images obtained [5] C. Poynton, Digital video and HD: Algorithms and Interfaces. Elsevier, by our method appear brighter than the images obtained by 2012. AE. This occurs because AE adjusts camera exposure by [6] I. Shim, J.-Y. Lee, and I. S. Kweon, “Auto-adjusting camera exposure for outdoor robotics using gradient information,” in IEEE/RSJ International directly measuring brightness values, while our method adjusts Conference on Intelligent Robots and Systems (IROS), 2014. it only by maximizing the gradient information. Hence, this [7] R. Szeliski, Computer vision: algorithms and applications. Springer difference in brightness is not an issue from the perspective of Science & Business Media, 2010. [8] M. D. Tocci, C. Kiser, N. Tocci, and P. Sen, “A versatile hdr video computer vision applications as long as the exposure captures production system,” ACM Transactions on Graphics (TOG), 2011. the dynamic range of the foreground well and the brightness [9] M. Schoberl, A. Belz, J. Seiler, S. Foessel, and A. Kaup, “High consistency between multi-view images is preserved. dynamic range video by spatially non-regular optical filtering,” in IEEE International Conference on Image Processing (ICIP), 2012. When a scene has different objects with different textures, [10] S. K. Nayar and T. Mitsunaga, “High dynamic range imaging: Spatially our method may overlook the parts of objects whose texture varying pixel exposures,” in IEEE Conference on Computer Vision and gradient is relatively lower than that of others. While this Pattern Recognition (CVPR), 2000. [11] C. Zhou and S. K. Nayar, “Computational cameras: Convergence of results from a reasonable decision by the algorithm to maxi- and processing,” IEEE Transactions on Image Processing (TIP), mize the holistic gradient information in an image, it can lead vol. 20, no. 12, 2011. to some failure cases. This limitation might be addressed by [12] T. Kuno, H. Sugiura, and N. Matoba, “A new automatic exposure system for digital still cameras,” IEEE Transactions on Consumer Electronics, considering additional semantic information in scenes, which vol. 44, no. 1, pp. 192–199, 1998. will be interesting future work. [13] H. Lu, H. Zhang, S. Yang, and Z. Zheng, “Camera parameters auto- adjusting technique for robust robot vision,” in IEEE International In the panoramic imaging and stereo matching applications Conference on Robotics and Automation (ICRA), 2010. of multi-camera systems, Ours-multi requires that the FoVs of [14] N. Nourani-Vatani and J. Roberts, “Automatic camera exposure control,” cameras overlap to create matching corresponding patches. In in Australasian Conference on Robotics and Automation, 2007. [15] M. Montalvo, J. M. Guerrero, J. Romeo, M. Guijarro, M. Jesus,´ and addition, our method depends on the performance of the patch G. Pajares, “Acquisition of agronomic images with sufficient quality by matching algorithms, which typically exploit the consistency automatic exposure time control and histogram matching,” in Advanced of patch brightness. Therefore, we can adjust the balance Concepts for Intelligent Vision Systems, 2013. [16] M. Murakami and N. Honda, “An exposure control system of video parameter α to encourage brightness consistency between cameras based on fuzzy logic using color information,” in IEEE the images to improve the matching performance. In this paper, Fifth International Conference on Fuzzy Systems, vol. 3, 1996. we showed that this approach leads to more stable matching [17] J.-S. Lee, Y.-Y. Jung, B.-S. Kim, and S.-J. Ko, “An advanced video camera system with robust af, ae, and awb control,” IEEE Transactions than does independent per-camera AE. Another approach is on Consumer Electronics, vol. 47, no. 3, 2001. to use an advanced intensity-invariant feature; however, doing [18] W.-C. Kao, “Real-time image fusion and adaptive exposure control for so is out of the scope of this work. Note that the problem of smart surveillance systems,” Electronics Letters, vol. 43, no. 18, 2007. [19] A. J. Neves, B. Cunha, A. J. Pinho, and I. Pinheiro, “Autonomous patch-matching is not an issue when the matching patches can configuration of parameters in robotic digital cameras,” in Pattern be obtained through pre-calibration, e.g., the surround-view Recognition and Image Analysis, 2009. imaging case. [20] Z. Zhang, C. Forster, and D. Scaramuzza, “Active exposure control for robust visual odometry in hdr environments,” in 2017 IEEE International Along the same lines, the gradient metric approach used Conference on Robotics and Automation (ICRA), 2017. here may not always be optimal because it is hand-designed. [21] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance A learning-based metric is another promising future direction. maps from ,” in ACM Computer Graphics, Proceedings Annual Conference Series, vol. 1997, 1997. Computational cost is a crucial issue in camera exposure [22] C. Lee and E. Lam, “Computationally efficient truncated nuclear norm control. Most computation is conducted through pixel-by-pixel minimization for high dynamic range imaging,” IEEE Transactions on operations of independent processes such as gradient informa- Image Processing (TIP), vol. 25, no. 9, 2015. [23] S. Nuske, J. Roberts, and G. Wyeth, “Extending the dynamic range tion metric computation, gamma-mapped image generation, of robotic vision,” in IEEE International Conference on Robotics and and downsampling—all of which parallelizable. However, a Automation (ICRA), 2006. memory bottleneck occurs when generating various gamma- [24] N. Barakat, A. N. Hone, and T. E. Darcie, “Minimal-bracketing sets for high-dynamic-range image capture,” IEEE Transactions on Image mapped images. As shown in Fig. 4-(b,c), because we can use Processing (TIP), vol. 17, no. 10, 2008. lower-resolution images to facilitate faster computation that [25] P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman, uses less memory without incurring a significant loss in accu- and E. Shechtman, “Robust patch-based hdr reconstruction of dynamic scenes,” ACM Transactions on Graphics (SIGGRAPH Asia), 2012. racy, the memory bottleneck can be adjusted accordingly. This [26] J. Hu, O. Gallo, K. Pulli, and X. Sun, “Hdr deghosting: How to deal feature of our method would be useful if it were implemented with saturation?” in IEEE Conference on Computer Vision and Pattern as an embedded system. Recognition (CVPR), 2013. [27] T.-H. Oh, J.-Y. Lee, Y.-W. Tai, and I. S. Kweon, “Robust high dynamic range imaging by rank minimization,” IEEE Transactions on Pattern REFERENCES Analysis and Machine Intelligence (TPAMI), 2015. [28] T.-H. Oh, Y.-W. Tai, J.-C. Bazin, H. Kim, and I. S. Kweon, “Partial [1] I. Shim, J. Choi, S. Shin, T.-H. Oh, U. Lee, B. Ahn, D.-G. Choi, D. H. sum minimization of singular values in Robust PCA: Algorithm and Shim, and I. S. Kweon, “An autonomous driving system for unknown applications,” IEEE Transactions on Pattern Analysis and Machine environments using a unified map,” IEEE Transactions on Intelligent Intelligence (TPAMI), vol. 38, no. 4, pp. 744–758, 2016. Transportation Systems, vol. 16, no. 4, pp. 1999–2013, 2015. [29] S. Hrabar, P. Corke, and M. Bosse, “High dynamic range stereo vision IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 14

for outdoor mobile robotics,” in IEEE International Conference on Robotics Research. Springer, 1998. Robotics and Automation (ICRA), 2009. [58] H. Hirschmuller,¨ “Accurate and efficient stereo processing by semi- [30] Y. S. Heo, K. M. Lee, and S. U. Lee, “Robust stereo matching using global matching and mutual information,” in IEEE Conference on adaptive normalized cross-correlation,” IEEE Transactions on Pattern Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005. Analysis and Machine Intelligence (TPAMI), vol. 33, no. 4, 2011. [31] S. J. Kim and M. Pollefeys, “Robust radiometric calibration and Inwook Shim received the B.S degree in Computer correction,” IEEE Transactions on Pattern Analysis and Science from Hanyang University in 2009, the M.S Machine Intelligence (TPAMI), vol. 30, no. 4, 2008. degree in Robotics Program from KAIST in 2011, [32] S. J. Kim, J.-M. Frahm, and M. Pollefeys, “Joint feature tracking and and Ph.D degrees in Division of Future Vehicle from radiometric calibration from auto-exposure video,” in IEEE International KAIST in 2017. He is currently a research scientist Conference on Computer Vision (ICCV), 2007. at the Agency for Defense Development. He was a [33] M. Pollefeys, D. Nister,´ J.-M. Frahm, A. Akbarzadeh, P. Mordohai, member of ”Team KAIST” that won the first place B. Clipp, C. Engels, D. Gallup, S.-J. Kim, P. Merrell, C. Salmi, S. N. at DARPA Robotics Challenge Finals in 2015. He Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius,´ R. Yang, G. Welch, was a recipient of Qualcomm Innovation Award, NI and H. Towles, “Detailed real-time urban 3d reconstruction from video,” finalist of NI Student Design Showcase, and received International Journal of Computer Vision (IJCV), vol. 78, no. 2-3, 2008. the KAIST achievement award of Robotics, and [34] D. G. Lowe, “Object recognition from local scale-invariant features,” in Creativity and Challenge award from KAIST. His research interests include IEEE International Conference on Computer Vision (ICCV), 1999. robot vision for autonomous systems and deep learning. [35] N. Dalal and B. Triggs, “Histograms of oriented gradients for human Tae-Hyun Oh is a postdoctoral associate at detection,” in IEEE Conference on Computer Vision and Pattern Recog- MIT/CSAIL. He received the BE degree (The high- nition (CVPR), vol. 1, 2005. est ranking) in Computer Engineering from Kwang- [36] C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba, “HOGgles: Woon University, South Korea in 2010, and the Visualizing Object Detection Features,” IEEE International Conference MS and PhD degrees in Electrical Engineering from on Computer Vision (ICCV), 2013. KAIST, South Korea in 2012 and 2017, respectively. [37] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper- He was a visiting student in the Visual Comput- laplacian priors,” in Advances in Neural Information Processing Systems ing group, Microsoft Research, Beijing and in the (NIPS), 2009. Cognitive group, Microsoft Research, Redmond in [38] A. Hyvarinen,¨ J. Hurri, and P. O. Hoyer, Natural image statistics: a 2014 and 2016, respectively. He was a recipient of probabilistic approach to early computational vision. Springer. Microsoft Research Asia fellowship, a prize of [39] I. Sobel and G. Feldman, “A 3x3 isotropic gradient operator for image Samsung HumanTech thesis award, Qualcomm Innovation award and a top processing,” a talk at the Stanford Artificial Project in, 1968. research achievement award from KAIST. [40] S. J. Kim, H. T. Lin, Z. Lu, S. Susstrunk, S. Lin, and M. S. , “A new in-camera imaging model for color computer vision and its Joon-Young Lee received the B.S degree in Elec- application,” in IEEE Transactions on Pattern Analysis and Machine trical and Electronic Engineering from Yonsei Uni- Intelligence (TPAMI), 2012. versity, Korea in 2008. He received the M.S and [41] D. Valerio´ and J. S. da Costa, “Tuning of fractional pid controllers with Ph.D degrees in Electrical Engineering from KAIST, ziegler–nichols-type rules,” Signal Processing, vol. 86, no. 10, 2006. Korea in 2009 and 2015, respectively. He is currently [42] J. G. Ziegler and N. B. Nichols, “Optimum settings for automatic a research scientist at Adobe Research. His research controllers,” trans. ASME, vol. 64, no. 11, 1942. interests include deep learning, photometric image [43] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, analysis, image enhancement, and computational “Object detection with discriminatively trained part-based models,” photography. He is a member of the IEEE. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 9, 2010. [44] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and Jinwook Choi received the B.S. and Ph.D. degrees A. C. Berg, “SSD: Single shot multibox detector,” in ECCV, 2016. from the School of Electrical and Electronic Engi- [45] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruc- neering, Yonsei University, Seoul, Korea, in 2008 tion in real-time,” in Intelligent Vehicles Symposium, 2011. and 2014, respectively. He is currently a Senior [46] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser- Research Engineer with the ADAS Development man, “The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Team of Automotive Research and Development Results.” Division, Hyundai Motor Company, Korea. His re- [47] ——, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) search interests include 2D/3D image and video Results.” processing, computer vision, hybrid sensor systems, [48] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: and intelligent vehicle systems. An evaluation of the state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 4, 2012. Dong-Geol Choi received the B.S and M.S degree [49] A. Kelly, “Linearized error propagation in odometry,” The International in Electric Engineering and Computer Science from Journal of Robotics Research, vol. 23, no. 2, 2004. Hanyang University in 2005 and 2007, respectively, [50] Y. Bok, Y. Jeong, D.-G. Choi, and I. S. Kweon, “Capturing village-level and the Ph.D degrees in the Robotics Program from heritages with a hand-held camera- fusion sensor,” International KAIST in 2016. He is currently a post-doctoral Journal of Computer Vision, vol. 94, no. 1, 2011. researcher at the Information & Electronics Research [51] Z. Zhang, “Flexible camera calibration by viewing a plane from un- Institute, in KAIST. His research interests include known orientations,” in IEEE International Conference on Computer sensor fusion, autonomous robotics, and artificial Vision (ICCV), vol. 1, 1999. intelligence issues. Dr.Choi received a fellowship [52] J.-Y. Lee, K. Sunkavalli, Z. Lin, X. Shenand, and I. S. Kweon, “Auto- award from Qualcomme Korea R&D Center in 2013. matic content-aware color and tone stylization,” in IEEE Conference on He was a member of ”Team KIAST,” which won the Computer Vision and Pattern Recognition (CVPR), 2016. first place in DARPA Robotics Challenge Finals 2015. [53] R. Szeliski and H.-Y. Shum, “Creating full view panoramic image mosaics and environment maps,” in In Proceedings of SIGGRAPH 97, In So Kweon received the B.S. and M.S. degrees Annual Conference Series, 1997. in Mechanical Design and Production Engineering [54] M. Muja and D. G. Lowe, “Fast matching of binary features,” in IEEE from Seoul National University, South Korea, in International Conference on Computer and Robot Vision (CRV), 2012. 1981 and 1983, respectively, and the Ph.D. degree [55] C. Harris and M. Stephens, “A combined corner and edge detector.” in in Robotics from the Robotics Institute, Carnegie Alvey vision conference, vol. 15, 1988. Mellon University, USA, in 1990. He worked for [56] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust the Toshiba R&D Center, Japan, and he is now independent elementary features,” European Conference on Computer a KEPCO chair professor with the Department of Vision (ECCV), 2010. Electrical Engineering, since 1992. He served as a [57] K. Konolige, “Small vision systems: Hardware and implementation,” in program co-chair for ACCV 07’ and a general chair for ACCV 12’. He is on the honorary board of IJCV. He was a member of ‘Team KAIST’ which won the first place in DARPA Robotics Challenge Finals 2015. He is a member of the IEEE and the KROS.