Jitter-Camera: High Resolution Video from a Low Resolution Detector
Total Page:16
File Type:pdf, Size:1020Kb
Images in this paper are best viewed magnified or printed on a high resolution color printer. Jitter-Camera: High Resolution Video from a Low Resolution Detector Moshe Ben-Ezra, Assaf Zomet, and Shree K. Nayar Computer Science Department Columbia University New York, NY, 10027 E-mail: {moshe, zomet, nayar}@cs.columbia.edu Abstract Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. The resolution of videos can be computationally enhanced by moving the camera and applying super-resolution reconstruction algorithms. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super resolution. The conclusion is, that in order to achieve the highest resolution, motion blur should be avoided. Motion blur can be minimized by sampling the space-time volume of the video in a specific manner. We have developed a novel camera, called the ”jitter camera,” that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one. Keywords: Sensors; Jitter Camera; Jitter Video; Super Resolution; Motion Blur; 2 Time Tim e Space Space (a) (b) Figure 1: Conventional video cameras sample the continuous space-time volume at regular time inter- vals and fixed spatial grid locations as shown in (a). The space-time volume can be sampled differently, for example, by varying the location of the sampling grid as shown in (b) to increase the resolution of the video. A moving video only approximates (b) due to motion blur. 1. Why is High-Resolution Video Hard? Improving the spatial resolution of a video camera is different from doing so with a still camera. Merely increasing the number of pixels of the detector reduces the amount of light received by each pixel, and hence increases the noise. With still images, this can be overcome by prolonging the exposure time. In the case of video, however, the exposure time is limited by the desired frame-rate. The amount of light incident on the detector can also be increased by widening the aperture, but with a significant reduction of the depth of field. The spatial resolution of a video detector is therefore limited by the noise level of the detector, the frame-rate (temporal resolution) and the required depth of field1. Our purpose is to make a judicious use of a given detector, that will allow a substantial increase of the video resolution by a resolution-enhancement algorithm. Figure 1 shows a continuous space-time video volume. A slice of this volume at a given time instance corresponds to the image appearing on the image plane of the camera at this time. This volume is sampled both spatially and temporally, where each pixel integrates light over time and space. Conventional video cameras sample the volume in a simple way, as shown in Figure 1(a), with a regular 2D grid of pixels integrating over regular temporal intervals and at fixed spatial locations. An alternative sampling of the space- time volume is shown in Figure 1(b). The 2D grid of pixels integrates over the same temporal intervals, but at different spatial locations. Given a 2D image detector, how should we sample the space-time volume to 1The optical transfer function of the lens also imposes a limit on resolution. In this paper we ignore this limit as it is several orders of magnitudes above the current resolution of video. 3 obtain the highest spatial resolution2? There is a large body of work on resolution enhancement by varying spatial sampling, commonly known as super-resolution reconstruction [4, 5, 7, 9, 13, 17]. Super-resolution algorithms typically assume that a set of displaced images are given as input. With a video camera, this can be achieved by moving the camera while capturing the video. However, the camera’s motion introduces motion blur. This is a key point in this paper: in order to use super-resolution with a conventional video camera, the camera must move, but when the camera moves, it introduces motion blur which reduces resolution. It is well known that an accurate estimation of the motion blur parameters is non-trivial, and requires strong assumptions about the camera motion during integration [2,13,15,19]. In this paper, we show that even when an accurate estimate of the motion blur parameters is available, motion blur has a significant influence on the super-resolution result. We derive a theoretical lower bound, indicating that the expected performance of any super-resolution reconstruction algorithm deteriorates as a function of the motion blur magnitude. The conclusion is that, in order to achieve the highest resolution, motion blur should be avoided. To achieve this, we propose the “jitter camera,” a novel video camera that samples the space-time volume at different locations without introducing motion blur. This is done by instantaneously shifting the detector (e.g. CCD) between temporal integration periods, rather than continuously moving the entire video camera during the integration periods. We have built a jitter camera, and developed an adaptive super-resolution algorithm to handle complex scenes containing multiple moving objects. By applying the algorithm to the video produced by the jitter camera, we show that resolution can be enhanced significantly for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has higher resolution than the captured one. 2. How Bad is Motion Blur for Super-resolution? The influence of motion blur on super resolution is well understood when all input images undergo the same motion blur [1, 10]. It becomes more complex when the input images undergo different motion blurs, and details that appear blurred in one image, appear sharp in another image. We address the influence of motion blur for any combination of blur orientations. 2Increasing the temporal resolution [18] is not addressed in this paper. 4 12 10 8 6 4 Lower Bound 2 s(A), Monotone Function of the Volume Solutions 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Length of Motion Blur Trajectory (pixels) Figure 2: We measure the super-resolution “hardness” by the volume of plausible high-resolution so- lutions [1]. The volume of solutions is proportional to s(A)n2 where n2 is the high resolution image size. 3 The graphs show the value of s(A) as a function of the length of the motion-blur trajectories {kljk}j=0. We show a large number of graphs computed for different configurations of blur orientations. The thick graph (blue line) is the lower bound of s(A), for any combination of motion blur orientations. In all shown configurations, the motion blur has a significant influence on s(A) and hence on the volume of solutions. The increase in the volume of solutions can explain the increase in reconstruction error in super-resolution shown in Figure 3. Super-resolution algorithms estimate the high resolution image by modeling and inverting the imaging pro- cess. Analyzing the influence of motion blur requires a definition for super-resolution “hardness” or the “invertibility” of the imaging process. We use a linear model for the imaging process [1, 7, 9, 13], where the intensity of a pixel in the input image is presented as a linear combination of the intensities in the unknown high resolution image: ~y = A~x + ~z, (1) where ~x is a vectorization of the unknown discrete high resolution image, ~y is a vectorization of all the input images, and the imaging matrix A encapsulates the camera displacements, blur and decimation [7]. The random variable ~z represents the uncertainty in the measurements due to noise, quantization error and model inaccuracies. 5 Baker and Kanade [1] addressed the invertibility of the imaging process in a noise-free scenario, where ~z represents the quantization error. In this case, each quantized input pixel defines two inequality constraints on the super-resolution solution. The combination of constraints forms a volume of solutions that satisfy all quantization constraints. Baker and Kanade suggest to use the volume of solutions as a measure of uncertainty in the super resolution solution. Their paper [1] shows the benefits in measuring the volume of solutions over the standard matrix conditioning analysis. We measure the influence of motion blur by the volume of solutions. To keep the analysis simple, the following assumptions are made. First, the motion blur in each input image is induced by a constant velocity motion. Different input images may have different motion blur orientations. Second, the optical blur is shift- invariant. Third, the input images are related geometrically by a 2D translation. Fourth, the number of input pixels equals the number of output pixels. Under the last assumption, the dimensionality n2 of ~x equals the dimensionality of ~y. Since the uncertainty due to quantization is an n2-dimensional unit cube, the volume of solutions for a given imaging matrix A can be computed from the absolute value of its determinant 1 vol(A) = . (2) |A| In Appendix A, we derive a simplified expression for |A| as a function of the imaging parameters.