Estimating a Small Signal in the Presence of Large

Amy Zhao Fredo´ Durand John Guttag [email protected] [email protected] [email protected]

MIT Computer Science and Artificial Intelligence Laboratory

Abstract to or lower than the noise level in the . It is important to understand the signal and noise characteristics in order Video magnification techniques are useful for visualiz- to produce accurate and informative visualizations. In this ing small changes in . For instance, Eulerian video work, we discuss how the signal is affected by the noise magnification has been used to visualize the flow of blood that is introduced by the acquisition and the compression in the human face. Such visualizations have possible ap- processes. We demonstrate a preliminary algorithm for es- plications in remote monitoring or screening for diseases. timating the amplitude of a small signal in the presence of However, when visualizing blood flow, the signal of interest relatively large noise. may be similar in amplitude to the noise in the video. This raises the question of what one is actually seeing in a mag- 2. Related work nified video: signal or noise? We seek to understand these signal and noise characteristics with the goal of producing Noise Modern images and image sequences are typically informative and accurate visualizations. We present a pre- affected by noise introduced by the acquisition process, the liminary algorithm for estimating the signal amplitude in compression process, or other factors such as motion [15]. the presence of relatively high noise. We demonstrate that CCD noise models usually assume that the acquisition noise the algorithm can be used to accurately estimate the sig- level is dependent on the uncorrupted intensity, but is nal amplitude in an uncompressed simulated video, but is independent between . The relationship between the susceptible to compression noise and motion. noise level and pixel intensity is described by a noise level function (NLF), which is dependent on the natural be- havior of photons, the properties of the camera and some recording parameters [14, 12]. 1. Introduction Compression noise can introduce additional artifacts to Video magnification can be used to visualize minute an image or video. Many modern image and video compres- changes in videos that would normally not be visible. Stud- sion standards apply a transform (commonly the discrete ies have indicated that there is a signal associated with blood cosine transform, or DCT) in a block-wise fashion to the flow in the skin; this signal is undetectable by the naked image and then quantize the transform coefficients [26, 27]. eye, but can be measured using video processing techniques Images and videos compressed in this fashion typically ex- es and visualized using video magnification. Signals relat- hibit “blocking” artifacts. ing to blood flow (e.g., heart rate) have been obtained from videos of human faces by applying temporal and spatial fil- Noise and signal estimation There is a large body of tering [25] or blind source separation [18] to the intensity literature regarding noise estimation and denoising tech- signal, or by tracking the motion of the head [4]. Tech- niques for images and videos. The NLF of a CCD cam- niques for examining these signals have potential applica- era is easy to estimate from a video or multiple images of tions in remote monitoring or screening for diseases relating a static scene [8, 2, 12]. In the image processing litera- to blood flow. ture, the signal of interest is typically defined as the uncor- Wu et al. showed that Eulerian linear video magnifica- rupted image; denoising aims to recover this image from a tion can be used to visualize the blood flow signal in hu- noisy image. A large class of image denoising techniques man faces [28]. In the context of visualizing blood flow, rely on the concept of averaging to reduce noise levels, e.g., the signal of interest (that is, the intensity changes reflect- through Gaussian smoothing or anisotropic filtering [17]. ing the flow of blood) can have an amplitude that is similar There are also deblocking techniques that remove visible

432120 block-based compression artifacts by detecting and smooth- we can apply a small spatial averaging filter without signifi- ing block boundaries [23, 22, 3]. cantly altering the signal characteristics; and (2) has a small Several key differences make it difficult to directly ap- enough amplitude as to not affect the intensity-dependent ply any of these noise estimation and denoising techniques noise level. Earlier, we discussed the difficulty of model- to video magnification. Firstly, most existing denoising al- ing temporally correlated noise terms such as motion. We gorithms focus on applications with relatively high input assume for now that the subject of the video has been me- SNR’s, and may not be accurate enough to estimate a low chanically stabilized such that the effect of motion is negli- amplitude signal. Typically, video denoising algorithms are gible. evaluated on image sequences with peak signal-to-noise ra- In the image and video denoising literatures, the acqui- tio (PSNR) values in the 10-30 dB range, and obtain im- sition noise is commonly assumed to be additive zero-mean provements in PSNR on the order of 0-10 dB [6, 19]. In with a variance that is proportional to the contrast, in order to visualize blood flow in the human face luminance of the scene [24]. We assume in our model that using Eulerian video magnification, a magnification factor each video frame is affected by such acquisition noise, as of 100 has been reported [28]; this suggests that the ampli- well as by JPEG compression. tude of the color variation in the input video is extremely We further assume that the video scene is smooth, so low relative to the image intensity, and that the SNR is fre- that pixels within small neighborhoods exhibit similar noise quently much lower than 10 dB. levels. We can enforce this assumption in real videos by Secondly, video magnification is affected by noise that excluding parts of the scene that contain high gradients. may be hard to model and/or correlated with the signal. For instance, when visualizing blood flow, the color change sig- 3.2. Signal and Noise Model nal is commonly corrupted by a motion signal with a simi- We model an uncompressed video frame as follows: lar temporal signature. Many state-of-the art video denois- ing techniques are capable of removing the types of noise I = I0 + φ + nacq, (1) commonly introduced by imaging devices (e.g., additive, multiplicative or structural noise [5, 13]), but may not be where I0 denotes the mean observed image intensity, φ applicable to more complex noise models. the signal of interest and nacq the zero-mean Gaussian noise that is introduced during the acquisition process. These 3. Signal Estimation for Video Magnification terms are assumed to be independent. Similarly, we model a compressed video frame as: We discuss the signal and noise in the context of visualiz- ing blood flow using the Eulerian linear video magnification Ic = I0 + φ + nacq , (2) technique described in [28]. Specifically, we focus on the Q{ } problem of estimating the amplitude of a color change sig- where represents the quantization operator that is nal in a video that has been corrupted by acquisition noise applied duringQ{·} the JPEG compression process [26]. of a similar amplitude, as well as compression noise in- In JPEG encoding, the image is first divided into non- troduced by the Motion JPEG (MJPEG) compression pro- overlapping 8 8 blocks. The discrete cosine transform cess. The MJPEG scheme is less commonly used than other (DCT) of each× block is computed according to the formula: video compression schemes such as H.264/AVC or MPEG- 4. However, we choose to study it because it compresses 7 7 1 each frame independently according to the JPEG compres- H(k,l)= C(k)C(l) I(x,y) 4 · sion scheme, and thus does not introduce any inter-frame xX=0 Xy=0 correlations. (2x + 1)kπ (2y + 1)lπ cos cos , In [25], the green channel was observed to have the  16   16  strongest signal pertaining to blood flow. We focus on the 1 Y or luminance channel, as it contains much of the green where C(k),C(l)= for k,l = 0, √ channel information, and is not subjected to the subsam- 2 pling process that is applied to the chroma channels during C(k),C(l) = 1 otherwise. MJPEG encoding [26]. H(k,l) describes the DCT coefficient value at location 3.1. Assumptions k,l. Each DCT coefficient is then quantized using an application-defined quantization table Q as follows: We assume that the input video depicts a stationary scene containing a small color change signal. We assume that the H(k,l) HQ(k,l)= Q(k,l) round . signal: (1) is smooth and varies slowly in space, so that ·  Q(k,l) 

21 The quantization operation is lossy and may be modeled In our algorithm, we choose several box filter sizes and 2 differently depending on the image content. DCT coeffi- measure the variance σ for each mi. We then extrapolate Imi σ2 cients that are smaller than the corresponding quantization 2 1 nacq a linear fit of σ against 2 to the point at which 2 = step are quantized to zero and result in a reduction in image Im m m 0 and σ2 = σ2 ; the intercept of the linear fit is an estimate energy, while larger DCT coefficients contribute noise that Im φ may be modeled as additive random noise [9, 21]. of the signal variance. We assumed earlier that the video scene and the signal The compression process introduces complications that are smooth in space. Under the additional assumption that require some adjustments to the above algorithm. As we discussed earlier, L sets the high-frequency DCT coef- the noise level is small relative to the quantization step size, Q {·} each frame contains little energy in the DCT components ficients of nacq to 0. Applying small spatial filters have little corresponding to high spatial frequencies. In the JPEG stan- effect on the variance of this compressed noise term since dard, the luminance quantization table applies more aggres- the high spatial frequency energy that would have been re- sive quantization to high-frequency DCT components [1]. moved by the averaging filter is already zero; we confirmed This that high-frequency DCT components are typ- this effect in simulations. Using a larger filter size affects ically quantized to 0, while the low frequency components the energy in the remaining spatial frequencies. In simula- tions, we found that for m 8, contribute quantization noise. We observed this effect in ≥ simulated videos compressed using moderate compression levels. We model this as follows: Var M (φ+nQ+ L nacq ) Var M (φ+nQ+nacq) . ∗ Q { } ≈ ∗     I = L I0 + φ + nacq + nQ, The resulting variance over time of the filtered image, Q { } σ2 , is approximated by: where L represents the spatial lowpass filtering ef- Ic,m fect of applyingQ {·} quantization to small DCT coefficients, and σ2 σ2 nQ represents the quantization noise that is added. Under 2 2 nQ + nacq σI = σφ + . (5) the assumption that the mean image I0 and the signal φ con- c,m m2 tain little energy in the high-frequency DCT components, This model assumes that the effect of L can be ig- we may rewrite this as: Q {·} nored for m 8, that nacq and nQ are spatially uncorre- ≥ lated, and that φ, nacq and nQ are independent. We discuss I = I0 + φ + nQ + L nacq . (3) Q { } these limitations later. 3.3. Signal Estimation Algorithm Similarly to the uncompressed case, we can apply sev- eral filters of varying sizes, and estimate a linear fit of σ2 Ic,m Local spatial averaging is used in many image denois- 1 against m2 . The intercept of the fit can be used as an esti- ing algorithms to reduce noise levels [7]. We present an 2 algorithm that examines the effect of spatial averaging on mate for σφ. the acquisition noise variance in a video, and extrapolates this relationship to estimate the theoretical variance of the 4. Experimental Results video when the acquisition noise variance has been reduced We demonstrate that we can use our algorithm to esti- to zero; the variance of the video should then be comprised mate the variance of a simple sinusoidal signal in the pres- of the signal variance. ence of acquisition noise. We first simulated two videos Let us first consider the case of an uncompressed video. of a stationary color calibration target. One video contains Under the assumption that the signal varies slowly in space, no signal (Fig. 1b); the other has a small sinusoidal signal we may apply a spatial averaging filter to reduce the stan- added to each pixel (Fig. 1c). We added zero-mean Gaus- dard deviation of Gaussian noise without significantly af- sian acquisition noise with a linear NLF to both videos (Fig. fecting the signal. An averaging filter (or, box filter) M of 1a). The videos were then compressed using various levels size m m attenuates the variance of zero-mean Gaussian × 1 of compression. We are able to estimate the signal variance noise by a factor of m2 [16]. From Eq. 2, we may write the fairly accurately from the uncompressed videos, as well as variance over time of the filtered frame Im as: the videos compressed with a high quality factor (QF; a higher value indicates less aggressive compression). Our Var Im = Var M (I0 + φ + nacq) , or t ∗ t predicted signal suffers from a large degree of error when a    σ2  2 2 nacq lower compression quality setting is used. Fig. 1c indicates σI = σφ + 2 . (4) m m that even for a high compression quality video, our estimate This relationship is valid under the assumption that each becomes inaccurate for pixels with high noise levels. of the terms I0, φ, nacq are independent, and that nacq is We applied our signal estimation algorithm to several spatially uncorrelated. real videos captured with a Panasonic Lumix DMC-G2 in

22 (a) Simulated video frame (top) (b) Mean (over space) estimated signal variance (c) Mean (over space) estimated signal variance and simulated NLF (bottom). per mean (over time) luminance, from simulated per mean (over time) luminance, from simulated videos containing no signal. videos containing a spatially invariant, temporally varying sine wave with RMS =0.01.

Figure 1: Estimated signal variance from a simulated color calibration target. To assess the effects of compression, we saved the videos using MJPEG encoding and specified the video quality factor (QF).

MJPEG format. The frames were converted to linear RGB introduced by the compression process, or the dependence space and then to YCbCr space. Fig. 2 shows our results for of the quantization noise on the image content [21]. The a stationary color calibration target, which should contain effect of compression might be described more accurately no signal. Our signal estimates in Fig. 2b contain a similar using existing models. Quantization noise in the DCT do- error to the compression noise observed in our simulation main has been modeled as uniformly distributed; it has also results (although the error in our simulations appears more been modeled based on a Laplacian model for DCT coeffi- discontinuous, likely due to the more discretized pixel in- cients [21]. Some authors have proposed the use of differ- tensity values in the simulation). We use the maximum of ent distributions depending on the quantized image content this error to conservatively estimate the noise floor of our [9, 21]. Further studies on the effect of quantization noise signal estimates in the subsequent videos. We took videos should also be conducted by capturing uncompressed real of a human hand and human feet under the same lighting videos. conditions as in Fig. 2. We used a noise floor estimate of −5. Fig. 3 show that the regions of the hand supplied 1 10 Motion Motion is often observed to be a confounding fac- most× directly by arterial flow [11] exhibit a signal with an tor when attempting to visualize blood flow using video amplitude that is above the noise floor. The feet (Fig. 4) ap- magnification. Because it often has a similar temporal sig- pear to exhibit a strong signal everywhere; however, when nature to the blood flow signal, it cannot be removed from processing this video using Eulerian linear video magnifi- the video via simple temporal filtering. Figs. 3 and 4 im- cation, it was difficult to visualize a signal at the subject’s ply that our signal estimate may have been corrupted by the heart rate. It is likely that this high signal estimate is actu- motion of specularities in the video. It may be possible to ally caused by motion rather than blood flow in the feet, as physically stabilize the video subject using mechanical re- some motion can be observed in the video. In both videos, straints. Alternatively, future work can examine the effect the signal estimate appears to be correlated with the mean of motion on our model and our estimates. For instance, it luminance (Figs. 3b and 4b); this indicates that our estimate may be possible to use motion estimation techniques such may be affected by the motion of specularities in the videos. as optical flow to estimate the amount of variance in each pixel that is caused by motion. 5. Limitations Compression noise Figs. 1 and 2 indicate that the ac- Camera processes There are camera processes other than curacy of our signal estimates is affected by the degree of compression that may affect the accuracy of our model. For compression (and thus quantization) applied to the video. example, many demosaicing strategies perform spatial in- We roughly estimated the quantization noise level from a terpolation [20] that introduces spatial correlations. Model- video of a static scene with no signal, using a simplistic ing these correlations requires the consideration of covari- model of the compression process (Equations 3 and 5). Our ance terms in Equations 4 and 5. The quantization process model does not account for the spatial correlations that are that occurs when the image is exported from the camera

23 (a) Estimated signal variance per (b) Mean (over space) estimated signal pixel. variance per mean (over time) luminance.

Figure 2: Estimated signal variance for a stationary color calibration target. The upper right region of the card was excluded from the visualization as it appeared to have extremely high noise levels that interfere with the visualization scale.

(a) Estimated signal variance (left) and estimated signal that is above (b) Mean (over space) estimated signal variance per mean the estimated noise floor of 1 × 10−5 (right). (over time) luminance.

Figure 3: A healthy female human hand. Video was taken under similar conditions to Fig. 2.

(typically to 8-16 bits per channel) introduces additional References quantization noise [10]. The accuracy of our model and [1] Digital compression and coding of continuoustone still im- camera can likely be improved by accounting for these ef- ages, part 1, requirements and guidelines. ISO/IEC JTC1 fects. Draft International Standard 10918-1, 1991. [2] A. Amer and E. Dubois. Fast and reliable structure-oriented video noise estimation. Circuits and Systems for Video Tech- 6. Conclusion nology, IEEE Transactions on, 15(1):113–118, 2005. [3] A. Z. Averbuch, A. Schclar, and D. L. Donoho. Deblocking Video magnification can be used to produce informative of block-transform compressed images using weighted sums of symmetrically aligned pixels. Image Processing, IEEE visualizations of human blood flow from consumer-grade Transactions on, 14(2):200–212, 2005. videos. In these videos, the signal of interest may be very [4] G. Balakrishnan, F. Durand, and J. Guttag. Detecting pulse small compared to the noise level. We presented a model from head motions in video. In Computer Vision and Pat- and an algorithm for estimating the signal in this context. tern Recognition (CVPR), 2013 IEEE Conference on, pages We showed that our algorithm can be used to estimate the 3430–3437. IEEE, 2013. signal variance in a simulated video that is corrupted by [5] J. E. Bioucas-Dias and M. A. T. Figueiredo. Multiplica- zero-mean Gaussian noise and compressed at a high quality tive noise removal using variable splitting and constrained factor. However, we also demonstrated that our estimates optimization. Image Processing, IEEE Transactions on, are highly affected by aggressive compression as well as 19(7):1720–1730, 2010. motion. Further work is required to assess the performance [6] J. C. Brailean, R. P. Kleihorst, S. Efstratiadis, A. K. Kat- of our algorithm in the presence of compression and motion. saggelos, and R. L. Lagendijk. filters for dy-

24 (a) Estimated signal variance (left) and estimated signal that is above (b) Mean (over space) estimated signal variance per mean the estimated noise floor of 1 × 10−5 (right). (over time) luminance.

Figure 4: A healthy pair of male human feet. Video was taken under similar conditions to Fig. 2.

namic image sequences: a review. Proceedings of the IEEE, [18] M.-Z. Poh, D. J. McDuff, and R. W. Picard. Non-contact, 83(9):1272–1292, 1995. automated cardiac pulse measurements using video imaging [7] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm and blind source separation. Optics express, 18(10):10762– for image denoising. 2:60–65, 2005. 10774, 2010. [8] G. E. Healey and R. Kondepudy. Radiometric ccd camera [19] M. Protter and M. Elad. Image sequence denoising via calibration and noise estimation. Pattern Analysis and Ma- sparse and redundant representations. Image Processing, chine Intelligence, IEEE Transactions on, 16(3):267–276, IEEE Transactions on, 18(1):27–35, 2009. 1994. [20] R. Ramanath, W. E. Snyder, G. L. Bilbro, and W. A. Sander. [9] A. Ichigaya, M. Kurozumi, N. Hara, Y. Nishida, and Demosaicking methods for bayer color arrays. Journal of E. Nakasu. A method of estimating coding psnr using quan- Electronic imaging, 11(3):306–315, 2002. tized dct coefficients. Circuits and Systems for Video Tech- [21] M. A. Robertson and R. L. Stevenson. Dct quantization noise nology, IEEE Transactions on, 16(2):251–259, 2006. in compressed images. Circuits and Systems for Video Tech- [10] K. Irie, A. E. McKinnon, K. Unsworth, and I. M. Woodhead. nology, IEEE Transactions on, 15(1):27–38, 2005. A technique for evaluation of ccd video-camera noise. Cir- [22] S.-Y. Shih, C.-R. Chang, and Y.-L. Lin. A near optimal de- cuits and Systems for Video Technology, IEEE Transactions blocking filter for h. 264 advanced video coding. In De- on, 18(2):280–284, 2008. sign Automation, 2006. Asia and South Pacific Conference [11] D. B. Jenkins. Hollinshead’s functional anatomy of the limbs on, pages 6–pp. IEEE, 2006. and back. Elsevier Health Sciences, 2008. [23] S.-C. Tai, Y.-Y. Chen, and S.-F. Sheu. Deblocking filter for low bit rate mpeg-4 video. Circuits and Systems for Video [12] M. Kobayashi, T. Okabe, and Y. Sato. Detecting forgery Technology, IEEE Transactions on, 15(6):733–741, 2005. from static-scene video based on inconsistency in noise level functions. Information Forensics and Security, IEEE Trans- [24] Y. Tsin, V. Ramesh, and T. Kanade. Statistical calibration actions on, 5(4):883–892, 2010. of ccd imaging process. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference [13] C. Liu and W. T. Freeman. A high-quality video denoising on, volume 1, pages 480–487. IEEE, 2001. algorithm based on reliable motion estimation. In Computer Vision–ECCV 2010, pages 706–719. Springer, 2010. [25] W. Verkruysse, L. O. Svaasand, and J. S. Nelson. Remote plethysmographic imaging using ambient . Optics ex- [14] C. Liu, W. T. Freeman, R. Szeliski, and S. B. Kang. Noise press, 16(26):21434–21445, 2008. estimation from a single image. In Computer Vision and Pat- [26] G. K. Wallace. The jpeg still picture compression standard. tern Recognition, 2006 IEEE Computer Society Conference Communications of the ACM, 34(4):30–44, 1991. on, volume 1, pages 901–908. IEEE, 2006. [27] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra. [15] M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian. Video Overview of the h. 264/avc video coding standard. Circuits denoising, deblocking, and enhancement through separable and Systems for Video Technology, IEEE Transactions on, 4-d nonlocal spatiotemporal transforms. Image Processing, 13(7):560–576, 2003. IEEE Transactions on, 21(9):3952–3966, 2012. [28] H.-Y. Wu, M. Rubinstein, E. Shih, J. V. Guttag, F. Durand, [16] P. M. Narendra. A separable filter for image noise and W. T. Freeman. Eulerian video magnification for re- smoothing. Pattern Analysis and Machine Intelligence, IEEE vealing subtle changes in the world. ACM Trans. Graph., Transactions on, (1):20–29, 1981. 31(4):65, 2012. [17] P. Perona and J. Malik. Scale-space and edge detection using . Pattern Analysis and Machine Intelli- gence, IEEE Transactions on, 12(7):629–639, 1990.

25