Asymmetric Bilateral Phase Correlation for Optical Flow Estimation in the

Vasileios Argyriou Kingston University London London, UK [email protected]

Abstract—We address the problem of motion estimation in techniques [19] that utilize probability smoothness constraints images operating in the frequency domain. A method is presented or Markov Random Fields (MRFs) over the entire image, which extends phase correlation to handle multiple motions focusing on finding the Maximum a Posteriori (MAP) solution. present in an area. Our scheme is based on a novel Bilateral- Phase Correlation (BLPC) technique that incorporates the con- These methods are also combined with multi-scale schemes cept and principles of Bilateral Filters retaining the motion offering more accurate and robust motion vectors; GPU based boundaries by taking into account the difference both in value approaches using local operations have been proposed [20] and distance in a manner very similar to Gaussian convolution. providing accurate estimates; and Phase correlation techniques The optical flow is obtained by applying the proposed method at [43] that operate in the frequency domain computing the certain locations selected based on the present motion differences and then performing non-uniform interpolation in a multi- optical flow by applying locally phase correlation to the scale iterative framework. Experiments with several well-known frames. datasets with and without ground-truth show that our scheme outperforms recently proposed state-of-the-art phase correlation based optical flow methods. Index Terms—optical flow, motion estimation, phase correla- tion, subpixel

Fig. 1. An example inaccurate or blurred motion vectors around motion I.INTRODUCTION boundaries or depth discontinuities. Optical flow estimation is one of the fundamental problems in computer vision [16], [24] with significant applications In the core of these methods we usually have phase correla- such as structure from motion, SLAM, 3D reconstruction tion [36], [44], which has become one of the motion estimation [3] object and pedestrian tracking, face, object or building methods of choice for a wide range of professional studio and detection and behaviour recognition [8], [26], robots or vehicle broadcasting applications. Phase Correlation (PC) and other navigation, image super-resolution, medical image registration, frequency domain approaches (that are based on the shift restoration, and compression. Dense motion estimation or property of the (FT)) offer speed through optical flow is an ill-posed problem and it is defined as the the use of FFT routines and enjoy a high degree of accuracy process of approximating the 3D movement in a scene on a featuring several significant properties: immunity to uniform 2D image sequence based on the illumination changes due to variations of illumination, insensitivity to changes in spectral camera or object motion. As an outcome of this process a energy and excellent peak localization accuracy. Furthermore, dense vector field is obtained providing motion information it provides sub-pixel accuracy that has a significant impact on for each pixel in the image sequences. motion compensated error performance and image registration arXiv:1811.00327v1 [cs.CV] 1 Nov 2018 Several aspects make this problem particularly challenging for tracking, recognition, super-resolution and other applica- including occlusions (i.e. points can appear or disappear tions, as theoretical and experimental analyses have suggested. between two frames), and the aperture problem (i.e. regions Sub-pixel accuracy mainly can be achieved through the use of of uniform appearance with the local cues to not be able to bilinear interpolation, which is also applicable to frequency provide any information about the motion). In the literature domain motion estimation methods. there are many approaches to overcome these challenges and One of the main issues of frequency domain registration compute the optical flow; and most of them belong in one of methods is that it is hard to identify the correct motion param- the following categories. Block matching techniques [31] are eters in the case of multiple motions present in a region. This based on the assumption that all pixels in a block undergo the requirement is more pronounced especially around motion same motion and are mainly applied for standards conversion, boundaries or depth discontinuities resulting to inaccurate or and in international standards for communications such blurred motion vectors (see figure 1). One of the most common as MPEGx and H.26x; Gradient-based techniques [23], [32] approaches to overcome this issue is to incorporate block are utilising the spatio-temporal image gradients introducing matching, computing the motion compensated error for a set of also some robust or smoothness constraints [50]; Bayesian candidate motions [1], [39], [40]. Another solution could be to reduce the block/window size used to apply Phase Correlation, Section 3, we discuss the principles of the proposed Bilateral but this affects further the accuracy of the estimates, since in Phase Correlation (BLPC) and the key features of this optical order to obtain reliable motion estimates large blocks of image flow framework are analysed. In Section 4 we present experi- data are required. Also, smaller windows will not support large mental results while in Section 5 we draw conclusions arising motion vectors. The problem becomes more visible in cases from this paper. that overlapped windows are used of high density and when accurate motion border estimation is essential. Furthermore, II.RELATED WORK another issue that effects the optical flow techniques is the In the following section, we review the major optical flow overall complexity especially for high resolution images or methodologies including frequency domain approaches, and (4K-UltraHD) and the available computational power energy minimization methods developed for computer vision especially from mobile devices. Additionally, since the number applications. of real-time computer vision applications constantly increases, the overall performance and complexity should allow real- time or near real-time optical flow estimation for such high resolutions [42]. In this paper we introduce a novel high-performance optical flow algorithm operating in the frequency domain based on the principles of Phase Correlation. The overall complexity is very low allowing near real-time flow estimation for very high Fig. 2. Example of correlation surface characterised by the presence of two resolution video sequences. In order to overcome the problem peaks corresponding to the present motions. of blurred or erroneous motion estimates at the motion borders, providing accurate and sharp motion vectors, a novel Asym- A brief review of current state-of-the-art Fourier-based metric Bilateral-Phase Correlation technique is introduced, in- methods for optical flow estimation is presented [27]. Several corporating the concept and principles of Bilateral Filters. We subpixel motion estimation and image registration algorithms propose to use the idea of taking into account the difference operating in the frequency domain have been introduced [11], in value with the neighbours to preserve motion edges in [13], [18], [34], [35], [46]–[48], [52]. In [22], Hoge proposes combination with a weighted average of nearby pixels, in a to apply a rank-1 approximation to the phase difference manner very similar to Gaussian convolution, integrated into matrix and then performs unwrapping estimating the motion Phase Correlation process. The key idea of the bilateral filter vectors. The work in [25] is a noise-robust extension to [22], is that for a pixel to influence another pixel, it should not only where noise is assumed to be Additive White Gaussian Noise occupy a nearby location but also have a similar value, [14], (AWGN). The authors in [7] derive the exact parametric model [45]. Finally, subpixel accuracy is obtained through the use of of the phase difference matrix and solve an optimization fast and simple interpolation schemes. Experiments with well- problem for fitting the analytic model to the noisy data. To known datasets of video sequences with and without ground estimate the subpixel shifts, Stone et al. [41] fit the phase truth have shown that our scheme performs significantly better values to a 2D linear function using linear regression, after than other state-of-art frequency domain optical flow methods, masking out frequency components corrupted by aliasing. An while overall in comparison with state of the art methods extension to this method for the additional estimation of planar provides very accurate results with low complexity especially rotation has been proposed in [49]. Foroosh et al. [15] showed for high resolution 4K video sequences. In summary, our that the phase correlation function is the Dirichlet kernel and contributions are: 1) A hierarchical framework for optical provided analytic results for the estimation of the subpixel flow estimation, applying motion estimation techniques only shifts using a sinc approximation. Finally, a fast method for to a small amount of regions of interest based on the motion subpixel estimation based on FFTs has been proposed in [38]. compensated prediction error allowing us to control the overall Notice that the above methods either assume aliasing-free complexity and required computational power. 2) A novel images [2], [4], [7], [15], [37], [38], or cope with aliasing asymmetric bilateral filter operating simultaneously on two by frequency masking [22], [25], [41], [49], which requires separate frames and using unequal size blocks, allowing us fine tuning. to carry the local spatial properties and intensity constraints Over the last decades different optical flow methods based of one frame to the other (i.e. using the properties of a small on the above Phase Correlation techniques have been pro- block/window to filter the whole image). 3) Integration of the posed. In [44] Thomas proposes a vector measurement method proposed asymmetric bilateral filter in the Phase Correlation that has the accuracy of the Fourier techniques, combined with process that can be applied either in the pixel domain as a the ability to measure multiple objects’ motion in a scene. The pre-processing step or directly in the frequency domain as a method is also able to assign vectors to individual pixels if multiplication in a higher dimensional space. required. The authors in [51] proposed raster scanning of the This paper is organised as follows. In Section 2, we review image pair method using a moving window to estimate the the state-of-the-art in optical flow estimation using phase motion of each small window with the introduced compound correlation and other pixel or gradient based approaches. In phase correlation algorithm. In order to improve the fitting T 2 accuracy of the phase difference plane a phase fringe filter is Let Ii(m), m = [x, y] ∈ R , i = 1, 2 be two image func- T 2 utilised in Fourier domain following an improved extension of tions, related by an unknown translation t = [tx, ty] ∈ R the work in [22]. The main issue with this approach is that it is not clear how it performs in the presence of multiple I2(m) = I1(m − t) (1) motions and furthermore it requires the motion boundaries to Phase correlation schemes are utilised to estimate the trans- be quite distinct. In [21] a regular grid of patches is generated lational displacement. Considering each image Ii(m) as a and the optical flow is estimated by calculating the phase continuous periodic image function, then the Fourier transform correlation of each pair of co-sited patches using the Fourier of I1 is given by Mellin Transform (FMT). This approach allows the estimation X −j(2π/N)kT m not only of translation but also scale and rotation motion of Iˆ1(k) = I1(m)e (2) image patches. The main limitation of this algorithm is that it m cannot handle multiple motions and preserve the motion edges. where −N/2 ≤ k < N/2, k = [k, l]T ∈ Z2, m = [m, n]T ∈ In the work presented in [5] the authors use an adaptive grid R2 and −N/2 ≤ m < N/2. to extract image patches and then apply gradient correlation Moving to the shifted version of the image, I , its DFT is in an iterative process using 2D median filters to overcome 2 given based on the Fourier shift property and assuming no the issues related to multiple motions. Despite the filtering aliasing by stage the motion blurring is not avoided in many cases due −j(2π/N)kT (Nt) to local characteristics that may effect the filter. The authors Iˆ2(k) = Iˆ1(k)e (3) in [1], [17], [39], [40] proposed a solution, based on the concept suggested by Thomas in [44], defining a large enough set of candidate motion vectors, and using a combinatorial optimization algorithm (such as block-matching) to find, for each point of interest, the candidate which best represents the motion at that location. This is the main difference Frame f1 Frame f2 with the proposed method since in their work they try to minimize directly the motion compensated error. Also they are Pre-processing using large windows and the obtained set contains the most Create Layers representative maxima of phase-correlation between the two Find all Points of interest input images, computed for different overlapping regions. This approach provides better accuracy and contains less spurious Motion Estimation candidates but the problem with the motion boundaries or Select Subset of Points depth discontinuities remains since it depends on the matching Apply PC on selected points algorithm resulting inaccurate estimates at the borders. Apply BLPC if Peak Ratio>Tr

Dense Vector Field Non-Linear Interpolation

Fig. 3. An example of obtained images showing that the proposed approach Fig. 4. The overall methodology and the related stages for optical flow retains pixels that are close to the centre of the window and have their estimation applied in each layer of the image pyramid. intensities similar to the values of the pixels at the neighboring area of the centre. In these images we observe visually the effect of the proposed novel asymmetric Bilateral filter integrated in the PC method. In more details at the first image we have the small red window that we use to extract the filter restrictions (spatial and intensity constraints) that are then applied to the whole second frame. This allows us to correlate the two frames using only the Bilateral filter information inside the red square.

III.BILATERAL-PC FOR OPTICAL FLOW ESTIMATION Fig. 5. An example of the selected pixel locations by applying first uniform sampling (first from left), and then calculating the frame difference (second), In this work, we propose a new optical flow estimation applying a binary threshold (third) and finally combining both (fourth). technique operating in the frequency domain based on a novel extension of Phase Correlation that incorporates the principles Since the shift property of the Fourier Transform refers of bilateral filters. Initially an overview of Phase Correlation is to integer shifts and no aliasing-free signals are assumed, presented, followed by an analysis of the proposed Bilateral- we regard that our sampling device eliminates aliasing. Tra- PC technique. Finally, a novel framework for efficient optical ditionally phase correlation (PC) is used to estimate the flow estimation in the Fourier domain is discussed supporting translational displacement, which is perhaps the most widely high resolution images or videos (4K). used correlation-based method in image registration. It finds Fig. 6. Selected frames from the datasets used in our evaluation.

Fig. 8. The ratio of the highest over the second highest peak for each correlation surface over each pixel of all the artificial examples. We have three rows of images and two pairs of columns. In each column pair, the left is obtained using the classical PC methods, while the other corresponds to the BLPC. The above images demonstrate the improved accuracy of the BLPC method since we have higher peaks in the correlation surfaces and especially and the borders of the moving objects (clear shapes nd borders of the moving objects in relation to figure 7).

TABLE I RESULTS FOR THE ARTIFICIAL DATASET USING SEVERAL PERFORMANCE MEASURESAPPLYINGTHEMETHODSONALLTHEPIXELLOCATIONS.

MSE PSNR NRMS AE AEF Time Fig. 7. Selected frames from the artificial examples used in our evaluation. PC 0.041 16.23 27.0 92.4 21.02 26.9 In the last two columns we can see the obtained colour coded optical flows BLPC 0.037 16.47 25.8 70.8 14.3 30.3 using PC and BLPC, respectively. It is clear how well the proposed BLPC methods distinguishes the multiple motions, without being affected from the background movement in each pixel location in comparison to PC. object or motion edges in our case. Bilateral-PC (BLPC) takes into account the difference in value with the neighbours to the maximum of the phase difference function which is defined preserve motion edges in combination with a weighted average as the inverse FT of the normalized cross-power spectrum of nearby pixels, which allows for a pixel to influence another ( ∗ ) one not only if it occupies a nearby location but also have a Iˆ (k)Iˆ (k) T PC(u) F −1 2 1 = F −1{ejk t} = δ(u − t) similar value. , ˆ ˆ∗ |I2(k)||I1 (k)| The bilateral filter is defined as a weighted average of (4) −1 nearby pixels, in a manner very similar to Gaussian convo- where ∗ denotes complex conjugate and F the inverse lution. Fourier transform. X GC[Ii] = G (kp − qk)Ii (5) A. Proposed methodology for Bilateral-PC p σ q q∈S One of the main problems that we encounter with Phase where i = 1, 2 indicates the frame/window (see equation (1)), Correlation is the unreliable estimates in the presence of P denotes a sum over all image pixels indexed by p and multiple motions (i.e. motion boundaries or depth disconti- q∈S k.k represents the L norm, e.g., kp − qk is the Euclidean nuities). Considering the example shown in figure 2, where 2 distance between pixel locations p and q. Also, G (x) denotes two motions are present (foreground object and the back- σ the 2D Gaussian kernel ground), the obtained correlation surface is characterised by 2 the presence of two peaks corresponding to the two motions. 1 x Gσ(x) = exp(− ) (6) As a result, it is not feasible to identify the correspondence 2πσ2 2σ2 between pixels and the estimated motion vectors. This problem Additionally, the bilateral filter takes into account the differ- is more pronounced when a dense vector field (optical flow) ence in value between the neighbors to preserve edges while is estimated using overlapped windows. In such cases the smoothing. Therefore, in order a pixel to influence another estimated motion vector is assigned only to the pixel located one, it should also have a similar value not only to be in close at the centre of each window. To overcome this problem a distance. Thus, the bilateral filter is defined as novel Bilateral Phase Correlation technique is proposed, based i 1 X i i i BF [I ] = G i (kp − qk)G i (|I − I |)I (7) on the principles of Bilateral filters. The selection of this p σs σr p q q Wp filter family is due to the higher accuracy on preserving the q∈S P where |.| denotes absolute value and W = G i (kp − TABLE II p q∈S σs i i THE EVALUATION RESULTS FOR THE 4K TEST SEQUENCES REVIVINGTHE qk)G i (|I −I |) is the normalization factor. The parameters σr p q i i i CLASSICS DATASET [30] WITHOUTGROUNDTRUTH. USINGSEVERAL σs and σr specify the amount of filtering for the image I . PERFORMANCEMEASURES. The bilateral filter is not a convolution of the spatial weight i i i G i (kp − qk) with the product G i (|I − I |)I , because Measures MSE PSNR NRMS Time(s) σs σr p q q i i Thomas [44] 5.492 24.682 199.55 44.822 the range weight G i (|I − I |) depends on the pixel value σr p q Argyr. [5] 5.453 24.747 198.83 45.591 i Ip. Considering that, and selecting a fixed intensity value Gaut. [17] 21.89 17.847 408.68 236.68 1 If = I (w/2, h/2) equal to the intensity of the pixel at the Yan [51] 5.498 24.679 199.67 2594.7 Ho [21] 3.263 26.753 159.56 291.64 centre of the selected window of size (w, h), we can overcome Reyes [40] 1.934 29.240 121.72 54.569 i i this problem. Then, the product G i (|I −I |)I is computed σr f q q Alba [39] 2.043 28.775 126.81 66.483 and convolved with the the Gaussian kernel G i , resulting the BLPC 0.908 32.680 81.64 48.669 σr i same outcome of bilateral filer at all pixels p with Ip = If after normalisation. Instead of using a single intensity If , a TABLE III set of values {I , ..., I } obtained from the neighborhood THE EVALUATION RESULTS FOR THE DATASET WITH GROUND TRUTH f1 fn PROVIDEDBY BAKER [6] USINGSEVERALPERFORMANCEMEASURES of If using a m × m window (e.g. m = 3), are selected. As a result we obtain MSE PSNR NRMS AE AEF Time Tho [44] 2.30 21.24 40.31 62.0 4.63 9.71 i i i Q˜ (q) = G i (|I − I |)I (8) Arg [5] 2.34 21.31 40.25 60.6 4.62 11.47 k σr fk q q Gau [17] 2.40 17.45 63.76 25.1 8.22 8.40 After the convolution with the special kernel and the normal- Yan [51] 2.31 21.22 40.41 62.4 4.64 9.78 isation we have Ho [21] 3.14 24.43 31.31 23.4 3.88 74.3 Rey [40] 1.97 26.42 20.88 11.2 3.38 2.72 i i Alb [39] 2.06 26.25 21.64 10.2 3.38 7.59 Q = (G i ⊗ Q˜ ) ÷ (G i ⊗ G i ) (9) k σs k σs σr BLPC 1.97 26.68 20.57 9.9 3.37 2.96 where ÷ indicates per-pixel division and denominator cor- responds to the total sum of the weights. The final images B 1 B 2 coarse-to-fine algorithm that constructs an image pyramid by I1 = BF [I ]p and I2 = BF [I ]p are estimated by down-sampling the original image into a maximum of three upsampling and then interpolating equation (9). Also σ1 is 2 layers by a factor of power of two. The optical flow can then be selected to be much smaller in comparison to σ . An example estimated initially for the smallest image pair in the pyramid, of the obtained images is shown in figure 3 and observing the and is used to unwarp the next one by up-sampling and scaling outcomes, it can be seen that this approach retains pixels that the previous layer. are close to the centre of the window and have their intensities The first stage of this framework includes all the pre- similar to the values of the pixels at the neighboring area of processing tasks, such as pyramid formation, estimation of the centre. key points and removal of the less significant ones. Ini- Since the filtered images IˆB and IˆB are obtained and 1 2 tially, the image Ii is down-sampled by a factor of power transferred to the Fourier domain, equation (4) is used to of two until the obtained resolution is below a threshold estimate the correlation surface. Since the filtering process Mmin × Nmin that depends on the available computational retains only the pixels related to the one at the centre, the power or the desired complexity of the system. If the down- correlation surface contains a single dominant peak, which sampling process was applied more than once before we reach yields an estimate of the shift parameters and can be recovered the minimum resolution threshold, one more layer is extracted as in-between the smallest one and the original size image. Since, s m (∆ˆx, ∆ˆy) = arg max |PC(x, y)| (10) the three layers are obtained (small Ii , medium Ii , and the x,y o original Ii ), key points are estimated on the smallest one. Let T 2 s Finally, subpixel accuracy is obtained through the use of fast pu = [x, y] ∈ R be a set of point locations in Ii , selected interpolation schemes [37]. In more details, the estimate in the by applying a uniform sampling using a step su (see figure s s s x and y directions is given by 5). Also, the absolute frame difference Id = |Ii − Ii+1| is calculated and then a second set of points p = [x, y]T ∈ R2 D D d ∆ˆx = x ∆ˆy = y (11) is obtained as C(0, 0) + |Dx| C(0, 0) + |Dy| s s (x, y) = argx,y{Id(x, y)}, if Id(x, y) ≥ Td (12) where Dx = C(1, 0) − C(−1, 0), Dy = C(0, 1) − C(0, −1) C(k, l) = PC(x + k, y + l) k, l ∈ [−1, 0, 1] where T = ασ s and α is a scaling factor (see figure 5). and 0 0 with . d Id The impact of this threshold is proportional to the overall B. Optical flow estimation framework complexity, and adjusting that value can improve or reduce In this section, we present the proposed optical flow estima- the overall quality with an impact on the computational tion framework using the introduced Bilateral-PC. The overall complexity. The selected values experimentally proved that methodology is separated into the following stages as it is provide a good balance between the overall accuracy and shown in figure 4. This approach is based on a multi-resolution complexity for all video sequences. TABLE IV TABLE VI THE EVALUATION RESULTS FOR SAMPLES OF VIDEOS WITH FACES FROM THE EVALUATION RESULTS FOR THE UCLGTOFV1.1 DATASET WITH DHALL’S DATASET [12] WITHOUTGROUNDTRUTHUSINGSEVERAL GROUNDTRUTH [33] USINGSEVERALPERFORMANCEMEASURES. PERFORMANCEMEASURES. MSE PSNR NRMS AE AEF Tim Measures MSE PSNR NRMS Time(s) Tho [44] 19.2 17.51 74.68 80.5 19.57 11 Thomas [44] 1.4471 33.082 21.108 12.784 Arg [5] 19.1 17.53 74.63 79.9 19.56 10.9 Argyr. [5] 1.4352 33.093 21.046 16.009 Gau [17] 30.4 15.62 96.31 66.7 76.03 7.9 Gaut. [17] 23.143 18.382 78.725 10.452 Yan [51] 19.2 17.53 74.75 80.8 19.6 9.6 Yan [51] 1.4488 33.08 21.117 13.63 Ho [21] 16.2 18.58 68.89 69.4 18.7 91 Ho [21] 1.081 33.81 18.621 132.56 Rey [40] 7.5 22.04 47.11 30.8 11.33 1.5 Reyes [40] 0.3299 36.999 10.218 1.3671 Alb [39] 8.5 21.45 50.51 34.3 12.29 9.4 Alba [39] 0.29531 36.891 10.343 8.6594 BLPC 7.2 22.82 46.19 23.3 7.44 1.6 BLPC 0.2595 37.419 9.0248 0.9956

TABLE V THE EVALUATION RESULTS FOR THE THE CT DATASET FROM LIANG [28], [29] WITHOUTGROUNDTRUTHUSINGSEVERALPERFORMANCE MEASURES.

Measures MSE PSNR NRMS Time(s) Thomas [44] 1.7381 28.755 25.455 13.526 Argyr. [5] 1.7381 28.756 25.455 16.661 Gaut. [17] 7.2814 21.807 54.722 10.535 Yan [51] 1.7382 28.755 25.455 17.148 Ho [21] 1.8028 28.648 25.882 131.35 Reyes [40] 1.6309 28.947 24.724 0.9989 Alba [39] 1.639 28.93 24.784 6.2942 BLPC 1.6283 28.95 24.705 0.4863

Since the key points pu and pd are estimated, if their total number is above a limit Tp, that depends on the desired complexity of the proposed optical flow algorithm, the pd are uniformly down-sampled. A sampling step is selected to result 0 a new set of points p , aiming to have in total less points than d 0 Tp by combining both pu and pd. During the second stage of the proposed methodology, motion estimation is applied at the selected key locations. Initially, Phase Correlation is applied using a square window of size (mw ×mw) centered at each key point location. From the obtained correlation surface PC(u) in equation (4), the ratio of 1 2 the highest PPC over the second highest PPC peak is estimated 1 2 Fig. 9. Examples of obtained optical flows using the MPI-SINTEL dataset rPC = PPC/PPC. If rPC ≥ Tr, then this is an indication that more than one motions are present in this window. Therefore, with ground truth from Butler [9], for several sequences (six columns) and methods (from top to bottom [5], [17], [21], [39], [40], [44], [51], the proposed Bilateral-PC is now applied to obtain a correlation surface BLPC and the GT). with a single peak and a consequently more accurate motion estimate. The threshold Tr is specified based on the logarithm of the current window size mw. Additionally, for the points that were removed after down-sampling calculated. Finally, the process repeats the same, moving back \ 0 to the first stage obtaining the new key point locations, until pˆd = pd pd (13) we reach the last layer that corresponds to the original size motion vectors are estimated using [32]. In order to obtain image. a dense vector field non-uniform interpolation with bilateral In the case of an application related to face analysis or filtering is applied on the motion components of the sparse tracking the proposed optical flow framework can by adapted key point locations. to estimate the motion information only in the area that a At the final stage, we unwrap the next smallest image face is located. In more details, if a face is detected in a m s o 0 Ii (x, y) = Ii (x − dx(x, y), y − dy(x, y) (or Ii (x, y) = frame, then from the point locations in pd, we retain only the m Ii (x − dx(x, y), y − dy(x, y)), where dx and dy are the ones that overlap with the face window. This approach can scaled motion estimates from the previous layer. Motion reduce significantly the computational complexity in certain ˆm ˆo compensation is applied to estimate the Ii+1 (or Ii+1) image applications that only for selected parts of a frame motion m m ˆm o o ˆo and the difference Id = |Ii − Ii+1| (or Id = |Ii − Ii+1|) is information is required. IV. RESULTS Regarding the AEF measure is probably more appropriate for A comparative study was performed with state-of-the-art op- most applications, since regions of non-zero motion are not tical flow techniques operating both in the frequency and pixel penalized less than regions of zero motion. domain. Video sequences with and without the ground truth Proof of concept using artificial data: In the first part were used for evaluating the performance. Also experiments of this analysis we used artificial examples to demonstrate with artificial data were performed to further demonstrate the the issue of multiple motions present in an area for Phase concept of the proposed BLPC technique. In this study six Correlation. Therefore, nine pairs of images with two or datasets were utilised, and in more details the video sequences more moving objects were used in our experiments (see a with ground truth provided by Baker in [6], the MPI-SINTEL subset in figure 7). In these examples the ground-truth is dataset from Butler in [9] and the UCLgtOFv1.1 in [33]. available for each moving object, and in our experiments we Furthermore, we used the CT dataset from Liang in [28], compared the proposed BLPC method with Phase Correlation. [29], samples of videos with faces from Dhall’s dataset in Corresponding blocks of size 32 × 32 centered around each [12] and also 4K Test Sequences Reviving the Classics dataset pixel are used to apply PC and BLPC generating a dense vector [30] without ground truth. Selected frames and videos from field. About the boundary pixels wrap padding was used and these datasets are shown in figure 6. In our evaluation, several the obtained motion vector was applied on the corresponding performance measures were utilised based on the availability pixel location. Examples of the obtained optical flow estimates of ground truth or not, [6]. For all video sequences the motion for each case are shown in figure 7. Also we plotted the ratio compensated prediction error is used and is defined as the of the highest over the second highest peak for each correlation mean square error (MSE) between the ground-truth image (i.e. surface over each pixel in all the artificial examples in figure 8. frame we try to predict) and the estimated compensated one These figures demonstrate that the proposed approach operates with high accuracy at the presence of multiple motions in 1 X MSE = (I (x, y) − I (x, y))2 (14) comparison with the traditional techniques. Furthermore, in N c gt (x,y) table I all the obtained results are summarised and we can see that BLPC provides more accurate estimates both in terms of where N is the number of pixels, I is the motion compensated c AE and PSNR. image and I is the ground-truth frame. For color images, we gt Comparative study using real video sequences: Regard- take the L2 norm of the vector of RGB color differences. Also ing the real sequences about 80 videos were used in our based on MSE we can obtain the PSNR defined as evaluation. We had a big variety in terms of frame resolution, 2 PSNR = 10 ∗ log10(max(Ic) /MSE) (15) starting from 584×388, nHD, HD, FHD and moving up to 4K UHD 3840×2160. In this comparative study, the performance Another measure based on the motion compensated prediction of the proposed BLP C approach is compared 16 methods in error is the gradient normalised root-mean-square difference total, with seven of them to be well-known state-of-the-art between the ground-truth image and the compensated one PC based methods [5], [17], [21], [39], [40], [44], [51]. In 1   2 our experiments the overall obtained accuracy of the proposed 2 1 X (Ic(x, y) − Igt(x, y)) BLP C approach is superior in comparison to other frequency NRMS =  2  (16) N k∇Igt(x, y)k +  (x,y) domain optical flow methods, resulting sharp, and precise motion estimates with good accuracy over motion boundaries In our experiments the arbitrary scaling constant is set to be and small details present in the image scene. Also the obtained  = 1.0 since it is the most common value used. About the results are close to state of the art methods operating in the motion compensation algorithm we used the one suggested in pixel domain. In figure 9, we can see the obtained optical flows [10] with subpixel accuracy. for several sequences and methods including also the ground Regarding the sequences with ground-truth flow available, truth. In tables II-VI all the obtained results are summarised the angular error (AE) between a flow vector (u, v) and the and we can see that BLPC provides more accurate estimates ground-truth (ugt, vgt) was used as a measure of performance both in terms of AE and PSNR in comparison with the other for optical flow. The AE is defined as angle in 3D space state-of-the-art frequency domain methods, while the complex- between (u, v, 1.0) and (ugt, vgt, 1.0) and it can be computed ity remains low. Overall the suggested optical flow method using the equation below provides significant accuracy, specially at motion boundaries,   and outperforms current well-known methods operating in the 1.0 + u · u + v · v −1 gt gt frequency domain. AE = cos √ q  (17) 2 2 2 2 1.0 + u + v 1.0 + ugt + vgt V. CONCLUSION With this measure, errors in large flows are penalized less than In this paper, a new framework for optical flow estima- errors in small flows. Also, we compute an absolute error in tion in the frequency domain was introduced. This approach flow (AEF) dened by: is based on Phase Correlation and a novel Bilateral-Phase q Correlation technique is introduced, incorporating the concept AEF = (u − u )2 + (v − v )2 (18) gt gt and principles of Bilateral Filters. One of the most attractive features of the proposed scheme is that it retains the motion [24] K. Jia, X. Wang, and X. Tang. Optical flow estimation using learned boundaries by taking into account the difference in value of the sparse model. In Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11, pages 2391–2398, Washington, DC, USA, neighbouring pixels to preserve motion edges in combination 2011. IEEE Computer Society. with a weighted average of nearby intensity values, in a [25] Y. Keller and A. Averbuch. A projection-based extension to phase correlation image alignment. Signal Process., 87:124–133, 2007. manner very similar to Gaussian convolution. BLPC yields [26] D. Konstantinidis, T. Stathaki, V. Argyriou, and N. Grammalidis. Build- very accurate motion estimates for a variety of test material ing detection using enhanced hoglbp features and region refinement pro- and motion scenarios, and outperforms optical flow techniques, cesses. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(3):888–905, March 2017. which are the current registration methods of choice in the [27] S. Kruger and A. Calway. A multiresolution frequency domain method frequency domain. for estimating affine motion parameters. In Proc. IEEE International Conf. on Image Processing, page 113116, 1996. ACKNOWLEDGEMENTS [28] X. Liang, Q. Cao, R. Huang, and L. Lin. Recognizing focal liver lesions in contrast-enhanced ultrasound with discriminatively trained spatio- This work is co-funded by the NATO within the WITNESS temporal model. IEEE ISBI, 2014. project under grant agreement number G5437. The Titan X [29] X. Liang, L. Lin, Q. Cao, R. Huang, and Y. Wang. Recognizing focal liver lesions in ceus with dynamically trained latent structured models. Pascal used for this research was donated by NVIDIA. IEEE TRANS ON MEDICAL IMAGING, 2015. [30] E. T. Limited. 4k test sequences reviving the classics. REFERENCES http://www.elemental.com/resources/ 4k-test-sequences, 2014. [31] B. Liu and A. Zaccarin. New fast algorithms for the estimation of black [1] A. Alba, E. Arce-Santana, and M. Rivera. Optical flow estimation with motion vectors. Circ. and Syst. Vid.Tech., 3(2):148–157, 1993. prior models obtained from phase correlation. Lect Notes Computer [32] B. Lucas and T. Kanade. An iterative image registration technique with Science, 2010. an application to stereo vision. In Proc. IJCAI, pages 674–679, 1981. [2] V. Argyriou. Sub-hexagonal phase correlation for motion estimation. [33] O. Mac Aodha, G. J. Brostow, and M. Pollefeys. Segmenting video into IEEE Transactions on Image Processing, 20(1):110–120, Jan 2011. classes of algorithm-suitability. In CVPR, 2010. [3] V. Argyriou and M. Petrou. Photometric stereo: an overview. Adv. [34] V. Maik, E. Chae, L. E. sung, P. C. yong, J. G. hyun, P. Sunhee, Imaging Electron. Phys., 156:1–54, 2009. H. Jinhee, and J. Paik. Robust sub-pixel image registration based on [4] V. Argyriou and T. Vlachos. Quad-tree motion estimation in the combination of local phase correlation and feature analysis. IEEE Intern IEEE Transactions on frequency domain using gradient correlation. Symp on Consumer Electronics, pages 1–2, June 2014. Multimedia, 9(6):1147–1154, Oct 2007. [35] M. G. Mozerov. Constrained optical flow estimation as a matching [5] V. Argyriou and T. Vlachos. Low complexity dense motion estimation problem. IEEE T. Image Proc., 22(5):2044–2055, May 2013. using phase correlation. In DSP, pages 1–6, July 2009. [36] J. Pearson, D. Hines, S. Goldsman, and C. Kuglin. Video rate image [6] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and R. Szeliski. correlation processor. Proc. SPIE, 119, 1977. A database and evaluation methodology for opticalflow. ICCV, 2007. [37] J. Ren, J. Jiang, and T. Vlachos. High-accuracy sub-pixel estimation [7] M. Balci and H. Foroosh. Subpixel estimation of shifts directly in the from noisy images in fourier domain. TIP, 19(5):1379–1384, 2010. fourier domain. IEEE Tr. Image Proc., 15(7):1965–1972, 2006. [38] J. Ren, T. Vlachos, and J. Jiang. Subspace extension to phase correlation [8] V. Bloom, D. Makris, and V. Argyriou. Clustered spatio-temporal approach for fast image registration. ICIP, pages 481–484, 2007. manifolds for online action recognition. In 2014 22nd International [39] A. Reyes, A. Alba, and E. Arce-Santana. Efficiency analysis of poc- Conference on Pattern Recognition, pages 3963–3968, Aug 2014. derived bases for combinatorial motion estimation. Chapter, Image and [9] D. Butler, J. Wulff, G. Stanley, and M. Black. A naturalistic open source Video Technology of the series Lecture Notes in Computer Science, ECCV movie for optical flow evaluation. , pages 611–625, 2012. 8333:124–135, 2014. [10] S. H. Chan, D. T. V, and T. Q. Nguyen. Subpixel motion estimation [40] A. Reyes, A. Alba, and E. R. Santana. Optical flow estimation using ICASSP without interpolation. In , pages 722–725, March 2010. phase only-correlation. Procedia Techn., 7:103–110, 2013. [11] P. Cheng and C.-H. Menq. Real-time continuous image registration [41] H. Stone, M. Orchard, E. Chang, and S. Martucci. A fast direct fourier- Image Processing, IEEE enabling ultraprecise 2-d motion tracking. based algorithm for subpixel registration of images. IEEE Trans. Geosci. Transactions on , 22(5):2081–2090, May 2013. Remote Sens., 39(10):2235–2243, Oct 2001. [12] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly [42] M. W. Tao, J. Bai, P. Kohli, and S. Paris. Simpleflow: A non- in IEEE MultiMedia annotated facial-expression databases from movies. , iterative, sublinear optical flow algorithm. Computer Graphics Forum 19(3):34–41, July-Sept. 2012. (Eurographics 2012), 31(2), may 2012. [13] Y. Douini, J. Riffi, M. A. Mahraz, and H. Tairi. Solving sub-pixel image [43] A. Tekalp. Digital . Prentice Hall, 1995. registration problems using phase correlation and lucas-kanade optical [44] G. Thomas. Television motion measurement for datv and other appli- flow method. In ISCV, pages 1–5, April 2017. cations. BBC Res. Dept. Rep., No. 1987/11, 1987. [14] F. Durand and J. Dorsey. Fast bilateral filtering for the display of [45] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. highdynamic-range images. ACM Tr. Graphics, 21(3):257–266, 2002. in Proceedings of the IEEE ICCV, pages 839–846, 1998. [15] H. Foroosh, J. Zerubia, and M. Berthod. Extension of phase correlation [46] X. Tong, Y. Xu, Z. Ye, S. Liu, L. Li, H. Xie, F. Wang, S. Gao, and to sub-pixel registration. IEEE Image Process., 11(2):188–200, 2002. U. Stilla. An improved phase correlation method based on 2-d plane [16] D. Fortun, P. Bouthemy, and C. Kervrann. Optical flow modeling and fitting and the maximum kernel density estimator. Geoscience and CVIU computation: A survey. , 134:121, 2015. Remote Sensing Letters, IEEE, 12(9):1953–1957, Sept 2015. [17] T. Gautama and M. Van Hulle. A phase-based approach to the estimation [47] X. Tong, Z. Ye, Y. Xu, S. Liu, L. Li, H. Xie, and T. Li. A novel Neural Networks, IEEE of the optical flow field using spatial filtering. subpixel phase correlation method using singular value decomposition Transactions on , 13(5):1127–1136, Sep 2002. and unified random sample consensus. Geoscience and Remote Sensing, [18] K. Ghoul, M. Berkane, and M. Batouche. Pc method to the optical flow IEEE Trans using neuronal networks. In ICMCS, pages 71–75, 2017. , 53(8):4143–4156, 2015. [19] B. Glocker, T. Heibel, N. Navab, P. Kohli, and C. Rother. Triangleflow: [48] M. Uss, B. Vozel, V. Dushepa, V. Komjak, and K. Chehdi. A precise Geoscience and Optical flow with triangulation-based higher-order likelihoods. ECCV, lower bound on image subpixel registration accuracy. Remote Sensing, IEEE Trans pages 272–285, 2010. , 52(6):3333–3345, June 2014. [20] P. Gwosdek, H. Zimmer, S. Grewenig, A. Bruhn, and J. Weickert. A [49] P. Vandewalle, S. Susstrunk, and M. Vetterli. A frequency domain highly efficient gpu implementation for variational optic flow based on approach to registration of aliased images with application to super- resolution. EURASIP J. Appl. Signal Process., pages 1–14, 2006. the euler-lagrange framework. ECCVW, pages 372–383, 2010. [21] H. T. Ho and R. Goecke. Optical flow estimation using fourier mellin [50] J. Weickert and C. Schnorr. Variational optic flow computation with a Computer Vision and Pattern Recognition, 2008. CVPR spatio-temporal smoothness constraint. JMIV, 14(3):245–255, 2001. transform. In [51] H. Yan and J. G. Liu. Robust phase correlation based motion estimation 2008. IEEE Conference on , pages 1–8, June 2008. and its applications. BMVC, pages 1–10, 2008. [22] W. Hoge. Subspace identification extension to the phase correlation [52] L. Zhongke, Y. Xiaohui, and W. Lenan. Image registration based on method. IEEE Trans. Med. Imag., 22(2):277280, Feb 2003. Neural Networks and Signal [23] B. Horn and B. Schunck. Determining optical flow. Artificial Intelli- hough transform and phase correlation. Processing, 2:956–959, Dec 2003. gence, 17(13):185–203, 1981.