Video Recovery Via Learning Variation and Consistency of Images

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Video Recovery via Learning Variation and Consistency of Images Zhouyuan Huo,1 Shangqian Gao,2 Weidong Cai,3 Heng Huang1∗ 1Department of Computer Science and Engineering, University of Texas at Arlington, USA 2College of Engineering, Northeastern University, USA 3School of Information Technologies, University of Sydney, NSW 2006, Australia [email protected], [email protected], [email protected], [email protected] Abstract 2010). However, trace norm is not the best approximation for rank minimization regularization. If a non-zero singular Matrix completion algorithms have been popularly used to recover images with missing entries, and they are proved value of a matrix varies, the trace norm value will change si- to be very effective. Recent works utilized tensor comple- multaneously, but the rank of the matrix keeps constant. So, tion models in video recovery assuming that all video frames there is a big gap between trace norm and rank minimiza- are homogeneous and correlated. However, real videos are tion, and a better model to approximate rank minimization made up of different episodes or scenes, i.e. heterogeneous. regularization is desired for learning a low-rank structure. Therefore, a video recovery model which utilizes both video In a recent work (Liu et al. 2009), Liu applied low-rank spatiotemporal consistency and variation is necessary. To model to solve image recovery problem. He assumed that solve this problem, we propose a new video recovery method video is in low-rank structure, however, videos are not ho- Sectional Trace Norm with Variation and Consistency Con- mogeneous, and they may contain sequences of images from straints (STN-VCC). In our model, capped 1-norm regular- different episodes. If we simply project all the video into a ization is utilized to learn the spatial-temporal consistency and variation between consecutive frames in video clips. low-rank subspace, it is very likely to lose important infor- Meanwhile, we introduce a new low-rank model to capture mation in the original video. More recently, Wang et al. in- the low-rank structure in video frames with a better approxi- troduced a spatiotemporal consistency low-rank tensor com- mation of rank minimization than traditional trace norm. An pletion method (Wang, Nie, and Huang 2014). They pro- efficient optimization algorithm is proposed, and we also proposed to utilize the content continuity within videos by im- vide a proof of convergence in the paper. We evaluate the posing a new 2-norm smoothness regularization. However, proposed method via several video recovery tasks and exper- this 2-norm smoothness regularization tends to force two iment results show that our new method consistently outper- consecutive frames to be similar, and it can not capture spa- forms other related approaches. tiotemporal variation properly. In this paper, we propose a novel video recovery model Introduction Sectional Trace Norm with Variation and Consistency Con- Image recovery task can be treated as a matrix completion straints (STN-VCC). Our contributions are summarized as problem, and it is proved to be really effective (Buades, following: firstly, we utilize capped 1-norm smoothness Coll, and Morel 2005; Ji et al. 2010; Liu et al. 2009; function to capture the spatiotemporal variation and consis- Wang, Nie, and Huang 2014). In this task, pixels in an image tency between consecutive frames in video clips; secondly, are missing because of noise or occlusion, and we need to we introduce a new sectional trace norm to capture the low- make use of observed information to reconstruct the original rank structure, which is a better approximation of rank min- image without deviating too much from it. Rank minimiza- imization than the traditional trace norm. Thirdly, instead of tion regularization is proposed to solve matrix completion using Alternating Direction Method of Multipliers (ADMM) problems, and has already achieved great success in other which was used by previous video recovery models, we use fields (Bennett and Lanning 2007). However, it leads to a proximal gradient descent method to optimize and prove that non-convex and NP-hard problem. To solve this problem, our method is guaranteed to converge. trace norm is introduced to approximate rank minimization, and since then many algorithms have been invented to solve Video Recovery with Capped Norm matrix completion problems using trace norm regularization Constrained Variation and Consistency (Shamir and Shalev-Shwartz 2014; Cai, Candes,` and Shen Capture Spatiotemporal Consistency and Variation ∗ To whom all correspondence should be addressed. This work in Video Clip via Capped 1-norm was partially supported by the following grants: NSF-IIS 1302675, In (Wang, Nie, and Huang 2014), authors proposed a 2- NSF-IIS 1344152, NSF-DBI 1356628, NSF-IIS 1619308, NSF-IIS || || 1633753, NIH R01 AG049371. norm regularization E 2 to maintain spatiotemporal con- Copyright c 2017, Association for the Advancement of Artificial sistency in video clips, it is reasonable to suppose that two Intelligence (www.aaai.org). All rights reserved. consecutive frames in video clips are alike. However, not 4082 all video sequences share the same content, spatiotempo- is also clear that capped 1-norm outperforms 1-norm. In ral variation is also very frequent in video clips. 1-norm large difference ranges, |Difference|≥100 in the exper- is a common alternative to 2-norm, and it is known for its iment, capped 1-norm does not penalize these differences robustness to outliers. It is intuitive to utilize 1-norm reg- for they represent different objects or episodes. ularization to capture both spatiotemporal consistency and variation. However, one video clip is usually made up of Sectional Trace Norm Based Low-Rank Model many different episodes, and it is natural that two consec- Recently, the low-rank model and its approximation have utive frames are from different scenes and backgrounds. been successfully applied to recover the images and videos In this case, most of the pixels in two consecutive frames with missing pixel. Most of these works used trace norm are different, and even 1-norm regularization fails. In this to recover the low-rank structure of the image matrix. For paper, we propose to use capped 1-norm regularization a low-rank matrix X ∈ Rn×m, rank(X)=r,soitsk- min(|e|,ε) to capture the spatiotemporal consistency = − e∈E smallest singular values should be zero, where k d r = ( ) || || and variation in video clips. Capped 1-norm function is able and d min n, m . Note that trace norm X ∗ denotes to ignore the errors larger than ε, penalize the errors smaller the sum of singular values. If the non-zero singular values || || than ε, and it is also used to handle outliers (Zhang 2009; of matrix X change, X ∗ will change as well, but the Gong et al. 2013; Huo, Nie, and Huang 2016; Gao et al. rank of X keeps constant. Thus, there is a big gap between || || ( ) 2015; Jiang, Nie, and Huang 2015). In this paper, it prop- trace norm X ∗ and rank X . erly solves the problem of spatiotemporal consistency and In this paper, we propose a new sectional trace norm variation in video clips. to uncover the low-rank structure of a matrix, ||X||str = k We perform an experiment to illustrate the advantage of 2 σi (X). In the sectional trace norm regularization, we capped 1-norm over 2-norm and 1-norm. The objective i=1 function of this task is minimize the sum of k-smallest singular value squares of X 2 min ||X − D1||F + λreg(X − D2) (1) and ignore the other larger singular values. When the non- X zero singular values increase largely, they are excluded by 288×352 where D1,D2 ∈ R are two distinct consecutive im- our sectional trace norm such that the norm value keeps con- 2 age matrices, and reg(E)=||E||F , reg(E)=||E||1 or stant. reg(E)= min(|e|,ε) for three models. In this experi- Alternatively, in (Hu et al. 2013), truncated trace norm e∈E was proposed to minimize the sum of k-smallest singular =05 = 100 ment, we set λ . and ε . We use stochastic gradi- values, and it can also avoid the effect of large singular 0 1 ent method to optimize and learning rate is . values, and is better than the traditional trace norm. How- ever, minimizing the sum of k-smallest singular values is ×10 4 3 16000 D1 - D2 D1 - D2 a 1 minimization problem, which leads to sparse solution, X - D2 14000 X - D2 2.5 12000 namely some k-smallest singular values will be zero, but 2 10000 some of them may get large values. Our sectional trace norm 1.5 8000 can solve this issue. When we minimize the sectional trace 6000 Frequency Frequency 1 norm, the sum of square of k-smallest singular values is min- 4000 0.5 2000 imized so that all of them will be shrunk near to zero. 0 0 We perform a low-rank matrix approximation experiment -300 -200 -100 0 100 200 300 -300 -200 -100 0 100 200 300 Difference Difference to illustrate the advantage of sectional trace norm over trun- (a) 2-norm (b) 1-norm cated trace norm. The objective function of this task is: 16000 2 D1 - D2 min || − || + ( ) 14000 X - D2 X D F λregX , (2) X 12000 10000 where D ∈ R288×352 is an image matrix, and reg(X)= 8000 k k 6000 Frequency ( ) ( )= 2( ) 4000 σi X or reg X σi X for two models. We i=1 i=1 2000 ( )=12 0 compute a low-rank matrix X and rank X ,so -300 -200 -100 0 100 200 300 Difference k = 276 and λ = 1000.

Load more