Total Variation and Euler's Elastica for Supervised Learning

Total Variation and Euler's Elastica for Supervised Learning Tong Lin [email protected] Hanlin Xue [email protected] Ling Wang* [email protected] Hongbin Zha [email protected] The Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, China *LTCI, TélécomParisTech, Paris, France Abstract Euler Elastica decision boundary In recent years, total variation (TV) and Eu- 1 ler's elastica (EE) have been successfully ap- 0.8 0.6 plied to image processing tasks such as de- 0.4 noising and inpainting. This paper inves- 0.2 tigates how to extend TV and EE to the 0 supervised learning settings on high dimen- 0 0.2 0.4 0.6 0.8 1 sional data. The supervised learning problem can be formulated as an energy functional Figure 1. Results on two moon data by the EE classifier: minimization under Tikhonov regularization (Left) decision boundary; (Right) learned target function. scheme, where the energy is composed of a squared loss and a total variation smoothing (or Euler's elastica smoothing). Its so- prehensive empirical comparison of these methods in lution via variational principles leads to an (Caruana & Niculescu-Mizil, 2006)). Existing meth- Euler-Lagrange PDE. However, the PDE is ods can be roughly divided into statistics based and always high-dimensional and cannot be di- function learning based (Kotsiantis et al., 2006). One rectly solved by common methods. Instead, advantage of function learning methods is that pow- radial basis functions are utilized to approxi- erful mathematical theories in functional analysis can mate the target function, reducing the prob- be utilized rather than doing optimizations on discrete lem to finding the linear coefficients of basis data points. functions. We apply the proposed methods to supervised learning tasks (including binary Most function learning methods can be derived from classification, multi-class classification, and Tikhonov regularization, which minimizes a loss term regression) on benchmark data sets. Exten- plus a smoothing regularizer. The most successful sive experiments have demonstrated promis- classification and regression method is SVM (Bishop, ing results of the proposed methods. 2006; Hastie T., 2009; Shawe-Taylor & Cristianini, 2000), whose cost function is composed of a hinge loss and a RKHS norm determined by a kernel. Re- 1. Introduction placing the hinge loss by a squared loss, the modified algorithm is called Regularized Least Squares (RLS) Supervised learning (Bishop, 2006; Hastie T., 2009) in- method (Rifkin, 2002). In addition, manifold regu- fers a function that maps inputs to desired outputs larization (Belkin et al., 2006) introduced a regular- under the guidance of training data. Two main tasks izer of squared gradient magnitude on manifolds. Its in supervised learning are classification and regres- discrete version amounts to graph Laplacian regular- sion. A huge number of supervised learning methods ization (Nadler et al., 2009; Zhou & Schölkopf, 2005), have been developed in several decades (see a com- which approximates the original energy functional. A most recent work is the geometric level set (GLS) Appearing in Proceedings of the 29 th International Confer- classifier (Varshney & Willsky, 2010), with an energy ence on Machine Learning, Edinburgh, Scotland, UK, 2012. functional composed of a margin-based loss and a ge- Copyright 2012 by the author(s)/owner(s). ometric regularization term based on the surface area Total Variation and Euler's Elastica for Supervised Learning of the decision boundary. Experiments showed that Simply, it is a measure of the total quantity of the GLS is competitive with SVM and other state-of-the- change of a function. Notice that if f 0(x) > 0; x 2 art classifiers. [a; b], it is exactly f(b) − f(a) by the basic theorem of calculus. Total variation has been widely used for im- In this paper, the supervised learning problem is for- age processing tasks such as denoising and inpainting. mulated as an energy functional minimization un- The pioneering work is Rudin, Osher, and Fatemi's der Tikhonov regularization scheme, with the en- image denoising model (Rudin et al., 1992): ergy composed of a squared loss and a total varia- Z tion (TV) penalty or an Euler's elastica (EE) penalty. 2 J = ((I − I0) + λjrIj)dx; Since the TV and EE models have achieved great Ω success in image denoising and image inpainting (Aubert & Kornprobst, 2006; Barbero & Sra, 2011; where I0 is the input image with noise, I the desired Chan & Shen, 2005), a natural question is whether the output image, λ a regulation parameter that balances success of TV and EE models on image processing ap- two terms, and Ω a 2D image domain. The first fitting plications can be transferred to high dimensional data term measures the fidelity to the input, while the sec- analysis such as supervised learning. This paper inves- ond is a p-Sobolev regularization term (p = 1) where r tigates the question by extending TV and EE models I is understood in the distributional sense. The main to supervised learning settings, and evaluating their merit is to preserve significant image edges during performance on benchmark data sets against state-of- denoising (Aubert & Kornprobst, 2006; Chan & Shen, the-art methods. Figure 1 shows the classification re- 2005). Note that TV may have different definitions sult on the popular two moon data by the EE clas- (Barbero & Sra, 2011). sifier, and the learned target function. Interestingly, In the machine learning literature, p-Sobolev regular- the GLS classifier (Varshney & Willsky, 2010) is also izer can be found in nonparametric smoothing splines, motivated by image processing techniques, and its gra- generalized additive models, and projection pursuit re- dient descent time marching leads to a mean curvature gression models (Hastie T., 2009). Specifically, Belkin flow. et al. proposed the manifold regularization term Z The paper is organized as follows. We begin with a jr j2 brief review of TV and EE in Section 2. In Section M f dx; x2M 3 the proposed models are described, and numerical solutions are developed in Section 4. Section 5 presents on a manifold M (Belkin et al., 2006). On the other the experimental results, and Section 6 concludes this hand, discrete graph Laplacian regularization was dis- paper. cussed in (Zhou & Schölkopf, 2005) as X p jrvfj ; 2. Preliminaries v2V We briefly introduce total variation and Euler's elas- where v is a vertex from V , and p is an arbitrary num- tica from an image processing perspective, and point ber. This penalty measures the roughness of f over a out connections with prior work in the machine learn- graph. ing literature. 2.2. Euler's Elastica (EE) 2.1. Total Variation (TV) Euler (1744) first introduced the elastica energy for The total variation of a 1D real-valued function f is a curve on modeling torsion-free elastic rods. Then defined as Mumford (Mumford, 1991) reintroduced elastica into computer vision. Later, elastica based image inpaint- − nXp 1 ing methods were developed in (Chan et al., 2002; a j − j Vb (f) = sup f(xi+1) f(xi) ; Masnou & Morel, 1998). i=0 A curve γ is said to be Euler's elastica if it is the where the supremum runs over all partitions of given equilibrium curve of the elasticity energy: Z interval [a; b]. If f is differentiable, the total variation 2 can be written as E[γ] = (a + bκ )ds; (1) γ Z b where a and b stand for two positive constant weights, a j 0 j Vb (f) = f (x) dx: κ denotes the scalar curvature, and ds is the arc length a Total Variation and Euler's Elastica for Supervised Learning element. Euler obtained the energy in studying the posed in the literature: hinge loss for SVM, squared steady shape of a thin and torsion-free rod under ex- loss for RLS, logistic loss for logistic regression, Huber ternal forces. The curve implies the lowest elastica en- loss, exponential loss, and among others. Throughout ergy, thus getting its name. According to (Mumford, the paper, squared loss is used in all models due to its 1991), the key link between the elastica and image rather simpler differential form. inpainting relies on the the interpolation capability of elastica. That is, elastica can comply to the con- 3.2. Laplacian Regularization (LR) nectivity principle better than total variation. Such kinds of "nonlinear splines", like classical polynomial A commonly used model using squared loss can be splines, are natural tools for completing the missing or written as occluded edges. Xn 2 min (u(xi) − yi) + λS(u): (5) The Euler's elastica based inpainting model was pro- i=1 posed as (Chan & Shen, 2005) Z Z If the RKHS norm is used for the smoothing term, 2 2 the model is called regularized least squares (RLS) J = (I − I0) dx + λ (a + bκ )jrIjdx; (2) ΩnD Ω (Rifkin, 2002). Another natural choice is the squared L -norm of the gradient: S(u) = jruj2, as proposed in where D is the region to be inpainted, Ω the whole 2 (Belkin et al., 2006). Under a continuous setting, we image domain, and κ the curvature of the associated get the following Laplacian regularization (LR) model: level set curve with Z rI − 2 jr j2 κ = r · ( ): (3) JLR[u] = ((u y) + λ u )dx: (6) jrIj Ω By using calculus of variation, its minimization is re- This LR model has been widely used in the image pro- duced to an nonlinear Euler-Lagrange equation. The cessing literatures. Using calculus of variations, the finite difference scheme can be used to give numerical minimization can be reduced to the following Euler- implementation, and experimental results show that Lagrange partial differential equation (PDE) with a the EE based inpainting performs better than its TV natural boundary condition along @Ω: ( version.

Load more