arXiv:1602.05610v2 [math.CA] 7 Mar 2016 il n asinrda ai ucin.Teetocassaerich are classes two These functions. multivariate basis namely, radial families, Gaussian such two and does investigates transform mials Weierstrass article the This which for form. functions of family tant transfo the approximated case, the or is computed that be When to expression. form closed a have Approximation. Numerical cr spurious suppressing by [ objective nonconvex a simplify [ data pled [ reduction noise [ equations diffusion of solution the as appears it transform Weierstrass xc lsdForm. Closed Exact algorit samplin efficient importance adaptive in by resulted e.g. have approximations, numerical such applications, optimization from vated a operations visual defining for mu [ used generating SIFT are for in as images images such blurred blur tures to the [ used of representations is tives scale-space it or processing, scale image and vision Motivation. Introduction 1 oaiadFse,2015 Fisher, and Mobahi lsdFr o oeGusa Gaussian Some for Form Closed in o hc lsdfr xss hsatceinvestigat article This exists. fa form important cases. closed are which there for expression, tions form convol closed this a Although have always optimization. visi numerical computer and equations, cessing, differential as such contexts many h ovlto fafnto iha stoi asinapp Gaussian isotropic an with function a of The optrSine&AtfiilItliec Lab. Intelligence Artificial & Science Computer oe ta. 1990 al., et Foley h ovlto fafnto iha stoi asin aka Gaussian, isotropic an with function a of convolution The lnhkffadZee,1976 Zverev, and Blinchikoff ascuet nttt fTechnology of Institute Massachusetts oe 1999 Lowe, [email protected] utn h eea ruet sd,teeaeimpor- are there aside, arguments general the Putting ]. per nmn otxs ndffrnilequations, differential In contexts. many in appears , abig,M,USA MA, Cambridge, .Fnly notmzto hoy ti sdto used is it theory, optimization in Finally, ]. osi Mobahi Hossein ngnrl h eesrs rnfr osnot does transform Weierstrass the general, In numerically Abstract .I inlpoesn,i ensfitr for filters defines it processing, signal In ]. oneik 1984 Koenderink, 1 n rvnigaisn nsam- in aliasing preventing and ] eeteot,mil moti- mainly efforts, Recent . idr 1975 Widder, .I diin deriva- addition, In ]. ssm fsuch of some es to osnot does ution n inlpro- signal on, iyo func- of mily .I computer In ]. mto needs rmation aeaclosed a have asin ears tclpoints itical [ g aga ta. 2015 al., et Maggiar sthey as , m for hms polyno- dfea- nd lti- ]. are able to approximate any arbitrary function (under some mild conditions), up to a desired accuracy. In addition, we present a result which provides the closed form Weierstrass transform of a multivariate function, given the closed form transform of its univariate cases.

Notation. We use x for scalars, x for vectors, . for the ℓ2 norm, and i for √ 1. We also use , for equality by definition. Wek define the isotropic Gaussian − kernel kσ(x) as below.

x 2 dim(x) k k kσ(x) , (√2πσ)− e− 2σ2 .

2 Closed Form Expressions 2.1 Given a multivariate function f : Rn R. We show that the → convolution of f with kσ has a closed form expression. Suppose f consists of m monomials,

m f(x) , aj fj (x) , Xj=1

where fj is the j’th term and aj is its coefficient. Each monomial has the , n pd,j form fj (x) Πd=1xd with xd being the d’th component of the vector x, and pd,j being a nonnegative integer exponent of xd in fj . Then we have,

m m m n pd,j n [f⋆kσ](x)= aj [fj⋆kσ](x) = ajΠd=1 [ d ⋆kσ](xd) = aj Πd=1u(xd,pd,j, σ) . Xj=1  Xj=1   Xj=1

The function u has the following form (see the Appendix for the proof), x u(x, p, σ) , [p ⋆ k ](x) = (iσ)p h ( ) , σ p iσ

where hp is the p’th Hermite polynomial, i.e.,

2 p 2 p x d x hp(x) , ( 1) e 2 e− 2 . − dxp The following table lists u(x, p, σ) for 0 p 10. ≤ ≤

2 p u(x, p, σ) 0 1 1 x 2 σ2 + x2 3 3σ2x + x3 4 3σ4 +6σ2x2 + x4 5 15σ4x + 10σ2x3 + x5 6 15σ6 + 45σ4x2 + 15σ2x4 + x6 7 105σ6x + 105σ4x3 + 21σ2x5 + x7 8 105σ8 + 420σ6x2 + 210σ4x4 + 28σ2x6 + x8 9 945σ8x + 1260σ6x3 + 378σ4x5 + 36σ2x7 + x9 10 945σ10 + 4725σ8x2 + 3150σ6x4 + 630σ4x6 + 45σ2x8 + x10

2 3 4 4 Example Let f(x, y) , x y +λ(x +y ). Then, using the table, [f ⋆kσ](x, y) has the following form,

2 2 2 3 4 2 2 4 4 2 2 4 [f ⋆ kσ] = (σ + x )(3σ y + y )+ λ(3σ +6σ x + x +3σ +6σ y + y ) .

2.2 Gaussian Radial Basis Functions Isotropic Gaussian Radial-Basis-Functions (RBFs) are general function approx- imators [Hartman et al., 1990]. That is, any function can be approximated to a desired accuracy by the following form,

2 m kx−xj k − 2δ2 f(x) , aj e j , Xj=1 n where xj R is the center and δj R+ is the width of the j’th basis function, for j∈ = 1,...,m. We have shown∈ in [Mobahi et al., 2012] that the convolution of f with kσ has the following form,

2 m kx−xj k δj n − 2(δ2 +σ2) [f ⋆ kσ](x)= aj ( ) e j . 2 2 Xj=1 δj + σ q In order to be self-contained, we include our proof as presented in [Mobahi et al., 2012] in the appendix of this article.

2.3 Trigonometric Functions

σ2 sin(x) e− 2 sin(x) σ2 cos(x) e− 2 cos(x) 2 1 2σ2 sin (x) 2 1 e− cos(2x) 2 1 − 2σ2  cos (x) 2 1+ e− cos(2x) 2σ2 sin(x)cos(x) e− sin(x)cos(x) 

3 Example Let f(x, y) , 2 cos(x) 3 sin(y)+4cos(x) sin(y) 8 sin(x)cos(y)+ 2 − − 4 . Compute [f ⋆ kσ](x, y).  We expand the quadratic form and then use the table, as shown below.

2 2 cos(x) 3 sin(y)+4cos(x) sin(y) 8 sin(x)cos(y)+4 (1) − − = 4cos2(x) + 9 sin2(y)+16cos2(x) sin2(y) + 64 sin2(x)cos2(y)+16 (2) 12 cos(x) sin(y)+16cos2(x) sin(y) 32 cos(x) sin(x)cos(y) (3) − − +16 cos(x) 24 sin(y) cos(x) + 48 sin(y) sin(x)cos(y) 24 sin(y) (4) − − 64 cos(x) sin(y) sin(x) + 32cos(x) sin(y) 64 sin(x)cos(y) (5) − − 2.4 Functions with Linear Argument T This result tells us about the form of [f  x ⋆ kσ()](w) given [f ⋆ kσ](y). Lemma 1 Let f : R R have well-defined  Gaussian convolution f˜(y ; σ) , → T [f ⋆kσ](y). Let x and w be any real vectors of the same length. Then [f  x ⋆ T kσ()](w)= f˜(w x ; σ x ).  We now use this resultk ink the following examples. These examples are related to the earlier Table about the smoothed response functions.

Example Suppose f(y) , sign(y). It is easy to check that f˜(y; σ) = erf( y ). √2σ T Applying the lemma to this implies that [sign(T x)⋆k ()](w) = erf( w x ). σ √2σ x k k

Example Suppose f(y) , max(0,y). It is easy to check that f˜(y; σ) = y2 σ 1 y T e− 2σ2 + y 1+erf( ) . Applying the lemma to this implies that [max(0,  x)⋆ √2π 2 √2σ T 2 (w x)  T σ x 2 2 1 T w x k ()](w)= k k e− 2σ kxk + w x 1+erf( ) . σ √2π 2 √2σ x k k  Example Suppose f(y) , sin(y). From trigonometric table in 2.3 we know 2 σ T f˜(y; σ) = e− 2 sin(y). Applying the lemma to this implies that [sin( x) ⋆ 2 2 σ kxk T kσ()](w)= e− 2 sin(w x). In particular, when x , (x, 1), then smoothed σ2(1+x2) version of sin(w1x + w2) w.r.t. w is e− 2 sin(w1x + w2).

References

[Blinchikoff and Zverev, 1976] Blinchikoff, H. and Zverev, A. (1976). Filtering in the time and frequency domains. R.E. Krieger Pub. Co. [Foley et al., 1990] Foley, J. D., van Dam, A., Feiner, S. K., and Hughes, J. F. (1990). Computer Graphics: Principles and Practice (2Nd Ed.). Addison- Wesley Longman Publishing Co., Inc., Boston, MA, USA.

4 [Hartman et al., 1990] Hartman, E., Keeler, J. D., and Kowalski, J. M. (1990). Layered neural networks with gaussian hidden units as universal approxima- tions. Neural Comput., 2(2):210–215. [Koenderink, 1984] Koenderink, J. J. (1984). The structure of images. Biological Cybernetics, 50(5):363–370. [Lowe, 1999] Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, ICCV ’99. [Maggiar et al., 2015] Maggiar, A., Waechter, A., Dolinskaya, I. S., and Staum, J. (2015). A -free trust-region algorithm for the optimization of functions smoothed via gaussian convolution using adaptive multiple impor- tance sampling. Technical Report IEMS Department, Northwestern Univer- sity. [Mobahi and Fisher, 2015] Mobahi, H. and Fisher, III, J. W. (2015). On the link between gaussian homotopy continuation and convex envelopes. In Lecture Notes in Computer Science (EMMCVPR 2015). Springer. [Mobahi et al., 2012] Mobahi, H., Zitnick, C. L., and Ma, Y. (2012). Seeing through the blur. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2012), Rhode Island, USA. IEEE Computer Society. [Widder, 1975] Widder, D. V. (1975). The . Academic Press.

5 6 Appendices

A Polynomials

p p x , Here we prove that [ ⋆ kσ](x) = (iσ) hp( iσ ), where i √ 1 and hp is the 2 2 − x dp x , p 2 2 p’th Hermite polynomial hp(x) ( 1) e dxp e− . Using F we proceed as below, −

p [ ⋆ kσ](x) (6) 1 p = F − F [ ⋆ kσ](x) (7) 1{ { p }} = F − F x F kσ(x) (8) { { } { }} 1 p (p) 1 1 σ2ω2 = F − √2π( i) δ (ω) e− 2 (9) { −  √2π } p 1 (p) 1 1 σ2ω2 = √2π( i) F − δ (ω) e− 2 (10) − {  √2π } p (p) 1 1 σ2ω2 iωx = √2π( i) δ (ω) e− 2 e dω (11) − ZR √2π p p d 1 1 σ2ω2+iωx √ 2 = 2π( i) p e− (12) − dω √2π ω=0 p p d 1 1 σ2ω2+iωx √ 2 = 2π( i) p e− (13) − dω √2π ω=0 p d 1 2 2 p 2 σ ω +iωx = ( i) p e− (14) − dω ω=0 (ω− 1 ix)2 p x2 σ2 p d 2σ2 2 − − σ2 = ( i) p e (15) − dω ω=0 ω 1 ix 2 ( − 2 ) 2 p σ p x d − 2 = ( i) e− 2σ2 e σ2 (16) − dωp ω=0  2 1 2 2 σ (ω− ix) p x 1 p σ2 = ( i) e− 2σ2 hp(σ(ω ix))( σ) e− 2 (17) −  − σ2 − ω=0 x = (iσ)ph ( ) . (18) p iσ In (9) we use the fact that F xp = √2π( i)pδ(p)(ω), where δ(p) denotes the p’th derivative of Dirac’s delta function.{ } In (−15) we use completing the square. Finally, note that change of variable x (x µ)/α in the Hermite polyno- x µ 2 → −x µ 2 x µ ( − ) dp ( − ) , p 2α2 2α2 mials leads to hp( −α ) ( α) e dxp e− . In particular, by setting − σ2(ω− 1 ix)2 σ2 2 1 p 2 α = 1/σ and µ = (ix)/σ we obtain hp(σ(ω σ2 ix))( σ) e− = (ω− 1 ix)2 − − σ2 p 2 1 d − σ2 dωp e , which is used in (17).

7 B Gaussian Radial Basis Functions

In order to prove our main claim, we will need the following proposition.

Proposition 0 The following identity holds for the product of two Gaussians.

2 kµ1−µ2k 2(σ2 +σ2) 2 2 2 2 e− 1 2 σ µ + σ µ σ σ k(x µ ; σ2) k(x µ ; σ2)= k(x 2 1 1 2 ; 1 2 ) . 1 1 2 2 2 2 m 2 2 2 2 − − ( 2π(σ1 + σ2)) − σ1 + σ2 σ1 + σ2 p Proof

2 2 k(x µ1; σ1) k(x µ2; σ2) − −2 2 kx−µ1k kx−µ2k 1 σ2 1 σ2 = e− 2 1 e− 2 2 m m (σ1√2π) (σ2√2π) 2 2 kx−µ1k kx−µ2k 1 σ2 σ2 − 2 1 − 2 2 = m e (2πσ1σ2)

σ2σ2 µ µ kx− 1 2 ( 1 + 2 )k2 σ2 σ2 σ2 σ2 2 1 + 2 1 2 kµ1−µ2k σ2σ2 2(σ2+σ2) − 2 1 2 − 1 2 1 σ2+σ2 1 2 = m e (19) (2πσ1σ2) 2 kµ1−µ2k 2(σ2+σ2) 2 2 2 2 e− 1 2 σ µ + σ µ σ σ = k(x 2 1 1 2 ; 1 2 ) . 2 2 m 2 2 2 2 ( 2π(σ1 + σ2 )) − σ1 + σ2 σ1 + σ2 p Note that (19) is derived by completing the square. 

We are now ready to state the main claim in the following lemma.

2 kx−xj k p δ2 − 2 j Lemma 1 Suppose f = j=1 aj e . Then the following identity holds. P 2 p kx−xj k δj n − 2(δ2 +σ2) [f ⋆ kσ](x)= aj ( ) e j . 2 2 Xj=1 δj + σ q

8 Proof

[f ⋆ kσ](x) , f(y)kσ (x y) dy ZRn − 2 p ky−xj k − 2δ2 = aj e j kσ(x y) dy ZRn −  Xj=1  2 p ky−xj k − 2δ2 = aj e j kσ(x y) dy ZRn − Xj=1   p √ n = aj (δj 2π) kδj (y xj )kσ(x y) dy ZRn − − Xj=1   2 kxj −xk p − 2(δ2 +σ2) 2 2 e j σ xj + δ x = a (δ √2π)n k (y j ) (20)dy j j δ2σ2 2 2  ZRn 2 2 n j − δj + σ  Xj=1 ( 2π(δj + σ )) δ2+σ2 q s j 2 p kxj −xk 2x 2x δj n 2(δ2 +σ2) σ j + δj = a ( ) e− j k (y ) dy j δ2σ2 2 2 2 2  ZRn j − δj + σ  Xj=1 δj + σ δ2+σ2 q s j 2 p kxj −xk δj n − 2(δ2 +σ2) = aj ( ) e j , 2 2 Xj=1 δj + σ q where in (20) we use the Gaussian product result from proposition 0. 

9 C Functions with Linear Argument

In order to prove the lemma, we first need the following proposition.

Proposition 2 The following identity holds,

  [kδ(a + b) ⋆ kσ( )](x)= k√δ2+a2σ2 (ax + b) . Proof We first claim that the following identity is true,

1 aσ2(at + b)+ δ2(t x) kδ(at+b) kσ(x t) dt = k√ 2 2 2 (ax+b) erf( − )+f(a,b,x) . Z − 2 δ +a σ δσ 2(δ2 + a2σ2) p (21) If the claim holds, the proposition can be easily proved as below,

[kδ(a  + b) ⋆ kσ(  )](x) (22) ∞ = kδ(at + b) kσ(x t) dt (23) Z − −∞

= lim kδ(at + b) kσ(x t) dt lim kδ(at + b) kσ(x t) dt (24) t Z t Z →∞ − − →−∞ − 1 aσ2(at + b)+ δ2(t x) = lim k√ 2 2 2 (ax + b) erf( − ) (25) t δ +a σ 2 2 2 →∞ 2 δσ 2(δ + a σ ) 1 aσp2(at + b)+ δ2(t x) lim k√ 2 2 2 (ax + b) erf( − ) (26) t δ +a σ 2 2 2 − →−∞ 2 δσ 2(δ + a σ ) 1 1 p = k 2 2 2 (ax + b) (1) k 2 2 2 (ax + b) ( 1) (27) 2 √δ +a σ − 2 √δ +a σ − = k√δ2+a2σ2 (ax + b) , (28)

where (27) uses the fact that limx erf(x) = 1. Thus, the following →±∞ ± focuses on proving (21), i.e. showing that kδ(at + b) kσ(x t) is anti-derivative 1 aσ2(at+b)+δ2(t x) − of k√ 2 2 2 (ax + b) erf( − ). We do this by differentiating the 2 δ +a σ δσ√2(δ2+a2σ2) latter and showing that it equals to the former.

10 d 1 aσ2(at + b)+ δ2(t x) k√ 2 2 2 (ax + b) erf( − ) (29) dt 2 δ +a σ δσ 2(δ2 + a2σ2) 1 d aσ2(atp+ b)+ δ2(t x) = k√ 2 2 2 (ax + b) erf( − ) (30) 2 δ +a σ dt δσ 2(δ2 + a2σ2) p 2 2 1 2(δ2 + a2σ2) ( aσ (at+b)+δ (t−x) )2 − δσ√2(δ2 +a2σ2) = k 2 2 2 (ax + b) e (31) 2 √δ +a σ p √πδσ

2 2 2 1 1 (ax+b) 2(δ2 + a2σ2) ( aσ (at+b)+δ (t−x) )2 = e− 2(δ2+a2σ2) e− δσ√2(δ2 +a2σ2) (32) 2 2π(δ2 + a2σ2) p √πδσ p1 (at+b)2 1 (x−t)2 = e− 2δ2 e− 2σ2 (33) √2πδ √2πσ = kδ(at + b) kσ(x t) , (34) − d 2 x2 where in (31) we used the identity dx erf(x)= √π e− . 

Now we are ready to prove the lemma.

Lemma 3 Let f : R R have well-defined Gaussian convolution f˜(y ; σ) , → T [f ⋆kσ](y). Let x and w be any real vectors of the same length. Then [f  x ⋆ T kσ()](w)= f˜(w x ; σ x ).  k k

Proof Using the trivial identity f h(w) = R f(y)δ y h(w) dy for any h : Rn R, we can proceed as below,  R −  →

[f h( . ) ⋆ kσ](w) (35)  = f h(t) kσ(w t) dt (36) Z − W  = f(y)δ y h(t) dy kσ(w t) dt (37) Z ZR − − W  = f(y) δ y h(t) kσ(w t) dt dy . (38) ZR Z − − W 

We now focus on computing the inner integral δ y h(t) kσ(w t) dt. , n RW −  − Applying the definition of h(w) k=1 xkwk we proceed as the following, P

11 n δ y xktk kσ(w t) dt (39) Z − − W kX=1  n = lim kǫ y xktk kσ(w t) dt (40) ǫ 0 Z − − → W kX=1  n

= lim ... kǫ y xktk kσ(w1 t1) dt1 ... kσ(wn tn) (41)dtn ǫ 0 Z Z n  − −  − → W W1 kX=1  n

= lim ... k 2 2 2 y x1w1 xktk ... kσ(wn tn) dtn (42) ǫ 0 Z √ǫ +σ x1 n  − −  − → W kX=2  n

= lim k ǫ2+σ2 n x2 y xkwk (43) ǫ 0 √ k=1 k − → kX=1  P T = kσ x y w x . (44) k k −  Plugging this into (38) leads to the identity [f h( . ) ⋆ kσ](w) T = R f(y)kσ x y w x dy, which can be equivalently  expressed as below, R k k − 

[f h( . ) ⋆ kσ](w) (45)  T = f(y)kσ x y w x dy (46) Z k k R −  T = [f ⋆ kσ x ](w x) (47) k k = f˜(wT x ; σ x ) . (48) k k 

12