Pattern Recognition 44 (2011) 278–283

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/pr

On Euclidean norm approximations

M. Emre Celebi a,Ã, Fatih Celiker b, Hassan A. Kingravi c a Department of Computer Science, Louisiana State University, Shreveport, LA, USA b Department of Mathematics, Wayne State University, Detroit, MI, USA c Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA article info abstract

Article history: Euclidean norm calculations arise frequently in scientific and engineering applications. Several Received 25 May 2010 approximations for this norm with differing complexity and accuracy have been proposed in the Accepted 23 August 2010 literature. Earlier approaches [1–3] were based on minimizing the maximum error. Recently, Seol and Cheun [4] proposed an approximation based on minimizing the average error. In this paper, we first Keywords: examine these approximations in detail, show that they fit into a single mathematical formulation, and Euclidean norm compare their average and maximum errors. We then show that the maximum errors given by Seol and Approximation Cheun are significantly optimistic. & 2010 Elsevier Ltd. All rights reserved.

1. Introduction from L2 significantly. The Minkowski metric is translation n invariant, i.e. Lpðx,yÞ¼Lpðxþz,yþzÞ for all x,y,zAR , hence it

The Minkowski (Lp) metric is inarguably one of the most suffices to consider DpðxÞ¼Lpðx,0Þ, i.e. the distance from the point commonly used quantitative distance (dissimilarity) measures in x to the origin. Therefore, in the rest of the paper, we will consider scientific and engineering applications. The Minkowski distance approximations to Dp(x) rather than Lp(x,y). between two vectors x¼(x1,x2,y,xn) and y¼(y1,y2,y,yn) in the In this paper, we examine several approximations to the n-dimensional Euclidean space, Rn, is given by Euclidean norm. The rest of the paper is organized as follows. ! In Section 2 we describe the Euclidean norm approximations that Xn 1=p have appeared in the literature, and compare their average and L ðx,yÞ¼ jx y jp : ð1Þ p i i maximum errors using numerical simulations. We then show that i ¼ 1 all of these methods fit into a single mathematical formulation. Three special cases of the Lp metric are of particular interest, In Section 3 we examine the simulation results from a theoretical namely, L1 (city- metric), L2 (Euclidean metric), and L1 perspective. Finally, in Section 4 we provide our conclusions. ( metric). Given the general form (1), L1 and L2 can be defined in a straightforward fashion, while L1 is defined as

L1ðx,yÞ¼ max jxiyij: 1 r i r n 2. Euclidean norm approximations In many applications, the data space is Euclidean and therefore For reasons explained in Section 1, we concentrate on the L2 metric is the natural choice. In addition, this metric has the approximations to the Euclidean norm D on n. Let D~ , defined advantage of being isotropic (rotation invariant). For example, 2 R on n, be an approximation to D . We assume that D~ is a when the input vectors stem from an isotropic vector field, e.g. R 2 continuous homogeneous function. We note that all variants of D~ a velocity field, the most appropriate choice is to use the L2 metric we consider in this paper satisfy these assumptions. As a measure so that all vectors are processed in the same way, regardless of of the quality of the approximation of D~ to D we define the their orientation [3]. 2 maximum relative error (MRE) as The main drawback of L2 is its high computational require- ments due to the multiplications and the square root operation. ~ D~ jDðxÞD2ðxÞj As a result, L1 and L1 are often used as alternatives. Although emax ¼ sup : ð2Þ A n\ D2ðxÞ these metrics are computationally more efficient, they deviate x R f0g ~ Using the homogeneity of D2 and D, (2) can be written as à Corresponding author. D~ ~ E-mail addresses: [email protected] (M.E. Celebi), [email protected] emax ¼ sup jDðxÞ1j, ð3Þ x A Sn-1 (F. Celiker), [email protected] (H.A. Kingravi). 2

0031-3203/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2010.08.028 M.E. Celebi et al. / Pattern Recognition 44 (2011) 278–283 279 where 2.2. Rhodes’ approximations

Sn-1 ¼fxA n : D ðxÞ¼1g 2 R 2 Rhodes [2] reformulated (6) as a maximum of linear functions n () is the unit hypersphere of R with respect to the Euclidean norm. Xn Furthermore, by the continuity of D~ , we can replace the DlðxÞ¼ max ð1lÞjxjjþl jxij , 1 r j r n supremum with maximum in (3) and write i ¼ 1 where 0olo1 is taken as a free parameter. He determined the D~ ~ emax ¼ max jDðxÞ1j: ð4Þ Dl x A Sn-1 optimal value for l by minimizing emax ¼ max A n1 jDlðxÞ1j 2 x S2 analytically. In particular, he showed that optimal lu values for We will use (4) as the definition of MRE throughout. Chaudhuri et al.’s norm can be determined by solving the ~ D~ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi In the trivial case where D ¼ D2 we have e ¼ 0. Hence, for max equation 12 luðluÞ2 ¼ 1þðluÞ2ðn1Þ1 in the interval nontrivial cases we wish to have a small eD~ value. In other max (0, 1/2). This equation is a quartic (fourth order) in l and can be D~ words, the smaller the value of emax, the better (more accurate) solved using Ferrari’s method [5]. It can be shown that this ~ the corresponding approximation D. It can be shown that D1 particular quartic equation has two real and two complex roots (city-block norm) overestimates D2 and the corresponding MRE is u pffiffiffi and the optimal l value is given by the smaller of the real roots. D1 given by e ¼ n1 [1]. In contrast, D1 (chessboard norm) The corresponding MRE is given by [2] max pffiffiffi D1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi underestimates D2 with MRE given by emax ¼ 11= n [1]. More Dl 2 explicitly, emax ¼ 12 luðluÞ : ð7Þ pffiffiffi In the remainder of this paper, D refers to this improved variant D2ðxÞrD1ðxÞr nD2ðxÞ, l of Chaudhuri et al.’s norm. pffiffiffi Rhodes also investigated the two-parameter family of approx- ð Þ ð Þ ð Þ ð ÞðÞ 1= n D2 x rD1 x rD2 x 5 imations given by n for all xAR . Therefore, it is natural to expect a suitable linear Dm,lðxÞ¼ðmlÞD1ðxÞþlD1ðxÞ, ð8Þ of D1 and D1 to give an approximation to D2 better where 0olom. He proved that the optimal solution and its MRE than both D1 and D1 [2]. in this case are given by ffiffiffi 2 p l ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffi , m ¼ð nþ1Þl , 2.1. Chaudhuri et al.’s approximation 2n1=4 þ 2nþ2 n

1 Chaudhuri et al. [1] proposed the approximation Dm,l 1=4 emax ¼ 12l n : ð9Þ Xn 1 DlðxÞ¼jximax jþl jxij with l ¼ : i ¼ 1 n2 Finally, Rhodes investigated the Dm,l approximations with i a i n max 2 0rmol. He proved that the optimal solution and its MRE are given by Here imax is the index of the absolute largest component of x, i.e. 2 i ¼ argmax ðjx jÞ, and bxc is the floor function which ffiffiffiffiffiffiffiffiffiffi Dm,l max 1 r i r n i l ¼ p , m ¼ 0, emax ¼ 1l : returns the largest integer less than or equal to x. Since 1þ n1 ð Þ¼j j j j D1 x ximax , by adding and subtracting the term l ximax , This approximation will not be considered any further since its D can be written as a linear combination of D and D as l 1 1 accuracy is inferior to even the single-parameter approximation Dl. n It should be noted that Rhodes optimized Dl and Dm,l over Z . DlðxÞ¼ð1lÞD1ðxÞþlD1ðxÞ: ð6Þ Therefore, these norms are in fact suboptimal on Rn (see Section 2.5). n It is easy to see that D1ðxÞrDlðxÞrD1ðxÞ for all xAR since 0 l 0:5. It can also be shown that [1] for sufficiently large n, D o r l 2.3. Barni et al.’s approximation is closer to D2 than both D1 and D1, i.e. jDlðxÞ D2ðxÞjrjD1ðxÞD2 n ðxÞj and jDlðxÞD2ðxÞjrjD2ðxÞD1ðxÞj for all xAR . Barni et al. [3,6] formulated a generic approximation for D2 as For sufficiently large n, Dl underestimates D2 and the corresponding MRE is given by Xn DBðxÞ¼d aixðiÞ, i ¼ 1 1lðn1Þ D 1 pffiffiffi re l r1l for nZ3: n max where x(i) is the i-th absolute largest component of x, i.e. (x ,x ,y,x ) is a permutation of ðjx j,jx j, ...,jx jÞ such that Otherwise, D overestimates D and we have (1) (2) (n) 1 2 n l 2 x Zx Z Zx . Here ¼ð , , ..., Þ and d40 are approx- 8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ ð2Þ ðnÞ a a1 a2 an > imation parameters. Note that a non-increasing ordering > 4ðn1Þ > 1þ 1 for n ¼ 2,4,6, ..., and strict positivity of the component weights, i.e. a Za Z < ðnþ2Þ2 1 2 D e l ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Zan 40 is a necessary and sufficient condition for DB to define max > > 4ðn1Þ a norm [6]. :> 1þ 1 for n ¼ 3,5,7, ... : ðnþ3Þ2 The minimization of (4) is equivalent to determining the weight vector a and the scale factor d that solve the following Proofs of these identities can be found in [1]. minimax problem [6,7]:

min maxjDBðxÞ1j, ð10Þ a,d x A V 1 Unfortunately the motivation behind this particular choice of l is not given n in the paper. where V ¼fxAR : x1 Zx2 Z Zxn Z0,D2ðxÞ¼1g. 280 M.E. Celebi et al. / Pattern Recognition 44 (2011) 278–283

The optimal solution and its MRE are given by Table 2 pffiffi pffiffiffiffiffiffiffiffi Operation counts for the norms. 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DB ai ¼ i i1, d ¼ P , emax ¼ 1d : ð11Þ n 2 Norm ABS COMP ADD MULT SQRT 1þ i ¼ 1 ai D nn1000 It should be noted that a similar but less rigorous approach had 1 D1 n 0 n10 0 been published earlier by Ohashi [8]. D2 00 n1 n 1 Dl nn1 n11 0 Dm,l nn1 n 20

2.4. Seol and Cheun’s approximation DB n OðnlognÞ n1 n 0

Da,b nn1 n 20 Seol and Cheun [4] recently proposed an approximation of the form

Da,bðxÞ¼aD1ðxÞþbD1ðxÞ, ð12Þ where a and b are strictly positive parameters to be determined by solving the following 2 2 linear system:

2 aEðD1ÞþbEðD1D1Þ¼EðD2D1Þ,

2 aEðD1D1ÞþbEðD1Þ¼EðD2D1Þ, where EðÞ is the expectation operator.

Note that the formulation of Da,b is similar to that of Dm,l (8) in that they both approximate D2 by a linear combination of D1 and D1. These approximations differ in their methodologies for finding the optimal parameters. Rhodes follows an analytical approach and derives theoretical values for the parameters and the maximum error. However, he achieves this by sacrificing maximization over Rn, and maximizes only over Zn. Seol and Cheun follow an empirical approach where they approximate optimal parameters over Rn, which causes them to sacrifice the ability to obtain analytical values for the parameters and the maximum error. They estimate the optimal values of a and b using Fig. 1. Maximum relative errors for Dl, Dm,l, and DB. 100,000 n-dimensional vectors whose components are indepen- dent and identically distributed, standard Gaussian random variables. more accurate than Dl when n is small, the difference between the two approximations becomes less significant as n is increased. The operation counts for each norm are given in Table 2 2.5. Comparison of the Euclidean norm approximations (ABS: absolute value, COMP: comparison, ADD: addition, MULT: multiplication, SQRT: square root). The following conclusions can It is easy to see that all of the presented approximations fit into be drawn: the general form

Xn DB has the highest computational cost among the approximate ~ DðxÞ¼ wixðiÞ, norms due to its costly weighting scheme, which requires i ¼ 1 sorting of n numbers and n multiplications. For small values of n, sorting can be performed most efficiently by a sorting which is a weighted D1 norm. The component weights for each approximation are given in network [9]. For large values of n, sorting requires OðnlognÞ comparisons, which is likely to exceed the cost of the square Table 1. It can be seen that DB has the most elaborate design in which each component is assigned a weight proportional to its root operation [6]. Therefore, in high dimensional spaces, ranking. However, this weighting scheme also presents a - e.g. n49 [4], DB provides no computational advantage over D2. back in that a full ordering of the component absolute values is Dl has the lowest computational cost among the approximate required (see Table 2). norms. Dm,l and Da,b have the same computational cost, which is slightly higher than that of Dl. Due to their formulations, the MREs for Dl, Dm,l, and DB can be calculated analytically using (7), (9), and (11), respectively. A significant advantage of Dl, Dm,l, and Da,b is that they require In Fig. 1 we plot the theoretical errors for these norms for a fixed number of multiplications (1 or 2) regardless of the n 100. It can be seen that D is not only more accurate than D value of n. r B l 2 Dl, Dm,l, and Da,b can be used to approximate D2 (squared and Dm,l, but it also scales significantly better. Although Dm,l is Euclidean norm) using an extra multiplication. On the other 2 hand, the computational cost of DB is higher than that of D2 Table 1 Weights for the approximate norms. due to the extra absolute value and sorting operations involved.

Norm w1 wi a 1

In Table 3 we display the average and maximum errors for Dl, Dl 1 lu Dm,l, and DB for nr10. Average relative error (ARE) is defined as Dm,l m l X DB d d a i D~ 1 ~ eavg ¼ jDðxÞ1j, ð13Þ Da,b a+bbjSj x A S M.E. Celebi et al. / Pattern Recognition 44 (2011) 278–283 281

Table 3

Average and maximum errors for Dl, Dm,l, and DB.

nDl Dm,l DB

ARE MREe MREt ARE MREe MREt ARE MREe MREt

2 0.0348 0.0551 0.0551 0.0276 0.0470 0.0470 0.0241 0.0396 0.0396 3 0.0431 0.0852 0.0852 0.0367 0.0778 0.0778 0.0300 0.0602 0.0602 4 0.0455 0.1074 0.1074 0.0420 0.1010 0.1010 0.0345 0.0739 0.0739 5 0.0460 0.1251 0.1251 0.0447 0.1197 0.1197 0.0377 0.0839 0.0839 6 0.0458 0.1400 0.1400 0.0462 0.1354 0.1354 0.0401 0.0919 0.0919 7 0.0454 0.1529 0.1529 0.0469 0.1489 0.1490 0.0418 0.0984 0.0984 8 0.0448 0.1641 0.1643 0.0471 0.1606 0.1609 0.0431 0.1039 0.1039 9 0.0442 0.1739 0.1745 0.0471 0.1709 0.1716 0.0440 0.1086 0.1086 10 0.0435 0.1827 0.1837 0.0469 0.1803 0.1812 0.0447 0.1128 0.1128

lower than those that we obtained and the discrepancy between Table 4 the outcomes of the two error calculation schemes increases as n Average and maximum errors for Da,b. is increased. The optimistic maximum error values given by Seol

n Seol and Cheun This study and Cheun are due to the fact that 100,000 vectors are not enough to cover the surface of the hypersphere in higher dimensions. This

ARE MREe ARE MREe is investigated further in the following section. On the other hand, the average error values agree perfectly in both calculation 2 0.0200 0.0526 0.0200 0.0525 schemes. 3 0.0239 0.0991 0.0239 0.0998 4 0.0257 0.1342 0.0257 0.1363 By examining Tables 3 and 4 the following observations can be 5 0.0268 0.1420 0.0268 0.1649 made regarding the maximum error: 6 0.0273 0.1674 0.0273 0.1871 7 0.0276 0.1772 0.0276 0.1968 8 0.0277 0.1753 0.0277 0.2076 DB is the most accurate approximation in all cases. This is 9 0.0277 0.1711 0.0277 0.2120 because this norm is designed to minimize the maximum error 10 0.0276 0.1526 0.0276 0.2156 and it has a more sophisticated weighting scheme than the

other two approximations, i.e. Dl and Dm,l, that are based on the same optimality criterion. n1 where S is a finite subset of the unit hypersphere S2 , and jSj As is also evident from Fig. 1, Dm,l is slightly more accurate denotes the number of elements in S. An efficient way to pick a than D especially for small values of n, in accordance with the n1 l random point on S2 is to generate n independent Gaussian greater degrees of freedom it is afforded. random variables x1,x2,y,xn with zero mean and unit variance. Da,b is the least accurate approximation except for n¼2. This The distribution of the unit vectors was expected since this norm is designed to minimize the 8 , ! 9 < Xn 1=2 = mean squared error rather than the maximum error. ¼ð Þ ¼ 2 ¼ As n is increased, the error increases in all approximations. :y y1,y2, ...,yn : yi xi xi , i 1,2, ...,n; i ¼ 1 However, as can be seen from Fig. 1, the error grows faster in some approximations than others. will then be uniform over the surface of the hypersphere [10]. For each approximate norm, the ARE and MRE values were On the other hand, with respect to average error we can see calculated over an increasing number of points, 220,221,y,232 1 that: (that are uniformly distributed on the hypersphere) until the error values converge, i.e. the error values do not differ by more than 5 e ¼ 10 in two consecutive iterations. Note that for each norm, As expected, Da,b is the most accurate approximation. two types of maximum error were considered: empirical max- As n is increased, the error increases consistently for the DB norm. This is not the case for the D and D norms. This imum error (MREe), which is calculated numerically over S and l m,l inconsistent average error behavior is not surprising given the the theoretical maximum error (MREt), which is calculated fact that these norms are designed to minimize the maximum analytically using (7), (9), or (11). It can be seen that for DB, the empirical and maximum errors agree in all cases, which error. demonstrates the validity of the presented iterative error Interestingly, Dl is more accurate than Dm,l for n45. A possible explanation to this phenomenon is that both approximations calculation scheme. This is not the case for Dl and Dm,l since these norms are optimized over Zn instead of Rn. Therefore, a are optimized for the maximum error. Since the minimization perfect agreement between the empirical and theoretical results of the maximum and average errors are conflicting objectives, should not be expected. Nevertheless, the empirical error is it is likely that Dm,l sacrifices the average error to obtain better always less than the maximum error, which is expected because (lower) maximum error. The same relationship holds between we are maximizing over a smaller set. Da,b and DB.

Table 4 shows the average and maximum errors for Da,b. The error values under the column ‘‘Seol and Cheun’’ are taken from 3. Sampling on the unit hypersphere [4] (where the simulations were performed on a set of 100,000 n- dimensional vectors whose components are independent and In this section, we demonstrate why a fixed number of samples identically distributed, zero mean, and unit variance Gaussian from the unit hypersphere (i.e. the approach advocated in [4]) can random variables), whereas those under the column ‘‘this study’’ give biased estimates for the maximum error. The basic reason were obtained using the aforementioned iterative scheme. It can behind this is the fact that a fixed number of samples fail to suffice be seen that the maximum errors obtained by Seol and Cheun are as the dimension of the space increases. The following provides a 282 M.E. Celebi et al. / Pattern Recognition 44 (2011) 278–283 plausibility argument as to why this is the case. To this end, we In the light of the following result, we see that the actual need to consider the notion of covering a sphere ‘‘sufficiently’’. We number of samples required does not deviate significantly from begin with some definitions. the value provided by Theorem 1. A closed n-ball of radius r with respect to the Euclidean norm, n Theorem 2. Let X be the number of samples observed before denoted B2(r), is the set of points whose Euclidean norm is less than or equal to r. That is, obtaining one in each region. Then, for any constant s40 we have PðX 4clncþscÞ es: n A n o B2ðrÞ¼fx R : D2ðxÞrrg:

n1 n Note that, in particular, the unit hypersphere S2 of R is the n boundary of B2(1). Proof. The probability of not obtaining the ith region after n1 Given an e40, we say that a set C of points on S2 is an c ln c+sc steps is n1  e-dense covering of S2 if for any x in C, there exists at least one x^ cðlnc þ sÞ 1 ðlnc þ sÞ 1 in C (different than x) such that D2ðxx^ Þoe. Essentially, our 1 oe ¼ : c esc main purpose here is to give a rough estimate of the number of n1 points in C, where C is an e-dense covering of S2 . We would then By a union bound, the probability that a region has not been argue that, if e is sufficiently small then C is a fine-enough obtained after c ln c+sc steps is only es. & n1 representation of points on S2 . Therefore, we can restrict n1 Note that one can use a Chernoff bound to obtain an even any computation that needs to be performed on S2 to the finite tighter bound in Theorem 2 since set C. n1 es The basic idea behind the proof is to approximate S2 by lim PðX 4clncþscÞ¼1e : - n1 n c 1 B2 ðeÞ, that is, approximate the unit hypersphere of R by (n1)- balls of radius e. This is the same principle as approximating a See [11] for details. 1 2 1 We should note that in order to apply Lemma 1 the patches circle (S2)inR by tiny line segments ðB2ðeÞÞ, or the surface of a 2 3 2 used to cover Sn1 should be disjoint which is clearly not the case sphere (S2)inR by tiny discs ðB2ðeÞÞ. It is easy to see that, if we 2 n1 choose e small enough, then the approximation is satisfactory for since we have used B2 ðeÞ for this purpose. This leads to an most practical purposes. overestimate of the samples needed to obtain a dense covering, To proceed further, we need a lemma from elementary and thus the argument presented in this section is only a rough probability theory, which is known as the coupon collector’s estimate. However, as empirically demonstrated in the previous problem [11]. section, a fixed number of samples as in [4] is definitely not sufficient either. To come up with a tight estimate of the number n1 Lemma 1. Given a collection of c distinct objects, the expected of sample points needed, one has to express S2 as a disjoint number of independent random trials needed to sample each one of union of small patches. The delicacy lies in the requirement that the c objects is OðclogcÞ. this has to be achieved through a constructive process in a way that the surface area of each patch can be explicitly computed as a We can now prove the following result. function of the dimension n, and a characteristic measure e. To the best of the authors’ knowledge there is no systematic method in Theorem 1. The expected number of uniformly distributed samples n1 the literature to achieve this. needed to generate an e-dense covering of S2 is OðN logNÞ where N ¼ n=en1. 4. Conclusions Proof. Let e40 be given. We will first count the number of identical copies of Bn1ðeÞballs that are needed to approximate 2 In this paper, we investigated the theoretical and practical Sn1 in the sense described above. By elementary calculus one can 2 aspects of several Euclidean norm approximations in the compute the volume of an Bn(r)tobe 2 literature and showed that these are in fact special cases of the pn=2 weighted city-block norm. We evaluated the average and V ðrÞ :¼ rn :¼ C rn, n n n maximum errors of these norms using numerical simulations. G þ1 2 Finally, we demonstrated that the maximum errors given in a where G is the gamma function. The surface area of this ball is recent study [4] are significantly optimistic. equal to the derivative with respect to r of its volume: The implementations of the approximate norms described in this paper will be made publicly available at http://www.lsus.edu/ d A ðrÞ¼ V ðrÞ¼nC rn1: faculty/ecelebi/research.htm. n1 dr n n n1 Note that An1(1) is equal to the surface area of S2 . The n1 Acknowledgments approximate number of B2 ðeÞ- balls needed to cover the surface n1 n1 of S2 is the ratio of the surface area of S2 to the volume of n1 This work was supported by a grant from the Louisiana Board B2 ðeÞ, i.e.,  of Regents (LEQSF2008-11-RD-A-12). The authors are grateful to n1 G þ1 Changkyu Seol for clarifying various points about his paper. A ð1Þ nC npn=2 2 1 n1 ¼ n ¼  n1 ðn1Þ=2 n n1 Vn1ðeÞ Cn1e p G þ1 e References 2  np1=2 n ¼ [1] D. Chaudhuri, C.A. Murthy, B.B. Chaudhuri, A modified metric to compute n1 O n1 : e e distance, Pattern Recognition 25 (7) (1992) 667–677. [2] F. Rhodes, On the metrics of Chaudhuri, Murthy and Chaudhuri, Pattern Recognition 28 (5) (1995) 745–752. The result now follows once we apply Lemma 1 with [3] M. Barni, F. Bartolini, F. Buti, V. Cappellini, Optimum linear approximation of c ¼ n=en1. & the Euclidean norm to speed up vector median filtering, in: Proceedings of the M.E. Celebi et al. / Pattern Recognition 44 (2011) 278–283 283

2nd IEEE International Conference on Image Processing (ICIP’95), 1995, [8] Y. Ohashi, Fast linear approximations of Euclidean distance in pp. 362–365. higher dimensions, in: P. Heckbert (Ed.), Graphics Gems IV, Academic [4] C. Seol, K. Cheun, A low complexity Euclidean norm approximation, IEEE Press, 1994. Transactions on Signal Processing 56 (4) (2008) 1721–1726. [9] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, [5] R.B. , Beyond the Quartic Equation, Birkhauser,¨ Boston, 2008. The MIT Press, 2009. [6] M. Barni, F. Buti, F. Bartolini, V. Cappellini, A quasi-Euclidean norm to speed [10] M.E. Muller, A note on a method for generating points uniformly on N- up vector median filtering, IEEE Transactions on Image Processing 9 (10) dimensional spheres, Communications of the ACM 2 (4) (1959) 19–20. (2000) 1704–1709. [11] M. Mitzenmacher, E. Upfal, Probability and Computing: Randomized [7] V.F. Dem’yanov, V.N. Malozemov, Introduction to Minimax, Dover Publica- Algorithms and Probabilistic Analysis, Cambridge University Press, 2005. tions, 1990.