<<

1 Geodesic Entropic Graphs for and Entropy Estimation in Learning Jose A. Costa, Student Member, IEEE, Alfred O. Hero, III, Fellow, IEEE

Abstract— In the manifold learning problem one seeks to manifold and the reconstruction of the manifold from a set of discover a smooth low dimensional , i.e., a manifold samples from the signal class [1]–[7]. This problem falls in the embedded in a higher dimensional linear , based on area of manifold learning which is concerned with discovering a set of measured sample points on the surface. In this paper we consider the closely related problem of estimating the manifold’s low dimensional structure in high dimensional data. intrinsic dimension and the intrinsic entropy of the sample points. When the samples are drawn from a large population Specifically, we view the sample points as realizations of an of signals one can interpret them as realizations from a unknown multivariate density supported on an unknown smooth multivariate distribution supported on the manifold. As this manifold. We introduce a novel geometric approach based on distribution is singular in the higher dimensional embedding entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesic-minimal- space it has zero entropy as defined by the standard Lebesgue spanning-tree (GMST) to obtaining asymptotically consistent integral over the embedding space. However, when defined as a estimates of the manifold dimension and the Renyi´ -entropy Lebesgue integral restricted to the lower dimensional manifold of the sample density on the manifold. The GMST approach is the entropy can be finite. This finite intrinsic entropy can be striking in its simplicity and does not require reconstructing the useful for exploring data compression over the manifold or, manifold or estimating the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree as suggested in [8], clustering of multiple sub-populations on (MST) sequence using a geodesic edge and uses the overall the manifold. The question that we address in this paper is: of the MSTs to simultaneously estimate manifold dimen- how to simultaneously estimate the intrinsic dimension and sion and entropy. We illustrate the GMST approach on standard intrinsic entropy on the manifold given a set of random sample synthetic as well as on real data sets consisting of points? We present a novel geometric probability approach images of faces. to this question which is based on entropic graph methods Index Terms— Nonlinear dimensionality reduction, minimal developed by us and reported in publications [8]–[10]. spanning tree, intrinsic dimension, intrinsic entropy, manifold Techniques for manifold learning can be classified into three learning, conformal embedding. categories: linear methods, local methods, and global methods. Linear methods include principal components analysis (PCA) I. INTRODUCTION [11] and classical multidimensional scaling (MDS) [12]. They are based on analyzing eigenstructure of empirical covariance ONSIDER a class of natural occurring signals, e.g., matrices, and can be reliably applied only when the manifold recorded speech, audio, images, or videos. Such signals C is a linear subspace. Local methods include linear local typically have high extrinsic dimension, e.g., as characterized imbedding (LLE) [2], locally linear projections (LLP) [13], by the number of pixels in an image or the number of time Laplacian eigenmaps [14], and Hessian eigenmaps [3]. They samples in an audio waveform. However, most natural signals are based on local approximation of the of the man- have smooth and regular structure, e.g. piecewise , ifold, and are computationally simple to implement. Global that permits substantial dimension reduction with little or approaches include ISOMAP [1] and C-ISOMAP [15]. They no loss of content information. For support of this fact one preserve the manifold geometry at all scales, and have better needs only consider the success of image, video and audio stability than local methods when the number of manifold compression algorithms, e.g. MP3, JPEG and MPEG, or the samples is limited. widespread use of efficient computational geometry methods

With regards to estimation of the intrinsic dimension ¡ for rendering smooth three dimensional shapes. several methods have been proposed [11], [16]. Most of A useful representation of a regular signal class is to model these methods are based on linear projection techniques: a it as a set of vectors which are constrained to a smooth low is explicitly constructed and dimension is estimated dimensional manifold embedded in a high dimensional vector by applying Principal Component Analysis (PCA), factor space. This manifold may in some cases be a linear, i.e., analysis, or MDS to analyze the eigenstructure of the data. Euclidean, subspace but in general it is a non-linear curved These methods rely on the assumption that only a small surface. A problem of substantial recent interest in machine number of the eigenvalues of the (processed) data covariance learning, , signal processing and statistics is

will be significant. Linear methods tend to overestimate ¡ the determination of the so-called intrinsic dimension of the as they don’t account for non-linearities in the data. Both This work was supported in part by NSF under grant CCR-032557 and by nonlinear PCA [4] methods and the ISOMAP circumvent this the NIH Cancer Institute under grant 1PO1 CA87634-01. problem but they still rely on possibly unreliable and costly J. A. Costa and A. O. Hero, III are with the Department of Electrical and Computer Science, University of Michigan, Ann Arbor, MI eigenstructure estimates. Other methods have been proposed 48109-2122 USA (email: [email protected]; [email protected]). based on local geometric techniques, e.g., estimation of local 2

neighborhoods [6] or fractal dimension [17], and estimating subset  of the plane. Naturally, the shortest path between

¢ ¦¢£ © packing numbers [5] of the manifold. any pair  of these points is given by the straight

We propose a geodesic-minimal-spanning-tree (GMST) in  connecting them, with corresponding

© ¢ "!#¢$ 

method that jointly estimates both the intrinsic dimension and given by its Euclidean ( ) , . Now let

intrinsic entropy on the manifold. The method is implemented be used as a parameterization space to describe a curved

&%  *  %

as follows. First a complete geodesic graph between all pairs surface in via a mapping ')( . Surfaces

. +),

of data samples is constructed, as in ISOMAP or C-ISOMAP. '- defined in this explicit manner are called domain Then a minimal spanning graph, the GMST, is obtained by or parameterized manifolds and they inherit the topological

pruning the complete geodesic graph down to a subgraph dimension, equal to 2 in this case, of the parameterization +

that still connects all points but has minimum total geodesic space. When ' is non-linear, the shortest path on between

 

¢  ¢ 

, ,

 

/ '- / '- length. The intrinsic dimension and intrinsic -entropy are points and is a on the surface then estimated from the GMST length functional using a called the geodesic curve. In this paper we will primarily simple linear least squares (LLS) and method of moments consider domain manifolds defined by conformal mappings

(MOM) procedure. ' . Such conformal embeddings have the property that the The GMST method falls in the category of global ap- angles between tangent vectors to the surface are identical to proaches to manifold learning but it differs significantly from angles between corresponding vectors in the parameterization the aforementioned methods. First, it has a different scope. space, possibly up to a smoothly varying local scale factor.

Indeed, unlike ISOMAP and C-ISOMAP, the GMST method This property guarantees that, regardless of how the mapping



+ +

provides a statistically consistent estimate of the intrinsic ' “deforms” onto , the geodesic in are  entropy in addition to the intrinsic dimension of the mani- closely related to the Euclidean distances in . When this fold. To the best of our knowledge no other such technique smooth surface representation holds there exist algorithms, e.g. has been proposed for learning manifold dimension. Second, ISOMAP and C-ISOMAP [1], [15], which can be used to unlike local methods that work on chunks of data in local estimate the Euclidean distances between points in  from neighborhoods, GMST works on resampled data distributed estimates of the geodesic distances between points in + . over the global data set. Third, the GMST method is simple If a certain type of minimal spanning graph is constructed and elegant: it estimates intrinsic entropy and dimension by using these estimates, well established results in geometrical detecting the rate of increase of a graph as a function of the probability [8], [20] allow us to develop simple estimates of number of its resampled vertices. both entropy and dimension of the points distributed on the The aims of this paper are limited to introducing GMST as a surface. novel method for estimating manifold dimension and entropy of the samples. As in work of others on dimension estimation B. Setting [5], [17] we do not here consider the issue of reconstruction of In the following, we recall some facts from differential the complete manifold. Similarly to these authors, we believe geometry needed to formalize and generalize the ideas just that dimension estimation and entropy estimation for non- described. We will consider smooth manifolds embedded in linear data are of interest in their own right. We also do  0 . For the general theory we refer the reader to any standard not consider the effect of additive noise or outliers on the

book in differential geometry (for example, [21], [22], [23]). 120 performance of GMST. Finally, the consistency results of +

An ¡ -dimensional smooth manifold is a set such that GMST reported here are limited to domain manifolds defined each of its points has a neighborhood that can be parameterized by some smooth unknown mapping. The extension of GMST  3 by an of through a local change of coordinates. methodology to general target manifolds, e.g. those defined by +

Intuitively, this means that although is a (hyper) surface in

 0  implicit level set embeddings [18], [19], is a worthwhile topic 3

, it can be locally identified with . * for future investigation. +

Let '4(6587 be a mapping between two manifolds,

¦ +

What follows is a brief outline of the paper. We review ¢

9 5 :;'

5 . Let be a curve in . The tangent map assigns ¢

some necessary background on the of domain ¢

5 :;' <

each tangent vector < to at point the tangent vector ¢=

manifolds in Sec. II. In Sec. III we review the asymptotic + <

to at point '- , such that, if is the initial velocity of 

theory of entropic graphs and obtain several new results ¢

5 :;' < '6>9

9 in , then is the initial velocity of the curve

¢@?A8 B  3 required for their extension to embedded manifolds. In Sec. +

in . For example, if 5 , with an open

KJ ON

 ¢¥

3

,DCFE CFE1,HG I IML

IV we define the general GMST algorithm. Finally, in Sec. V ¢

<  < '

set of , then :;' , where ,

¦  O¦¨S ¦  ¦

,RQ ,UQ

we test the GMST algorithm on standard synthetic manifolds P ¡ , T , is the Jacobian matrix associated

and on a real data set consisting of human faces from different ¢? 5

with ' at point .

N

¦ *

X Q +

subjects. G ¤

The length of a smooth curve VW( is defined as

 

Y ,\[

]

£^ V=>` :a`

ZV . The geodesic distance between points

¦ ?

+ ¤ EOMETRIC ACKGROUND ^ _

II. G B ] / / is the length of the shortest (piecewise) smooth

A. A 3D Example curve between the two points:

Scb ¦  ¡   ¦  k

,edf;g Y X , Q ,

¤ ¤ ]

To illustrate ideas consider a 2D surface embedded ]

>/ / iV (jV= / V= / h in 3D ,© called the embedding space. Let ¡£¢¥¤§¦¨¢ © ¦   be a set of points (samples) in a We can now define the following types of embeddings. 3

It has long been known [24] that, when suitably normalized,

*

+

( 5R7

Definition 1: ' is called a conformal mapping the sum of the edge weights of certain minimal Euclidean

Q

¤

' ' 

if is a diffeomorphism (i.e., is differentiable, bijective spanning graphs % over converges with probability

¢ ?

 S

Q

[+*-,

¡

0

' 5 '

!/. Z/ /

with differentiable inverse ) and, at each point , (w.p. ) to the limit ) , where the integral is

? ¦  §

X Q X

0

 )

preserves the angles between tangent vectors, i.e., interpreted in the sense of Lebesgue, and .

[

¢

¢

  ¢¥ ¦

,¦¥

¢ ¢¤£ £ .

This a.s. limit is the integral factor ! in what we will call

< Z:;'  < Z:;' (1)

the extrinsic Ren´ yi -entropy of the multivariate Lebesgue

¢ ¢¥¤§

¥ X

£

! 0

5 

for all vectors < and that are tangent to at , and density :

¢ ¢A?

,

* Q

is a scaling factor that varies smoothly with . If for all 5 ,

  S

,

¢¥

¥ ,BQ

*

,

!

.

Q

1! ! >/ / '

 , then is said to be a (global) isometry. In this case (3)

. 325476

the length of tangent vectors is also preserved in addition to

* Q

the angles between them. In the limit, when we obtain the usual Shannon entropy,

!  

*

[ ,

!&Z/ !&>/ :a/

   

3 . Graph constructions that converge to 25476

If there is an open set 5 [21], then the

¢

¢¥ ¢¥

CFE ,

C E the integral in the limit (3) were called continuous quasi-

' 

diffeomorphism is a conformal mapping iff 

¨ ¢¥©¨

¥ additive (Euclidean) graphs in [20] and entropic (Euclidean)

3 3

¡ ¡  , where is the identity matrix. In this case,

+ graphs in [8]. See the monographs by Steele [25] and Yukich

the geodesic distance in can be computed as follows.

N

¦ *

X Q +

G [20] for an excellent introduction to the theory of such random 7

Any smooth curve V( can be represented as

N

   ¦ *

G X Q

, Euclidean graphs. Relevant details for these results are given

'- 9 >` 9 ( 7 5

V2>` , where is a smooth curve in

  3

Y in the next subsection. V

. Then, the length ZV of the curve is given by

¤

¤ The -entropy has proved to be an important quantity in

     

CFE Y ,¦ ,

: signal processing, where its applications range from vector

 9 >` 9&Z` :;` ZV '- 9 >` :;`

] ]

quantization [26], [27] to pattern matching [28] and image

:a`

¤

registration [8], [29]. The -entropy parameterizes the Cher-

  

,¦ ¥

9 >` 9&Z` :;`

 noff exponent governing the minimum probability of error [30] 

] making it an important quantity in detection and classification

3 

As in the shortest path between any two points is given by problems. Like the Shannon entropy, the -entropy also has an

 ¢ ¢2¤-! ¢ 

,

] ]

¤ `O

the straight line that connects them, 9&Z` operational characterization in terms of source coding rates. In

 

[

]

9 Z` :;`

minimizes , over all smooth with start and [31] it was shown that the -entropy of a source determines

¢ ¢2¤ ¢¥

¥

] 

end points at and , respectively. So, if is constant, the achievable block-code0 rates in the sense that the probability

¢¥ ?

¥ ,¥ L

,

* 5

i.e.  for all , the geodesic distance between of block decoding error converges to zero at an exponential

¢2¤O ¢ 

, ,



¤

]

]

/ / '- '6 8!

and is rate with rate constant  .

.

S ¢  ¦ ¢=¤O ¢ !W¢2¤

b

,¦ ¥

] ]

'6

Z'- (2)

20

, Q

¥ A. Beardwood-Halton-Hammersley Theorem in

'

¡' ¤F¦  ¦:   0

When , i.e., is an isometry, the geodesic distance in ,

+ 

and the in the parameterization space Let 9 be a set of points in . A

 §

3

¥ Q ¥ Q

are the same. If ( ) there is a global expansion minimal Euclidean graph spanning  is defined as the graph 

spanning  having minimal overall length ,

(contraction) in the distances between points. *



;

>= d f

It is evident from the above discussion that geodesic dis- ,

;



< Ea

¢-?A@CB (4) ?7¢

tances carry strong information about a non-linear domain D

+

 

 !F ¦

, , G

manifold such as . However, their computation requires the P

+

E E

Here the sum is over all edges (e.g., T ) in

'

? ¦S 

knowledge of the analytical form of via and its Jacobian. X

Ea E 9  Our goal is to learn the entropy of non-linear data on a domain the graph % , is the Euclidean length of , and

manifold together with its intrinsic dimension, given only the is called the edge exponent or power-weighting constant. For

¥0

 

example when H is the set of spanning trees over one  data set  of samples in the embedding space , and obtains the MST. A minimal Euclidean graph is continuous without knowledge of its embedding function ' . quasi-additive when it satisfies several technical conditions III. ENTROPIC GRAPH ESTIMATORS ON EMBEDDED specified in [20] (also see [9]). Continuous quasi-additive

MANIFOLDS Euclidean graphs include: the minimal spanning tree (MST),

¤

¡  ¦ ¦  

, I

the I -nearest neighbors graph ( -NNG), the minimal matching

   Let  be independent identically =0 graph (MMG), the traveling salesman problem (TSP), and their distributed (i.i.d.) random vectors in a compact subset of , power-weighted variants. While all of the results in this paper with multivariate Lebesgue density ! , which we will also call

apply to this larger class of minimal graphs we specialize to # $ random vertices. Define the distance matrix " as the the MST for concreteness.

matrix of edge weights (w.r.t. a specified metric). A spanning

¡'&2¦ (  &

, A remarkable result in geometric probability was estab-

 

  graph % over is defined as the pair where ( lished by Beardwood, Halton and Hammersley [24].

and is a subset of all graph edges connecting pairs of

& "

vertices in , with weights given by " . When is computed Beardwood-Halton-Hammersley (BHH) Theorem [20], [25]: 9 from pairwise Euclidean distances, % is called a Euclidean Let be an i.i.d. set of random variables taking values in a spanning graph. compact subset of  0 having common probability distribution

4

*



¤

  

b

+ ,¢¡

¡

¤£

¤

;

Z' <9

. Let this distribution have the decomposition on . Then, the length functional of the



¡

¡

£

 9

where is the Lebesgue continuous component and is the MST spanning ' satisfies

*



¤

0'& 0'&

J



;

d5= , ¡

singular component. The Lebesgue continuous component has ¡



;



Z' < 

, (8)

* ©¨

a Lebesgue density (no singular component) which is denoted 

2

 ?  0 

L L

()

¦ S/.



;



&  

! , . Let be the length of the MST spanning

¡

S¦¥¤§ S

X   Q

*

¢

. "5768/9

; b S  ¦ S

[ : , C C§E



 9

E 10

and assume that and . Then, w.p. b

)+-,

3

.

) :324 ! >/  / ¡

,

*

¦ S/.-§

X

J

  S

d5= ,

¡

0

*/,

;

. .



<  ) ! Z/ /

(5)

©¨

J

! 

Q ,

2

*



¤

 ¡ 9 ¡

w.p. , where . Furthermore, the mean

N>J

(   0'& 0'&

G

;

J

S !  S

¡ ¡

,

,



0 ;



i'   

*

 9 )

where and is a constant not depending on the converges to the same limit.

N>J

( 

G

;

 .

<  distribution . Furthermore, the mean length  Proof: This theorem is a simple consequence of relation

converges to the same limit. (5) in the BHH Theorem and properties of integrals over Q

The limit on the right side of (5) in the BHH theorem is manifolds. By the BHH Theorem, w.p. ,

*



3 3 3 3

 ¦ ¢¥;Sc¢ 

; ;

zero when the distribution has no Lebesgue continuous ,

¡ ¡

¡ X

>=

3

<

*

 

;

.





) !  <   

component, i.e., when . On the other hand, when

X £

has no singular component, i.e., , a consequence of the ¤ (9)

 

 

,

¡

<

 ' 

BHH Theorem is that where ! is the density of . Therefore the

,

0

S;. S/./§



*



,



¡ ¡ *

S limits claimed in (8) for and are obvious. For

S/.

,



;

<9

! 

def,

¡ 

0 , relation (9) implies

) <9

0 0 

; (6)

¡

*



.

25476 25476

9



3 3

J

¢¥aS ¢-¦ 

;

d = ,

¡

<

3

*

0 

;

.



 !  ¤  )

(10)

?¨

, 2

is an asymptotically unbiased and* strongly consistent estimator



8!

of the extrinsic -entropy defined in (3). For a and it remains to show that this limit is identical to the limit

. 0 discussion about the role of the constant ) in the proposed asserted in (8).

estimators see section IV. For an integrable function ¡ defined on a domain manifold

 *

3

+ + 7

defined by the diffeomorphism 'A( , the integral + of ¡ over satisfies the relation [22]:

B. Generalization of BHH Thm. to Embedded Manifolds

; b S  ¢2 ;@ ¢¥aSc¢ ¦

¡ , ¡

*



Z/  / i'6 

b (11) 

If the vertices  are constrained to lie on a smooth compact



 ¥0  +

¡ -dimensional manifold , the distribution of

¡ X

¢

@ ¢& "

,BA C C

E

E

:32C4

is singular with respect to Lebesgue measure, , and, where  is the Riemannian metric asso- ¡ as previously mentioned, the limit (5) in the BHH Theorem ciated with + . Specializing to the indicator function of a

is zero. However, as shown below, if + is defined by an

/ 

3 small volume centered at a point , (11) implies the following

  b S 

3 ,

isometric embedding from the parameterization space , if +

 / 

 relation between volume elements in and :

+

@ ¢ aS ¢-  

¡ ,

has a density ! on , and if the geodesic estimation

 !&>/

Furthermore, specializing to >/ it is clear

¢¥ ¢¥ D@ ¢¥ ,

step of ISOMAP is used to approximate geodesic distances, <

 !&Z'-  from (11) that ! . Therefore

then the length of an MST constructed from the0 geodesic edge

¢¥ S ¢ ¢¥D@ ¢2 Sc¢

,

b

<

* *

. .

 

 8!&Z'- 

lengths can be made to converge, after suitable normalization !



+

1!

and transformation, to the intrinsic -entropy on

¤

.

¢¥E@ ¢¥ S ¢#¦ ¢¥D@

,

¡

defined by *

0

. .



  ! Z'-

b

  S  ¦

b

,

¡

¢ * ¢¥

.

8! ! >/  /

b '-

(7) which, after the change of variable 7 , is equivalent

.

2 4A6

9 to the integral in the limit (8).

 S 

b / where  denotes the differential volume element over

+ . 20 + C. Estimating Geodesic Distances

More generally, assume that is embedded¤ in through

  i 

3

, ¡

If ' is an isometric or conformal embedding then it has

 '  + the diffeomorphism ' . As lives in , let

been shown that for sufficiently dense sampling over ,

* *

¤

 

% be the Euclidean minimal graph spanning and having

 #"

, ¡

i.e., for large  , the ISOMAP or the C-ISOMAP algorithm

; ;!

   ' <9 length function according to

summarized in Table I will approximate the matrix¤ of pairwise

 ,

definition (4). We have the following extension of the BHH ¡

 

< '

Euclidean distances between points  in the  Theorem. 3

domain space without explicit knowledge of ' . This

+ F

Theorem 1: Let be a smooth compact ¡ -dimensional estimate is computed from an Euclidean graph connecting

¥0 +

manifold embedded in through the diffeomorphism '8( all local neighborhoods of data points in . Specifically,

* ¦   §%$ $ S

3

+ X> 

7 5 ¡ 9

5 . Assume and in the isometric case, ISOMAP proceeds as follows. Two

 ¤§¦: © ¦ 

+

G I

¡ . Suppose that are i.i.d. random vectors on methods, called the -rule and the -rule [1], are available F having common density ! with respect to Lebesgue measure for constructing . The first method connects each point to 5

TABLE I Q w.p. , then the length functional of the GMST satisfies

DISTANCE ESTIMATION STEPS OF ISOMAP/C-ISOMAP ALGORITHMS TO

b



0'& 0'&

J



;

d5= , ¡

RECONSTRUCT EUCLIDEAN DISTANCES BETWEEN ¢¡ ON THE



;



< 

(14)

©¨ 2

EMBEDDING PARAMETERIZATION SPACE FROM POINTS £¤¡ OVER THE

(

¦ S/.



¤ ¡

EMBEDDED MANIFOLD. *



 @  3 S  ¦ S/.

b

,

[

+

¡4 ¡

b

3

,

) !/. Z/ Z' Z/  / ¡

¦ S/./§

X ¡

Step 1. Determine a Euclidean neighborhood graph ¥ of the observed

¦ §

data £¤¡ according to the -rule or the -rule as defined in

¢

J "

!  @ ¢¥

Q , , A C CFE

ISOMAP [32]. E

 ¡ 9 ¡  :32C4

w.p. , where and .

b



NZJ

(  0'& 0'&

G

; ¡

Step 2. For isometric embeddings compute the edge matrix ¨ of the

; 



  

ISOMAP graph [1] and for conformal imbeddings compute Furthermore, the mean converges to ©  the edge matrix ¨ of the C-ISOMAP graph [15]. The the same limit. entry of this symmetric matrix is the sum of the lengths of the

edges in ¥ along the shortest path between the pair of vertices The sufficient condition (13) of Corollary 1 simply states ,

©  where the edge lengths between neighboring points

¤ 

that the pairwise Euclidean distances between points in 



  ¥    

, in are defined as Euclidean distance - ¡



<

' should be uniformly well approximated by the

 

    ! " #$©&%' #$© () b

in the case of ISOMAP or - in the b

" " , case of C-ISOMAP where #$©* + is the mean distance of entries of matrix . Constructions of which satisfy

to its immediate nearest neighbors. condition (13) will be discussed in the next subsection.

b





;

<9

Proof of Corollary 1: Write as

;





S 

b G

all points within some fixed radius and the other connects 



;

, = d f

 E



;



< Ea

I F

?A@ B

each point to all its -nearest neighbors. The graph defining ¢

Ea ?7¢ the connectivity of these local neighborhoods is then used to D approximate the geodesic distance between any pair of points The uniform convergence expressed by condition (13) implies

as the shortest path through F that connects them. Finally, this

¦ 

P

*

 ¤

that b



 T

.   

;

Q Q

results in a distance matrix whose entry is the geodesic ,

¡

¦ 

>= P

; ;

 

<     Z' T

distance estimate for the  -th pair of points. The geodesic

J

!  ! ¢

, ,

distance estimation algorithm just described is motivated by Q

¤ ¤

9 ¡

Applying Thm. 1 and identifying i ,

¢

 " @  

C CFE ,

E ¡

the fact that locally a smooth manifold is well “approximated” ¡

' >/ :324 >/ and Z' provides the desired by a linear hyper-plane and, so, geodesic distances between result.

neighboring points are close to their Euclidean distances. For

§¤§ S/& §

faraway points, the geodesic distance is estimated by summing If ¡ , as the parameter is increased from to the

the sequence of such local approximations over the shortest limit (14) in Corollary 1 transitions from infinity to a, finite

S3& ,

path through the graph F . limit and finally to zero over three consecutive steps

& &

! ¦ ¦ S/& 0 0

Q Q

;

¡

*



¤





Thus, if one uses these distances to construct an MST, b

¡ ¡ ¡ 

 . As indexes the rate constant of



¡



;



< i'

its length function will approximate and we ;

<9 the length functional , this abrupt transition suggests can invoke Thm. 1 to characterize its asymptotic convergence that the intrinsic dimension ¡ and the intrinsic entropy might

properties. As the estimated distances will use information

 

¦    be easily estimated by investigating the growth rate of the about the geodesic distances between pairs of points  GMST’s length functional. This observation is the for

this graph will be called a geodesic MST (GMST). 

b the estimation algorithm introduced in the next section.

¤ ¤

More specifically, denote by " the matrix of estimated

K  £

 We now specialize Corollary 1 to the following cases of

¡ ¡

¤

 ' 

pairwise distances between points ' and in



 

S 

 interest.

¡

¤ ¤



   E

' , and by the estimated length of the corre-

'

   

  !  

, 1) Isometric Embeddings: In the case that defines an

¡ ¡

'  ' 

sponding edge E . Define the GMST isometric embedding, the geodesic estimation step of ISOMAP

 

% as the minimal graph over whose length is: ¤

b is asymptotically able to recover the true Euclidean distances





b

,





¡

 S .

;

, = d f

"  ' <9

between the points in  and satisfies

;



  1E

¢

¨

¢ ?A@

B C C ,

(12) E

E

3 ?7¢

D condition (13) [32]. Furthermore, . Thus, using



b S/.

, ¡ The following is the principal theoretical result of this ISOMAP to construct " , limit (14) holds with the

paper and is a simple consequence of Thm. 1. limit replaced by

+

; b S O

¡

Corollary 1: Let be a smooth compact -dimensional 3

.

) ! Z/  /

b ¥0

manifold embedded in through the diffeomorphism '8(

 * § $ $RS

3

+ X  

b

5



7 ¡ 9 ¡

J J

 ! 3

. Let and . Suppose that 3

;  

¡

¤

 ¦  ¦ 

+

376



;

9 <9  ) Furthermore, ¡ con-

 are i.i.d. random vectors on with common

Q

25476 25476

$b +

density ! w.r.t. Lebesgue measure on . If the entries verges w.p. to the intrinsic entropy (7).

 

 

¡ S  b '

of matrix " satisfy If defines an isometric embedding with contraction or



 

S expansion, the geodesic estimation step of ISOMAP algorithm

! * *

=.-0/ Q X

¤ ¤

21 +3 )1

¤ is able to recover the true Euclidean distances between points

 

  !  



¥

¡

¡ as (13)



'  ' 

¤

in  only up to an unknown scaling constant (c.f. (2)). As

,

6

¢

¨ S;.

C C ,¦¥ , E

E TABLE II 3

, limit (14) holds with the ¡ limit replaced GMST RESAMPLING ALGORITHM FOR ESTIMATING INTRINSIC

by ¡£¢

¤ © ¤

DIMENSION AND INTRINSIC ENTROPY .

; S 

b

; 

¥

¡

3

.

) ! >/  / b

Initialize: Using entire database of signals £¤¡

¥

¦ Q

Now, the entropy estimator defined above converges w.p. up construct geodesic distance matrix ¢ using ISOMAP

J

! §j ¥

Q or C-ISOMAP method. 9

to an unknown additive constant  to the intrinsic

¨§ © § © § ©  

Select parameters: # , , and 25476 entropy (7). We point out that in many signal processing 

applications (e.g. image registration) a constant bias on the ¡ ©

© , ;

  % !" #

entropy estimate does not pose a problem since an estimate for #

2 "!" %& for #$

of the relative magnitude of the entropy functional is all that '

is required. (© ;

$% " )

for

£+*

2) Non-isometric Embeddings Defined by - Randomly select a subset of  signals ¡

from £ ; '

pings: In the case that is a general (non-isometric) con- '

£,*

Compute geodesic MST length * over and ¥ formal mapping, it was stated in [33] without proof, that ¦

from ¢ ; '.-$'

the C-ISOMAP algorithm is once again able to recover the ' *

 ;

b 

true Euclidean distances between points in  . Furthermore, end for



¢

¢¥©¨ 

C CFEe,¥ E

3 Compute sample average geodesic MST length;

;



 < ¥ ¥

¢

' '

. Thus, when is the length of the /10

2

,*043 !" geodesic MST constructed on the distance matrix generated ©*£ ;

end for

¥ ¥

by the C-ISOMAP algorithm, we expect the limit (14) to hold ¡

& &

65 7 5

/. S Estimate dimension and -entropy from

,

¥ ¥

8 *;:

¢

/10 '

¡

2

©*£+* 43%9

with the limit replaced by *;< * via LLS/NLLS;

¥

9

¥

- ¡ ¡>- ¡

© ¤

& &

65  5

= , ;

 ; S O

b

;

¥

¡ ¡ 3

. end for

) ! >/ i' >/  /

b

¥

¥

¡ ¡

! #  ! #

 ,

b

5



J J

 !

3 3

;  

¡

376



;

9 <9  )

In this case, ¡ would

25476 25476

b





, ?

converge a.s. to the weighted intrinsic entropy ?

;

 9 

 . According to (14) has the following

Q

© ¤

25476

  ; .

b

; 

¥

¡

¡ approximation

!

.

Q

! Z/ Z' >/ >/

b

¦

,A@

?

25476

B 

0

  G

 (15)

25476 b The weighted -entropy is a “version” of the standard un-

 where

1!

weighted -entropy which is “tilted” by the space-

J

!  ¦

0

@ ,

+

.

 ¡ 9 ¡

varying volume element of . This unknown weighting b

J

 ¦

, (16)

B 

3

) ¡ 1!

makes it impossible to estimate the intrinsic unweighted - 9

.

25476

J

! 

entropy. However, as can be seen from the discussion in the ,



 ¡ 9 ¡ G

and is an error residual that goes to zero *

next section, as the growth rate exponent of the GMST length Q  w.p. as . 0 depends on ¡ we can still perform dimension estimation in

The additive, model (15) could be the basis for many differ- this case.

ent methods for estimation of ¡ and . For example, we could

3) Non-conformal Diffeomorphic Embeddings: When ' de- invoke a central limit theorem on the MST length functional

fines a general diffeomorphic embedding, an extension of the  [34] to motivate a Gaussian approximate to G and apply C-ISOMAP algorithm that can provably learn the Euclidean

maximum likelihood principles. However, in this paper we  distances between the points  in the parametrization space adopt a simpler non-parametric least squares strategy which

is needed in order to apply Corollary 1. To the best of our  is based on resampling from the population  of available knowledge such an extension of C-ISOMAP does not yet exist. +

points in . The proposed algorithm is summarized in Table

¤ ¤

$  ¦ $ ¦  ¦

Q  

C CED  CED

II. Specifically, let C , , be

J

,G

IV. GMST ALGORITHM £

F 

integers and let F be an integer that satisfies for

N

? ¦ ?B¡ ¤F¦  O¦ 

G X Q



 C C C

Now that we have characterized the asymptotic limit (14) some fixed . For each value of D

¦ ¦

, Q

H



 T F

of the length functional of the GMST we here apply this randomly draw F bootstrap data sets , ,

H  theory to jointly estimate entropy and dimension. The key with replacement, where the C data points within each

is to notice that the growth rate of the length functional is are chosen from the entire data set  independently. From

¡

¤KJL 

strongly dependent on the intrinsic dimension , while the these samples compute the empirical b mean of the GMST





, ,

!M$¤

¡

H

; H

F   N

constant in the convergent limit is equal to the intrinsic - length functionals I . Defining

¢

N

¦ ¦

G

H H

entropy. We use this strong growth dependence as a motivation

: , and motivated by (15) we write down

9

25476 2 4A6 b

for a simple estimator of ¡ . Throughout we assume that the 

the linear model @



¦

,AOQP

;

<9

geodesic minimal graph length is determined from a 



N G

b (17) B R

distance matrix " that satisfies the assumption of Corollary b

1, e.g., obtained using ISOMAP or C-ISOMAP. We also ¢

 where

J 0'& 0'&

¥ § 

¤



;

¡



; O#,SP

¡ <9  CTD

assume that . This guarantees that C

S3.

, 

Q Q

25476 25476 R has a non-zero finite convergent limit for ¡ . Next define 7

Fig. 1. Computing the dimension estimators by averaging over the length

#  ©&% " 

functional values, i.e., © (dashed line), or by averaging over

£

Fig. 2. The S-shaped surface manifold and corresponding GMST ( § )

7E$©# '%' the dimension estimates, i.e., ©# (solid lines).

graph on 400 sample points.

0

@

N

¦ 3

G X Q B

Expressing and explicitly as functions of ¡ and via

. the unit cube . Closed form expressions are not available (16), the dimension and entropy quantities could be estimated but several approximations and bounds can be used in various using a combination of non-linear least squares (NLLS) and

regimes of ¡ [20], [35]. For example, one could use the large 

integer programming. Instead we take a simpler method-of- 3 ¡

approximation of Bertsimas and van Ryzin [36]: )

J J

§ § 

moments (MOM) approach in which we use (17) to solve for 25476



9  ¡ E

¦ ¦

@ @

 . Another strategy, adopted in this paper, is

B B

2 4A6

the linear least squares (LLS) estimates of followed 3 to determine ) by simulation of the Euclidean MST length by inversion of the relations (16). After making a simple large on the ¡ -dimensional cube for uniform random samples.

 approximation, this approach yields the following estimates: Before turning to applications we briefly discuss compu-

J

¡ !  

, Q @

  

0 tational issues. For samples, computing the MST scales

¡ 9 

round 



3

b

   

5 

 as , for which we have implemented Kruskal’s

!

¡  ,

¡ (18)

25476

B

3 6



© £¢

) algorithm [29]. On the other hand, the geodesic distances



.

25476

9

 <

needed to compute the GMST require  operations 25476

It is easily shown that the law of large numbers and Thm. 1 using Dijkstra’s algorithm multiple© times. Thus, like ISOMAP,

¦ *



F 

  

imply that these estimators are consistent as . We the GMST has overall  computational complexity.

2 4A6 , omit the details. 0 By running the algorithm ¤ times independently over the

 V. APPLICATIONS

¡  ¦  ¦¥



¤

%M$¤

9 ¡ 0 population , one obtains estimates,0 , that

We illustrate the performance of the GMST algorithm on J

can be averaged to obtainJ final regularized dimension and

 

 J  J

, ,

  ¤

¤ manifolds of known dimension as well as on a real data set ¡ entropy estimators, ¡ and . The

¤ consisting of face images. In all the simulations we fixed the

¤

! ¦  O¦ !

,BQ , , Q

role of parameter , together with parameter F , is to provide

£

C  CTD  parameters 9 and . We also a tradeoff between the bias and variance performance of the

used the I -rule method, as described in table I, to estimate

estimators for finite  . The two cases of interest (considered in

¦  ¦  ¦  ¦ 

, Q , Q

¤ ¤

¤ geodesic lengths. With regards to intrinsic dimension estima-

F  F  F  the next section) are  and . tion, we compare our algorithm to ISOMAP. In ISOMAP, In the first case, the smoothing is performed on the GMST similarly to PCA, intrinsic dimension is usually estimated by length functional values before dimension and entropy are detecting changes in the residual fitting errors as a function of estimated, resulting in low variance but possibly high bias. In subspace dimension. the second case, the smoothing is performed directly on the dimension and entropy estimates, resulting in higher variance but less bias. A. S-Shaped Surface

Fig. 1 shows a graphical illustration of the smoothing step The first manifold considered is the standard § -dimensional

§



% ,

of the algorithm. Left panel shows F resampled GMST S-shaped surface [2] embedded in (Fig. 2). Fig. 3 shows



lengths, labeled “+” and “o”, along with their average, labeled the evolution of the average GMST length I as a function

¤ ©

 

%

C C

“x” for GMSTs built on C randomly chosen of the number of samples, for a random set of i.i.d. points

¦  ¦ 

, Q

¤

F  F

vertices. For  , a linear least squares fit to the uniformly distributed on the surface.

J

 §

, O

©¨ 

average GMST trajectory, § , is used to compute To compare the dimension estimation performance of the

¦  ¦ 

, Q



¤ ¤

 F 

the dimension estimate ¡ . For , dimension GMST method to ISOMAP we ran a Monte Carlo simulation.

O

 

X

¡ ¡  estimates and are computed from sub-trajectories For each of several sample sizes,  independent sets of i.i.d. and ¨ , forming a histogram from which a final estimate can random vectors uniformly distributed on the surface were be computed. generated. We then counted the number of times that the

A word about determination of the sequence of constants intrinsic dimension was correctly estimated. To automatically

¡ 

3 3 

) is in order. First of all, in the large regime for which estimate dimension with ISOMAP, we follow a standard PCA 3

the above estimates were derived, ) is not required for the order estimation procedure. Specifically, we graph the residual 3 dimension estimator. ) is the limit of the normalized length variance of the MDS fit as a function of the PCA dimension functional of the Euclidean MST for a uniform distribution on and try to detect the “elbow” at which residuals cease to 8

(a) (b) (c)

'

7E$©&%  ¡

Fig. 3. Illustration of GMST dimension estimation for ©# : (a) plot of the average GMST length for the S-shaped manifold as a function

¥

¢¤£

¡

©  ¦¥

of the number of samples; (b) log-log plot of (a); (c) blowup of the last ten points in (b) and its linear least squares fit. The estimated slope is 

¥

( §   # $% ¨§ which implies  . ( , , ). decrease “significantly” as estimated dimension increases [1]. The elbow detector is implemented by a simple minimum angle threshold rule. Table III shows the results of this experi- ment. As it can be observed, the GMST algorithm outperforms ISOMAP in terms of dimension estimation error rates for small sample sizes. Fig. 4 shows the histogram of the entropy estimates for the same experiment.

B. Hyper-Planes ¤

Next, we investigated linear ¡ -dimensional hyper-planes in

 ©

3 for which PCA methods are designed. We consider

¤ ¤

L L , X

  © hyper-planes of the form 3 . Table IV shows the results of running a Monte Carlo simulation under

the same conditions as in the previous subsection. When

, Q ¤ (i.e., least squares applied to the average length functional values), the GMST method showed a tendency

to underestimate the correct dimension at smaller sample Fig. 4. Histogram of GMST entropy estimates over 30 trials of 600 samples

, Q

  #  % ¨§   uniformly distributed on the S-shaped manifold ( §

sizes. However, by taking F instead (i.e., averaging © % ). True entropy (”true”) was computed analytically from the area of S curve of least squares dimension estimates), this negative bias was supporting the uniform distribution of manifold samples. eliminated and the GMST performed as well as the ISOMAP, which was observed to correctly predict the dimension for all sample sizes investigated. TABLE IV

Of course, as expected, the number of samples required NUMBER OF CORRECT GMST DIMENSION ESTIMATES OVER 30 TRIALS AS

 § to achieve the same level of accuracy increases with the A FUNCTION OF THE NUMBER OF SAMPLES FOR HYPER-PLANES ( § ). manifold dimension. This is the usual curse of dimensionality

Hyper-plane 

#

phenomenon: as the dimension increases, more samples are 

© © © © %© © © dimension ¥

needed for the asymptotic regime in (14) to settle in and §

% 30 30 30

%©

validate the limit in Corollary 1. ( %

§ 30 30 30 §

% 24 24 27 ©

TABLE III % % § 25 26 27

NUMBER OF CORRECT ISOMAP AND GMST DIMENSION ESTIMATES

%©

% 30 30 30

%§ ©

OVER TRIALS AS A FUNCTION OF THE NUMBER OF SAMPLES FOR THE

© %

% 30 30 30

 

S-SHAPED MANIFOLD ( § ).

%©

% 24 25 26

%§

¢

¢

© %

% 27 28 28

 ( © © © © ¥ © ©

%©

% 25 28 29 ©

ISOMAP 23 29 30 (

© %

% 29 29 30

$%  §   %© GMST ( # ) 29 30 30 9

Face 1

Face 2

Fig. 5. Samples from ISOMAP face database [1].

30 30 Face 3

25 25

20 20 Face 4

15 15

10 10 Fig. 7. Samples from Yale face database B [37].

5 5

Face Database B [37]. This is a publicly available database1

¨ ¨ containing a number of portfolios of face images under ¤ 0 0

2 3 4 5 6 2 3 4 5 6 different viewing conditions for each subject (Figure 7). Each

¨ ¡ dimension estimate dimension estimate portfolio consists of ¢ poses and illumination conditions (including ambient lighting) for each subject. The images were Fig. 6. GMST intrinsic dimension estimate histogram for the ISOMAP face

taken against a fixed background which we did not bother

 ¥ #  %  %©  %§ §  ¥

database: left plot, § , , , ; right plot, ,

$%© $%   %§ # , , . to segment out. This is justified since any fixed structures throughout the images would not change the intrinsic dimen- sion or the intrinsic entropy of the dataset. We randomly

C. ISOMAP Face Database selected 4 individuals from this data base and subsampled each

¡¥¦ ¡§¦

person’s face images down to a pixels image. Similarly We applied our method to a high dimensional synthetic

to the ISOMAP face data set, we normalized the pixel values Q image data set. For this purpose we used the ISOMAP face X

¡£¢¥¤ between and . database [1]. This set consists of images of the same face X

Fig. 8 displays the results of running  trials of the

generated by varying three different parameters: vertical and § ¡§¦ ¡¥¦ algorithm using face . The first panel shows the real valued

horizontal pose, and lighting direction. Each image has

§£¨

X Q ¡ estimates of the intrinsic dimension, i.e., estimates obtained

pixels with gray levels, normalized between and (Fig.

X ¢£¡ ¦ before the rounding operation in (18). Any value that falls in 5). For processing, we embedded each image in the - between the dashed lines will then be rounded to the integer at dimensional Euclidean space using the common lexicographic X the midpoint. The second panel of Fig. 8 shows the histogram order. We applied the algorithm  times over the data set

for these rounded estimates over the 30 generated trials. The ¨ with the histogram of the dimension estimates displayed in ¡ intrinsic dimension estimate is between and . Fig. 9 shows Figure 6. The estimated intrinsic dimension oscillates between ¦ the corresponding residual variance plots used by ISOMAP

 and , which, as in [5], deviates from the “informal” intrinsic to estimate intrinsic dimension. From these plots it is not

dimension of  estimated by ISOMAP with thresholding. The

§ Q ¤ obvious how to determine the “elbow” at which the residuals

estimated entropy was bits, with a standard deviation

J

¨ ! 

, Q

X cease to decrease “significantly” with added . This

 ¡ ¡ of . Note that as is close to one

illustrates one of the major drawbacks of ISOMAP (and other for the estimated values of ¡ , the estimate of -entropy is spectral based methods like PCA) as an intrinsic dimension expected to be close to the Shannon entropy. This estimate estimator, as it relies on a specific eigenstructure that may

suggests that one could, in theory, compress the ISOMAP face

J

¨ § 

X XcX Q

¡§¦ ¡§¦

¤ not exist in real data. The simple minimum angle threshold



¡ database, with little loss, using at most 

rule on ISOMAP produced estimates between  and . Table bits/pixel. V summarizes the results of the GMST method for the four faces. The intrinsic entropy estimates expressed in log base D. Yale Face Database B 2 were between 24.9 and 28 bits. Similarly to the ISOMAP

face database, as is close to one, these values suggest that Finally, we applied the GMST method to a real data set, the portfolio of a person’s face image could be accurately

and, consequently, of unknown manifold structure, intrinsic

§¥¨ ¡ dimension and intrinsic entropy. We chose the set of 1http://cvc.yale.edu/projects/yalefacesB/ gray levels images of several individuals taken from the Yale yalefacesB.html 10

TABLE V

¥

¥ ¡

GMST DIMENSION ESTIMATES AND ENTROPY ESTIMATES FOR FOUR FACES IN THE YALE FACE DATABASE B.

Face1 Face2 Face3 Face 4 ¥

6 6 7 7 ¥ ¡ (bits) 24.9 26.4 25.8 28.0

entropy. We are also studying the use of entropic graphs that bypass the complex step of geodesic estimation. In particular,

in [38], we consider I -nearest neighbor graphs due to their low complexity and local properties. Future work includes the characterization of the statistics in the linear model (15), optimization of the bias/variance tradeoff parameters of the GMST algorithm and the study of the effect of additive noise on the manifold samples.

Fig. 8. GMST real valued intrinsic dimension estimates and histogram for ACKNOWLEDGMENT

  #Q %  %©   ( © face 2 in the Yale face database B ( § , , ). The authors acknowledge and thank the creators of Yale Face Database B for making their face image data pub- licly available. The authors would also like to thank Huzefa Neemuchwala and Arpit Almal for their help in acquiring and processing these face images.

REFERENCES [1] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319–2323, 2000. [2] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear imbedding,” Science, vol. 290, no. 1, pp. 2323–2326, 2000. [3] D. Donoho and C. Grimes, “Hessian eigenmaps: locally linear embed- ding techniques for high dimensional data,” Proc. Nat. Acad. of Sci., vol. 100, no. 10, pp. 5591–5596, 2003. [4] M. Kirby, Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns, Wiley-Interscience, 2001. [5] B. Kegl,´ “Intrinsic dimension estimation using packing numbers,” in

Neural Information Processing Systems: NIPS, Vancouver, CA, Dec.

(  Fig. 9. ISOMAP ( § ) residual variance for face 2 in the Yale face 2002. database B. [6] P. Verveer and R. Duin, “An evaluation of intrinsic dimensionality estimators,” IEEE Trans. on Pattern Analysis and Machine Intelligence,

vol. 17, no. 1, pp. 81–86, January 1995.

J

§ 

X XjX¡

¤ ¡§¦

¡§¦ [7] V. I. Koltchinskii, “Empirical geometry of multivariate data: A decon-

 compressed using at most  bits/pixel. volution approach,” Annals of Statistics, vol. 28, no. 2, pp. 591–629, 2000. VI. CONCLUSION [8] A.O. Hero, B. Ma, O. Michel, and J. Gorman, “Applications of entropic spanning graphs,” IEEE Signal Processing Magazine, vol. 19, no. 5, pp. We have introduced a novel method for intrinsic dimension 85–95, October 2002. and entropy estimation based on the growth rate of the [9] A. Hero and O. Michel, “Asymptotic theory of greedy approximations to minimal k-point random graphs,” IEEE Trans. on Inform. Theory, geodesic total edge length functional of entropic graphs. The vol. 45, no. 6, pp. 921–1939, September 1999. proposed method has two main advantages. First, it is global in [10] A. Hero, J. Costa, and B. Ma, “Convergence rates of minimal graphs the sense that the MST is constructed over the entire set and with random vertices,” submitted to IEEE Trans. on Inform. Theory, 2002, www.eecs.umich.edu/˜hero/det_est.html. thus avoids local linearizations. Second, it does not require [11] A. K. Jain and R. C. Dubes, Algorithms for clustering data, Prentice reconstructing the manifold or estimating the multivariate Hall, Englewood Cliffs, NJ, 1988. density of the samples. We validated the new method by testing [12] T. Cox and M. Cox, Multidimensional Scaling, Chapman & Hall, London, 1994. it on synthetic manifolds of known dimension and on high [13] X. Huo and J. Chen, “Local linear projection (LLP),” in Proc. of First dimensional real data sets. Workshop on Genomic Signal Processing and Statistics (GENSIPS), One drawback of GMST, or any other dimension estimator 2002. [14] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques based on ISOMAP geodesic fitting to data, is the restriction to for embedding and clustering,” in Advances in Neural Informa- isometric embeddings. We are currently working on extending tion Processing Systems, Volume 14, T. G. Diettrich, S. Becker, and Thm. 1 and Corollary 1 to general (non-isometric) Riemann Z. Ghahramani, Eds. MIT Press, 2002. [15] V. de Silva and J. B. Tenenbaum, “Global versus local methods in manifolds, thus avoiding any assumptions about global embed- nonlinear dimensionality reduction,” in Neural Information Processing dings and eliminating the effect of the Jacobian on the intrinsic Systems 15 (NIPS), Vancouver, Canada, Dec. 2002. 11

[16] K. Pettis, T. Bailey, A. Jain, and R. Dubes, “An intrinsic dimensionality Jose A. Costa (S’99) was born in Lisbon, Portugal. He received the estimator from near-neighbor information,” IEEE Trans. on Pattern Licenciatura degree in Aerospace Engineering (1998) from Instituto Superior Analysis and Machine Intelligence, vol. 1, no. 1, pp. 25–36, 1979. Tecnico,´ Technical University of Lisbon, Portugal. He received the M.S. [17] F. Camastra and A. Vinciarelli, “Estimating the intrinsic dimension of degree in Electrical Engineering (2002) and the M.A. degree in Statistics data with a fractal-based method,” IEEE Trans. on Pattern Analysis and (2003) from the University of Michigan, Ann Arbor, where he is currently Machine Intelligence, vol. 24, no. 10, pp. 1404–1407, October 2002. pursuing the Ph.D. degree in Electrical Engineering. [18] F. Memolia, G. Sapiro, and S. Osher, “Solving variational problems and He performed research at the Communication Theory and Pattern Recogni- partial differentyial mapping into general target manifolds,” tion group, Institute of Telecommunications, Lisbon, between 1998 and 2000 Tech. Rep. 1827, IMA, January 2003. in statistical algorithms for positioning and navigation. His research interests [19] F. Memoli and G. Sapiro, “Fast computation of weighted distance include statistical signal and image processing, nonparametric estimation, functions and geodesics on implicit hyper-surfaces,” Journal of Com- pattern recognition and machine learning. putationals , vol. 73, pp. 730–764, 2001. Mr. Costa was awarded a Ph.D. fellowship in 2001 by Fundac¸ao˜ para [20] J. E. Yukich, Probability theory of classical Euclidean optimization a Cienciaˆ e Tecnologia (Portugal) and is currently a Rackham Predoctoral problems, vol. 1675 of Lecture Notes in Mathematics, Springer-Verlag, Fellow at the University of Michigan. Berlin, 1998. [21] M. Carmo, Differential geometry of curves and surfaces, Prentice-Hall, Englewood Cliffs, N.J., 1976. [22] M. Carmo, , Birkhauser¨ , Boston, 1992. Alfred O. Hero, III (S’79-M’80-SM’96-F’98) received the B.S. in Electrical [23] W. Boothby, An introduction to differentiable manifolds and Riemannian Engineering (summa cum laude) from Boston University (1980) and the Ph.D geometry, Academic, San Diego, Calif., rev. 2nd edition, 2003. from Princeton University (1984), both in Electrical Engineering. Since 1984 [24] J. Beardwood, J. H. Halton, and J. M. Hammersley, “The shortest path he has been a Professor with the University of Michigan, Ann Arbor, where he through many points,” Proc. Cambridge Philosophical Society, vol. 55, has appointments in the Department of Electrical Engineering and Computer pp. 299–327, 1959. Science, the Department of Biomedical Engineering and the Department of [25] J. M. Steele, Probability theory and combinatorial optimization, vol. 69 Statistics. of CBMF-NSF Regional Conferences in Applied Mathematics, Society He has held visiting positions at I3S University of Nice, Sophia-Antipolis, for Industrial and Applied Mathematics (SIAM), 1997. France (2001), Ecole Normale Superieure´ de Lyon (1999), Ecole Nationale [26] A. Gersho, “Asymptotically optimal block quantization,” IEEE Trans. Superieure´ des Tel´ ecommunications,´ Paris (1999), Scientific Research Labs on Inform. Theory, vol. 28, pp. 373–380, 1979. of the Ford Motor Company, Dearborn, Michigan (1993), Ecole Nationale [27] D. N. Neuhoff, “On the asymptotic distribution of the errors in vector Superieure des Techniques Avancees (ENSTA), Ecole Superieure d’Electricite, quantization,” IEEE Trans. on Inform. Theory, vol. 42, pp. 461–468, Paris (1990), and M.I.T. Lincoln Laboratory (1987 - 1989). His research March 1996. interests are in areas including: estimation and detection, statistical commu- [28] A.O. Hero and O. Michel, “Estimation of Ren´ yi information divergence nications, bioinformatics, signal processing and image processing. via pruned minimal spanning trees,” in IEEE Workshop on Higher Order He is a Fellow of the Institute of Electrical and Electronics Engineers Statistics, Caesaria, Israel, Jun. 1999. (IEEE), a member of Tau Beta Pi, the American Statistical Association (ASA), [29] H. Neemuchwala, A. O. Hero, and P. Carson, “Image registration using the Society for Industrial and Applied Mathematics (SIAM), and the US entropy measures and entropic graphs,” to appear in European Journal National Commission (Commission C) of the International Union of Radio of Signal Processing, Special Issue on Content-based Visual Information Science (URSI). He has received the 1998 IEEE Signal Processing Society Retrieval, 2003. Meritorious Service Award, the 1998 IEEE Signal Processing Society Best Paper Award, and the IEEE Third Millenium Medal. In 2002 he was appointed [30] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley, IEEE Signal Processing Society Distinguished Lecturer. New York, 1991. [31] I. Csiszar, “Generalized cutoff rates and Ren´ yi’s information measures,” IEEE Trans. on Inform. Theory, vol. 41, no. 1, pp. 26–34, January 1995. [32] M. Bernstein, V. de Silva, J. C. Langford, and J. B. Tenenbaum, “Graph approximations to geodesics on embedded manifolds,” Tech. Rep., Department of Psychology, Stanford University, 2000. [33] V. de Silva and J. B. Tenenbaum, “Unsupervised learning of curved manifolds,” in Nonlinear estimation and classi®cation, D.D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, and B. Yu, Eds. Springer- Verlag, New York, 2002. [34] K. S. Alexander, “The RSW theorem for continuum percolation and the CLT for Euclidean minimal spanning trees,” Ann. Applied Probab., vol. 6, pp. 466–494, 1996. [35] F. Avram and D. Bertsimas, “The minimum spanning tree constant in geometrical probability and under the independent model: a unified approach,” Ann. Applied Probab., vol. 9, pp. 223–231, 1990. [36] D. Bertsimas and G. van Ryzin, “An aysmptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability,” Oper. Research Letters, vol. 2, pp. 113–130, 1992. [37] A. Georghiades, P. Belhumeur, and D. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intelligence, vol. 23, no. 6, pp. 643–660, 2001. [38] J. A. Costa and A. O. Hero, “Entropic graphs for manifold learning,” in Proc. of IEEE Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, November 2003.