Embedding Words and Senses Together Via Joint Knowledge-Enhanced Training

Total Page:16

File Type:pdf, Size:1020Kb

Embedding Words and Senses Together Via Joint Knowledge-Enhanced Training Embedding Words and Senses Together via Joint Knowledge-Enhanced Training Massimiliano Mancini*, Jose Camacho-Collados*, Ignacio Iacobacci and Roberto Navigli Department of Computer Science Sapienza University of Rome [email protected] collados,iacobacci,navigli @di.uniroma1.it { } Abstract Previous works have addressed this limita- tion by automatically inducing word senses from Word embeddings are widely used in Nat- monolingual corpora (Schutze¨ , 1998; Reisinger ural Language Processing, mainly due to and Mooney, 2010; Huang et al., 2012; Di Marco their success in capturing semantic infor- and Navigli, 2013; Neelakantan et al., 2014; Tian mation from massive corpora. However, et al., 2014; Li and Jurafsky, 2015; Vu and their creation process does not allow the Parker, 2016; Qiu et al., 2016), or bilingual par- different meanings of a word to be auto- allel data (Guo et al., 2014; Ettinger et al., 2016; matically separated, as it conflates them Susterˇ et al., 2016). However, these approaches into a single vector. We address this issue learn solely on the basis of statistics extracted by proposing a new model which learns from text corpora and do not exploit knowl- word and sense embeddings jointly. Our edge from semantic networks. Additionally, their model exploits large corpora and knowl- induced senses are neither readily interpretable edge from semantic networks in order to (Panchenko et al., 2017) nor easily mappable to produce a unified vector space of word lexical resources, which limits their application. and sense embeddings. We evaluate the Recent approaches have utilized semantic net- main features of our approach both qual- works to inject knowledge into existing word rep- itatively and quantitatively in a variety of resentations (Yu and Dredze, 2014; Faruqui et al., tasks, highlighting the advantages of the 2015; Goikoetxea et al., 2015; Speer and Lowry- proposed method in comparison to state- Duda, 2017; Mrksic et al., 2017), but without solv- of-the-art word- and sense-based models. ing the meaning conflation issue. In order to ob- tain a representation for each sense of a word, a number of approaches have leveraged lexical 1 Introduction resources to learn sense embeddings as a result of post-processing conventional word embeddings Recently, approaches based on neural networks (Chen et al., 2014; Johansson and Pina, 2015; which embed words into low-dimensional vector Jauhar et al., 2015; Rothe and Schutze¨ , 2015; Pile- spaces from text corpora (i.e. word embeddings) hvar and Collier, 2016; Camacho-Collados et al., have become increasingly popular (Mikolov et al., 2016). 2013; Pennington et al., 2014). Word embeddings Instead, we propose SW2V (Senses and Words have proved to be beneficial in many Natural Lan- to Vectors), a neural model that exploits knowl- guage Processing tasks, such as Machine Transla- edge from both text corpora and semantic net- tion (Zou et al., 2013), syntactic parsing (Weiss works in order to simultaneously learn embed- et al., 2015), and Question Answering (Bordes dings for both words and senses. Moreover, our et al., 2014), to name a few. Despite their suc- model provides three additional key features: (1) cess in capturing semantic properties of words, both word and sense embeddings are represented these representations are generally hampered by in the same vector space, (2) it is flexible, as it can an important limitation: the inability to discrimi- be applied to different predictive models, and (3) nate among different meanings of the same word. it is scalable for very large semantic networks and Authors marked with an asterisk (*) contributed equally. text corpora. 100 Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 100–111, Vancouver, Canada, August 3 - August 4, 2017. c 2017 Association for Computational Linguistics 2 Related work space of words and senses as an emerging feature. Embedding words from large corpora into a low- 3 Connecting words and senses in dimensional vector space has been a popular task context since the appearance of the probabilistic feed- forward neural network language model (Ben- In order to jointly produce embeddings for words gio et al., 2003) and later developments such as and senses, SW2V needs as input a corpus where 1 word2vec (Mikolov et al., 2013) and GloVe (Pen- words are connected to senses in each given con- nington et al., 2014). However, little research has text. One option for obtaining such connections focused on exploiting lexical resources to over- could be to take a sense-annotated corpus as input. come the inherent ambiguity of word embeddings. However, manually annotating large amounts of Iacobacci et al.(2015) overcame this limitation data is extremely expensive and therefore imprac- by applying an off-the-shelf disambiguation sys- tical in normal settings. Obtaining sense-annotated tem (i.e. Babelfy (Moro et al., 2014)) to a cor- data from current off-the-shelf disambiguation and pus and then using word2vec to learn sense em- entity linking systems is possible, but generally beddings over the pre-disambiguated text. How- suffers from two major problems. First, supervised ever, in their approach words are replaced by their systems are hampered by the very same prob- intended senses, consequently producing as out- lem of needing large amounts of sense-annotated put sense representations only. The representation data. Second, the relatively slow speed of current of words and senses in the same vector space disambiguation systems, such as graph-based ap- proves essential for applying these knowledge- proaches (Hoffart et al., 2012; Agirre et al., 2014; based sense embeddings in downstream applica- Moro et al., 2014), or word-expert supervised sys- tions, particularly for their integration into neural tems (Zhong and Ng, 2010; Iacobacci et al., 2016; architectures (Pilehvar et al., 2017). In the litera- Melamud et al., 2016), could become an obstacle ture, various different methods have attempted to when applied to large corpora. overcome this limitation. Chen et al.(2014) pro- This is the reason why we propose a simple yet posed a model for obtaining both word and sense effective unsupervised shallow word-sense con- representations based on a first training step of nectivity algorithm, which can be applied to vir- conventional word embeddings, a second disam- tually any given semantic network and is linear on biguation step based on sense definitions, and a fi- the corpus size. The main idea of the algorithm is nal training phase which uses the disambiguated to exploit the connections of a semantic network text as input. Likewise, Rothe and Schutze¨ (2015) by associating words with the senses that are most aimed at building a shared space of word and connected within the sentence, according to the sense embeddings based on two steps: a first train- underlying network. ing step of only word embeddings and a second Shallow word-sense connectivity algorithm. training step to produce sense and synset em- Formally, a corpus and a semantic network are beddings. These two approaches require multiple taken as input and a set of connected words and steps of training and make use of a relatively small senses is produced as output. We define a seman- resource like WordNet, which limits their cov- tic network as a graph (S, E) where the set S con- erage and applicability. Camacho-Collados et al. tains synsets (nodes) and E represents a set of (2016) increased the coverage of these WordNet- semantically connected synset pairs (edges). Al- based approaches by exploiting the complemen- gorithm1 describes how to connect words and tary knowledge of WordNet and Wikipedia along senses in a given text (sentence or paragraph) T . with pre-trained word embeddings. Finally, Wang First, we gather in a set ST all candidate synsets et al.(2014) and Fang et al.(2016) proposed a of the words (including multiwords up to trigrams) model to align vector spaces of words and en- in T (lines1 to3). Second, for each candidate tities from knowledge bases. However, these ap- synset s we calculate the number of synsets which proaches are restricted to nominal instances only are connected with s in the semantic network (i.e. Wikipedia pages or entities). and are included in ST , excluding connections of In contrast, we propose a model which learns synsets which only appear as candidates of the both words and sense embeddings from a single 1In this paper we focus on senses but other items con- joint training phase, producing a common vector nected to words may be used (e.g. supersenses or images). 101 Algorithm 1 Shallow word-sense connectivity only, irrespective of the corpus size. This enables Input: Semantic network (S, E) and text T represented as a a fast training on large amounts of text corpora, bag of words in contrast to current unsupervised disambiguation Output: Set of connected words and senses T ∗ T S ⊂ × algorithms. Additionally, as we will show in Sec- 1: Set of synsets ST 2: for each word w ←T ∅ tion 5.2, this algorithm does not only speed up sig- ∈ 3: ST ST Sw (Sw: set of candidate synsets of w) ← ∪ S + T nificantly the training phase, but also leads to more 4: Minimum connections threshold θ | T | | | ← 2 δ accurate results. 5: Output set of connections T ∗ 6: for each w T ← ∅ Note that with our algorithm a word is allowed ∈ 7: Relative maximum connections max = 0 to have more than one sense associated. In fact, 8: Set of senses associated with w, Cw ← ∅ 9: for each candidate synset s Sw current lexical resources like WordNet (Miller, ∈ 10: Number of edges n = s0 ST :(s, s0) E & 1995) or BabelNet (Navigli and Ponzetto, 2012) | ∈ ∈ w0 T : w0 = w & s0 S w0 are hampered by the high granularity of their sense 11: ∃if n ∈ max &6 n θ then∈ | 12: if≥n > max then≥ inventories (Hovy et al., 2013).
Recommended publications
  • Planar Embeddings of Minc's Continuum and Generalizations
    PLANAR EMBEDDINGS OF MINC’S CONTINUUM AND GENERALIZATIONS ANA ANUSIˇ C´ Abstract. We show that if f : I → I is piecewise monotone, post-critically finite, x X I,f and locally eventually onto, then for every point ∈ =←− lim( ) there exists a planar embedding of X such that x is accessible. In particular, every point x in Minc’s continuum XM from [11, Question 19 p. 335] can be embedded accessibly. All constructed embeddings are thin, i.e., can be covered by an arbitrary small chain of open sets which are connected in the plane. 1. Introduction The main motivation for this study is the following long-standing open problem: Problem (Nadler and Quinn 1972 [20, p. 229] and [21]). Let X be a chainable contin- uum, and x ∈ X. Is there a planar embedding of X such that x is accessible? The importance of this problem is illustrated by the fact that it appears at three independent places in the collection of open problems in Continuum Theory published in 2018 [10, see Question 1, Question 49, and Question 51]. We will give a positive answer to the Nadler-Quinn problem for every point in a wide class of chainable continua, which includes←− lim(I, f) for a simplicial locally eventually onto map f, and in particular continuum XM introduced by Piotr Minc in [11, Question 19 p. 335]. Continuum XM was suspected to have a point which is inaccessible in every planar embedding of XM . A continuum is a non-empty, compact, connected, metric space, and it is chainable if arXiv:2010.02969v1 [math.GN] 6 Oct 2020 it can be represented as an inverse limit with bonding maps fi : I → I, i ∈ N, which can be assumed to be onto and piecewise linear.
    [Show full text]
  • Neural Subgraph Matching
    Neural Subgraph Matching NEURAL SUBGRAPH MATCHING Rex Ying, Andrew Wang, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, Jure Leskovec Stanford University and Siemens Corporate Technology ABSTRACT Subgraph matching is the problem of determining the presence of a given query graph in a large target graph. Despite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. However, existing techniques based on combinatorial matching and integer programming cannot handle matching problems with both large target and query graphs. Here we propose NeuroMatch, an accurate, efficient, and robust neural approach to subgraph matching. NeuroMatch decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. Trained to capture geometric constraints corresponding to subgraph relations, NeuroMatch then efficiently performs subgraph matching directly in the embedding space. Experiments demonstrate that NeuroMatch is 100x faster than existing combinatorial approaches and 18% more accurate than existing approximate subgraph matching methods. 1.I NTRODUCTION Given a query graph, the problem of subgraph isomorphism matching is to determine if a query graph is isomorphic to a subgraph of a large target graph. If the graphs include node and edge features, both the topology as well as the features should be matched. Subgraph matching is a crucial problem in many biology, social network and knowledge graph applications (Gentner, 1983; Raymond et al., 2002; Yang & Sze, 2007; Dai et al., 2019). For example, in social networks and biomedical network science, researchers investigate important subgraphs by counting them in a given network (Alon et al., 2008).
    [Show full text]
  • Some Planar Embeddings of Chainable Continua Can Be
    Some planar embeddings of chainable continua can be expressed as inverse limit spaces by Susan Pamela Schwartz A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mathematics Montana State University © Copyright by Susan Pamela Schwartz (1992) Abstract: It is well known that chainable continua can be expressed as inverse limit spaces and that chainable continua are embeddable in the plane. We give necessary and sufficient conditions for the planar embeddings of chainable continua to be realized as inverse limit spaces. As an example, we consider the Knaster continuum. It has been shown that this continuum can be embedded in the plane in such a manner that any given composant is accessible. We give inverse limit expressions for embeddings of the Knaster continuum in which the accessible composant is specified. We then show that there are uncountably many non-equivalent inverse limit embeddings of this continuum. SOME PLANAR EMBEDDINGS OF CHAIN ABLE OONTINUA CAN BE EXPRESSED AS INVERSE LIMIT SPACES by Susan Pamela Schwartz A thesis submitted in partial fulfillment of the requirements for the degree of . Doctor of Philosophy in Mathematics MONTANA STATE UNIVERSITY Bozeman, Montana February 1992 D 3 l% ii APPROVAL of a thesis submitted by Susan Pamela Schwartz This thesis has been read by each member of the thesis committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. g / / f / f z Date Chairperson, Graduate committee Approved for the Major Department ___ 2 -J2 0 / 9 Date Head, Major Department Approved for the College of Graduate Studies Date Graduate Dean iii STATEMENT OF PERMISSION TO USE .
    [Show full text]
  • Cauchy Graph Embedding
    Cauchy Graph Embedding Dijun Luo [email protected] Chris Ding [email protected] Feiping Nie [email protected] Heng Huang [email protected] The University of Texas at Arlington, 701 S. Nedderman Drive, Arlington, TX 76019 Abstract classify unsupervised embedding approaches into two cat- Laplacian embedding provides a low- egories. Approaches in the first category are to embed data dimensional representation for the nodes of into a linear space via linear transformations, such as prin- a graph where the edge weights denote pair- ciple component analysis (PCA) (Jolliffe, 2002) and mul- wise similarity among the node objects. It is tidimensional scaling (MDS) (Cox & Cox, 2001). Both commonly assumed that the Laplacian embed- PCA and MDS are eigenvector methods and can model lin- ding results preserve the local topology of the ear variabilities in high-dimensional data. They have been original data on the low-dimensional projected long known and widely used in many machine learning ap- subspaces, i.e., for any pair of graph nodes plications. with large similarity, they should be embedded However, the underlying structure of real data is often closely in the embedded space. However, in highly nonlinear and hence cannot be accurately approx- this paper, we will show that the Laplacian imated by linear manifolds. The second category ap- embedding often cannot preserve local topology proaches embed data in a nonlinear manner based on differ- well as we expected. To enhance the local topol- ent purposes. Recently several promising nonlinear meth- ogy preserving property in graph embedding, ods have been proposed, including IsoMAP (Tenenbaum we propose a novel Cauchy graph embedding et al., 2000), Local Linear Embedding (LLE) (Roweis & which preserves the similarity relationships of Saul, 2000), Local Tangent Space Alignment (Zhang & the original data in the embedded space via a Zha, 2004), Laplacian Embedding/Eigenmap (Hall, 1971; new objective.
    [Show full text]
  • Strong Inverse Limit Reflection
    Strong Inverse Limit Reflection Scott Cramer March 4, 2016 Abstract We show that the axiom Strong Inverse Limit Reflection holds in L(Vλ+1) assuming the large cardinal axiom I0. This reflection theorem both extends results of [4], [5], and [3], and has structural implications for L(Vλ+1), as described in [3]. Furthermore, these results together highlight an analogy between Strong Inverse Limit Reflection and the Axiom of Determinacy insofar as both act as fundamental regularity properties. The study of L(Vλ+1) was initiated by H. Woodin in order to prove properties of L(R) under large cardinal assumptions. In particular he showed that L(R) satisfies the Axiom of Determinacy (AD) if there exists a non-trivial elementary embedding j : L(Vλ+1) ! L(Vλ+1) with crit (j) < λ (an axiom called I0). We investigate an axiom called Strong Inverse Limit Reflection for L(Vλ+1) which is in some sense analogous to AD for L(R). Our main result is to show that if I0 holds at λ then Strong Inverse Limit Reflection holds in L(Vλ+1). Strong Inverse Limit Reflection is a strong form of a reflection property for inverse limits. Axioms of this form generally assert the existence of a collection of embeddings reflecting a certain amount of L(Vλ+1), together with a largeness assumption on the collection. There are potentially many different types of axioms of this form which could be considered, but we concentrate on a particular form which, by results in [3], has certain structural consequences for L(Vλ+1), such as a version of the perfect set property.
    [Show full text]
  • Lecture 9: the Whitney Embedding Theorem
    LECTURE 9: THE WHITNEY EMBEDDING THEOREM Historically, the word \manifold" (Mannigfaltigkeit in German) first appeared in Riemann's doctoral thesis in 1851. At the early times, manifolds are defined extrinsi- cally: they are the set of all possible values of some variables with certain constraints. Translated into modern language,\smooth manifolds" are objects that are (locally) de- fined by smooth equations and, according to last lecture, are embedded submanifolds in Euclidean spaces. In 1912 Weyl gave an intrinsic definition for smooth manifolds. A natural question is: what is the difference between the extrinsic definition and the intrinsic definition? Is there any \abstract" manifold that cannot be embedded into any Euclidian space? In 1930s, Whitney and others settled this foundational problem: the two ways of defining smooth manifolds are in fact the same. In fact, Whitney's result is much more stronger than this. He showed that not only one can embed any smooth manifold into some Euclidian space, but that the dimension of the Euclidian space can be chosen to be (as low as) twice the dimension of the manifold itself! Theorem 0.1 (The Whitney embedding theorem). Any smooth manifold M of di- mension m can be embedded into R2m+1. Remark. In 1944, by using completely different techniques (now known as the \Whitney trick"), Whitney was able to prove Theorem 0.2 (The Strong Whitney Embedding Theorem). Any smooth man- ifold M of dimension m ≥ 2 can be embedded into R2m (and can be immersed into R2m−1). We will not prove this stronger version in this course, but just mention that the Whitney trick was further developed in h-cobordism theory by Smale, using which he proved the Poincare conjecture in dimension ≥ 5 in 1961! Remark.
    [Show full text]
  • A Novel Approach to Embedding of Metric Spaces
    A Novel Approach to Embedding of Metric Spaces Thesis submitted for the degree of Doctor of Philosophy By Ofer Neiman Submitted to the Senate of the Hebrew University of Jerusalem April, 2009 This work was carried out under the supervision of: Prof. Yair Bartal 1 Abstract An embedding of one metric space (X, d) into another (Y, ρ) is an injective map f : X → Y . The central genre of problems in the area of metric embedding is finding such maps in which the distances between points do not change “too much”. Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The mathematical theory of metric embedding is well studied in both pure and applied analysis and has more recently been a source of interest for computer scientists as well. Most of this work is focused on the development of bi-Lipschitz mappings between metric spaces. In this work we present new concepts in metric embeddings as well as new embedding methods for metric spaces. We focus on finite metric spaces, however some of the concepts and methods may be applicable in other settings as well. One of the main cornerstones in finite metric embedding theory is a celebrated theorem of Bourgain which states that every finite metric space on n points embeds in Euclidean space with O(log n) distortion. Bourgain’s result is best possible when considering the worst case distortion over all pairs of points in the metric space.
    [Show full text]
  • Isomorphism and Embedding Problems for Infinite Limits of Scale
    Isomorphism and Embedding Problems for Infinite Limits of Scale-Free Graphs Robert D. Kleinberg ∗ Jon M. Kleinberg y Abstract structure of finite PA graphs; in particular, we give a The study of random graphs has traditionally been characterization of the graphs H for which the expected dominated by the closely-related models (n; m), in number of subgraph embeddings of H in an n-node PA which a graph is sampled from the uniform distributionG graph remains bounded as n goes to infinity. on graphs with n vertices and m edges, and (n; p), in n G 1 Introduction which each of the 2 edges is sampled independently with probability p. Recen tly, however, there has been For decades, the study of random graphs has been dom- considerable interest in alternate random graph models inated by the closely-related models (n; m), in which designed to more closely approximate the properties of a graph is sampled from the uniformG distribution on complex real-world networks such as the Web graph, graphs with n vertices and m edges, and (n; p), in n G the Internet, and large social networks. Two of the most which each of the 2 edges is sampled independently well-studied of these are the closely related \preferential with probability p.The first was introduced by Erd}os attachment" and \copying" models, in which vertices and R´enyi in [16], the second by Gilbert in [19]. While arrive one-by-one in sequence and attach at random in these random graphs have remained a central object \rich-get-richer" fashion to d earlier vertices.
    [Show full text]
  • Inverse Limit Spaces of Interval Maps
    FACULTY OF SCIENCE DEPARTMENT OF MATHEMATICS Ana Anušic´ INVERSE LIMIT SPACES OF INTERVAL MAPS DOCTORAL THESIS Zagreb, 2018 PRIRODOSLOVNO - MATEMATICKIˇ FAKULTET MATEMATICKIˇ ODSJEK Ana Anušic´ INVERZNI LIMESI PRESLIKAVANJA NA INTERVALU DOKTORSKI RAD Zagreb, 2018. FACULTY OF SCIENCE DEPARTMENT OF MATHEMATICS Ana Anušic´ INVERSE LIMIT SPACES OF INTERVAL MAPS DOCTORAL THESIS Supervisors: Univ.-Prof. PhD Henk Bruin izv. prof. dr. sc. Sonja Štimac Zagreb, 2018 PRIRODOSLOVNO - MATEMATICKIˇ FAKULTET MATEMATICKIˇ ODSJEK Ana Anušic´ INVERZNI LIMESI PRESLIKAVANJA NA INTERVALU DOKTORSKI RAD Mentori: Univ.-Prof. PhD Henk Bruin izv. prof. dr. sc. Sonja Štimac Zagreb, 2018. Acknowledgements During my PhD studies I have met so many extraordinary people who became not only my future colleagues but my dear friends. They all deserve to be mentioned here and it is going to be really hard not to leave somebody out. I would like to express my deepest gratitude to my supervisors Sonja Štimac and Henk Bruin. Sonja, thank you for introducing me to the area, giving me a push into the community, and for the life lessons I am still to comprehend. Henk, thank you for openly sharing your knowledge, treating me like an equal from the very beginning, never locking your doors, and turtle keeping it simple. I am also deeply indebted to Jernej and Vesna Činč. Guys, thank you for being the best friends a person can have. Jernej, I also have to thank you for your patience during our collaboration. The completion of this thesis would not have been possible without Martina Stojić and Goran Erceg who shared their template with me, Mario Stipčić who helped me hand the thesis in, and the committee members who carefully read the first drafts and improved it with valuable comments.
    [Show full text]
  • Embedding Smooth Diffeomorphisms in Flows
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector J. Differential Equations 248 (2010) 1603–1616 Contents lists available at ScienceDirect Journal of Differential Equations www.elsevier.com/locate/jde Embedding smooth diffeomorphisms in flows Xiang Zhang 1 Department of Mathematics, Shanghai Jiaotong University, Shanghai 200240, People’s Republic of China article info abstract Article history: In this paper we study the problem on embedding germs of Received 24 May 2009 smooth diffeomorphisms in flows in higher dimensional spaces. Revised 10 August 2009 First we prove the existence of embedding vector fields for a local Available online 30 September 2009 diffeomorphism with its nonlinear term a resonant polynomial. Then using this result and the normal form theory, we obtain MSC: k ∈ N ∪{∞ } 34A34 a class of local C diffeomorphisms for k , ω which 34C41 admit embedding vector fields with some smoothness. Finally we 37G05 prove that for any k ∈ N ∪{∞} under the coefficient topology the 58D05 subset of local Ck diffeomorphisms having an embedding vector field with some smoothness is dense in the set of all local Ck Keywords: diffeomorphisms. Local diffeomorphism © 2009 Elsevier Inc. All rights reserved. Embedding flow Smoothness 1. Introduction and statement of the main results Let F(x) be a Ck smooth diffeomorphism on a smooth manifold M in Rn with k ∈ N ∪{∞, ω}, where N is the set of natural numbers and Cω denotes the class of analytic functions. A vector field X defined on the manifold M is called an embedding vector field of F(x) if F(x) is the Poincaré map of the vector field X .
    [Show full text]
  • Lecture 5: Submersions, Immersions and Embeddings
    LECTURE 5: SUBMERSIONS, IMMERSIONS AND EMBEDDINGS 1. Properties of the Differentials Recall that the tangent space of a smooth manifold M at p is the space of all 1 derivatives at p, i.e. all linear maps Xp : C (M) ! R so that the Leibnitz rule holds: Xp(fg) = g(p)Xp(f) + f(p)Xp(g): The differential (also known as the tangent map) of a smooth map f : M ! N at p 2 M is defined to be the linear map dfp : TpM ! Tf(p)N such that dfp(Xp)(g) = Xp(g ◦ f) 1 for all Xp 2 TpM and g 2 C (N). Remark. Two interesting special cases: • If γ :(−"; ") ! M is a curve such that γ(0) = p, then dγ0 maps the unit d d tangent vector dt at 0 2 R to the tangent vectorγ _ (0) = dγ0( dt ) of γ at p 2 M. • If f : M ! R is a smooth function, we can identify Tf(p)R with R by identifying d a dt with a (which is merely the \derivative $ vector" correspondence). Then for any Xp 2 TpM, dfp(Xp) 2 R. Note that the map dfp : TpM ! R is linear. ∗ In other words, dfp 2 Tp M, the dual space of TpM. We will call dfp a cotangent vector or a 1-form at p. Note that by taking g = Id 2 C1(R), we get Xp(f) = dfp(Xp): For the differential, we still have the chain rule for differentials: Theorem 1.1 (Chain rule). Suppose f : M ! N and g : N ! P are smooth maps, then d(g ◦ f)p = dgf(p) ◦ dfp.
    [Show full text]
  • Metric Manifold Learning: Preserving the Intrinsic Geometry
    Metric Learning and Manifolds Metric Manifold Learning: Preserving the Intrinsic Geometry Dominique Perrault-Joncas [email protected] Google, Inc. Seattle, WA 98103, USA Marina Meilă [email protected] Department of Statistics University of Washington Seattle, WA 98195-4322, USA Editor: Abstract A variety of algorithms exist for performing non-linear dimension reduction, but these algorithms do not preserve the original geometry of the data except in special cases. In general, in the low-dimensional representations obtained, distances are distorted, as well as angles, areas, etc. This paper proposes a generic method to estimate the distortion incurred at each point of an embedding, and subsequently to “correct” distances and other intrinsic geometric quantities back to their original values (up to sampling noise). Our approach is based on augmenting the output of an embedding algorithm with geometric information embodied in the Riemannian metric of the manifold. The Riemannian metric allows one to compute geometric quantities (such as angle, length, or volume) for any coordinate system or embedding of the manifold. In this work, we provide an algorithm for estimating the Riemannian metric from data, consider its consistency, and demonstrate the uses of our approach in a variety of examples. 1. Introduction When working with high-dimensional data, one is regularly confronted with the problem of tractabil- ity and interpretability of the data. An appealing approach to this problem is dimension reduction: finding a low-dimensional representation of the data that preserves all or most of the “important information”. One popular idea for data consisting of vectors in Rr is to assume the so-called man- ifold hypothesis, whereby the data lie on a low-dimensional smooth manifold embedded in the high dimensional space.
    [Show full text]