Unsupervised Mining of Analogical Frames by Constraint Satisfaction

Unsupervised Mining of Analogical Frames by Constraint Satisfaction Lance De Vine Shlomo Geva Peter Bruza Queensland University of Technology Brisbane, Australia fl.devine, s.geva, [email protected] Abstract structure can be uncovered. For example, we are not aware of any proposal or system that is fo- It has been demonstrated that vector-based cussed on the unsupervised and systematic discov- representations of words trained on large text ery of analogies from word vectors that does not corpora encode linguistic regularities that may be exploited via the use of vector space arith- make use of exemplar relations, existing linguistic metic. This capability has been extensively resources or seed terms. The automatic identifica- explored and is generally measured via tasks tion of linguistic analogies, however, offers many which involve the automated completion of potential benefits for a diverse range of research linguistic proportional analogies. The ques- and applications, including language learning and tion remains, however, as to what extent it is computational creativity. possible to induce relations from word embed- Computational models of analogy have been dings in a principled and systematic way, without the provision of exemplars or seed terms. studied since at least the 1960’s (Hall, 1989; In this paper we propose an extensible and effi- French, 2002) and have addressed tasks relating cient framework for inducing relations via the to both proportional and structural analogy. Many use of constraint satisfaction. The method is computational systems are built as part of inves- efficient, unsupervised and can be customized tigations into how humans might perform analog- in various ways. We provide both quantitative ical inference (Gentner and Forbus, 2011). Most and qualitative analysis of the results. make use of Structure Mapping Theory (SMT) 1 Introduction (Gentner, 1983) or a variation thereof which maps one relational system to another, generally using a The use and study of analogical inference and symbolic representation. Other systems use vector structure has a long history in linguistics, logic, space representations constructed from corpora of cognitive psychology, scientific reasoning and ed- natural language text (Turney, 2013). The analo- ucation (Bartha, 2016), amongst others. The use gies that are computed using word embeddings of analogy has played an especially important have primarily been proportional analogies and are role in the study of language, language change, closely associated with the prediction of relations and language acquisition and learning (Kiparsky, between words. For example, a valid semantic 1992). It has been a part of the study of phonology, proportional analogy is “cat is to feline as dog is morphology, orthography and syntactic grammar to canine” which can be written as “cat : feline :: (Skousen, 1989), as well as the development of dog : canine.” applications such as machine translation and para- In linguistics proportional analogies have been phasing (Lepage and Denoual, 2005). extensively studied in the context of both in- Recent progress with the construction of vec- flectional and derivational morphology (Blevins, tor space representations of words based on their 2016). Proportional analogies are used as part distributional profiles has revealed that analogi- of an inference process to fill the cells/slots in a cal structure can be discovered and operationalised word paradigm. A paradigm is an array of mor- via the use of vector space algebra (Mikolov et al., phological variations of a lexeme. For example, 2013b). There remain many questions regarding fcat, catsg is a simple singular-noun, plural-noun the extent to which word vectors encode analog- paradigm in English. Word paradigms exhibit ical structure and also the extent to which this inter-dependencies that facilitate the inference of Lance De Vine, Shlomo Geva and Peter Bruza. 2018. Unsupervised Mining of Analogical Frames by Constraint Satisfaction. In Proceedings of Australasian Language Technology Association Workshop, pages 34−43. new forms and for this reason have been studied composition law ⊕ making it a semi-group (X,⊕). within the context of language change. The in- This definition also applies to any richer algebraic n formativeness of a form correlates with the degree structure such as groups or vector spaces. In R , to which knowledge of the form reduces uncer- given x1, x2 and x3 there is always only one point tainty about other forms within the same paradigm that can be assigned to x4 such that proportionality (Blevins et al., 2017). holds. (Miclet et al., 2008) define a relaxed form In this paper we propose a construction which of analogy which reads as “x1 is to x2 almost as we call an analogical frame. It is intended to elicit x3 is to x4”. To accompany this they introduce a associations with the terms semantic frame and measure of analogical dissimilarity (AD) which is proportional analogy. It is an extension of a lin- a positive real value and takes the value 0 when guistic analogy in which the elements satisfy cer- the analogy holds perfectly. A set of four points n tain constraints that allow them to be induced in an R can therefore be scored for analogical dissimi- unsupervised manner from natural language text. larity and ranked. We expect that analogical frames will be useful for a variety of purposes relating to the automated 2.2 Word Vectors and Proportional induction of syntactic and semantic relations and Analogies categories . The background just mentioned provides a use- The primary contributions of this paper are two- ful context within which to place the work on lin- fold: guistic regularities in word vectors (Mikolov et al., 2013b; Levy et al., 2014). (Mikolov et al., 2013b) 1. We introduce a generalization of proportional showed that analogies can be completed using vec- analogies with word embeddings which we tor addition of word embeddings. This means that call analogical frames. given x1; x2 and x3 it is possible to infer the value of x . This is accomplished with the vector offset 2. We introduce an efficient constraint satisfac- 4 formula, or 3COSADD (Levy et al., 2014). tion based approach to inducing analogical frames from natural language embeddings in an unsupervised fashion. arg max s(x4; x2 + x3 − x1) 3CosAdd x4 In section 2 we present background and related The s in 3COSADD is a similarity measure. In research. In section 3 we present and explain the practice unit vectors are generally used with co- proposal of Analogical Frames. In section 4 we sine similarity. (Levy et al., 2014) introduced an present methods implemented for ensuring search expression 3COSMUL which tends to give a small efficiency of Analogical Frames. In section 5 we improvement when evaluated on analogy comple- present some analysis of empirical results. In section tasks. tion 6 we present discussion of the proposal and in section 7 we conclude. s(x4; x3) · s(x4; x2) arg max 3COSMUL x4 s(x4; x1) + 2 Background and Related Work 3COSADD and 3COSMUL are effectively scor- 2.1 Proportional Analogies ing functions that are used to judge the correctness A proportional analogy is a 4-tuple which we write of a value for x4 given values for x1; x2 and x3. as x1 : x2 :: x3 : x4 and read as “x1 is to x2 as 2.3 Finding Analogies x3 is to x4”, with the elements of the analogy be- longing to some domain X (we use this notation Given that it is possible to complete analogies with as it is helpful later). From here-on we will use word vectors it is natural to ask whether analogies the term “analogy” to refer to proportional analo- can be identified without being given x1; x2 and gies unless indicated otherwise. Analogies can be x3.(Stroppa and Yvon, 2005) considers an anal- defined over different types of domains, for ex- ogy to be valid when analogical proportions hold ample, strings, geometric figures, numbers, vec- between all terms in the analogy. They describe a tor spaces, images etc. (Stroppa and Yvon, 2005) finite-state solver which searches for formal analo- propose a definition of proportional analogy over gies in the domain of strings and trees. As noted any domain which is equipped with an internal by several authors (Lavallee´ and Langlais, 2009; 35 Langlais, 2016; Beltran et al., 2015) a brute force 2014) to scale to larger datasets for the purpose search to discovering proportional analogies is of machine translation, but also limit themselves computationally difficult, with the complexity of to the formal or graphemic level instead of more a naive approach being at least O(n3) and perhaps general semantic relations between words. O(n4). A computational procedure must at least traverse the space of all 3-tuples if assuming that 2.4 Other Related Work the 4th term of an analogy can be efficiently in- The present work is related to a number of re- ferred. search themes in language learning, relational For example, if we use a brute force approach learning and natural language processing. We pro- to discovering linguistic analogies using a vocab- vide a small sample of these. ulary of 100 000 words, we would need to ex- (Holyoak and Thagard, 1989) introduce the use amine all combinations of 4-tuples, or 1000004, of constraint satisfaction as a key requirement for or 1020 combinations. Various strategies may be models of analogical mapping. Their computer program ACME (Analogical Constraint Mapping Engine) uses a connectionist network to balance x x 1 2 structural, semantic and pragmatic constraints for mapping relations. (Hummel and Holyoak, 1997) propose a computational model of analogical in- x3 x4 ference and schema induction using distributed patterns for representing objects and predicates. Figure 1: A Proportional Analogy template, with 4 (Doumas et al., 2008) propose a computational variables to be assigned appropriate values model which provides an account of how struc- tured relation representations can be learned from considered for making this problem tractable.

Load more