<<

Unsupervised Mining of Analogical Frames by Constraint Satisfaction

Lance De Vine Shlomo Geva Peter Bruza Queensland University of Technology Brisbane, Australia {l.devine, s.geva, p.bruza}@qut.edu.au

Abstract structure can be uncovered. For example, we are not aware of any proposal or system that is fo- It has been demonstrated that vector-based cussed on the unsupervised and systematic discov- representations of words trained on large text ery of analogies from word vectors that does not corpora encode linguistic regularities that may be exploited via the use of vector space arith- make use of exemplar relations, existing linguistic metic. This capability has been extensively resources or seed terms. The automatic identifica- explored and is generally measured via tasks tion of linguistic analogies, however, offers many which involve the automated completion of potential benefits for a diverse range of research linguistic proportional analogies. The ques- and applications, including language learning and tion remains, however, as to what extent it is computational creativity. possible to induce relations from word embed- Computational models of analogy have been dings in a principled and systematic way, with- out the provision of exemplars or seed terms. studied since at least the 1960’s (Hall, 1989; In this paper we propose an extensible and effi- French, 2002) and have addressed tasks relating cient framework for inducing relations via the to both proportional and structural analogy. Many use of constraint satisfaction. The method is computational systems are built as part of inves- efficient, unsupervised and can be customized tigations into how humans might perform analog- in various ways. We provide both quantitative ical inference (Gentner and Forbus, 2011). Most and qualitative analysis of the results. make use of Structure Mapping Theory (SMT) 1 Introduction (Gentner, 1983) or a variation thereof which maps one relational system to another, generally using a The use and study of analogical inference and symbolic representation. Other systems use vector structure has a long history in , logic, space representations constructed from corpora of cognitive psychology, scientific reasoning and ed- natural language text (Turney, 2013). The analo- ucation (Bartha, 2016), amongst others. The use gies that are computed using word embeddings of analogy has played an especially important have primarily been proportional analogies and are role in the study of language, language change, closely associated with the prediction of relations and and learning (Kiparsky, between words. For example, a valid semantic 1992). It has been a part of the study of , proportional analogy is “cat is to feline as dog is , and syntactic grammar to canine” which can be written as “cat : feline :: (Skousen, 1989), as well as the development of dog : canine.” applications such as machine translation and para- In linguistics proportional analogies have been phasing (Lepage and Denoual, 2005). extensively studied in the context of both in- Recent progress with the construction of vec- flectional and derivational morphology (Blevins, tor space representations of words based on their 2016). Proportional analogies are used as part distributional profiles has revealed that analogi- of an inference process to fill the cells/slots in a cal structure can be discovered and operationalised word paradigm. A paradigm is an array of mor- via the use of vector space algebra (Mikolov et al., phological variations of a lexeme. For example, 2013b). There remain many questions regarding {cat, cats} is a simple singular-noun, plural-noun the extent to which word vectors encode analog- paradigm in English. Word paradigms exhibit ical structure and also the extent to which this inter-dependencies that facilitate the inference of

Lance De Vine, Shlomo Geva and Peter Bruza. 2018. Unsupervised Mining of Analogical Frames by Constraint Satisfaction. In Proceedings of Australasian Language Technology Association Workshop, pages 34−43. new forms and for this reason have been studied composition law ⊕ making it a semi-group (X,⊕). within the context of language change. The in- This definition also applies to any richer algebraic n formativeness of a form correlates with the degree structure such as groups or vector spaces. In R , to which knowledge of the form reduces uncer- given x1, x2 and x3 there is always only one point tainty about other forms within the same paradigm that can be assigned to x4 such that proportionality (Blevins et al., 2017). holds. (Miclet et al., 2008) define a relaxed form In this paper we propose a construction which of analogy which reads as “x1 is to x2 almost as we call an analogical frame. It is intended to elicit x3 is to x4”. To accompany this they introduce a associations with the terms semantic frame and measure of analogical dissimilarity (AD) which is proportional analogy. It is an extension of a lin- a positive real value and takes the value 0 when guistic analogy in which the elements satisfy cer- the analogy holds perfectly. A set of four points n tain constraints that allow them to be induced in an R can therefore be scored for analogical dissimi- unsupervised manner from natural language text. larity and ranked. We expect that analogical frames will be useful for a variety of purposes relating to the automated 2.2 Word Vectors and Proportional induction of syntactic and semantic relations and Analogies categories . The background just mentioned provides a use- The primary contributions of this paper are two- ful context within which to place the work on lin- fold: guistic regularities in word vectors (Mikolov et al., 2013b; Levy et al., 2014). (Mikolov et al., 2013b) 1. We introduce a generalization of proportional showed that analogies can be completed using vec- analogies with word embeddings which we tor addition of word embeddings. This means that call analogical frames. given x1, x2 and x3 it is possible to infer the value of x . This is accomplished with the vector offset 2. We introduce an efficient constraint satisfac- 4 formula, or 3COSADD (Levy et al., 2014). tion based approach to inducing analogical frames from natural language embeddings in an unsupervised fashion. arg max s(x4, x2 + x3 − x1) 3CosAdd x4 In section 2 we present background and related The s in 3COSADD is a similarity measure. In research. In section 3 we present and explain the practice unit vectors are generally used with co- proposal of Analogical Frames. In section 4 we sine similarity. (Levy et al., 2014) introduced an present methods implemented for ensuring search expression 3COSMUL which tends to give a small efficiency of Analogical Frames. In section 5 we improvement when evaluated on analogy comple- present some analysis of empirical results. In sec- tion tasks. tion 6 we present discussion of the proposal and in section 7 we conclude. s(x4, x3) · s(x4, x2) arg max 3COSMUL x4 s(x4, x1) +  2 Background and Related Work 3COSADD and 3COSMUL are effectively scor- 2.1 Proportional Analogies ing functions that are used to judge the correctness A proportional analogy is a 4-tuple which we write of a value for x4 given values for x1, x2 and x3. as x1 : x2 :: x3 : x4 and read as “x1 is to x2 as 2.3 Finding Analogies x3 is to x4”, with the elements of the analogy be- longing to some domain X (we use this notation Given that it is possible to complete analogies with as it is helpful later). From here-on we will use word vectors it is natural to ask whether analogies the term “analogy” to refer to proportional analo- can be identified without being given x1, x2 and gies unless indicated otherwise. Analogies can be x3.(Stroppa and Yvon, 2005) considers an anal- defined over different types of domains, for ex- ogy to be valid when analogical proportions hold ample, strings, geometric figures, numbers, vec- between all terms in the analogy. They describe a tor spaces, images etc. (Stroppa and Yvon, 2005) finite-state solver which searches for formal analo- propose a definition of proportional analogy over gies in the domain of strings and trees. As noted any domain which is equipped with an internal by several authors (Lavallee´ and Langlais, 2009;

35 Langlais, 2016; Beltran et al., 2015) a brute force 2014) to scale to larger datasets for the purpose search to discovering proportional analogies is of machine translation, but also limit themselves computationally difficult, with the complexity of to the formal or graphemic level instead of more a naive approach being at least O(n3) and perhaps general semantic relations between words. O(n4). A computational procedure must at least traverse the space of all 3-tuples if assuming that 2.4 Other Related Work the 4th term of an analogy can be efficiently in- The present work is related to a number of re- ferred. search themes in language learning, relational For example, if we use a brute force approach learning and natural language processing. We pro- to discovering linguistic analogies using a vocab- vide a small sample of these. ulary of 100 000 words, we would need to ex- (Holyoak and Thagard, 1989) introduce the use amine all combinations of 4-tuples, or 1000004, of constraint satisfaction as a key requirement for or 1020 combinations. Various strategies may be models of analogical mapping. Their computer program ACME (Analogical Constraint Mapping Engine) uses a connectionist network to balance x x 1 2 structural, semantic and pragmatic constraints for mapping relations. (Hummel and Holyoak, 1997) propose a computational model of analogical in- x3 x4 ference and schema induction using distributed patterns for representing objects and predicates. Figure 1: A Proportional Analogy template, with 4 (Doumas et al., 2008) propose a computational variables to be assigned appropriate values model which provides an account of how struc- tured relation representations can be learned from considered for making this problem tractable. The unstructured data. More specifically to language first observation is that symmetries of an analogy acquisition, (Bod, 2009) uses analogies over trees should not be recomputed (Lepage, 2014). This to derive and analyse new sentences by combin- can reduce the compute time by 8 for a single anal- ing fragments of previously seen sentences. The ogy as there are 8 symmetric configurations. proposed framework is able to replicate a range of Another proposed strategy is the construction of phenomena in language acquisition. a feature tree for the rapid computation and anal- From a more cognitive perspective (Kurtz et al., ysis of tuple differences over vectors with binary 2001) investigates how mutual alignment of two attributes (Lepage, 2014). This method was used situations can create better understanding of both. to discover analogies between images of Chinese Related to this (Gentner, 2010), and many others, characters. This has complexity O(n2). It com- argue that analogical ability is the key factor in hu- n(n+1) man cognitive development. putes the 2 vectors between pairs of tuples and collects them together into clusters contain- (Turney, 2006, 2013) makes extensive investi- ing the same difference vector. A variation of this gations of the use of corpus based methods for method was reported in (Fam and Lepage, 2016) determining relational similarities and predicting and (Fam and Lepage, 2017) for automatically dis- analogies. covering analogical grids of word paradigms us- (Miclet and Nicolas, 2015) propose the concept ing edit distances between word strings. It is not of an analogical complex which is a blend of ana- immediately obvious, however, how to extend this logical proportions and formal concept analysis. to the case of word embeddings where differences More specifically in relation to word embed- between word representations are real valued vec- dings, (Zhang et al., 2016) presents an unsu- tors. pervised approach for explaining the meaning of A related method is used in (Beltran et al., 2015) terms via word vector comparison. for identifying analogies in relational databases. In the next section we describe an approach It is less constrained as it uses analogical dis- which addresses the task of inducing analogies in similarity as a metric when determining valid an unsupervised fashion from word vectors and analogies. builds on existing work relating to word embed- (Langlais, 2016) extend methods from (Lepage, dings and linguistic regularities.

36 3 Analogical Frames of variables to the words which are within the nearest neighbourhood of the words to which they The primary task that we address in this paper is are connected. We define this as: the discovery of linguistic proportional analogies in an unsupervised fashion given only the distri- xi ∈ Neight(xj) and xj ∈ Neight(xi) butional profile of words. The approach that we take is to consider the problem as a constraint sat- where Neight(xi) is the nearest neighbourhood isfaction problem (CSP) (Rossi et al., 2006). We of t words of the the word assigned to xi, as increase the strength of constraints until we can measured in the vector space representation of the accurately decide when a given set of words and words. their embeddings forms a valid proportional anal- C4. Parallel Constraint. The Parallel Con- ogy. At this point we introduce some terminology: straint forces opposing difference vectors to have Constraint satisfaction problems are generally a minimal degree of parallelism. defined as a triple P = hX,D,Ci where x\2 − x1 · x\5 − x4 < pT hreshold X = {x1, ..., xn} is a set of variables. D = {d1, ..., dn} is the set of domains associated where pThreshold is a parameter. For the paral- with the variables. lel constraint we ensure that the difference vector C = {c , ..., c } is the set of constraints to be sat- 1 m x2 − x1 has a minimal cosine similarity to the isfied. difference vector x5 − x1. This constraint over- A solution to a constraint satisfaction prob- laps to some extent with the proportionality con- lem must assign a single value to each variable straint (below), however it serves a different pur- x1, ..., xn in the problem. There may be multiple pose, which is to eliminate low probability candi- solutions. In our problem formulation there is only date analogies. one domain which is the set of word types which C5. Proportionality Constraint. The Propor- comprise the vocabulary. Each variable must be tionality Constraint constrains the vector space assigned a word identifier and each word is as- representation of words to form approximate ge- sociated with one or more vector space represen- ometric proportional analogies. It is a quarternary tations. Constraints on the words and associated constraint. For any given 4-tuple, we use the con- vector space representation limit the values that cept of “inter-predictability” (Blevins et al., 2017) the variables can take. From here-on we will we to decide whether the 4-tuple is acceptable. We use the bolded symbol xi to indicate the vector enforce inter-predictability by requiring that each space value of the word assigned to the variable term in a 4-tuple is predicted by the other three xi. terms. This implies four analogy completion tasks In our proposal we use the following five which must be satisfied for the variable assign- constraints. ment to be accepted. x1 : x2 :: x3 : x ⇒ x = x4 C1. AllDiff constraint. The Alldiff constraint x2 : x1 :: x4 : x ⇒ x = x3 constrains all terms of the analogy to be distinct x3 : x4 :: x1 : x ⇒ x = x2 (The Alldiff constraint is a common constraint in x4 : x3 :: x2 : x ⇒ x = x1 CSPs), such that xi 6= xj for all 1 < i < j < n. C2. Asymmetry constraint. The asymmetry con- (Stroppa and Yvon, 2005) use a similar ap- straint (Meseguer and Torras, 2001) is used to proach with exact formal analogies. With our eliminate unnecessary searches in the search tree. approach, however, we complete analogies using It is defined as a partial ordering on the values of word vectors and analogy completion formulas a subset of the variables. In the case of a 2x3 ana- (eg. using 3COSADD or 3COSMUL, or a deriva- logical frame (figure2), for example, we define the tive). ordering as: The proportionality constraint is a relatively ex- x1 ≺ x2 ≺ x3, and x1 ≺ x4. pensive constraint to enforce as it requires many where the ordering is defined on the integer vector-vector operations and comparison against identifiers of the words in the vocabulary. all word vectors in the vocabulary. This constraint C3. Neighbourhood Constraint. The neigh- may be checked approximately which we discuss bourhood constraint is used to constrain the value in the next section.

37 3.1 The Insufficiency of 2x2 Analogies When we are not given any prior knowledge, or When we experiment with discovering 2x2 analo- exemplar, there is no privileged direction within gies using the constraints just described we find an analogical frame. For example, in figure2, the that we can’t easily set the parameters of the con- pair (x2, x5) has the same importance as (x4, x5). straints such that only valid analogies are pro- We first construct difference vectors and then duced, without severely limiting the types of average those that are aligned. analogies which we accept. For example, the fol- lowing analogy is produced “oldest : old :: earliest dif 2,1 = x\2 − x1 dif 4,1 = x\4 − x1 : earlier”. We find that these types of mistakes are dif = x − x dif = x − x common. We therefore take the step of expanding 5,4 \5 4 5,2 \5 2 the number of variables so that the state space is dif 6,3 = x\6 − x3 now larger (figure2). difsum1 = dif 2,1 + dif 5,4 3 1 2 difsum2 = dif 4,1 + dif 5,2 + dif 6,3 x1 x2 x3 difsum1 4 4 4 dif = 1 |difsum | 1 2 1 x4 x5 x6 difsum2 3 dif2 = |difsum2|

Figure 2: A 2x3 Analogical Frame The vector dif1 is the normalized average off- set vector indicated with a 1 in figure2. The vector The idea is to increase the inductive support for dif2 is the normalized average offset vector indi- each discovered analogy by requiring that analo- cated with a 4 in figure2. gies be part of a larger system of analogies. We Using these normalized average difference vec- refer to the larger system of analogies as an Ana- tors we define the scoring function for selecting x5 logical Frame. It is important to note that in figure given x1, x2 and x4 as: 2, x1 and x3 are connected. The numbers asso- arg max s(x5, x4) · s(x5, dif1) (1) ciated with the edges indicate aligned vector dif- x5 ferences. It is intended that the analogical pro- + s(x5, x2) · s(x5, dif2) portions hold according to the connectivity shown. For example, proportionality should hold such that The formulation makes use of all information available in the analogical frame. Previous work x1 : x2 :: x4 : x5 and x1 : x3 :: x4 : x6 and has provided much evidence for the linear com- x2 : x3 :: x5 : x6. It is also important to note that while we have added only two new variables, positionality of word vectors as embodied by we have increased the number of constraints by al- 3COSADD (Vylomova et al., 2015; Hakami et al., most 3 times. It is not exactly 3 because there is 2017). It has also been known since (Mikolov some redundancy in the proportions. et al., 2013a) that averaging the difference vectors of pairs exhibiting the same relation results in a 3.2 New Formulas difference vector with better predictive power for We now define a modified formula for complet- that relation (Drozd et al., 2016). ing analogies that are part of a larger systems of Extrapolating from figure2 larger analogical analogies such as the analogical frame in figure2. frames can be constructed, such as 2 x 4, 3 x 3, It can be observed that 3COSADD and 3COSMUL or 2 x 2 x 3. Each appropriately connected 4-tuple effectively assigns a score to vocabulary items and contained within the frame should satisfy the pro- then selects the item with the largest score. We portionality constraint. do the same but with a modified scoring function 4 Frame Discovery and Search Efficiency which is a hybrid of 3COSADD and 3COSMUL. 3COSADD or 3COSMUL could both be used as The primary challenge in this proposal is to effi- part of our approach, but the formula which we ciently search the space of variable assignments. propose, better captures the intuition of larger sys- We use a depth first approach to cover the search tems of analogies where the importance is placed space as well several other strategies to make this on 1) symmetry, and 2) average offset vectors. search efficient.

38 4.1 Word Embedding Neighbourhoods We find that if we set m to 10 then most incorrect analogies are eliminated. An exhaustive check for The most important strategy is the concentration correctness can be made as a post processing step. of compute resources on the nearest neighbour- hoods of word embeddings. Most analogies in- 4.3 Other Methods volve terms that are within the nearest neighbor- hood of each other when ranked according to sim- Another important method for increasing effi- ilarity (Linzen, 2016). We therefore compute the ciency is testing the degree to which opposing dif- nearest neighbour graph of every term in the vo- ference vectors in a candidate proportional anal- cabulary as a pre-processing step and store the re- ogy are parallel to each other. This is encapsulated sult as an adjacency list for each vocabulary item. in constraint C4. While using parallelism as a pa- We do this efficiently by using binary represen- rameter for solving word analogy problems has tations of the word embeddings (Jurgovsky et al., not been successful (Levy et al., 2014), we have 2016) and re-ranking using the full precision word found that the degree of parallelism is a good in- vectors. The nearest neighbour list of each vocab- dicator of the confidence that we can have in the ulary entry is used in two different ways, 1) for ex- analogy once other constraints have been satisfied. ploring the search space of variable assignments, Other methods employed to improve efficiency in- and 2) efficiently eliminating variable assignments clude the use of Bloom filters to test neighbour- (next section) that do not satisfy the proportional hood membership and the indexing of discovered analogy constraints. proportions so as not to repeat searching. When traversing the search tree we use the ad- 4.4 Extending Frames jacency list of an already assigned variable to as- sign a value to a nearby variable in the frame. The Frames can be extended by extending the initial breadth of the search tree at each node is therefore base frame in one or more directions and search- parameterized by a global parameter t assigned by ing for additional variable assignments that satisfy the user. The parameter t is one of the primary pa- all constraints. For example, a 2x3 frame can be rameters of the algorithm and will determine the extended to a 2x4 frame by assigning values to two length of the search and the size of the set of dis- new variables x7 and x8 (figure3). covered frames. x1 x2 x3 / x7 4.2 Elimination of Candidates by Sampling A general principle used in most CSP solving x4 x5 x6 / x8 is the quick elimination of improbable solutions. Figure 3: Extending a 2x3 frame When we assign values to variables in a frame we need to make sure that the proportional analogy constraint is satisfied. For four given variable as- As frames are extended the average offset vec- tors (equation 1) are recomputed so that the off- signments w1, w2, w3 and w4 this involves making set vectors become better predictors for computing sure that w4 is the best candidate for completing proportional analogies. the proportional analogy involving w1, w2 and w3. This is equivalent to knowing if there are any bet- 5 Experimental Setup and Results ter vocabulary entries than w4 for completing the analogy. Instead of checking all vocabulary en- The two criteria we use to measure our proposal tries we can limit the list of entries to the m near- include a) accuracy of analogy discovery, b) com- 1 est neighbours of w2, w3 and w4 to check if any pute scalability. Other criteria are also possible score higher than w4. If any of them score higher such as relation diversity and interestingness of re- the proportional analogy constraint is violated and lations. the variable assignment is discarded. The advan- The primary algorithm parameters are 1) The tage of this is that we only need to check 3 x m number of terms in the vocabulary to search, 2) entries to approximately test the correctness of w4. the size of the nearest neighbourhood of each term to search, 3) the degree of parallelism required for 1assuming w is farthest away from w and that the near- 1 4 opposing vector differences and 4) whether to ex- est neighbours of w1 are not likely to help check the correct- ness of w4 tend base frames.

39 5.1 Retrieving Analogical Completions from Table 1: Analogy Completion on Google Subset Frames When frames are discovered they are stored in a 3CosAdd 3CosMul Frames Frame Store, a data structure used to efficiently Sem. (2681) 2666 2660 2673 store and retrieve frames. Frames are indexed us- Syn. (5908) 5565 5602 5655 ing posting lists similar to a document index. To Tot. (8589) 8231 8262 8328 complete an incomplete analogy, all frames which contain all terms in the incomplete analogy are re- trieved. For each retrieved frame the candidate and 3CosMul using the same embeddings as used term for completing the analogy is determined by to build the frames. cross-referencing the indices of the terms within Results show that the frames are slightly more the frame (figure4). If diverse candidate terms accurate than 3CosAdd and 3CosMul, achieving are selected from multiple frames, voting is used 96.9% on the 8589 evaluation items. It needs to to select the final analogy completion, or random be stressed, however, that the objective is not to selection in the case of tied counts. outperform vector arithmetic based methods, but rather to verify that the frames have a high degree a : a :: a :? 1,1 1,3 3,1 of accuracy.

a1,1 a1,2 a1,3 To better determine the accuracy of the discov- a2,1 a2,2 a2,3 ered frames we also randomly sampled 1000 of the 21571 frames generated for the results shown in a3,1 a3,2 a3,3 table2, and manually checked them. The raw out- 4 Figure 4: Determining an analogy completion from puts are included in the online repository . These a larger frame frames cover many relations not included in the Google analogy test set. We found 9 frames with errors giving an accuracy of 99.1%. We conducted experiments with embeddings It should be noted that the accuracy of frames is constructed by ourselves as well as with publicly influenced by the quality of the embeddings. How- 2 accessible embeddings from the fastText web site ever, even with embeddings trained on small cor- trained on 600B tokens of the Common Crawl pora it is possible to discover analogies provided (Mikolov et al., 2018). that sufficient word embedding training epochs We evaluated the accuracy of the frames by at- have been completed. tempting to complete the analogies from the well The online repository contains further empirical 3 known Google analogy test set. The greatest evaluations and explanations regarding parameter challenge in this type of evaluation is adequately choices, including raw outputs and errors made by covering the evaluation items. At least three of the the system. terms in an analogy completion item need to be si- multaneously present in a single frame for the item 5.2 Scaling: Effect of Neighbourhood Size to be attempted. We report the results of a typical and pThreshold execution of the system using a nearest neighbour- Tables2 and3 show the number of frames discov- hood size of 25, a maximum vocabulary size of 50 ered on typical executions of the software. The re- 000, and a minimal cosine similarity between op- ported numbers are intended to give an indication posing vector differences of 0.3. For this set of of the relationship between neighbourhood size, parameters, 8589 evaluation items were answered number of frames produced and execution time. by retrieval from the frame store, covering approx- The reported times are for 8 software threads. imately 44% of the evaluation items (table1). Ap- proximately 30% of these were from the semantic 5.3 Qualitative Analysis category, and 70% from the syntactic. We com- pared the accuracy of completing analogies using From inspection of the frames we see that a large the frame store, to the accuracy of both 3CosAdd part of the relations discovered are grammatical or morpho-syntactic, or are related to high frequency 2https://fasttext.cc/docs/en/english-vectors.html 3http://download.tensorflow.org/data/questions-words.txt 4https://github.com/ldevine/AFM

40 Table 2: Par Thres = 0.3 (Less Constrained) as set membership constraints where sets may be clusters, or documents. Near. Sz. 15 20 25 30 One improvement that could be made to the pro- Frames 13282 16785 19603 21571 posed system is to facilitate the discovery of re- Time (sec) 20.1 29.3 38.3 45.9 lations that are not one-to-one. While we found many isolated examples of one-to-many relations expressed in the frames, a strictly symmetrical Table 3: Par Thres = 0.5 (More Constrained) proportional analogy does not seem ideal for cap- turing one-to-many relations. Near. Sz. 15 20 25 30 As outlined by (Turney, 2006) there are many Frames 6344 7995 9286 10188 applications of automating the construction and/or Time (sec) 10.4 15.7 21.1 26.0 discovery of analogical relations. Some of these include relation classification, metaphor detection, word sense disambiguation, information extrac- entities. However we also observe a large num- tion, question answering and thesaurus generation. ber of other types of relations such as synonyms, Analogical frames should also provide insight antonyms, domain alignments and various syntac- into the geometry of word embeddings and may tic mappings. We provide but a small sample be- provide an interesting way to measure their qual- low. ity. eyes eye blindness The most interesting application of the system is ears ear deafness in the area of computational creativity with a hu- Father Son Himself man in the loop. For example, analogical frames Mother Daughter Herself could be chosen for their interestingness and then geschichte gesellschaft musik expanded. histoire societe musique 6.1 Software and Online Repository aircraft flew skies air ships sailed seas naval The software implementing the proposed system as a set of command line applications can be found always constantly constant 6 often frequently frequent in the online repository . The software is imple- sometimes occasionally occasional mented in portable C++11 and compiles on both Windows and Unix based systems without com- brighter stronger piled dependencies. Example outputs of the sys- bright strong tem as well as parameter settings are provided in the online repository including the outputs created fainter weaker (2 x 2 x 2 Frame) from embeddings trained on a range of corpora. faint weak

We have also observed reliable mappings with 7 Future Work and Conclusions embeddings trained on non-English corpora. Further empirical evaluation is required. The The geometry of analogical frames can also be establishment of more suitable empirical bench- explored via visualizations of projections onto 2D marks for assessing the effectiveness of open 5 sub-spaces derived from the offset vectors. analogy discovery is important. The most inter- esting potential application of this work is in the 6 Discussion combination of automated discovery of analogies We believe that the constraint satisfaction ap- and human judgment. There is also the possibility proach introduced in this paper is advantageous of establishing a more open-ended compute because it is a systematic but flexible and can make architecture that could search continuously for use of methods from the constraint satisfaction analogical frames in an online fashion. domain. We have only mentioned a few of the primary CSP concepts in this paper. Other con- straints can be included in the formulation such

5Examples provided in the online repository 6 https://github.com/ldevine/AFM

41 References Keith J Holyoak and Paul Thagard. 1989. Analogical mapping by constraint satisfaction. Cognitive sci- Paul Bartha. 2016. Analogy and analogical reasoning. ence, 13(3):295–355. The Stanford Encyclopedia of Philosophy. John E Hummel and Keith J Holyoak. 1997. Dis- William Correa Beltran, Hel´ ene` Jaudoin, and Olivier tributed representations of structure: A theory of Pivert. 2015. A clustering-based approach to the analogical access and mapping. Psychological re- mining of analogical proportions. In Tools with Ar- view, 104(3):427. tificial Intelligence (ICTAI), 2015 IEEE 27th Inter- national Conference on, pages 125–131. IEEE. Johannes Jurgovsky, Michael Granitzer, and Christin Seifert. 2016. Evaluating memory efficiency and ro- James P Blevins. 2016. Word and paradigm morphol- bustness of word embeddings. In European Con- ogy. Oxford University Press. ference on Information Retrieval, pages 200–211. Springer. James P Blevins, Farrell Ackerman, Robert Malouf, J Audring, and F Masini. 2017. Word and paradigm Paul Kiparsky. 1992. Analogy. International Encyclo- morphology. pedia of Linguistics. Rens Bod. 2009. From exemplar to grammar: A prob- Kenneth J Kurtz, Chun-Hui Miao, and Dedre Gentner. abilistic analogy-based model of language learning. 2001. Learning by analogical bootstrapping. The Cognitive Science, 33(5):752–793. Journal of the Learning Sciences, 10(4):417–446.

Leonidas AA Doumas, John E Hummel, and Cather- Philippe Langlais. 2016. Efficient identification of for- ine M Sandhofer. 2008. A theory of the discovery mal analogies. and predication of relational concepts. Psychologi- cal review, 115(1):1. Jean-Franc¸ois Lavallee´ and Philippe Langlais. 2009. Unsupervised morphological analysis by formal Aleksandr Drozd, Anna Gladkova, and Satoshi Mat- analogy. In Workshop of the Cross-Language Eval- suoka. 2016. Word embeddings, analogies, and ma- uation Forum for European Languages, pages 617– chine learning: Beyond king-man+ woman= queen. 624. Springer. In Proceedings of COLING 2016, the 26th Inter- national Conference on Computational Linguistics: Yves Lepage. 2014. Analogies between binary im- Compu- Technical Papers, pages 3519–3530. ages: Application to chinese characters. In tational Approaches to Analogical Reasoning: Cur- Rashel Fam and Yves Lepage. 2016. Morphological rent Trends, pages 25–57. Springer. predictability of unseen words using computational Yves Lepage and Etienne Denoual. 2005. Purest ever analogy. example-based machine translation: Detailed pre- Rashel Fam and Yves Lepage. 2017. A study of the sentation and assessment. Machine Translation, saturation of analogical grids agnostically extracted 19(3-4):251–282. from texts. Omer Levy, Yoav Goldberg, and Israel Ramat-Gan. 2014. Linguistic regularities in sparse and explicit Robert M French. 2002. The computational modeling word representations. In CoNLL, pages 171–180. of analogy-making. Trends in cognitive Sciences, 6(5):200–205. Tal Linzen. 2016. Issues in evaluating seman- tic spaces using word analogies. arXiv preprint Dedre Gentner. 1983. Structure-mapping: A theo- arXiv:1606.07736. retical framework for analogy. Cognitive science, 7(2):155–170. Pedro Meseguer and Carme Torras. 2001. Exploit- ing symmetries within constraint satisfaction search. Dedre Gentner. 2010. Bootstrapping the mind: Ana- Artificial intelligence, 129(1-2):133–163. logical processes and symbol systems. Cognitive Science, 34(5):752–775. Laurent Miclet, Sabri Bayoudh, and Arnaud Delhay. 2008. Analogical dissimilarity: definition, algo- Dedre Gentner and Kenneth D Forbus. 2011. Compu- rithms and two experiments in machine learning. tational models of analogy. Wiley interdisciplinary Journal of Artificial Intelligence Research, 32:793– reviews: cognitive science, 2(3):266–276. 824. Huda Hakami, Kohei Hayashi, and Danushka Bolle- Laurent Miclet and Jacques Nicolas. 2015. From for- gala. 2017. An optimality proof for the pairdiff oper- mal concepts to analogical complexes. In CLA 2015, ator for representing relations between words. arXiv page 12. preprint arXiv:1709.06673. Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- Rogers P Hall. 1989. Computational approaches to frey Dean. 2013a. Efficient estimation of word analogical reasoning: A comparative analysis. Ar- representations in vector space. arXiv preprint tificial intelligence, 39(1):39–120. arXiv:1301.3781.

42 Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Ad- vances in pre-training distributed word representa- tions. In Proceedings of the International Confer- ence on Language Resources and Evaluation (LREC 2018). Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013b. Linguistic regularities in continuous space word representations. In HLT-NAACL, volume 13, pages 746–751. Francesca Rossi, Peter Van Beek, and Toby Walsh. 2006. Handbook of constraint programming. El- sevier. Royal Skousen. 1989. Analogical modeling of lan- guage. Springer Science & Business Media. Nicolas Stroppa and Francc¸ois Yvon. 2005. An ana- logical learner for morphological analysis. In Pro- ceedings of the Ninth Conference on Computational Natural Language Learning, pages 120–127. Asso- ciation for Computational Linguistics. Peter D Turney. 2006. Similarity of semantic relations. Computational Linguistics, 32(3):379–416. Peter D Turney. 2013. Distributional beyond words: Supervised learning of analogy and para- phrase. arXiv preprint arXiv:1310.5042. Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Timothy Baldwin. 2015. Take and took, gaggle and goose, book and read: Evaluating the utility of vec- tor differences for lexical relation learning. arXiv preprint arXiv:1509.01692. Yating Zhang, Adam Jatowt, and Katsumi Tanaka. 2016. Towards understanding word embeddings: Automatically explaining similarity of terms. In Big Data (Big Data), 2016 IEEE International Confer- ence on, pages 823–832. IEEE.

43