Arxiv:1901.05066V1 [Cs.CL] 15 Jan 2019 Present in a Multitude of Domains Ranging from Cognitive Science

Investigating Antigram Behaviour using Distributional Semantics

Saptarshi Sengupta B.Tech Student Dept. of Computer Science and Engg. St. Thomas’ College of Engineering and Technology Kolkata, India [email protected]

Abstract used to generate new phrases (anagrams) which might have a connection to the original word, the Language is an extremely interesting task can be to find a new word from the root word subject to study, each day presenting with which it has a relationship (antonym). new challenges and new topics for research. Words in particular have sev- In light of Natural Language Processing (NLP) eral unique characteristics which when ex- tasks, very little work has been performed on plored, prove to be astonishing. Anagrams antonyms and even less work has been done on and Antigrams are such words possessing anagrams and antigrams. Most of the work done these amazing properties. The presented has been conducted from the viewpoint of psy- work is an exploration into generating ana- chology experiments where researchers try to un- grams from a given word and determin- derstand cognitive processing in the human mind. ing whether there exists antigram relation- Adams et al. (2011) presented a work which ships between the pairs of generated ana- showed how the number of syllables influenced grams in light of the Word2Vec distribu- the difficulty of solving an anagram for both tional semantic similarity model. The ex- skilled and unskilled problem solvers. A sim- periments conducted, showed promising ilar task was undertaken by Novick and Sher- results for detecting antigrams. man (2008). Vincent et al. (2006) created software which enabled users to discover novel ana- 1 Introduction grams and classify existing anagrams on the basis An anagram can be defined as a kind of word of certain psycholinguistic variables. Anagrams play where all the characters in the word are re- also help in understanding how cognition is linked arranged, using each character exactly once in or- with age or personality changes (Java, 1992). der to generate a new word (s) which may or Thalenberg (2016) conducted research in explor- may not share semantic relationships with the root ing antonyms and distinguishing their presence in word. ‘Live’ and ‘Vile’ are examples of anagrams. vector space by using the very recent Word2Vec Sometimes it is also possible to create a num- model. But as mentioned before, none of the ber of words from the root word using anagram- above works truly examined anagram or antigram ming so as to produce a phrase; such is with the relationships among words from the standpoint of case ‘Dormitory’ and ‘Dirty Room’. Anagrams are NLP application. Rather they were explorations in arXiv:1901.05066v1 [cs.CL] 15 Jan 2019 present in a multitude of domains ranging from cognitive science. literature (a famous example is seen in the novel Generating single word anagrams is a relatively ‘The Da Vinci Code’ where the phrase “O, Dra- trivial task i.e. simply compute every possible per- conian devil!” was an anagram of “Leonardo Da mutation of the characters of the given word and Vinci”) to cyber security (for solving certain kinds eliminate those terms created which are essentially of cryptograms such as the transposition and per- noisy data i.e. not found in either a corpus or a dic- mutation ciphers). Antigrams on the other hand, tionary. However, trying to automatically detect are a class of anagrams which share an antonymic antigram relationships among the generated ana- relationship with their anagram partner. For in- grams in a trickier task as semantic information stance, ‘medicate’ and ‘decimate’ are examples of is required before such analysis can be conducted. antigrams. These words are much more interest- This idea forms the main motivation of our work. ing to study because instead of a simple word play We wanted to explore how antigrams are related to each other in terms of semantic similarity using tered permutation list are the anagrams that were the Word2Vec distributional similarity model, or required. Using Word2Vec, semantic similarity in other words, determine pairs of antigrams from between each unique pair (C) of anagrams from the anagrams of a given word. The results ob- the filtered permutation list was computed. The tained were compared with Word2Vec similarity members of pair C were termed as C0 and C1. Fi- scores computed between well-known antonyms nally, if the similarity score between C0 and C1 and a unique difference was identified which is de- was found to be less than 0.3, the pair was selected scribed later in the paper. as an antigram. The value of 0.3 was set from The rest of the paper is organized as follows. observation rather than through theoretical study. Section 2 describes the methodology of our work. Changing it around would yield different results. Section 3 provides the results and analysis of ex- Algorithm 1: Anagram generator and Antigram perimentation. Finally, the paper is concluded and checker future directions are described in section 4. Input: Root word W Output: List of anagrams of W and those pairs 2 Proposed Methodology which are antigrams The key question which our system aims to ad- Begin dress is whether pairs of anagrams generated from 1. Generate every permutation of all the letters a target word share an antigram relationship. To in W solve this question, we devise a simple algorithm 2. ∀ P ∈ permutation (cf. algorithm 1) which classifies a pair of words a. if P not (valid lexical form) then (anagrams) as having or not having antigram rela- i. remove P from permutation tionship on the basis of a threshold value which is b. end set empirically. 3. end Our work makes use of the Word2Vec model 4. Print permutation // This is the Anagram List (Mikolov et al., 2013) for calculating seman- 5. Set antigram list ← [] tic similarity between the generated anagrams. 6. ∀ pair C ∈ permutation Word2Vec is an unsupervised learning model which creates embeddings of words i.e. real num- a. z ← sim(C0,C1) bered vector representations of words from a given b. if z ≤ 0.3 then corpus. What makes it such a powerful tool for de- i. Add C to antigram list tecting similarity is its unsupervised nature i.e. not c. end relying on WordNet or related resource and abil- 7. end ity to discover several interesting semantic rela- 8. return antigram list tionships among words. Word2Vec operates in 2 End modes viz. continuous bag-of-words or CBOW A point of contention regarding the algorithm and skip-gram. Generally, when talking about would be, the exclusive use of the Word2Vec similarity between words, the CBOW architecture model for generating similarity scores between the is selected as the latter i.e. skip-gram, is useful in anagram pairs. We provide two reasons for this applications where context prediction is focused choice upon rather than exploring a word for similarity • We wanted to establish an algorithm reliant purposes. on unsupervised models. In the proposed algorithm, each permutation • The main purpose of the algorithm was to of the root word W (P) is run through a spell- examine the nature of word ‘vectors’ to see checker which was implemented in our work with whether antonym relationship was equivalent the help of the PyEnchant1 package for the python to word vectors being opposite in direction in programming language. A spell-checking module semantic space. was required so as to filter out the invalid lexical forms of W (as all the permutations would not be 3 Results Analysis proper English words). The members in the fil- The semantic similarity scores were obtained us- 1 Available online: https://pythonhosted.org/ ing the Word2Vec model trained on the widely pyenchant/ Word Anagram Anagram-Pair Similarity Score System Antigram True Antigram Termini Interim (termini, interim) -0.08 Yes Yes Indeed Denied (indeed, denied) 0.42 No Yes Tip Pit (tip, pit) 0.28 Yes Yes Souring Rousing (souring, rousing) 0.08 Yes Yes Adheres (sheared, adheres) -0.05 Yes Yes Sheared Headers (sheared, headers) 0.23 Yes No - (adheres, headers) -0.04 Yes No

Table 1: Semantic Similarity Scores for antigram testing

Antonym Similarity Score ity scores were computed between well-known Up-Down 0.92 antonym pairs and a striking observation was Large-Small 0.93 made. Table 2 shows their similarity scores. Top-Bottom 0.59 The average semantic similarity score for the Happy-Sad 0.68 antonym pairs (cf. Table 2) was 0.75. Such a high Heavy-Light 0.64 score is actually counterintuitive. This is because the general idea regarding word vectors is that if Table 2: Semantic Similarity Scores for Antonym they are antonyms, they would point in opposite pairs directions and as such the cosine of the angle between them would be negative. Such a notion is available Gigaword2 corpus, distributed in the clearly challenged when the scores from the anti- form of pre-trained word vectors (the 100 dimen- gram tests are contrasted with the antonym tests. sion variant was used). Using such a standardized The scores from Table 2 highlight the fact that it corpus provides a strong validity for the results ob- is not necessary for a word pair to have a negative tained. similarity score, in order to be antonyms. Gen- For testing our algorithm, the antigram dataset erally, if the cosine similarity between two word compiled by Anil (2010) was taken which con- vectors is more than 0.6, it means that they are sisted of 50 pairs of well-known antigrams. Using highly similar to each another. Word2Vec tries to this dataset, the antigrams predicted by the system reduce the angle between vectors of similar words from among the pairs of anagrams generated could and as such they become clustered very near to be easily veriﬁed. Table 1 presents the results of each other in semantic space. Smaller the angle, the antigram tests. closer is its cosine to 1. Thus, having a nega- From the results of Table 1, it became clear that tive similarity score is not indicative of antonyms. a similarity score lower than 0.3 was not the only Rather it was found that antonyms are strongly re- criteria for an anagram pair to be declared an anti- lated to each other and as such produce high sim- gram. A total of 5 pairs of anagrams were actu- ilarity scores (cf. Table 2). This fact was chal- ally (true) antigrams. The system predicted that lenged by the antigrams whose similarity scores 4 out of 5 (80%) of these pairs were antigrams. were extremely low and even negative in some However, the system also falsely predicted that cases (cf. Table 1) but still had an antonymic rela- (sheared, headers) and (adheres, headers) were tionship. This proves that antigrams and antonyms also antigrams owing to their similarity scores be- behave differently in semantic space in spite of ing less than 0.3. This indicated that some form of sharing a common link. manual validation is required when verifying antigram nature according to the proposed model. All 4 Conclusion and Future Work in all, out of a total of 7 cases, the system cor- rectly predicted that 4 out of those 7 cases were The presented work aims to analyse antigrams and antigrams thus achieving an accuracy of 57.14%. anagrams from the standpoint of NLP instead of The 0.3 threshold was selected after seeing how treating them as subjects falling in the realm of antonyms behaved in semantic space. Similar- logology (recreational linguistics). We propose 2Available online: https://nlp.stanford.edu/ a simple technique for detecting antigrams from projects/glove/ pairs of anagrams using Word2Vec similarity. Our work is perhaps one of the ﬁrst to explore this topic using distributional semantics. However, as can be seen from the paper, further research remains to be done in this area particularly involving multi-word anagrams. We propose basic ideas in the direction of anagram and antigram research and hope that future developments are undertaken towards it.

References J.W. Adams, M. Stone, R.D. Vincent, and S.J. Muncer. 2011. The role of syllables in anagram solution : a rasch analysis. Journal of general psychology., 138(2):94–109, April. A. Anil. 2010. Antigrams from word ways: A retro- spective. Word Ways, 43(21). Rosalind I. Java. 1992. Priming and aging: Evidence of preserved memory function in an anagram solution task. The American Journal of Psychology, 105(4):541–548. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efﬁcient estimation of word representations in vector space. CoRR, abs/1301.3781.

Laura R. Novick and Steven J. Sherman. 2008. The effects of superﬁcial and structural information on online problem solving for good versus poor anagram solvers. The Quarterly Journal of Experimen- tal Psychology, 61(7):1098–1120. Bruna Thalenberg. 2016. Distinguishing antonyms from synonyms in vector space models of semantics. Technical report, University of Sao Paulo, Brazil. Robert D. Vincent, Yael K. Goldberg, and Debra A. Titone. 2006. Anagram software for cognitive research that enables speciﬁcation of psycholinguistic variables. Behavior Research Methods, 38(2):196– 201, May.