An Open-Source Online Reverse Dictionary System

Total Page:16

File Type:pdf, Size:1020Kb

An Open-Source Online Reverse Dictionary System WantWords: An Open-source Online Reverse Dictionary System Fanchao Qi1;2∗, Lei Zhang2∗y, Yanhui Yang2y, Zhiyuan Liu1;2;3, Maosong Sun1;2;3 1Department of Computer Science and Technology, Tsinghua University 2Institute for Artificial Intelligence, Tsinghua University Beijing National Research Center for Information Science and Technology 3Beijing Academy of Artificial Intelligence [email protected], [email protected] [email protected], fliuzy,[email protected] Abstract A reverse dictionary takes descriptions of words as input and outputs words semanti- cally matching the input descriptions. Re- verse dictionaries have great practical value Figure 1: An example illustrating what a regular (for- such as solving the tip-of-the-tongue problem ward) dictionary and a reverse dictionary are. and helping new language learners. There have been some online reverse dictionary sys- McNeill, 1966), namely the phenomenon of failing tems, but they support English reverse dictio- nary queries only and their performance is far to retrieve a word from memory. Many people fre- from perfect. In this paper, we present a new quently suffer the problem, especially those who open-source online reverse dictionary system write a lot such as writers, researchers and students. named WantWords (https://wantwords. With the help of reverse dictionaries, people can thunlp.org/). It not only significantly out- quickly and easily find the words that they need but performs other reverse dictionary systems on temporarily forget. English reverse dictionary performance, but In addition, reverse dictionaries are helpful to also supports Chinese and English-Chinese as well as Chinese-English cross-lingual re- new language learners who grasp a limited num- verse dictionary queries for the first time. ber of words. They will know and learn some new Moreover, it has user-friendly front-end de- words that have the meanings they want to express sign which can help users find the words they by using a reverse dictionary. Also, reverse dictio- need quickly and easily. All the code and naries can help word selection (or word dictionary) data are available at https://github.com/ anomia patients, people who can recognize and thunlp/WantWords. describe an object but fail to name it due to neuro- 1 Introduction logical disorder (Benson, 1979). Currently, there are mainly two online reverse Opposite to a regular (forward) dictionary that pro- dictionaries, namely OneLook1 and ReverseDic- vides definitions for query words, a reverse dic- tionary.2 Their performance is far from perfect. tionary (Sierra, 2000) returns words semantically Further, both of them are closed-source and only matching the query descriptions. In Figure1, for support English reverse dictionary queries. example, a regular dictionary tells you the defini- To solve these problems, we design and de- tion of “expressway” is “a wide road that allows velop a new online reverse dictionary system traffic to travel fast”, while a reverse dictionary named WantWords, which is totally open-source. outputs “expressway” and other semantically sim- WantWords is mainly based on our proposed ilar words like “freeway” which match the query multi-channel reverse dictionary model (Zhang description “a road where cars go very quickly et al., 2020), which achieves state-of-the-art perfor- without stopping” you input. mance on an English benchmark dataset. Our sys- Reverse dictionaries are useful in practical ap- tem uses an improved version of the multi-channel plications. First and foremost, they can effectively reverse dictionary model and incorporates some solve the tip-of-the-tongue problem (Brown and ∗Indicates equal contribution 1https://onelook.com/thesaurus/ y Work done during internship at Tsinghua University 2https://reversedictionary.org/ 175 Proceedings of the 2020 EMNLP (Systems Demonstrations), pages 175–181 November 16-20, 2020. c 2020 Association for Computational Linguistics engineering tricks to handle extreme cases. Eval- Query Description uation results show that with these improvements, Cross-lingual (en-zh/zh-en) our system achieves higher performance. Besides, Mode our system supports Chinese reverse dictionary =1 (Word) Query Monolingual queries and Chinese-English as well as English- Length (en/zh) Chinese cross-lingual reverse dictionary queries, Cross-lingual >1 (Sentence) Dictionary all of which are realized for the first time. Finally, Translation our system is very user-friendly. It includes multi- Query =1 (Word) Length ple filters and sort methods, and can automatically >1 (Sentence) Word Similarity cluster the candidate words, all of which help users Multi-channel Reverse Dictionary Model find the target words as quickly as possible. Thesaurus Confidence Score 2 Related Work Filter / Sort / Cluster There are mainly two methods for reverse dictio- nary building. The first one is based on sentence Word List matching (Bilac et al., 2004; Zock and Bilac, 2004; Figure 2: Workflow of WantWords. Mendez´ et al., 2013; Shaw et al., 2013). Its main idea is to return the words whose dictionary defini- as antonyms. Experimental results have demon- tions are most similar to the query description. Al- strated that our multi-channel reverse dictionary though effective in some cases, this method cannot model achieves state-of-the-art performance. In cope with the problem that human-written query WantWords, we employ an improved version of descriptions might differ widely from dictionary it that yields better results. definitions. The second method uses a neural language 3 System Architecture model (NLM) to encode the query description into In this section, we describe the system architec- a vector in the word embedding space, and returns ture of WantWords. We first give an overview of the words with the closest embeddings to the vector its workflow, then we detail the improved multi- of the query description (Hill et al., 2016; Morinaga channel reverse dictionary model, and finally we and Yamaguchi, 2018; Kartsaklis et al., 2018; Hed- introduce its front-end design. derich et al., 2019; Pilehvar, 2019). Performance of this method depends largely on the quality of word 3.1 Overall Workflow embeddings. Unfortunately, according to Zipf’s The workflow of WantWords is illustrated in Fig- law (Zipf, 1949), many words are low-frequency ure2. There are two reverse dictionary modes, and usually have poor embeddings. namely monolingual and cross-lingual modes. In To tackle this issue of the NLM-based method, the monolingual mode, if the query description is we proposed a multi-channel reverse dictionary longer than one word, it will be fed into the multi- model (Zhang et al., 2020). This model is com- channel reverse dictionary model directly, which posed of a sentence encoder, more specifically, a calculates a confidence score for each candidate bi-directional LSTM (BiLSTM) (Hochreiter and word in the vocabulary; if the query description Schmidhuber, 1997) with attention (Bahdanau is just a word, the confidence score of each candi- et al., 2015), and four characteristic predictors. date word is mostly based on the cosine similarity The four predictors are used to predict the part-of- between the embeddings of the query word and speech, morphemes, word category and sememes3 candidate word. of the target word according to the query descrip- In the cross-lingual mode, where the query de- tion, respectively. The incorporation of the char- scriptions are in the source language and the target acteristic predictors can help find the target words words are in the target language, if the query de- with poor embeddings and exclude wrong words scription is longer than one word, it will be trans- with similar embeddings to the target words, such lated into the target language first and then pro- cessed in the monolingual mode of the target lan- 3A sememe is defined as the minimum semantic units of human languages (Bloomfield, 1926). The meaning of a word guage; if the query description is just a word, cross- can be expressed by several sememes. lingual dictionaries will be consulted for the target- 176 Local Morpheme Prediction Score Morpheme Score language definitions of the query word, and then the & Local Sememe Prediction Score & Sememe Score definitions are fed into the multi-channel reverse dictionary model to calculate candidate words’ con- Max-Pooling fidence scores. After obtaining confidence scores, all candidate Word Score Confidence Score words in the vocabulary will be sorted by descend- Sentence Vector ing confidence scores and listed as system output. BERT The words in the query description are excluded since they are unlikely to be the target word. Differ- Dictionary Definition ent filters, other sort methods and clustering may / Query Description be further employed to adjust the final results. Part-of-speech Score & Category Score 3.2 Multi-channel Reverse Dictionary Model Figure 3: Revised version of the multi-channel reverse dictionary model. The multi-channel reverse dictionary model (MRDM) is the core module of our system. We (5) The fifth part is sememe score, which is based use an improved version of MRDM that employs on the prediction for the sememes of the target BERT (Devlin et al., 2019) rather than BiLSTM word. Sememe score can be calculated in a similar as the sentence encoder. Figure3 illustrates the way to morpheme score. model. We use the official pre-trained BERT models For a given query description, MRDM calculates for both English and Chinese.4 As for fine-tuning a confidence score for each candidate word in the (training) for English, we use the dictionary defi- vocabulary. The confidence score is composed of nition dataset created by Hill et al.(2016), which five parts: contains about 100; 000 words and 900; 000 word- (1) The first part is word score. To obtain it, definition pairs extracted from five dictionaries. the input query description is first encoded into a For fine-tuning (training) for Chinese, we build sentence vector by BERT, then the sentence vector a large-scale dictionary definition dataset based on is mapped into the space of word embeddings by a the dataset created by Zhang et al.(2020).
Recommended publications
  • Anarchism in the Chinese Revolution Was Also a Radical Educational Institution Modeled After Socialist 1991 36 for This Information, See Ibid., 58
    only by rephrasing earlier problems in a new discourse that is unmistakably modern in its premises and sensibilities; even where the answers are old, the questions that produced them have been phrased in the problematic of a new historical situation. The problem was especially acute for the first generation of intellec- Anarchism in the Chinese tuals to become conscious of this new historical situation, who, Revolution as products of a received ethos, had to remake themselves in the very process of reconstituting the problematic of Chinese thought. Anarchism, as we shall see, was a product of this situation. The answers it offered to this new problematic were not just social Arif Dirlik and political but sought to confront in novel ways its demands in their existential totality. At the same time, especially in the case of the first generation of anarchists, these answers were couched in a moral language that rephrased received ethical concepts in a new discourse of modernity. Although this new intellectual problematique is not to be reduced to the problem of national consciousness, that problem was important in its formulation, in two ways. First, essential to the new problematic is the question of China’s place in the world and its relationship to the past, which found expression most concretely in problems created by the new national consciousness. Second, national consciousness raised questions about social relationships, ultimately at the level of the relationship between the individual and society, which were to provide the framework for, and in some ways also contained, the redefinition of even existential questions.
    [Show full text]
  • Teahouses and the Tea Art: a Study on the Current Trend of Tea Culture in China and the Changes in Tea Drinking Tradition
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by NORA - Norwegian Open Research Archives Teahouses and the Tea Art: A Study on the Current Trend of Tea Culture in China and the Changes in Tea Drinking Tradition LI Jie Master's Thesis in East Asian Culture and History (EAST4591 – 60 Credits – Autumn 2015) Department of Culture Studies and Oriental Languages Faculty of Humanities UNIVERSITY OF OSLO 24 November, 2015 © LI Jie 2015 Teahouses and the Tea Art: A Study on the Current Trend of Tea Culture in China and the Changes in Tea Drinking Tradition LI Jie http://www.duo.uio.no Print: University Print Center, University of Oslo II Summary The subject of this thesis is tradition and the current trend of tea culture in China. In order to answer the following three questions “ whether the current tea culture phenomena can be called “tradition” or not; what are the changes in tea cultural tradition and what are the new features of the current trend of tea culture; what are the endogenous and exogenous factors which influenced the change in the tea drinking tradition”, I did literature research from ancient tea classics and historical documents to summarize the development history of Chinese tea culture, and used two month to do fieldwork on teahouses in Xi’an so that I could have a clear understanding on the current trend of tea culture. It is found that the current tea culture is inherited from tradition and changed with social development. Tea drinking traditions have become more and more popular with diverse forms.
    [Show full text]
  • The Thinking of Speaking Issue #27 May /June 2017 Ccooggnnaatteess,, Tteelllliinngg Rreeaall Ffrroomm Ffaakkee More About Cognates Than You Ever Wanted to Know
    Parrot Time The Thinking of Speaking Issue #27 May /June 2017 CCooggnnaatteess,, TTeelllliinngg RReeaall ffrroomm FFaakkee More about cognates than you ever wanted to know AA PPeeeekk iinnttoo PPiinnyyiinn The Romaniizatiion of Mandariin Chiinese IInnssppiirraattiioonnaall LLaanngguuaaggee AArrtt Maxiimiilliien Urfer''s piiece speaks to one of our wriiters TThhee LLeeaarrnniinngg MMiinnddsseett Language acquiisiitiion requiires more than study An Art Exhibition That Spoke To Me LLooookk bbeeyyoonndd wwhhaatt yyoouu kknnooww Parrot Time is your connection to languages, linguistics and culture from the Parleremo community. Expand your understanding. Never miss an issue. 2 Parrot Time | Issue#27 | May/June2017 Contents Parrot Time Parrot Time is a magazine covering language, linguistics Features and culture of the world around us. 8 More About Cognates Than You Ever Wanted to Know It is published by Scriveremo Languages interact with each other, sharing aspects of Publishing, a division of grammar, writing, and vocabulary. However, coincidences also Parleremo, the language learning create words which only looked related. John C. Rigdon takes a community. look at these true and false cognates, and more. Join Parleremo today. Learn a language, make friends, have fun. 1 6 A Peek into Pinyin Languages with non-Latin alphabets are often a major concern for language learners. The process of converting a non-Latin alphabet into something familiar is called "Romanization", and Tarja Jolma looks at how this was done for Mandarin Chinese. 24 An Art Exhibition That Spoke To Me Editor: Erik Zidowecki Inspiration is all around us, often crossing mediums. Olivier Email: [email protected] Elzingre reveals how a performance piece affected his thinking of languages.
    [Show full text]
  • Glottal Stop Initials and Nasalization in Sino-Vietnamese and Southern Chinese
    Glottal Stop Initials and Nasalization in Sino-Vietnamese and Southern Chinese Grainger Lanneau A thesis submitted in partial fulfillment of the requirements for the degree of Master of Arts University of Washington 2020 Committee: Zev Handel William Boltz Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2020 Grainger Lanneau University of Washington Abstract Glottal Stop Initials and Nasalization in Sino-Vietnamese and Southern Chinese Grainger Lanneau Chair of Supervisory Committee: Professor Zev Handel Asian Languages and Literature Middle Chinese glottal stop Ying [ʔ-] initials usually develop into zero initials with rare occasions of nasalization in modern day Sinitic1 languages and Sino-Vietnamese. Scholars such as Edwin Pullyblank (1984) and Jiang Jialu (2011) have briefly mentioned this development but have not yet thoroughly investigated it. There are approximately 26 Sino-Vietnamese words2 with Ying- initials that nasalize. Scholars such as John Phan (2013: 2016) and Hilario deSousa (2016) argue that Sino-Vietnamese in part comes from a spoken interaction between Việt-Mường and Chinese speakers in Annam speaking a variety of Chinese called Annamese Middle Chinese AMC, part of a larger dialect continuum called Southwestern Middle Chinese SMC. Phan and deSousa also claim that SMC developed into dialects spoken 1 I will use the terms “Sinitic” and “Chinese” interchangeably to refer to languages and speakers of the Sinitic branch of the Sino-Tibetan language family. 2 For the sake of simplicity, I shall refer to free and bound morphemes alike as “words.” 1 in Southwestern China today (Phan, Desousa: 2016). Using data of dialects mentioned by Phan and deSousa in their hypothesis, this study investigates initial nasalization in Ying-initial words in Southwestern Chinese Languages and in the 26 Sino-Vietnamese words.
    [Show full text]
  • A COMPARISON ANALYSIS of AMERICAN and BRITISH IDIOMS By
    A COMPARISON ANALYSIS OF AMERICAN AND BRITISH IDIOMS By: NANIK FATMAWATI NIM: 206026004290 ENGLISH LETTERS DEPARTMENT LETTERS AND HUMANITIES FACULTY STATE ISLAMIC UNIVERSITY “SYARIF HIDAYATULLAH” JAKARTA 2011 ABSTRACT Nanik Fatmawati, A Comparison Analysis of American Idioms and British Idioms. A Thesis: English Letters Department. Adab and Humanities Faculty. Syarif Hidayatullah State Islamic University Jakarta, 2011 In this paper, the writer uses a qualitative method with a descriptive analysis by comparing and analyzing from the dictionary and short story. The dictionary that would be analyzed by the writer is English and American Idioms by Richard A. Spears and the short story is you were perfectly fine by John Millington Ward. Through this method, the writer tries to find the differences meaning between American idioms and British idioms. The collected data are analyzed by qualitative using the approach of deconstruction theory. English is a language particularly rich in idioms – those modes of expression peculiar to a language (or dialect) which frequently defy logical and grammatical rules. Without idioms English would lose much of its variety and humor both in speech and writing. The results of this thesis explain the difference meaning of American and British Idioms that is found in the dictionary and short story. i ii iii DECLARATION I hereby declare that this submission is my original work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgement has been made in the text.
    [Show full text]
  • Application of XML Based EDI in Logistics Bills Circulation
    2020 3rd International Conference on Education Technology and Information System (ETIS 2020) Research on Idioms Variation of Modern Chinese and English in the Perspective of Construction Grammar Theory Yan Gu1,2 1Department of Foreign Languages, Xuzhou Kindergarten Teachers College, Xuzhou, 221004, China 2Teaching and Research Institute of Foreign Languages, Bohai University, Jinzhou, 121013, China [email protected] Keywords: construction grammar; Chinese and English; idioms variation; comparative analysis Abstract: Constructive grammar is essentially a category of cognitive linguistics. It considers vocabulary, grammar, semantics and pragmatics as a whole. It considers that the whole form and function of the construction does not add up to the simple components. Idioms are the essence of language. English is known for its rich idioms. It is very common to use idiom mutations in practice because of actual needs. Here, the principles of construction grammar theory are used to analyze the ways and classification of English idiom variation. The modern Chinese-English idiom variation is compared and analyzed from the aspects of phonetic features, semantic features, pragmatic features and syntactic features. The research results serve the expected practical value of English teaching, idiom dictionary compilation and corpus improvement. 1. Introduction Constructive grammar theory is a cognitive linguistic theory that studies the nature of speaker knowledge. A construction itself is a whole. Its meaning is not a simple addition between components, nor is it a random arrangement of meaningless forms. That is to say, a certain part of a whole cannot exist independently from the whole, and the whole is larger than the sum of the parts.
    [Show full text]
  • Chinese-To-English Phonetic Transfer of Chinese University ESL Students
    The Asian Journal of Applied Linguistics Vol. 7 No. 1, 2020, pp. 18-31 AJ A L Chinese-to-English phonetic transfer of Chinese university ESL students Zheng Fu School of English, Tianjin Foreign Studies University, China Chang-Ho Ji School of Education, La Sierra University, U.S.A. Heidi Weiss-Krumm Office of International Students, La Sierra University, U.S.A. Geng Wang School of Education, University of Glasgow, U.K. Yunfei Ma College of Humanities, Tianjin Agricultural University, China Phonetic transfer is defined as an L1 influence on the acquisition of L2 phonetics. Previous studies have investigated phonetic transfer in the area of articulation, but the effects of L1 on L2 pronunciation measured by speech recognition technology have been under-researched. This study aims to address the issue by focusing on a sample of 676 Chinese university ESL students. Drawing on quantitative data, it examined whether the participants applied phonetic transfer to ESL learning and what factors might have influenced the results of phonetic transfer. We assumed that Chinese-to- English phonetic transfer occurs but that the extent of the transfer would be small because Chinese and English belong to different language families. However, findings from this study confirm that Chinese-to-English phonetic transfer occurs and the extent is large. The findings regarding high transferability might be attributed to spelling through phonics and the nature of pronunciation acquisition. Keywords: Phonetic transfer; language transfer; second language learning; Chinese; English; ESL Introduction Phonetic transfer and its extent Phonetic transfer, widely accepted as a common phenomenon in second language acquisition, refers to L1 influence on L2 phonetics acquisition (Eckman, 2004; Odlin, 2003; Ringbom, 2007).
    [Show full text]
  • Phonotactic Complexity of Finnish Nouns FRED KARLSSON
    7 Phonotactic Complexity of Finnish Nouns FRED KARLSSON 7.1 Introduction In the continuous list of publications on his homepage, Kimmo Koskenniemi gives an item from 1979 as the first one. But this is not strictly speaking his first publication. Here I shall elevate from international oblivion a report of Kimmo’s from 1978 from which the following introductory prophecy is taken: “The computer might be an extremely useful tool for linguistic re- search. It is fast and precise and capable of treating even large materials” (Koskenniemi 1978: 5). This published report is actually a printed version of Kimmo’s Master’s Thesis in general linguistics where he theoretically analyzed the possibili- ties of automatic lemmatization of Finnish texts, including a formalization of Finnish inflectional morphology. On the final pages of the report he esti- mates that the production rules he formulated may be formalized as analytic algorithms in several ways, that the machine lexicon might consist of some 200,000 (more or less truncated) stems, that there are some 4,000 inflectional elements, that all of these stems and elements can be accommodated on one magnetic tape or in direct-access memory, and that real-time computation could be ‘very reasonable’ (varsin kohtuullista) if the data were well orga- nized and a reasonably big computer were available (ibid.: 52-53). I obviously am the happy owner of a bibliographical rarity because Kimmo’s dedication of 1979 tells me that this is the next to the last copy. This was five years before two-level morphology was launched in 1983 when Kimmo substantiated his 1978 exploratory work by presenting a full- blown theory of computational morphology and entered the international computational-linguistic scene where he has been a main character ever since.
    [Show full text]
  • Acknowledgements
    ACKNOWLEDGEMENTS *** During the process of my fulfilling research paper, I have been fortunate to receive a great deal of assistance, guidance, and encouragement from many people. First of all, I would like to show my deepest thanks to my supervisor - Ms Nguyen Thi Yen Thoa, M.A who supports me both knowledge and encouragement for useful advice, valuable guide to finish this study. I also would like to give whole- hearted thank for all of teachers in foreign language Department of Hai Phong Private University, providing materials for this study and having taught me through four years at university. Finally, my deep thanks are extended to my parents, my younger brother and all of my friends who always stand by and support me both materially and mentally. The completion and success of my research paper would not be achieved without their help. For my young experience and knowledge, I would like to receive from teachers more useful comments. Hai Phong, June, 2010 Student Pham Thi Viet Ha TABLE OF CONTENTS PART I: INTRODUCTION ........................................................................... 1 1. Rationale ........................................................................................................ 1 2. Purpose of the study ...................................................................................... 2 3. Scope of the study ......................................................................................... 2 4. Method of the study .....................................................................................
    [Show full text]
  • English and Czech Idioms Based on Sports and Games (Master Thesis)
    Department of English and American Studies Philosophical Faculty Palacký University Olomouc MASTER THESIS English and Czech Idioms based on Sports and Games 2015 Eliška Dlouhá Supervisor: Study field: Prof. PhDr. Jaroslav Macháček, CSc. English Philology and Spanish Philology, full-time form Univerzita Palackého v Olomouci Studijní program: Filologie Filozofická fakulta Forma: Prezenční Akademický rok: 2012/2013 Obor/komb.: Anglická filologie - Španělská filologie (AF-ŠF) Podklad pro zadání DIPLOMOVÉ práce studenta PŘEDKLÁDÁ: ADRESA OSOBNÍ ČÍSLO Bc. DLOUHÁ Eliška 17. listopadu 1162, Mladá Boleslav - Mladá Boleslav II F120421 TÉMA ČESKY: Anglické a české idiomy ze sportu NÁZEV ANGLICKY: English and Czech Idioms based on Sports and Games VEDOUCÍ PRÁCE: prof. PhDr. Jaroslav Macháček, CSc. - KAA ZÁSADY PRO VYPRACOVÁNÍ: V teoretické části bude zpracována problematika frazeologie, budou definovány hlavní pojmy v této oblasti. V praktické části budou české a anglické idiomy ze sportu vysvětleny v jejich doslovném i přeneseném významu, dále bude nalezen ekvivalent v tom druhém jazyce. V závěru se porovná četnost idiomů z určitého sportu v daných jazycích. SEZNAM DOPORUČENÉ LITERATURY: ČERMÁK, František. Slovník české frazeologie a idiomatiky. Výrazy větné. Academia, Praha 2009. ČERMÁK, František - HRONEK, Jiří - MACHAČ, Jaroslav. Slovník české frazeologie a idiomatiky. Přirovnání. Academia, Praha 1983. ČERMÁK, František - HRONEK, Jiří - MACHAČ, Jaroslav. Slovník české frazeologie a idiomatiky. Výrazy neslovesné. Academia, Praha 1988. ČERMÁK, František - HRONEK, Jiří - MACHAČ, Jaroslav. Slovník české frazeologie a idiomatiky. Výrazy slovesné A-P. Academia, Praha 1994. ČERMÁK, František - HRONEK, Jiří - MACHAČ, Jaroslav. Slovník české frazeologie a idiomatiky. Výrazy slovesné R-Ž. Academia, Praha 1994. ČERMÁK, František. Frazeologie a idiomatika česká a obecná. Karolinum, Praha 2007.
    [Show full text]
  • A Reverse Dictionary of Cypriot Greek
    Cypriot Greek Lexicography: A Reverse Dictionary of Cypriot Greek Charalambos Themistocleous, Marianna Katsoyannou, Spyros Armosti & Kyriaki Christodoulou Keywords: reverse dictionary, Cypriot Greek, orthographic variation, orthography standardisation, dialectal lexicography. Abstract This article explores the theoretical issues of producing a dialectal reverse dictionary of Cypriot Greek, the collection of data, the principles for selecting the lemmas among various candidates of word types, their orthographic representation, and the choices that were made for writing a variety without a standardized orthography. 1. Introduction Cypriot Greek (henceforth CG) is a variety of Greek spoken by almost a million people in the Republic of Cyprus. CG differs from Standard Modern Greek (henceforth SMG) with regard to its phonetics, phonology, morphology, syntax, and even pragmatics (Goutsos and Karyolemou 2004; Papapavlou and Pavlou 1998; Papapavlou 2005; Tsiplakou 2004; Katsoyannou et al. 2006; Tsiplakou 2007; Arvaniti 2002; Terkourafi 2003). The study of the vocabulary of CG was one of the research goals of ‘Syntychies’, a research project for the production of lexicographic resources undertaken by the Department of Byzantine and Modern Greek Studies of the University of Cyprus between 2006 and 2010. Another research goal of the project has been the study of the written representation of the dialect—since there exists no standardized CG orthography. The applied part of the Syntychies project includes the creation of a lexicographic database suitable for the production of dialectal dictionaries of CG, such as a reverse dictionary of CG (henceforth RDCG), which is currently under publication. The lexicographic database is hosted in a dedicated webpage, which allows online searching of the database.
    [Show full text]
  • Phraseology in the Language, in the Dictionary, and in the Computer
    DOI 10.1515/phras-2012-0003 YoP 2012; 3: 31–56 Igor Mel’čuk Phraseology in the language, in the dictionary, and in the computer Abstract : Two main families of phrasemes (= non-free phrases) are distinguish ed: lexical phrasemes and semantic-lexical phrasemes; the phrasemes of the first family are constrained only in their form (their meaning being free), those of the second family are constrained both in their meaning and in their form. Two basic concepts are introduced: compositionality of complex linguistic signs and the pivot of a meaning. Three major classes of phrasemes are presented: non- compositional idioms and compositional collocations and clichés . A new type of general dictionary is proposed, and the lexicographic presentation of the three classes of phrasemes is illustrated. To show how the proposed approach to phraseology can be used in Automatic Language Processing, three fully-fledged examples are examined in detail. Keywords: phraseology; compositionality; pivot (of a meaning); idioms; colloca- tions; clichés; lexicography; automatic language processing. Correspondence address: [email protected] 1 Introduction There is no need to insist on the importance of phraseology for linguistic studies; on this point the linguistic community is in agreement. But, curiously and unfor- tunately, there is no agreement on either the exact content of the notion ‘phra- seology’, nor on the way phraseological expressions should be described, nor on how they should be treated in linguistic applications, in particular, in lexi- cography
    [Show full text]