The Croatian Web Dictionary Project – Mrežnik
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Issues in Text-To-Speech for French
ISSUES IN TEXT-TO-SPEECH FOR FRENCH Evelyne Tzoukermann AT&T Bell Laboratories 600 Mountain Avenue, Murray tlill, N.J. 07974 evelyne@rcsearch, art.corn Abstract in the standard International Phonetic Alphabi,t; the second column ASCII shows the ascii correspon- This paper reports the progress of the French dence of these characters for the text-to-speech text-to-speech system being developed at AT&T system, and the third column shows art example Bell Laboratories as part of a larger project for of the phoneme in a French word. multilingual text-to-speech systems, including lan- guages such as Spanish, Italian, German, Rus- Consonant s Vowels sian, and Chinese. These systems, based on di- IPA ASCII WORD IPA ASCII WORD phone and triphone concatenation, follow the gen- p p paix i i vive eral framework of the Bell Laboratories English t t tout e e the TTS system [?], [?]. This paper provides a de- k k eas e g aisc scription of the approach, the current status of the b b bas a a table French text-to-speech project, and some problems iI d dos u a time particular to French. g g gai 3 > homme m m mais o o tgt n n liOn u U boue 1 Introduction .p N gagner y y tour l 1 livre n ellX In this paper, the new French text-to-sIieech sys- f f faux ce @ seul tem being developed at AT&T is presented; sev- s s si o & peser eral steps have been already achieved while others f S chanter I bain are still in progress. -
Multi-Disciplinary Lexicography
Multi-disciplinary Lexicography Multi-disciplinary Lexicography: Traditions and Challenges of the XXIst Century Edited by Olga M. Karpova and Faina I. Kartashkova Multi-disciplinary Lexicography: Traditions and Challenges of the XXIst Century, Edited by Olga M. Karpova and Faina I. Kartashkova This book first published 2013 Cambridge Scholars Publishing 12 Back Chapman Street, Newcastle upon Tyne, NE6 2XX, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2013 by Olga M. Karpova and Faina I. Kartashkova and contributors All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-4438-4256-7, ISBN (13): 978-1-4438-4256-3 TABLE OF CONTENTS List of Illustrations ..................................................................................... ix List of Tables............................................................................................... x Editors’ Preface .......................................................................................... xi Olga M. Karpova and Faina I. Kartashkova Ivanovo Lexicographic School................................................................ xvii Ekaterina A. Shilova Part I: Dictionary as a Cross-road of Language and Culture Chapter One................................................................................................ -
The Art of Lexicography - Niladri Sekhar Dash
LINGUISTICS - The Art of Lexicography - Niladri Sekhar Dash THE ART OF LEXICOGRAPHY Niladri Sekhar Dash Linguistic Research Unit, Indian Statistical Institute, Kolkata, India Keywords: Lexicology, linguistics, grammar, encyclopedia, normative, reference, history, etymology, learner’s dictionary, electronic dictionary, planning, data collection, lexical extraction, lexical item, lexical selection, typology, headword, spelling, pronunciation, etymology, morphology, meaning, illustration, example, citation Contents 1. Introduction 2. Definition 3. The History of Lexicography 4. Lexicography and Allied Fields 4.1. Lexicology and Lexicography 4.2. Linguistics and Lexicography 4.3. Grammar and Lexicography 4.4. Encyclopedia and lexicography 5. Typological Classification of Dictionary 5.1. General Dictionary 5.2. Normative Dictionary 5.3. Referential or Descriptive Dictionary 5.4. Historical Dictionary 5.5. Etymological Dictionary 5.6. Dictionary of Loanwords 5.7. Encyclopedic Dictionary 5.8. Learner's Dictionary 5.9. Monolingual Dictionary 5.10. Special Dictionaries 6. Electronic Dictionary 7. Tasks for Dictionary Making 7.1. Panning 7.2. Data Collection 7.3. Extraction of lexical items 7.4. SelectionUNESCO of Lexical Items – EOLSS 7.5. Mode of Lexical Selection 8. Dictionary Making: General Dictionary 8.1. HeadwordsSAMPLE CHAPTERS 8.2. Spelling 8.3. Pronunciation 8.4. Etymology 8.5. Morphology and Grammar 8.6. Meaning 8.7. Illustrative Examples and Citations 9. Conclusion Acknowledgements ©Encyclopedia of Life Support Systems (EOLSS) LINGUISTICS - The Art of Lexicography - Niladri Sekhar Dash Glossary Bibliography Biographical Sketch Summary The art of dictionary making is as old as the field of linguistics. People started to cultivate this field from the very early age of our civilization, probably seven to eight hundred years before the Christian era. -
Phonotactic Complexity of Finnish Nouns FRED KARLSSON
7 Phonotactic Complexity of Finnish Nouns FRED KARLSSON 7.1 Introduction In the continuous list of publications on his homepage, Kimmo Koskenniemi gives an item from 1979 as the first one. But this is not strictly speaking his first publication. Here I shall elevate from international oblivion a report of Kimmo’s from 1978 from which the following introductory prophecy is taken: “The computer might be an extremely useful tool for linguistic re- search. It is fast and precise and capable of treating even large materials” (Koskenniemi 1978: 5). This published report is actually a printed version of Kimmo’s Master’s Thesis in general linguistics where he theoretically analyzed the possibili- ties of automatic lemmatization of Finnish texts, including a formalization of Finnish inflectional morphology. On the final pages of the report he esti- mates that the production rules he formulated may be formalized as analytic algorithms in several ways, that the machine lexicon might consist of some 200,000 (more or less truncated) stems, that there are some 4,000 inflectional elements, that all of these stems and elements can be accommodated on one magnetic tape or in direct-access memory, and that real-time computation could be ‘very reasonable’ (varsin kohtuullista) if the data were well orga- nized and a reasonably big computer were available (ibid.: 52-53). I obviously am the happy owner of a bibliographical rarity because Kimmo’s dedication of 1979 tells me that this is the next to the last copy. This was five years before two-level morphology was launched in 1983 when Kimmo substantiated his 1978 exploratory work by presenting a full- blown theory of computational morphology and entered the international computational-linguistic scene where he has been a main character ever since. -
A Reverse Dictionary of Cypriot Greek
Cypriot Greek Lexicography: A Reverse Dictionary of Cypriot Greek Charalambos Themistocleous, Marianna Katsoyannou, Spyros Armosti & Kyriaki Christodoulou Keywords: reverse dictionary, Cypriot Greek, orthographic variation, orthography standardisation, dialectal lexicography. Abstract This article explores the theoretical issues of producing a dialectal reverse dictionary of Cypriot Greek, the collection of data, the principles for selecting the lemmas among various candidates of word types, their orthographic representation, and the choices that were made for writing a variety without a standardized orthography. 1. Introduction Cypriot Greek (henceforth CG) is a variety of Greek spoken by almost a million people in the Republic of Cyprus. CG differs from Standard Modern Greek (henceforth SMG) with regard to its phonetics, phonology, morphology, syntax, and even pragmatics (Goutsos and Karyolemou 2004; Papapavlou and Pavlou 1998; Papapavlou 2005; Tsiplakou 2004; Katsoyannou et al. 2006; Tsiplakou 2007; Arvaniti 2002; Terkourafi 2003). The study of the vocabulary of CG was one of the research goals of ‘Syntychies’, a research project for the production of lexicographic resources undertaken by the Department of Byzantine and Modern Greek Studies of the University of Cyprus between 2006 and 2010. Another research goal of the project has been the study of the written representation of the dialect—since there exists no standardized CG orthography. The applied part of the Syntychies project includes the creation of a lexicographic database suitable for the production of dialectal dictionaries of CG, such as a reverse dictionary of CG (henceforth RDCG), which is currently under publication. The lexicographic database is hosted in a dedicated webpage, which allows online searching of the database. -
Reverse Dictionary
Imperial Journal of Interdisciplinary Research (IJIR) Vol-2, Issue-4, 2016 ISSN: 2454-1362, http://www.onlinejournal.in Reverse Dictionary Veena Gurram1, Sweta Rathod2, Anish Lushte3, Pranay Vaidya4, Prof. Vinod Alone5 & Prof. Mahendra Pawar6 1,2,3,4B.E Student 5,6Assistant Professor 1,2,3,4,5,6Department Of Computer Engineering, PVPPCOE Sion, Mumbai 400022 Abstract— In this paper, we describe the design and dictionary are the reverse dictionaries [3][4] which is implementation of a reverse dictionary. Unlike a tra- built with certain drawbacks. The existing dictionary ditional forward dictionary, which maps from words receives an input phrase and outputs many output to their definitions, a reverse dictionary takes a user words; therefore it can be tedious for the user to input phrase describing the desired concept, and re- search one from it. Whereas, building a ranking based turns a set of candidate words that satisfy the input dictionary, allows the user to choose from a set of phrase. This work has significant application not only words which are closely related to each other. for the general public, particularly those who work Example: In the existing reverse dictionary, the user closely with words, but also in the general field of inputs a phrase “unknown name” outputs over a 100 conceptual search. search results including “anonymous”, “nameless”, “incognito”, “jane doe”, “john doe”, “some”, “sky- Index Terms—Dictionaries, thesauruses, search pro- blue pink”, “dark horse” etc. [4], But the most cess, web-based services. accurate result for “unknown name” shall be “anonymous”, “nameless” but “incognito” also I. INTRODUCTION contributes some primary meaning to the user input phrase. -
Towards Building a Multilingual Sememe Knowledge Base
Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets Fanchao Qi1∗, Liang Chang2∗y, Maosong Sun13z, Sicong Ouyang2y, Zhiyuan Liu1 1Department of Computer Science and Technology, Tsinghua University Institute for Artificial Intelligence, Tsinghua University Beijing National Research Center for Information Science and Technology 2Beijing University of Posts and Telecommunications 3Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University [email protected], [email protected] fsms, [email protected], [email protected] Abstract word husband A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain sense "married man" "carefully use" words annotated with sememes, have been successfully ap- plied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread human economize utilization. To address the issue, we propose to build a uni- sememe fied sememe KB for multiple languages based on BabelNet, a family male spouse multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It man- ually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of auto- Figure 1: Sememe annotation of the word “husband” in matic sememe prediction for synsets, aiming to expand the HowNet. seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative anal- sememes to annotate senses of over 100 thousand Chinese yses to explore important factors and difficulties in the task. -
Kernerman Kdictionaries.Com/Kdn DICTIONARY News
Number 19 y July 2011 Kernerman kdictionaries.com/kdn DICTIONARY News Integrating phonetic transcription in a Brazilian Portuguese dictionary Luiz Carlos Cagliari 1. The dictionary in everyday work. It is always common for a phonetician to K Dictionaries has developed a series of dictionaries for learners transcribe his or her own language. When doing that, all kinds of various languages, including Portuguese/French. This of sound variation are registered, according to the speakers’ dictionary was based on European Portuguese, whose words pronunciation. On the other hand, when doing phonology, the had a phonetic transcription with European pronunciation. A sound patterns are interpreted following theories in order to group of lexicographers from the University of São Paulo, under get an abstract sound system of the language. In this case, the the supervision of Ieda Maria Alves, adapted the Portuguese phonetic variation is accommodated into their phonemes. The entries to Brazilian vocabulary, and another team from UNESP phonological transcription does not represent a pronunciation (São Paulo State University) at São José do Rio Preto, under of the language, in the same sense as phonetic transcription the supervision of Claudia Xatara, adapted to Brazilian the does. Moreover, phonological transcription does not relate to Poruguese translations of the French entries. language in the same way as orthography does. According In line with the change from European to Brazilian to the rules of writing systems, orthography has the function Portuguese, I gathered a group of students to be involved in of neutralizing the phonetic variation of a language on the modifying the phonetic transcription. The collaborators were word pronunciation level, allowing all speakers to read. -
Using an On-Line Dictionary to Find Rhyming Words And
USING AN ON=LINE DICTIONARY TO FIND RHYMING WORDS AND PRONUNCIATIONS FOR UNKNOWN WORDS Roy J Byrd I.B.M. Thomas J. Watson Research Center Yorktown Heights, New York 10598 Martin S. Chodorow Department of Psychology, Hunter College of CUNY and I.B.M. Thomas J. Watson Research Center Yorktown Heights, New York 10598 ABSTRACT word not found in the dictionary. WordSmith also shows the user words that are "close" to a given word Humans know a great deal about relationships among along dimensions such as spelling (as in published dic- words. This paper discusses relationships among word tionaries), meaning (as in thesauruses), and sound (as in pronunciations. We describe a computer system which rhyming dictionaries). models human judgement of rhyme by assigning specific roles to the location of primary stress, the similarity of Figure I shows a sample of the WordSmith user inter- phonetic segments, and other factors. By using the face. The current word, urgency, labels the text box at model as an experimental tool, we expect to improve our the center of the screen. The box contains the output understanding of rhyme. A related computer model will of the PRONUNC application applied to the current attempt to generate pronunciations for unknown words word: it shows the pronunciation of urgency and the by analogy with those for known words. The analogical mapping between the word's spelling and pronunciation. processes involve techniques for segmenting and PRONUNC represents pronunciations in an alphabet matching word spellings, and for mapping spelling to derived from Webster's Seventh Collegiate Dictionary. In sound in known words. -
Producing an Encyclopedic Dictionary Using Patent Documents
Producing an Encyclopedic Dictionary Using Patent Documents Atsushi Fujii Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan [email protected] Abstract Although the World Wide Web has of late become an important source to consult for the meaning of words, a number of technical terms related to high technology are not found on the Web. This paper describes a method to produce an encyclopedic dictionary for high-tech terms from patent information. We used a collection of unexamined patent applications published by the Japanese Patent Office as a source corpus. Given this collection, we extracted terms as headword candidates and retrieved applications including those headwords. Then, we extracted paragraph-style descriptions and categorized them into technical domains. We also extracted related terms for each headword. We have produced a dictionary including approximately 400 000 Japanese terms as headwords. We have also implemented an interface with which users can explore our dictionary by reading text descriptions and viewing a related-term graph. 1. Introduction CLONE has been used for various research purposes. Term descriptions that have been carefully organized At the same time, we have identified that descriptions of in hand-compiled dictionaries and encyclopedias pro- technical terms associated with high technology are not vide valuable linguistic knowledge for human use and necessarily found on the Web. Example terms are “pho- knowledge-intensive computer systems, developed in the tosensitive lithographic printing plate”, “tracking error sig- human language technology community. However, as with nal”, and “magenta coupler”. Even in Wikipedia, which other types of linguistic knowledge relying on human intro- is a large encyclopedia on the Web, a number of high-tech spection and supervision, compiling encyclopedias is ex- terms are not explained. -
Implementing a Reverse Dictionary, Based on Word Definitions, Using A
Implementing a Reverse Dictionary, based on word definitions, using a Node-Graph Architecture Sushrut Thorat Varad Choudhari Center for Mind/Brain Sciences Department of Computer Science University of Trento Rajarambapu Institute of Technology Rovereto, TN 38068, Italy Islampur, MH 415414, India [email protected] [email protected] Abstract In this paper, we outline an approach to build graph-based reverse dictionaries using word defi- nitions. A reverse dictionary takes a phrase as an input and outputs a list of words semantically similar to that phrase. It is a solution to the Tip-of-the-Tongue problem. We use a distance-based similarity measure, computed on a graph, to assess the similarity between a word and the input phrase. We compare the performance of our approach with the Onelook Reverse Dictionary and a distributional semantics method based on word2vec, and show that our approach is much better than the distributional semantics method, and as good as Onelook, on a 3k lexicon. This simple approach sets a new performance baseline for reverse dictionaries.1 1 Introduction A forward dictionary (FD) maps words to their definitions. A reverse dictionary (RD) (Sierra, 2000), also known as an inverse dictionary, or search-by-concept dictionary (Calvo et al., 2016), maps phrases to single words that approximate the meaning of those phrases. In the Oxford Learner’s Dictionary2, one definition of ‘brother’ is ‘a boy or man who has the same mother and father as another person’. A reverse dictionary will map not only this phrase to ‘brother’, but also phrases such as ‘son of my parents’. -
Find the Right Words with Thesauruses Ebook, Epub
FIND THE RIGHT WORDS WITH THESAURUSES PDF, EPUB, EBOOK Kara Fribley | 24 pages | 01 Jan 2012 | Cherry Lake Publishing | 9781610803694 | English | Ann Arbor, United States Find the Right Words with Thesauruses PDF Book How is the word barren an attack on women? Login or Register. This essential guide for writers provides real-life example sentences and a careful selection of the most relevant synonyms, as well as new usage notes, hints for choosing between similar words, a Word Finder section organized by subject, and a comprehensive language guide. An easy to access list of synonyms can greatly influence any writing assignment a student has. Stay a step ahead with Microsoft For some types of searches only the first result or the first few results are likely to be useful. Check your dictionary… help you extend your vocabulary. Words formed from any letters in thesauruses , plus an optional blank or existing letter All words formed from thesauruses by changing one letter Browse words starting with thesauruses by next letter. Thy shalt If you get back nothing but junk, try restating your query so that it's just two or three simple words. You can also train yourself to self-edit by watching out for words you know you over-rely on. It is a great tool to use for building a vocabulary, however, it should be handled with care when used for writing purposes as it can have the adverse effect. In Excel , on the Review tab, click Thesaurus. At the bottom of the Thesaurus task pane, select a language from the drop- down list.