Methods for Estonian Large Vocabulary Speech Recognition
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Survey of Orthographic Information in Machine Translation 3
Machine Translation Journal manuscript No. (will be inserted by the editor) A Survey of Orthographic Information in Machine Translation Bharathi Raja Chakravarthi1 ⋅ Priya Rani 1 ⋅ Mihael Arcan2 ⋅ John P. McCrae 1 the date of receipt and acceptance should be inserted later Abstract Machine translation is one of the applications of natural language process- ing which has been explored in different languages. Recently researchers started pay- ing attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine transla- tion systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthog- raphy’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic in- formation can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under- resourced languages. We discuss different types of machine translation and demon- strate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts of cog- nates information at different levels of machine translation and the lessons that can Bharathi Raja Chakravarthi [email protected] Priya Rani [email protected] Mihael Arcan [email protected] John P. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
VALTS ERNÇSTREITS (Riga) LIVONIAN ORTHOGRAPHY Introduction the Livonian Language Has Been Extensively Written for About
Linguistica Uralica XLIII 2007 1 VALTS ERNÇSTREITS (Riga) LIVONIAN ORTHOGRAPHY Abstract. This article deals with the development of Livonian written language and the related matters starting from the publication of first Livonian books until present day. In total four different spelling systems have been used in Livonian publications. The first books in Livonian appeared in 1863 using phonetic transcription. In 1880, the Gospel of Matthew was published in Eastern and Western Livonian dialects and used Gothic script and a spelling system similar to old Latvian orthography. In 1920, an East Livonian written standard was established by the simplification of the Finno-Ugric phonetic transcription. Later, elements of Latvian orthography, and after 1931 also West Livonian characteristics, were added. Starting from the 1970s and due to a considerable decrease in the number of Livonian mother tongue speakers in the second half of the 20th century the orthography was modified to be even more phonetic in the interest of those who did not speak the language. Additionally, in the 1930s, a spelling system which was better suited for conveying certain phonetic phenomena than the usual standard was used in two books but did not find any wider usage. Keywords: Livonian, standard language, orthography. Introduction The Livonian language has been extensively written for about 150 years by linguists who have been noting down examples of the language as well as the Livonians themselves when publishing different materials. The prime consideration for both of these groups has been how to best represent the Livonian language in the written form. The area of interest for the linguists has been the written representation of the language with maximal phonetic precision. -
The Fore Language of Papua New Guinea
PACIFIC LINGUISTICS Se�ie� B - No. 47 THE FORE LANGUAGE OF PAPUA NEW GUINEA by Graham Scott Department of Linguistics Research School of Pacific Studies THE AUSTRALIAN NATIONAL UNIVERSITY Scott, G. The Fore language of Papua New Guinea. B-47, xvi + 225 pages. Pacific Linguistics, The Australian National University, 1978. DOI:10.15144/PL-B47.cover ©1978 Pacific Linguistics and/or the author(s). Online edition licensed 2015 CC BY-SA 4.0, with permission of PL. A sealang.net/CRCL initiative. r PACIFIC LINGUISTICS is published through the Lingu��tic Ci�cle 06 Canb e��a and consists of four series: SERIES A - OCCASIONA L PAPERS SERIES B - MONOGRAPHS SERIES C - BOOKS SERIES V - SPECIAL PUB LICATIONS EDITOR: S.A. Wurm. ASSOCIATE EDITORS: D.C. Laycock, C.L. Voorhoeve, D.T. Tryon, T.E. Dutton. EDITORIAL ADVISERS: B. Bender, University of Hawaii N.D. Liem, University of Hawaii D. Bradley, Australian National J. Lynch, University of Papua University New Guinea A. Capell, University of Sydney K.A. McElhanon, University of S. Elbert, University of Hawaii Texas K. Franklin, Summer Institute of H. McKaughan, University of Hawaii Linguistics P. MUhlh�usler, Technische W.W. Glover, Summer Institute of Universit�t Berlin Linguis tics G.N. O'Grady, University of G. Grace, University of Hawaii Victoria, B.C. M.A.K. Halliday, University of K. Pike, University of Michigan; Sydney Summer Institute of Linguistics A. Healey, Summer Institute of E.C. Polome, University of Texas Linguistics E. Uhlenbeck, University of Leiden L. Hercus, Australian National J.W.M. Verhaar, University of University Indonesia, Jakarta ALL CORRESPONDENCE concerning PA CIFIC LINGUIS TICS, including orders and subscriptions, should be addressed to: The Secretary, PACIFIC LINGUIS TICS, Department of Linguistics , School of Pacific Studies , The Australian National University , Box 4, P.O., Canberra , A.C.T. -
A Generative Approach to the Phonology of Bahasa Indonesia
A GENERATIVE APPROACH TO THE P H O N O L O G Y OF BAHASA INDONESIA by Hans Lapoliwa (MATERIALS IN LANGUAGES OF INDONESIA, N o. 3) ,W. A. L. Stokhof, Series Editor A GENERATIVE APPROACH TO THE PHONOLOGY OF BAHASA INDONESIA by Hans Lapoliwa (MATERIALS IN LANGUAGES OF INDONESIA/ NO. 3) W.A.L. Stokhof, Series Editor Department of Linguistics Research School of Pacific Studies THE AUSTRALIAN NATIONAL UNIVERSITY F A K U L T A S - S AS IR A PACIFIC LINGUISTICS is issued through the L-ingu-c6t-cc C-t fide, o & Ca.nbzn.tia. and consists of four series: SERIES A - OCCASIONAL PAPERS SERIES 8 - MONOGRAPHS SERIES C - BOOKS SERIES V - SPECIAL PUBLICATIONS EDITOR: S.A. Wurm. ASSOCIATE EDITORS: D.C. Laycock, C.L. Voorhoeve, D.T. Tryon, T.E. Dutton. EDITORIAL ADVISERS: B. Bender, University of Hawaii K.A. McElhanon, University of Texas D. Bradley, University of Melbourne H. McKaughan, University of Hawaii A. Capell, University of Sydney p - Miihlhausler, Linacre College, Oxford S. Elbert, University of Hawaii G.N. O'Grady, University of Victoria, K. Franklin, Summer Institute of B.C. Linguistics A.K. Pawley, University of Hawaii W.W. Glover, Summer Institute of K. Pike, University of Michigan; Summer Linguistics Institute of Linguistics G. Grace, university of Hawaii E.C. Polome, University of Texas M.A.K. Halliday, University of Sydney G. Sankoff, Universite de Montreal A. Healey, Summer Institute of W.A.L. Stokhof, National Center for Linguistics Language Development, Jakarta; L. Hercus, Australian National University University of Leiden N.D. -
XXVII Fonetiikan Päivät Phonetics Symposium 2012
XXVII Fonetiikan päivät Phonetics Symposium 2012 Tallinn, Estonia February 17-18, 2012 XXVII Fonetiikan päivät – Phonetics Symposium 2012 Tallinn, Estonia, February 17-18, 2012 Phonetics Symposium 2012 (XXVII Fonetiikan päivät) continues the tradition of meetings of Finnish phoneticians started in 1971 in Turku. These meetings, held in turn at different universities in Finland, have been frequently attended by Estonian phoneticians as well. In 1998 the meeting was held in Pärnu, Estonia, and in 2012 it will take place in Estonia for the second time. Phonetics Symposium 2012 (XXVII Fonetiikan päivät) will provide a forum for scientists and students in phonetics and speech technology to present and discuss recent research and development in spoken language communication. In the recent years we have lost three world-renowned scholars in the phonetic sciences – Ilse Lehiste, Matti Karjalainen and Arvo Eek. The symposium shall commemorate and honor their scientific contributions to Estonian and Finnish phonetics and speech technology. The symposium is hosted by the Institute of Cybernetics at Tallinn University of Technology (IoC) and organized in co-operation with the Estonian Centre of Excellence in Computer Science, EXCS (funded mainly by the European Regional Development Fund). XXVII Fonetiikan päivät – Phonetics Symposium 2012 Tallinn, Estonia, February 17-18, 2012 SCIENTIFIC PROGRAM: DAY 1 9:45 Opening SESSION 1: Chair Karl Pajusalu Professori Matti Karjalainen, suomalaisen 10:00 Unto K. Laine puheteknologian edelläkävijä 10:30 Diana -
Estonian Yiddish and Its Contacts with Coterritorial Languages
DISSERT ATIONES LINGUISTICAE UNIVERSITATIS TARTUENSIS 1 ESTONIAN YIDDISH AND ITS CONTACTS WITH COTERRITORIAL LANGUAGES ANNA VERSCHIK TARTU 2000 DISSERTATIONES LINGUISTICAE UNIVERSITATIS TARTUENSIS DISSERTATIONES LINGUISTICAE UNIVERSITATIS TARTUENSIS 1 ESTONIAN YIDDISH AND ITS CONTACTS WITH COTERRITORIAL LANGUAGES Eesti jidiš ja selle kontaktid Eestis kõneldavate keeltega ANNA VERSCHIK TARTU UNIVERSITY PRESS Department of Estonian and Finno-Ugric Linguistics, Faculty of Philosophy, University o f Tartu, Tartu, Estonia Dissertation is accepted for the commencement of the degree of Doctor of Philosophy (in general linguistics) on December 22, 1999 by the Doctoral Committee of the Department of Estonian and Finno-Ugric Linguistics, Faculty of Philosophy, University of Tartu Supervisor: Prof. Tapani Harviainen (University of Helsinki) Opponents: Professor Neil Jacobs, Ohio State University, USA Dr. Kristiina Ross, assistant director for research, Institute of the Estonian Language, Tallinn Commencement: March 14, 2000 © Anna Verschik, 2000 Tartu Ülikooli Kirjastuse trükikoda Tiigi 78, Tartu 50410 Tellimus nr. 53 ...Yes, Ashkenazi Jews can live without Yiddish but I fail to see what the benefits thereof might be. (May God preserve us from having to live without all the things we could live without). J. Fishman (1985a: 216) [In Estland] gibt es heutzutage unter den Germanisten keinen Forscher, der sich ernst für das Jiddische interesiere, so daß die lokale jiddische Mundart vielleicht verschwinden wird, ohne daß man sie für die Wissen schaftfixiert -
Ilse Lehiste (1922-2010) the Significance of Empirical Evidence in Linguistics* (2012)
Ilse Lehiste (1922-2010) The significance of empirical evidence in linguistics* (2012) Zita McRobbie-Utasi Simon Fraser University [email protected] It is with immense sadness that members of the worldwide linguistics community acknowledge the passing away of Ilse Lehiste, Professor Emeritus at the Department of Linguistics, Ohio State University. Her wide-ranging scholarly legacy has greatly influenced the way linguists view the science of phonetics and its role in speech communication. Her pioneering research in employing and advancing experimental phonetic methods, together with groundbreaking theoretical works on the acoustic properties of speech sounds relevant to language, is to be considered a major contribution to the discipline of linguistics. 1. Introduction Ilse Lehiste was born in Tallinn, Estonia. She received a Doctor of Philosophy degree from the University of Hamburg, Germany, in 1949; subsequently she moved to the United States, where she earned her Ph.D. in linguistics from the University of Michigan in 1959. Dr. Lehiste was a recipient of honorary doctoral degrees from the University of Essex, Great Britain (1977); the University of Lund, Sweden (1982); the University of Tartu, Estonia (1989) and the Doctor of Human Letters honorary degree from the Ohio State University (1999). Prior to her appointment at the Ohio State University in 1963, she taught at the Kansas Wesleyan University (1950-1951), at the Detroit Institute of Technology (1951- 1956) and was a research associate at the University of Michigan’s prestigious Communication Sciences Laboratory (1958-1963). Dr. Lehiste was Chair of the Department of Linguistics, Ohio State University (1965-1971) and (1985-1987), as * I wish to acknowledge posthumously Ilse Lehiste’s comments on an earlier version of this paper. -
A Grammar of the Dom Language a Papuan Language of Papua New Guinea
A Grammar of the Dom Language A Papuan Language of Papua New Guinea TIDA Syuntaroˆ i Table of Contents Acknowledgements xiii Abbreviations xv Maps xvii Chapter 1 Introduction 1 1.1 Geographical and demographic background . 1 1.2 Socio-linguistic setting . 1 1.2.1 Tribes and clans . 3 1.2.2 Names and Naming . 4 1.3 Linguistic background . 5 1.3.1 Genetic relationships . 5 1.3.2 Typological profile . 6 1.3.3 Papuan context . 7 1.4 Previous work . 7 1.5 Present study . 8 Chapter 2 Phonology 9 2.1 Vowels . 9 2.1.1 Minimal pairs . 9 2.1.2 Lengthening . 9 2.1.3 /e/ . 9 2.1.4 [1] and /i/ insertion . 10 2.1.5 /i/ . 11 2.1.6 /o/ . 11 2.1.7 /u/ . 12 2.1.8 /a/ . 12 2.1.9 Sequence of vowels . 12 2.2 Consonants . 13 2.2.1 Minimal pairs . 13 2.2.2 Prenasalisation and gemination . 13 2.2.3 Obstruents . 14 2.2.3.1 /p/ . 14 2.2.3.2 /b/ . 14 2.2.3.3 /k/ . 14 2.2.3.4 /g/ . 14 ii Table of Contents 2.2.3.5 /t/ . 15 2.2.3.6 /d/ . 15 2.2.3.7 /s/ . 15 2.2.3.8 /r/ . 15 2.2.3.9 /l/ and /L/........................... 16 2.2.3.10 /s/, /t/ and /l/ . 17 2.2.3.11 /c/ and /j/ . 18 2.2.4 Nasals . 19 2.2.4.1 /n/ . 19 2.2.4.2 /m/ . -
Language Dispersal Beyond Farming
Language Dispersal Beyond Farming edited by Martine Robbeets and Alexander Savelyev John Benjamins Publishing Company Language Dispersal Beyond Farming Language Dispersal Beyond Farming Edited by Martine Robbeets Alexander Savelyev Max Planck Institute for the Science of Human History, Jena John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. ./z. Cataloging-in-Publication Data available from Library of Congress () (-) © –John Benjamins B.V. The electronic edition of this book is Open Access under a CC BY-NC-ND 4.0 license. https://creativecommons.org/licenses/by-nc-nd/4.0 This license permits reuse, distribution and reproduction in any medium for non-commercial purposes, provided that the original author(s) and source are credited. Derivative works may not be distributed without prior permission. This work may contain content reproduced under license from third parties. Permission to reproduce this third-party content must be obtained from these third parties directly. Permission for any reuse beyond the scope of this license must be obtained from John Ben- jamins Publishing Company, [email protected] John Benjamins Publishing Company · https://benjamins.com Table of contents List of tables vii List of figures ix List of contributors xi Acknowledgements xiii Chapter 1 Farming/Language Dispersal: Food for thought 1 Martine -
Grammatical Case in Estonian
Grammatical Case in Estonian Merilin Miljan A thesis submitted in fulfilment of requirements for the degree of Doctor of Philosophy to School of Philosophy, Psychology and Language Sciences, University of Edinburgh September 2008 Declaration I hereby declare that this thesis is of my own composition, and that it contains no material previously submitted for the award of any other degree. The work reported in this thesis has been executed by myself, except where due acknowledgement is made in the text. Merilin Miljan ii Abstract The aim of this thesis is to show that standard approaches to grammatical case fail to provide an explanatory account of such cases in Estonian. In Estonian, grammatical cases form a complex system of semantic contrasts, with the case-marking on nouns alternating with each other in certain constructions, even though the apparent grammatical functions of the noun phrases themselves are not changed. This thesis demonstrates that such alternations, and the differences in interpretation which they induce, are context dependent. This means that the semantic contrasts which the alternating grammatical cases express are available in some linguistic contexts and not in others, being dependent, among other factors, on the semantics of the case- marked noun and the semantics of the verb it occurs with. Hence, traditional approaches which treat grammatical case as markers of syntactic dependencies and account for associated semantic interpretations by matching cases directly to semantics not only fall short in predicting the distribution of cases in Estonian but also result in over-analysis due to the static nature of the theories which the standard approach to case marking comprises. -
17Th Century Estonian Orthography Reform, the Teaching of Reading and the History of Ideas
TRAMES, 2011, 15(65/60), 4, 365–384 17TH CENTURY ESTONIAN ORTHOGRAPHY REFORM, THE TEACHING OF READING AND THE HISTORY OF IDEAS Aivar Põldvee Institute of the Estonian Language, Tallinn, and Tallinn University Abstract. Literary languages can be divided into those which are more transparent or less transparent, based on phoneme-grapheme correspondence. The Estonian language falls in the category of more transparent languages; however, its development could have pro- ceeded in another direction. The standards of the Estonian literary language were set in the first half of the 17th century by German clergymen, following the example of German orthography, resulting in a gap between the ‘language of the church’ and the vernacular, as well as a discrepancy between writing and pronunciation. The German-type orthography was suited for Germans to read, but was not transparent for Estonians and created difficulties with the teaching of reading, which arose to the agenda in the 1680s. As a solution, Bengt Gottfried Forselius offered phonics instead of an alphabetic method, as well as a more phonetic and regular orthography. The old European written languages faced a similar problem in the 16th–17th centuries; for instance, Valentin Ickelsamer in Germany, John Hart in England, the grammarians of Port-Royal in France, and Comenius and others suggested using the phonic method and a more phonetic orthography. This article explores 17th century Estonian orthography reform and the reasons why it was realized as opposed to European analogues. Keywords: 17th century, spelling systems, literacy, phonics, Estonian language DOI: 10.3176/tr.2011.4.03 1. The problem In the Early Modern era, several universal processes took place in the history of European languages.