Transfer Learning for Less-Resourced Semitic Languages Speech Recognition: the Case of Amharic
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Language of the Old Testament: Biblical Hebrew “The Holy Tongue”
E-ISSN 2281-4612 Academic Journal of Interdisciplinary Studies Vol 4 No 1 ISSN 2281-3993 MCSER Publishing, Rome-Italy March 2015 Language of the Old Testament: Biblical Hebrew “The Holy Tongue” Associate Professor Luke Emeka Ugwueye Department of Religion & Human Relations, Faculty of Arts, Nnamdi Azikiwe University, PMB 5025, Awka- Anambra State, Nigeria Email: [email protected] phone - 08067674763 Doi:10.5901/ajis.2015.v4n1p129 Abstract Some kind of familiarity with the structure and thought pattern of biblical Hebrew language enhances translation and improved ways of working with the language needed by students of Old Testament. That what the authors of the Scripture say also has meaning for us today is not in doubt but they did not express themselves primarily for us or in our language, and so it requires training on our part to understand them in their own language. The features of biblical Hebrew as combined in the language’s use of imagery and picturesque description of things are of huge assistance in this training exercise for a better operational knowledge of the language and meaning of Hebrew Scripture. Keywords: Language, Old Testament, Biblical Hebrew, Holy Tongue 1. Introduction Hebrew language is the language of the culture, religion and civilization of the Jewish people since ancient times. It belongs to the northwest ancient Semitic family of languages. The word Semitic, according to Kitchen (1992) is formed from the name Shem, Noah’s eldest son (Genesis 5:32). It is an adjective derived from ‘Shem’ meaning a member of any of the group of people speaking Akkadian, Phoenician, Punic, Aramaic, and especially Hebrew, Modern Hebrew and Arabic language. -
Saudi Dialects: Are They Endangered?
Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects. -
Modeling Language Variation and Universals: a Survey on Typological Linguistics for Natural Language Processing
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen To cite this version: Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, et al.. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing. 2018. hal-01856176 HAL Id: hal-01856176 https://hal.archives-ouvertes.fr/hal-01856176 Preprint submitted on 9 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Maria Ponti∗ Helen O’Horan∗∗ LTL, University of Cambridge LTL, University of Cambridge Yevgeni Berzaky Ivan Vuli´cz Department of Brain and Cognitive LTL, University of Cambridge Sciences, MIT Roi Reichart§ Thierry Poibeau# Faculty of Industrial Engineering and LATTICE Lab, CNRS and ENS/PSL and Management, Technion - IIT Univ. Sorbonne nouvelle/USPC Ekaterina Shutova** Anna Korhonenyy ILLC, University of Amsterdam LTL, University of Cambridge Understanding cross-lingual variation is essential for the development of effective multilingual natural language processing (NLP) applications. -
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew Morphological analysis, lemmatization, vocalization, disambiguation and text-to-speech Dror Kamir Naama Soreq Yoni Neeman Melingo Ltd. Melingo Ltd. Melingo Ltd. 16 Totseret Haaretz st. 16 Totseret Haaretz st. 16 Totseret Haaretz st. Tel-Aviv, Israel Tel-Aviv, Israel Tel-Aviv, Israel [email protected] [email protected] [email protected] Abstract 1 Introduction This paper presents a comprehensive NLP sys- 1.1 The common Semitic basis from an NLP tem by Melingo that has been recently developed standpoint for Arabic, based on MorfixTM – an operational formerly developed highly successful comprehen- Modern Standard Arabic (MSA) and Modern Hebrew (MH) share the basic Semitic traits: rich sive Hebrew NLP system. morphology, based on consonantal roots (Jiðr / The system discussed includes modules for Šoreš)1, which depends on vowel changes and in morphological analysis, context sensitive lemmati- some cases consonantal insertions and deletions to zation, vocalization, text-to-phoneme conversion, create inflections and derivations.2 and syntactic-analysis-based prosody (intonation) For example, in MSA: the consonantal root model. It is employed in applications such as full /ktb/ combined with the vocalic pattern CaCaCa text search, information retrieval, text categoriza- derives the verb kataba ‘to write’. This derivation tion, textual data mining, online contextual dic- is further inflected into forms that indicate seman- tionaries, filtering, and text-to-speech applications tic features, such as number, gender, tense etc.: katab-tu ‘I wrote’, katab-ta ‘you (sing. masc.) in the fields of telephony and accessibility and wrote’, katab-ti ‘you (sing. fem.) wrote, ?a-ktubu could serve as a handy accessory for non-fluent ‘I write/will write’, etc. -
An Introduction to the Relevance of and a Methodology for a Study of the Proper Names of the Book of Mormon
An Introduction to the Relevance of and a Methodology for a Study of the Proper Names of the Book of Mormon Paul Y. Hoskisson Since the appearance of the Book of Mormon in 1830, its proper names have been discussed in diverse articles and books.1 Most of the statements proffer etymologies, while a few suggest the signicance of various names. Because of the uneven quality of these statements this paper proposes an apposite methodology. First, though, a few words need to be said about the relevance of name studies to our understanding of the Book of Mormon. Relevance With the exception of a few modern proper names coined for their composite sounds,2 all names have meanings in their language of origin. People are often not aware of these meanings because the name has a private interpretation, or the name has been borrowed into a language in which the original meaning is no longer evident, or the name is very old and the meaning has not been transmitted. For example, the English personal name Wayne is an old form of the more modern English word wain, meaning a “wagon” or “cart,” hence the surname Wainwright, “builder/repairer of “3 However, to our contemporary ears Wayne no longer has a meaning; it is simply a personal name. With training and experience, it is often possible to dene the language of origin, the meaning, and, when applicable, the grammatical form of a name. Names like Karen, Tony, and Sasha (also written Sacha from the French spelling) have been borrowed into English from Danish,4 Italian,5 and Russian6 respectively. -
Essentials of Language Typology
Lívia Körtvélyessy Essentials of Language Typology KOŠICE 2017 © Lívia Körtvélyessy, Katedra anglistiky a amerikanistiky, Filozofická fakulta UPJŠ v Košiciach Recenzenti: Doc. PhDr. Edita Kominarecová, PhD. Doc. Slávka Tomaščíková, PhD. Elektronický vysokoškolský učebný text pre Filozofickú fakultu UPJŠ v Košiciach. Všetky práva vyhradené. Toto dielo ani jeho žiadnu časť nemožno reprodukovať,ukladať do informačných systémov alebo inak rozširovať bez súhlasu majiteľov práv. Za odbornú a jazykovú stánku tejto publikácie zodpovedá autor. Rukopis prešiel redakčnou a jazykovou úpravou. Jazyková úprava: Steve Pepper Vydavateľ: Univerzita Pavla Jozefa Šafárika v Košiciach Umiestnenie: http://unibook.upjs.sk Dostupné od: február 2017 ISBN: 978-80-8152-480-6 Table of Contents Table of Contents i List of Figures iv List of Tables v List of Abbreviations vi Preface vii CHAPTER 1 What is language typology? 1 Tasks 10 Summary 13 CHAPTER 2 The forerunners of language typology 14 Rasmus Rask (1787 - 1832) 14 Franz Bopp (1791 – 1867) 15 Jacob Grimm (1785 - 1863) 15 A.W. Schlegel (1767 - 1845) and F. W. Schlegel (1772 - 1829) 17 Wilhelm von Humboldt (1767 – 1835) 17 August Schleicher 18 Neogrammarians (Junggrammatiker) 19 The name for a new linguistic field 20 Tasks 21 Summary 22 CHAPTER 3 Genealogical classification of languages 23 Tasks 28 Summary 32 CHAPTER 4 Phonological typology 33 Consonants and vowels 34 Syllables 36 Prosodic features 36 Tasks 38 Summary 40 CHAPTER 5 Morphological typology 41 Morphological classification of languages (holistic -
Classical and Modern Standard Arabic Marijn Van Putten University of Leiden
Chapter 3 Classical and Modern Standard Arabic Marijn van Putten University of Leiden The highly archaic Classical Arabic language and its modern iteration Modern Standard Arabic must to a large extent be seen as highly artificial archaizing reg- isters that are the High variety of a diglossic situation. The contact phenomena found in Classical Arabic and Modern Standard Arabic are therefore often the re- sult of imposition. Cases of borrowing are significantly rarer, and mainly found in the lexical sphere of the language. 1 Current state and historical development Classical Arabic (CA) is the highly archaic variety of Arabic that, after its cod- ification by the Arab Grammarians around the beginning of the ninth century, becomes the most dominant written register of Arabic. While forms of Middle Arabic, a style somewhat intermediate between CA and spoken dialects, gain some traction in the Middle Ages, CA remains the most important written regis- ter for official, religious and scientific purposes. From the moment of CA’s rise to dominance as a written language, the whole of the Arabic-speaking world can be thought of as having transitioned into a state of diglossia (Ferguson 1959; 1996), where CA takes up the High register and the spoken dialects the Low register.1 Representation in writing of these spoken dia- lects is (almost) completely absent in the written record for much of the Middle Ages. Eventually, CA came to be largely replaced for administrative purposes by Ottoman Turkish, and at the beginning of the nineteenth century, it was function- ally limited to religious domains (Glaß 2011: 836). -
The Modern South Arabian Languages
HETZRON, R. (ed.). 1997. The Semitic Languages. London : Routledge, p. 378-423. The Modern South Arabian Languages Marie-Claude SIMEONE-SENELLE CNRS - LLACAN. Meudon. France 0. INTRODUCTION 0.1. In the South of the Arabian Peninsula, in the Republic of the Yemen and in the Sultanate of Oman, live some 200,000 Arabs whose maternal language is not Arabic but one of the so- called Modern South Arabian Languagues (MSAL). This designation is very inconvenient because of the consequent ambiguity, but a more appropriate solution has not been found so far. Although there exists a very close relationship with other languages of the same Western South Semitic group, the MSAL are different enough from Arabic to make intercomprehension impossible between speakers of any of the MSAL and Arabic speakers. The MSAL exhibit many common features also with the Semitic languages of Ethiopia; their relationships with Epigraphic South Arabian (SahaydicLanguages, according to Beeston) remain a point of discussion. There are six MSAL: Mehri (=M), HarsVsi (=H), BaT©ari (=B), HobyOt (=Hb), Jibbßli (=J), SoqoTri (=S. As regards the number of speakers and the geographical extension, Mehri is the main language. It is spoken by the Mahra tribes (about 100,000 speakers) and some Beyt Kathir, in the mountains of Dhofar in Oman, and in the Yemen, in the far eastern Governorate, on the coast, between the border of Oman and the eastern bank of Wadi Masilah, and not in the Mukalla area, contrary to Johnstone's statement (1975:2); in the North-West of the Yemen, Mehri is spoken as far as Thamud, on the border of the Rubº al-Khali. -
Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona. -
September 2009 Special Edition Language, Culture and Identity in Asia
The Linguistics Journal – September 2009 The Linguistics Journal September 2009 Special Edition Language, Culture and Identity in Asia Editors: Francesco Cavallaro, Andrea Milde, & Peter Sercombe The Linguistics Journal – Special Edition Page 1 The Linguistics Journal – September 2009 The Linguistics Journal September 2009 Special Edition Language, Culture and Identity in Asia Editors: Francesco Cavallaro, Andrea Milde, & Peter Sercombe The Linguistics Journal: Special Edition Published by the Linguistics Journal Press Linguistics Journal Press A Division of Time Taylor International Ltd Trustnet Chambers P.O. Box 3444 Road Town, Tortola British Virgin Islands http://www.linguistics-journal.com © Linguistics Journal Press 2009 This E-book is in copyright. Subject to statutory exception no reproduction of any part may take place without the written permission of the Linguistics Journal Press. No unauthorized photocopying All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise, without the prior written permission of The Linguistics Journal. [email protected] Editors: Francesco Cavallaro, Andrea Milde, & Peter Sercombe Senior Associate Editor: Katalin Egri Ku-Mesu Journal Production Editor: Benjamin Schmeiser ISSN 1738-1460 The Linguistics Journal – Special Edition Page 2 The Linguistics Journal – September 2009 Table of Contents Foreword by Francesco Cavallaro, Andrea Milde, & Peter Sercombe………………………...... 4 - 7 1. Will Baker……………………………………………………………………………………… 8 - 35 -Language, Culture and Identity through English as a Lingua Franca in Asia: Notes from the Field 2. Ruth M.H. Wong …………………………………………………………………………….. 36 - 62 -Identity Change: Overseas Students Returning to Hong Kong 3. Jules Winchester……………………………………..………………………………………… 63 - 81 -The Self Concept, Culture and Cultural Identity: An Examination of the Verbal Expression of the Self Concept in an Intercultural Context 4. -
Notes on Biblical Hebrew
NOTES ON BIBLICAL HEBREW JACK FELMAN HEBREW AS A SEMITIC LANGUAGE Biblical Hebrew is a member of the Semitic family of some seventy lan- guages/dialects spoken in antiquity in Southwest Asia from the Sinai Desert and Arabian Desert in the south, to the Taurus Mountains of Lebanon in the north, the Zagros Mountains of Iran in the east, and the Mediterranean Sea in the west. Hemmed in by these natural barriers they remained a single collec- tive. Later they expanded a bit into North Africa and into the East African Horn. The Semitic languages were all dialects of one large dialect continuum. Over time however, important centers, usually capital cities, became foci of dialect concentrations and thus ultimately languages developed. These are the five great literary languages: Biblical Hebrew of Jerusalem, Aramaic of Da- mascus, Akkadian of Babylon and Nineveh, Classical Arabic of Mecca and Classical Ethiopic (Ge'ez) of Axum. Akkadian is often termed East Semitic, Biblical Hebrew and Aramaic Northwest Semitic, and Arabic and Ge'ez Southwest Semitic. Semitic itself is one branch of a much larger superfamily (phylum) of Ham- ito-Semitic (also known as Afro-Asiatic) of some 150 languages stretching in a band from Egypt through North Africa to Morocco, south to the East Afri- can Horn, and southwest in a large area around Lake Chad. Four major branches of languages are noted: Ancient Egyptian including Coptic, Berber, Cushitic, and Chadic. These are all considered sister families of languages related to Semitic. Biblical Hebrew as noted is considered a Northwest Semitic language part of the Canaanite group including Phoenician and Punic, Moabite, Edomite, Ammonite, Amorite and most importantly Ugaritic with shares not only a common linguistic connection but also a common literary culture. -
The Arabic Language: a Latin of Modernity? Tomasz Kamusella University of St Andrews
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository Journal of Nationalism, Memory & Language Politics Volume 11 Issue 2 DOI 10.1515/jnmlp-2017-0006 The Arabic Language: A Latin of Modernity? Tomasz Kamusella University of St Andrews Abstract Standard Arabic is directly derived from the language of the Quran. The Ara- bic language of the holy book of Islam is seen as the prescriptive benchmark of correctness for the use and standardization of Arabic. As such, this standard language is removed from the vernaculars over a millennium years, which Arabic-speakers employ nowadays in everyday life. Furthermore, standard Arabic is used for written purposes but very rarely spoken, which implies that there are no native speakers of this language. As a result, no speech com- munity of standard Arabic exists. Depending on the region or state, Arabs (understood here as Arabic speakers) belong to over 20 different vernacular speech communities centered around Arabic dialects. This feature is unique among the so-called “large languages” of the modern world. However, from a historical perspective, it can be likened to the functioning of Latin as the sole (written) language in Western Europe until the Reformation and in Central Europe until the mid-19th century. After the seventh to ninth century, there was no Latin-speaking community, while in day-to-day life, people who em- ployed Latin for written use spoke vernaculars. Afterward these vernaculars replaced Latin in written use also, so that now each recognized European lan- guage corresponds to a speech community.