A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew

Total Page:16

File Type:pdf, Size:1020Kb

A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew Morphological analysis, lemmatization, vocalization, disambiguation and text-to-speech Dror Kamir Naama Soreq Yoni Neeman Melingo Ltd. Melingo Ltd. Melingo Ltd. 16 Totseret Haaretz st. 16 Totseret Haaretz st. 16 Totseret Haaretz st. Tel-Aviv, Israel Tel-Aviv, Israel Tel-Aviv, Israel [email protected] [email protected] [email protected] Abstract 1 Introduction This paper presents a comprehensive NLP sys- 1.1 The common Semitic basis from an NLP tem by Melingo that has been recently developed standpoint for Arabic, based on MorfixTM – an operational formerly developed highly successful comprehen- Modern Standard Arabic (MSA) and Modern Hebrew (MH) share the basic Semitic traits: rich sive Hebrew NLP system. morphology, based on consonantal roots (Jiðr / The system discussed includes modules for Šoreš)1, which depends on vowel changes and in morphological analysis, context sensitive lemmati- some cases consonantal insertions and deletions to zation, vocalization, text-to-phoneme conversion, create inflections and derivations.2 and syntactic-analysis-based prosody (intonation) For example, in MSA: the consonantal root model. It is employed in applications such as full /ktb/ combined with the vocalic pattern CaCaCa text search, information retrieval, text categoriza- derives the verb kataba ‘to write’. This derivation tion, textual data mining, online contextual dic- is further inflected into forms that indicate seman- tionaries, filtering, and text-to-speech applications tic features, such as number, gender, tense etc.: katab-tu ‘I wrote’, katab-ta ‘you (sing. masc.) in the fields of telephony and accessibility and wrote’, katab-ti ‘you (sing. fem.) wrote, ?a-ktubu could serve as a handy accessory for non-fluent ‘I write/will write’, etc. Arabic or Hebrew speakers. Similarly in MH: the consonantal root /ktv/ Modern Hebrew and Modern Standard Arabic combined with the vocalic pattern CaCaC derives share some unique Semitic linguistic characteris- the verb katav ‘to write’, and its inflections are: tics. Yet up to now, the two languages have been katav-ti ‘I wrote’, katav-ta ‘you (sing. masc.) handled separately in Natural Language Processing circles, both on the academic and on the applica- 1 A remark about the notation: Phonetic transcriptions always tive levels. This paper reviews the major similari- appear in Italics, and follow the IPA convention, except the ties and the minor dissimilarities between Modern following: ? – glottal stop, ¿ – voiced pharyngeal fricative (‘Ayn), đ – velarized d, ś – velarized s. Orthographic Hebrew and Modern Standard Arabic from the transliterations appear in curly brackets. Bound morphemes NLP standpoint, and emphasizes the benefit of de- (affixes, clitics, consonantal roots) are written between two veloping and maintaining a unified system for both slashes. Arabic and Hebrew linguistic terms are written in phonetic spelling beginning with a capital letter. The Arabic languages. term comes first. 2 For a review on the different approaches to Semitic inflec- tions see Beesley (2001), p. 2. wrote’, katav-t ‘you (sing. fem.) wrote’, e-xtov ‘I The fact that MSA and MH morphology is will write’ etc. root-based might promote the notion of identifying In fact, morphological similarity extends much the lemma with the root. But this solution is not further than this general observation, and includes satisfactory: in most cases there is indeed a dia- very specific similarities in terms of the NLP sys- chronic relation in meaning among words and tems, such as usage of nominal forms to mark forms of the same consonantal root. However, se- tenses and moods of verbs; usage of pronominal mantic shifts which occur over the years rule out enclitics to convey direct objects, and usage of this method in synchronic analysis. Moreover, proclitics to convey some prepositions. Moreover, some diachronic processes result in totally coinci- the inflectional patterns and clitics are quite similar dental “sharing” of a root by two or more com- in form in most cases. Both languages exhibit con- pletely different semantic domains. For example, struct formation (Iđa:fa / Smixut), which is similar in MSA, the words fajr ‘dawn’ and infija:r ‘explo- in its structure and in its role. The suffix marking sion’ share the same root /fjr/ (the latter might have feminine gender is also similar, and similarity goes originally been a metaphor). Similarly, in MH the as far as peculiarities in the numbering system, verbs pasal ‘to ban, disqualify’ and pisel ‘to sculp- where the female gender suffix marks the mascu- ture’ share the same root /psl/ (the former is an old line. Some of these phenomena will be demon- loan from Aramaic). strated below. In Morfix, as described below (2.1), a lemma is defined not as the root, but as the manifestation 1.2 Lemmatization of Semitic Languages of this root, most commonly as the lesser marked A consistent definition of lemma is crucial for form of a noun, adjective or verb. There is no es- a data retrieval system. A lemma can be said to be cape from some arbitrariness in the implementation the equivalent to a lexical entry: the basic gram- of this definition, due to the fine line between in- matical unit of natural language that is semanti- flectional morphology and derivational morphol- cally closed. In applications such as search ogy. However, Morfix generally follows the engines, usually it is the lemma that is sought, tradition set by dictionaries, especially bilingual while additional information including tense, num- dictionaries. Thus, for example, difference in part ber, and person are dispensable. of speech entails different lemmas, even if the In MSA and MH a lemma is actually the morphological process is partially predictable. common denominator of a set of forms (hundreds Similarly each verb pattern (Wazn / Binyan) is or thousands of forms in each set) that share the treated as a different lemma. same meaning and some morphological and syn- Even so, the roots should not be overlooked, as tactic features. Thus, in MSA, the forms: ?awla:d, they are a good basis for forming groups of lem- walada:ni, despite their remarkable difference in mas; in other words, the root can often serve as a appearance, share the same lemma WALAD ‘a boy’. “super-lemma”, joining together several lemmas, This is even more noticeable in verbs, where forms provided they all share a semantic field. like kataba, yaktubu, kutiba, yuktabu, kita:ba and The Issue of Nominal Inflections of Verbs many more are all part of the same lemma: 1.3 KATABA ‘to write’. The inconclusive selection of lemmas in MSA The rather large number of inflections and and MH can be demonstrated by looking into an complex forms (forms that include clitics, see be- interesting phenomenon: the nominal inflections of low 1.5) possible for each lemma results in a high verbs (roughly parallel to the Latin participle, see total number of forms, which, in fact, is estimated below). Since this issue is a good example both for to be the same for both languages: around 70 mil- a characteristic of Semitic NLP and for the simi- lion3. The mapping of these forms into lemmas is larities between MSA and MH, it is worthwhile to inconclusive (See Dichy (2001), p. 24). Hence the further elaborate on it. question rises: what should be defined as lemma in Both MSA and MH use the nominal inflections MSA and MH. of verbs to convey tenses, moods and aspects. These inflections are derived directly from the verb 3 For Arabic - see Beesley (2001), p. 7 For Hebrew - our own according to strict rules, and their forms are pre- sources. dictable in most cases. Nonetheless, grammati- It is easy to see the additional difficulty that cally, these forms behave as nouns or adjectives. this writing convention presents for NLP. The This means that they bear case marking in MSA, string {yktb} in MSA can be interpreted as yak- nominal marking for number and gender (in both tubu (future tense), yaktuba (subjunctive), yaktub languages) and they can be definite or indefinite (jussive), yuktabu (future tense passive) and even (in both languages). Moreover, these inflections yuktibu ‘he dictates/will dictate’ a form that is con- often serve as nouns or adjectives in their own sidered by Morfix to be a different lemma alto- right. This, in fact, causes the crucial problem for gether (see above 1.2). Furthermore, ambiguity can data retrieval, since the system has to determine occur between totally unrelated words, as will be whether the user refers to the noun/adjective or shown in section 1.7. A trained MSA reader can rather to the verb for which it serves as inflection. distinguish between these forms by using contex- Nominal inflections of verbs exist in non- tual cues (both syntactic and semantic). A similar Semitic languages as well; in most European lan- contextual sensitivity must be programmed into the guages participles and infinitives have nominal NLP system in order to meet this challenge. features. However, two Semitic traits make this Each language also has some orthographic pe- phenomenon more challenging in our case – the culiarities of its own. The most striking in MH is rich morphology which creates a large set of in- the multiple spelling conventions that are used si- flections for each base form (i.e. the verb is in- multaneously. The classical convention has been flected to create nominal forms and then each form replaced in most texts with some kind of spelling is inflected again for case, gender and number). system that partially indicates vowels, and thus Furthermore, Semitic languages allow nominal reduces ambiguities. An NLP system has to take clauses, namely verbless sentences, which increase into account the various spelling systems and the ambiguity. For example, in English it is easy to fact that the classic convention is still occasionally recognize the form ‘drunk’ in ‘he has drunk’ as used.
Recommended publications
  • Automatic Identification of Arabic Language Varieties and Dialects in Social Media
    Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat Farnazeh Kazemi Atefeh Farzindar University of Quebec in NLP Technologies Inc. NLP Technologies Inc. Montreal, 201 President Ken- 52 Le Royer Street W., 52 Le Royer Street W., nedy, Montreal, QC, Canada Montreal, QC, Canada Montreal, QC, Canada [email protected] [email protected] [email protected] Abstract Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dia- lects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially be- tween MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different condi- tions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable over- all accuracy of 98%. 1 Introduction Arabic is a morphologically rich and complex language, which presents significant challenges for nat- ural language processing and its applications. It is the official language in 22 countries spoken by more than 350 million people around the world1. Moreover, the Arabic language exists in a state of diglossia where the standard form of the language, Modern Standard Arabic (MSA) and the regional dialects (AD) live side-by-side and are closely related (Elfardy and Diab, 2013).
    [Show full text]
  • Language of the Old Testament: Biblical Hebrew “The Holy Tongue”
    E-ISSN 2281-4612 Academic Journal of Interdisciplinary Studies Vol 4 No 1 ISSN 2281-3993 MCSER Publishing, Rome-Italy March 2015 Language of the Old Testament: Biblical Hebrew “The Holy Tongue” Associate Professor Luke Emeka Ugwueye Department of Religion & Human Relations, Faculty of Arts, Nnamdi Azikiwe University, PMB 5025, Awka- Anambra State, Nigeria Email: [email protected] phone - 08067674763 Doi:10.5901/ajis.2015.v4n1p129 Abstract Some kind of familiarity with the structure and thought pattern of biblical Hebrew language enhances translation and improved ways of working with the language needed by students of Old Testament. That what the authors of the Scripture say also has meaning for us today is not in doubt but they did not express themselves primarily for us or in our language, and so it requires training on our part to understand them in their own language. The features of biblical Hebrew as combined in the language’s use of imagery and picturesque description of things are of huge assistance in this training exercise for a better operational knowledge of the language and meaning of Hebrew Scripture. Keywords: Language, Old Testament, Biblical Hebrew, Holy Tongue 1. Introduction Hebrew language is the language of the culture, religion and civilization of the Jewish people since ancient times. It belongs to the northwest ancient Semitic family of languages. The word Semitic, according to Kitchen (1992) is formed from the name Shem, Noah’s eldest son (Genesis 5:32). It is an adjective derived from ‘Shem’ meaning a member of any of the group of people speaking Akkadian, Phoenician, Punic, Aramaic, and especially Hebrew, Modern Hebrew and Arabic language.
    [Show full text]
  • Saudi Dialects: Are They Endangered?
    Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects.
    [Show full text]
  • An Introduction to the Relevance of and a Methodology for a Study of the Proper Names of the Book of Mormon
    An Introduction to the Relevance of and a Methodology for a Study of the Proper Names of the Book of Mormon Paul Y. Hoskisson Since the appearance of the Book of Mormon in 1830, its proper names have been discussed in diverse articles and books.1 Most of the statements proffer etymologies, while a few suggest the signicance of various names. Because of the uneven quality of these statements this paper proposes an apposite methodology. First, though, a few words need to be said about the relevance of name studies to our understanding of the Book of Mormon. Relevance With the exception of a few modern proper names coined for their composite sounds,2 all names have meanings in their language of origin. People are often not aware of these meanings because the name has a private interpretation, or the name has been borrowed into a language in which the original meaning is no longer evident, or the name is very old and the meaning has not been transmitted. For example, the English personal name Wayne is an old form of the more modern English word wain, meaning a “wagon” or “cart,” hence the surname Wainwright, “builder/repairer of “3 However, to our contemporary ears Wayne no longer has a meaning; it is simply a personal name. With training and experience, it is often possible to dene the language of origin, the meaning, and, when applicable, the grammatical form of a name. Names like Karen, Tony, and Sasha (also written Sacha from the French spelling) have been borrowed into English from Danish,4 Italian,5 and Russian6 respectively.
    [Show full text]
  • Classical and Modern Standard Arabic Marijn Van Putten University of Leiden
    Chapter 3 Classical and Modern Standard Arabic Marijn van Putten University of Leiden The highly archaic Classical Arabic language and its modern iteration Modern Standard Arabic must to a large extent be seen as highly artificial archaizing reg- isters that are the High variety of a diglossic situation. The contact phenomena found in Classical Arabic and Modern Standard Arabic are therefore often the re- sult of imposition. Cases of borrowing are significantly rarer, and mainly found in the lexical sphere of the language. 1 Current state and historical development Classical Arabic (CA) is the highly archaic variety of Arabic that, after its cod- ification by the Arab Grammarians around the beginning of the ninth century, becomes the most dominant written register of Arabic. While forms of Middle Arabic, a style somewhat intermediate between CA and spoken dialects, gain some traction in the Middle Ages, CA remains the most important written regis- ter for official, religious and scientific purposes. From the moment of CA’s rise to dominance as a written language, the whole of the Arabic-speaking world can be thought of as having transitioned into a state of diglossia (Ferguson 1959; 1996), where CA takes up the High register and the spoken dialects the Low register.1 Representation in writing of these spoken dia- lects is (almost) completely absent in the written record for much of the Middle Ages. Eventually, CA came to be largely replaced for administrative purposes by Ottoman Turkish, and at the beginning of the nineteenth century, it was function- ally limited to religious domains (Glaß 2011: 836).
    [Show full text]
  • Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi
    Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi To cite this version: Christopher Lucas, Stefano Manfredi. Arabic and Contact-Induced Change. 2020. halshs-03094950 HAL Id: halshs-03094950 https://halshs.archives-ouvertes.fr/halshs-03094950 Submitted on 15 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language Contact and Multilingualism 1 science press Contact and Multilingualism Editors: Isabelle Léglise (CNRS SeDyL), Stefano Manfredi (CNRS SeDyL) In this series: 1. Lucas, Christopher & Stefano Manfredi (eds.). Arabic and contact-induced change. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language science press Lucas, Christopher & Stefano Manfredi (eds.). 2020. Arabic and contact-induced change (Contact and Multilingualism 1). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/235 © 2020, the authors Published under the Creative Commons Attribution
    [Show full text]
  • The Modern South Arabian Languages
    HETZRON, R. (ed.). 1997. The Semitic Languages. London : Routledge, p. 378-423. The Modern South Arabian Languages Marie-Claude SIMEONE-SENELLE CNRS - LLACAN. Meudon. France 0. INTRODUCTION 0.1. In the South of the Arabian Peninsula, in the Republic of the Yemen and in the Sultanate of Oman, live some 200,000 Arabs whose maternal language is not Arabic but one of the so- called Modern South Arabian Languagues (MSAL). This designation is very inconvenient because of the consequent ambiguity, but a more appropriate solution has not been found so far. Although there exists a very close relationship with other languages of the same Western South Semitic group, the MSAL are different enough from Arabic to make intercomprehension impossible between speakers of any of the MSAL and Arabic speakers. The MSAL exhibit many common features also with the Semitic languages of Ethiopia; their relationships with Epigraphic South Arabian (SahaydicLanguages, according to Beeston) remain a point of discussion. There are six MSAL: Mehri (=M), HarsVsi (=H), BaT©ari (=B), HobyOt (=Hb), Jibbßli (=J), SoqoTri (=S. As regards the number of speakers and the geographical extension, Mehri is the main language. It is spoken by the Mahra tribes (about 100,000 speakers) and some Beyt Kathir, in the mountains of Dhofar in Oman, and in the Yemen, in the far eastern Governorate, on the coast, between the border of Oman and the eastern bank of Wadi Masilah, and not in the Mukalla area, contrary to Johnstone's statement (1975:2); in the North-West of the Yemen, Mehri is spoken as far as Thamud, on the border of the Rubº al-Khali.
    [Show full text]
  • Notes on Biblical Hebrew
    NOTES ON BIBLICAL HEBREW JACK FELMAN HEBREW AS A SEMITIC LANGUAGE Biblical Hebrew is a member of the Semitic family of some seventy lan- guages/dialects spoken in antiquity in Southwest Asia from the Sinai Desert and Arabian Desert in the south, to the Taurus Mountains of Lebanon in the north, the Zagros Mountains of Iran in the east, and the Mediterranean Sea in the west. Hemmed in by these natural barriers they remained a single collec- tive. Later they expanded a bit into North Africa and into the East African Horn. The Semitic languages were all dialects of one large dialect continuum. Over time however, important centers, usually capital cities, became foci of dialect concentrations and thus ultimately languages developed. These are the five great literary languages: Biblical Hebrew of Jerusalem, Aramaic of Da- mascus, Akkadian of Babylon and Nineveh, Classical Arabic of Mecca and Classical Ethiopic (Ge'ez) of Axum. Akkadian is often termed East Semitic, Biblical Hebrew and Aramaic Northwest Semitic, and Arabic and Ge'ez Southwest Semitic. Semitic itself is one branch of a much larger superfamily (phylum) of Ham- ito-Semitic (also known as Afro-Asiatic) of some 150 languages stretching in a band from Egypt through North Africa to Morocco, south to the East Afri- can Horn, and southwest in a large area around Lake Chad. Four major branches of languages are noted: Ancient Egyptian including Coptic, Berber, Cushitic, and Chadic. These are all considered sister families of languages related to Semitic. Biblical Hebrew as noted is considered a Northwest Semitic language part of the Canaanite group including Phoenician and Punic, Moabite, Edomite, Ammonite, Amorite and most importantly Ugaritic with shares not only a common linguistic connection but also a common literary culture.
    [Show full text]
  • The Arabic Language: a Latin of Modernity? Tomasz Kamusella University of St Andrews
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository Journal of Nationalism, Memory & Language Politics Volume 11 Issue 2 DOI 10.1515/jnmlp-2017-0006 The Arabic Language: A Latin of Modernity? Tomasz Kamusella University of St Andrews Abstract Standard Arabic is directly derived from the language of the Quran. The Ara- bic language of the holy book of Islam is seen as the prescriptive benchmark of correctness for the use and standardization of Arabic. As such, this standard language is removed from the vernaculars over a millennium years, which Arabic-speakers employ nowadays in everyday life. Furthermore, standard Arabic is used for written purposes but very rarely spoken, which implies that there are no native speakers of this language. As a result, no speech com- munity of standard Arabic exists. Depending on the region or state, Arabs (understood here as Arabic speakers) belong to over 20 different vernacular speech communities centered around Arabic dialects. This feature is unique among the so-called “large languages” of the modern world. However, from a historical perspective, it can be likened to the functioning of Latin as the sole (written) language in Western Europe until the Reformation and in Central Europe until the mid-19th century. After the seventh to ninth century, there was no Latin-speaking community, while in day-to-day life, people who em- ployed Latin for written use spoke vernaculars. Afterward these vernaculars replaced Latin in written use also, so that now each recognized European lan- guage corresponds to a speech community.
    [Show full text]
  • A Reply to the New Arabia Theory by Ahmad Al-Jallad
    City University of New York (CUNY) CUNY Academic Works Publications and Research CUNY Central Office 2020 The Case for Early Arabia and Arabic Language: A Reply to the New Arabia Theory by Ahmad al-Jallad Saad D. Abulhab CUNY Central Office How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/oaa_pubs/16 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] The Case for Early Arabia and Arabic Language: A Reply to the New Arabia Theory by Ahmad al-Jallad Saad D. Abulhab (City University of New York) April, 2020 The key aspect of my readings of the texts of ancient Near East languages stems from my evidence-backed conclusion that these languages should be classified and read as early Arabic. I will explore here this central point by replying to a new theory with an opposite understanding of early Arabia and the Arabic language, put forth by Ahmad al-Jallad, a scholar of ancient Near East languages and scripts. In a recent debate with al-Jallad, a self-described Semitic linguist, he proclaimed that exchanging the term 'Semitic' for ‘early Arabic’ or ‘early fuṣḥā’ is “simply a matter of nomenclature.”1 While his interpretation of the term Semitic sounds far more moderate than that of most Western philologists and epigraphists, it is not only fundamentally flawed and misleading, but also counterproductive. Most scholars, unfortunately, continue to misinform their students and the scholarly community by alluding to a so-called Semitic mother language, as a scientific fact.
    [Show full text]
  • Was There Hebrew Language in Ancient America? an Interview with Brian Stubbs
    Journal of Book of Mormon Studies Volume 9 Number 2 Article 9 7-31-2000 Was There Hebrew Language in Ancient America? An Interview with Brian Stubbs Brian Stubbs Follow this and additional works at: https://scholarsarchive.byu.edu/jbms BYU ScholarsArchive Citation Stubbs, Brian (2000) "Was There Hebrew Language in Ancient America? An Interview with Brian Stubbs," Journal of Book of Mormon Studies: Vol. 9 : No. 2 , Article 9. Available at: https://scholarsarchive.byu.edu/jbms/vol9/iss2/9 This Feature Article is brought to you for free and open access by the Journals at BYU ScholarsArchive. It has been accepted for inclusion in Journal of Book of Mormon Studies by an authorized editor of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Title Was There Hebrew Language in Ancient America? An Interview with Brian Stubbs Author(s) Brian Stubbs and John L. Sorenson Reference Journal of Book of Mormon Studies 9/2 (2000): 54–63, 83. ISSN 1065-9366 (print), 2168-3158 (online) Abstract In an interview with John L. Sorenson, linguist Brian Stubbs discusses the evidence he has used to establish that at least one language family in Mesoamerica is related to Semitic languages. Stubbs explains how his studies of Near Eastern languages, coupled with his studies of Uto-Aztecan, helped him find related word pairs in the two language families. The evidence for a link between Uto-Aztecan and Semitic languages, or even Egyptian or Arabic, is still tentative, although the evidence includes all the standard requirements of comparative or historical linguistic research: sound correspondences or con- sistent sound shifts, morphological correspondences, and a substantial lexicon consisting of as many as 1,000 words that exemplify those correspondences.
    [Show full text]
  • Interpreting Semitic Protolanguage As a Conlag Or Constructed Language I
    US-China Foreign Language, ISSN 1539-8080 March 2014, Vol. 12, No. 3, 183-192 D DAVID PUBLISHING Interpreting Semitic Protolanguage as a Conlag or Constructed Language I Edouard G. Belaga Université de Strasbourg, Strasbourg, France One of the most natural approaches to the problem of origins of natural languages is the study of hidden intelligent “communications” emanating from their historical forms. Semitic languages history is especially meaningful in this sense. One discovers, in particular, that BH (Biblical Hebrew), the best preserved fossil of the Semitic protolanguage, is primarily a verbal language, with an average verse of the Hebrew Bible containing no less than three verbs and with the biggest part of its vocabulary representing morphological derivations from verbal roots, almost entirely triliteral—the feature BH shares with all Semitic and a few other Afro-Asiatic languages. For classical linguists, more than hundred years ago, it was surprising to discover that verbal system of BH is, as we say today, optimal from the Information Theory’s point of view and that its formal topological morphology is semantically meaningful. These and other basic features of BH reflect, in our opinion, the original design of the Semitic protolanguage and suggest the indispensability of IIH—Inspirational Intelligence Hypothesis, our main topic—for the understanding of origins of natural languages. Our project is of vertical nature with respect to the time, in difference with the vastly dominating today horizontal linguistic approaches. Keywords: Semitic languages, protolanguage, verbal system, origins of natural languages, artificial intelligence, conlagor constructed language, VBBH (Verbal Body of Biblical Hebrew), IIH (Inspirational Intelligence Hypothesis) Language is one of the hallmarks of the human species—an important part of what makes us human.
    [Show full text]