Motivations, Challenges, and Perspectives for the Development of a Deep Learning Based Automatic Speech Recognition System for the Under-Resourced Ngiemboon Language
Total Page:16
File Type:pdf, Size:1020Kb
Motivations, Challenges, and Perspectives for the Development of a Deep Learning based Automatic Speech Recognition System for the Under-resourced Ngiemboon Language Patrice A. Yemmene Laurent Besacier School of Engineering Laboratoire Informatique de Grenoble University of Saint Thomas, MN, USA University of Grenoble, France [email protected] [email protected] Abstract systems for automatic speech recognition were mostly guided by the theory of acoustic-phonetics, which Nowadays, a broad range of speech recognition describes the phonetic elements of speech (the basic technologies (such as Apple Siri and Amazon sounds of the language) and tries to explain how they Alexa) are developed as the user interface has are acoustically realized in a spoken utterance” (Juang become ever convenient and prevalent. Machine learning algorithms are yielding better training and Rabiner, 2005). These efforts date back to the early results to support these developments in 50s. Since then, ASR has yielded incredible Automatic Speech Recognition (ASR). development in a broad range of commercial However, most of these developments have been technologies where Speech Recognition as the user in languages with worldwide, political, economic interface has become ever useful and pervasive. and/or scientific influence such as English, However, most of these developments have been in Japanese, German, French, and Spanish, just to languages with strong scientific, political, and/or name a few. On the other hand, there has been economic influences such as English, German, French, little or no development of ASR systems (or and to some extent Japanese and Spanish, just to name a language technologies) in most minority and few. Historically, most of these languages have always under-resourced languages of the world, especially those spoken in Sub-Sahara Africa. enjoyed social prestige and their extensive vocabulary One of such languages is the Ngiemboon has given them prominence in the world of commerce. language which is the focus of this paper. The It is worth noting that ASR research and innovation in Ngiemboon language is a Grassfield Bantu these languages are significant and continuous. On the language spoken in the West Region of contrary, there has been little or no research and Cameroon (Africa) by about 400,000 people. development efforts in ASR and other Human Language This paper highlights the motivations, challenges Technologies in most minority languages of the world, and perspectives inherent in a work in progress particularly those spoken in Sub-Sahara Africa. Yet, (speech data collection is underway) to build a these languages serve as the main vector for the socio- Deep Learning based Automatic Speech economic development of communities where they are Recognition System for this minority under- resourced Cameroonian local language. This spoken. In this paper, we highlight the motivations, paper introduces the issues critical to conducting challenges, and perspectives that must be considered in research in Speech Processing in this language building Human Language Technologies, more precisely an Automatic Speech Recognition System for 1. Introduction the Ngiemboon language. Automatic Speech Recognition is “the process and the 1.1 Paper objective and contribution related technology for converting the speech signal into A surge of interest in the development of its corresponding sequence of words or other linguistic technologies in African languages is emerging. The entities by means of algorithms implemented in a African Languages in the Field: speech Fundamentals device, a computer, or computer clusters” (Li and and Automation (ALFFA)1 project (spearheaded in O'Shaughnessy, 2003). As an active field of research, France by the “Laboratoire Informatique de Grenoble” Automatic Speech Recognition has told significant stories for a few decades. “Early attempts to design 1 http://alffa.imag.fr/ of the Grenoble Alpes University) is a great example 2.1 Sociolinguistic considerations and has been leading significant efforts in the The nation of Cameroon is home to about 247 local automation of languages spoken in sub-Sahara Africa. languages, two official languages (French and English) Researchers interested in African Languages hope to and Pidgin English (Echu, 2004). In their linguistic contribute to the history of Language Technologies choices, it is estimated that 73% of Cameroonians use innovations as it is being written across the continent. their mother tongue (a local Cameroonian language) The objective of this paper is to contribute to the instead of a foreign language (English and/or French), research on the development of Language Technologies despite the peaceful coexistence of these Indo-European in African Languages. As a pioneer research project on languages with Cameroonian languages. This linguistic ASR for the Ngiemboon language, this work will choice is explained by the fact that Cameroonian local provide a guide for work in Natural Language languages are spoken either in the village of their native Processing (NLP) in minority and under-resourced speakers, their homes, and often used for heritage and language in Cameroon and other Sub-Sahara African cultural identification (Ngefac, 2010). In this diverse languages. linguistic landscape, many industries and fields 1.2 The choice of the language for ASR currently access ASR only in the high-resourced languages of French and English where “presently ASR The authors of this paper both share a very systems find a wide variety of applications in the strong interest in the automation of minority and under- following domains; Medical Assistance; Industrial resourced African languages. In fact, under different Robotics; Forensic and Law enforcement; Defense & circumstances, each of them carried out research on Aviation, Telecommunications Industry; Home some of these languages and have become aware of the Automation and security Access Control; I.T. and challenges faced when working on the digitization and Consumer Electronics” (Vajpai and Bora, 2016). As automation of minority languages of Africa. One of vital as these might be, they are still a luxury for such challenges is “to bridge the gap between language speakers of Cameroonian local languages, including experts (the speakers themselves) and technology Ngiemboon speakers. Speakers of Ngiemboon, as well experts (system developers). Indeed, it is often almost as speakers of other Cameroonian languages, prefer the impossible to find native speakers with the necessary use of their mother tongue in daily communication technical skills to develop ASR systems in their native (Echu, 2004). What if vital ASR applications were language” (Besacier et al. 2014). It becomes obvious developed in the Ngiemboon language as well? It would that a degree of collaboration between native speakers be an opportunity with great excitement for Ngiemboon and systems developers is essential to addressing this speakers’ economic, social, and community identified challenge. Fortunately, one of the authors of development. this paper is a native speaker of the Ngiemboon language and a trained linguist who has contributed to 2.2 Literacy gap the development of an already published trilingual – Over two decades ago, analysis of the literacy landscape French – English – Ngiemboon dictionary. The in Cameroon reported that “four million Cameroonians availability of a native speaker explains the authors’ above fifteen years of age are illiterate. This includes preference in exploring ASR for the Ngiemboon people who never went to school and those who have language. lapsed back into illiteracy. The Cameroon population is about eleven million people. This is a young population. 2. Motivations About 60 percent of Cameroonians are below twenty- The rationale for ASR research and development in five years old. The accuracy of literacy rate estimates is under-resourced, minority languages spoken in Sub- doubtful and could be higher” (Tadadjeu, 2004). It is Sahara Africa such as the Ngiemboon language is highly likely that the population of Cameroon has grounded in a unique sociolinguistic context, an grown significantly since then. A 2018 US Federal observation of existing literacy gap, a recognition of Government civilian foreign intelligence service report advances in technology, a paradigm shift in human suggested that about 25 million heads were counted in rights priorities and scientific discoveries as well as an Cameroon with a 75 % literacy rate. This estimate understanding of the implications of these for economic assumes that about 6 million individuals or more living and community development. In this section, we in Cameroon were illiterate as early as last year. We do highlight these motivation factors. not have any reason to believe this has changed much during the last few months. In an illiteracy context such as this one, the use of oral communication is preponderate and convenient. Human-Machine giving this linguistic community the opportunity to interaction via voice has great potential for economic exercise one of its fundamental rights, it is also a and community development. fascinating endeavor to develop an ASR system for the Ngiemboon language. 2.3 Economic and community development motivations 2.5 Scientific motivations In recent years, the mobile telephone industry has The Development of a Speech Recognition system in experienced a significant boom, in this part of the world the Ngiemboon language will play a great role