Development of Speech Corpus and Automatic Speech Recognition of Angami

Development of Speech Corpus and Automatic Speech Recognition of Angami Viyazonuo Terhiija, Priyankoo Sarmah and Samudra Vijaya Indian Institute of Technology Guwahati, Guwahati, India [email protected], [email protected], [email protected] Abstract—Development of speech technologies for under- resourced languages is important. In this paper, we describe the development of speech corpus in Angami, an under resourced language, and the implementation of an automatic speech recognition system. Angami is a tone language, belonging to the Tibeto-Burman language family. It is spoken in the state of Nagaland in North-East India. The speech corpus and the speech recognition system was developed for the variety of Angami spoken in Kohima village. In this work, we report the creation of a database of Angami sentences, read by eleven Angami speakers. The outcome of this speech database creation efforts (speech data, transcription and a pronunciation dictionary) was used in the development of an automatic speech recognition system using Kaldi, a public domain toolkit. The performance of various versions of the system using different types of acoustic models are presented and discussed. While the word error rate on training data is under 5%, the error is higher, for unseen test data by a new speaker, by a factor of 2 to 5, depending on the speaker. The average word error rate of a ‘leave one speaker out’ cross validation evaluation of context independent phone model is 17.3%. The results and inferences of a few experiments conducted to discover optimal settings of the system are presented. Index Terms—Angami, speech database, ASR I. INTRODUCTION Fig. 1. The state of Nagaland in North East India. Nagaland is a small, multi-lingual state in north-east India, located at the foothills of the Himalayan range as shown in instead of typing keywords or navigate complex menus. In Fig. 1. The state is surrounded by Barail mountain ranges this context, machine recognition of spoken language becomes to the southeast and Patkai range to the northeast. Nagaland important. Such speech recognition systems were implemented is home to sixteen indigenous tribes, out of which fourteen for some major languages of north-east India [3], [4]. How- are Nagas and the other two tribes are Kukis and Kacharis. ever, no such system exists for any indigenous languages of The residents of Nagaland generally speak multiple languages; Nagaland. This paper reports the creation of speech database English as an official language, Nagamese as a lingua franca for standard Angami and its use in the implementation of an and Hindi as a subject in school. There are more than a Automatic Speech Recognition (ASR) system. dozen major (spoken by 10,000 or more people) indigenous The paper is organised as follows. A brief overview of languages spoken in the state of Nagaland [1]. Angami is one Angami language is given in section II. The development of a of the major communities of the Nagas in Nagaland and is database of Angami language spoken by native Angami speak- primarily found in the Kohima district [2]. Angami is a tone ers is described in section III. The details of implementation language of the Tibeto-Burman language family. of an ASR system is given in section IV. The preliminary With the advancement in telecommunication network and results of the experiments conducted with the ASR systems is availability of inexpensive mobile telephone services, the use given in section V. A summary of the work and conclusions of internet has increased in India. Such an interaction with a are given in section VI. smartphone would be easier if people could talk to the device II. AN OVERVIEW OF THE ANGAMI LANGUAGE This work is part of an ongoing project titled “Sociolinguistic Study of Angami (also referred to as Tenyidie, ISO 639−3: njm) [5] Phonetic Variations among the Clans and Khels of Two Southern Angami Villages”, funded by the Indian Council of Social Science Research (ICSSR), is spoken by 152; 796 people in Nagaland [1]. Angami is Government of India. an ethno−cultural as well as a linguistic group.The Angami 978-1-7281-2449-0/19/$31.00 ©2019 IEEE TABLE I TABLE III TONAL MINIMAL PAIR SETS IN STANDARD ANGAMI THE IPA LABELS OF CONSONANTS OF ANGAMI ALONG WITH THE CORRESPONDING LABELS USED IN THIS WORK. Word Tone Gloss T1 to use Label IPA symbol Label IPA symbol sE T2/T3 to erect p p ph ph T4 three b b t t T5 snatch th th d d T1 to incline k k kh kh pE T2/T3 fat/bridge g g m m T4 shiver mh mh n n T5 shoot nh nh ny ñ T1 to twist nyh ñh ng N ôi T2/T3 to hold f f v v T4 also s s z z T5 to mix sh S zh Z h h pf pf pfh pfh bv bv TABLE II ts ts tsh tsh VOWEL MINIMAL PAIRS IN ANGAMI dz dz c tS ch tSh j dZ words Meaning l l lh lh kôa plenty r ô rh ôh kôE laugh w w wh wh kôi clingy y j yh jh kôu flow kô@ nest kôo allergy A. Standard Angami and variations community is broadly divided into four groups namely, the The Angami community can be categorized into four groups Northern, Southern, Western and Chakro (literally meaning as mentioned in section II. Several villages constitute each people residing below the highway) based on geography group. It is a folk belief of the Angamis that each village and administrative convenience. Each group consists of small has its own variety. The identity of a native speaker is the villages varying from ten to twenty, and each village is said to variety (s)he speaks. Due to the variations of Angami speech, have it’s own variety. The variety of Angami spoken around Rev. J.E. Tanquist (one of the earliest American missionaries) Kohima village is considered to be the standard form of convened a meeting of the elders of the community in 1939, Angami language. and formed the Angami Literature Committee to establish a common pattern of writing and spelling system [15]. This Some of the prominent works in Angami are the descrip- gave birth to the standard Angami, which is also known as tions of grammar in Standard Angami and western variety Tenyidie. In 1971, the literature committee changed its name [6]–[11]. Dialectal variations and internal variations of the to Ura Academy. It acts as a catalyst of development of the language based on kinship ties and geography have also been language and socio-cultural aspect of the community. With reported [12], [13]. Angami is a register tonal language with the development of the language, Standard Angami was intro- four lexical tones. Previous works have stated the inventory of duced in school and university curriculum. Previous linguistics tones ranging from four to five [7]–[11], [13]. In a recent study studies on Angami, including the description of grammar, were of tone and vowel interaction in Angami, a pilot study was based on data collected from Khonoma, Mezoma and Jotsoma conducted to determine the tonal inventory [14]. The study dialects, spoken in the western region of Kohima district [6], found that there is no acoustic difference between T2 and [16]. Studies by Giridhar and Kuolie focussed on the Standard T3 produced by the Standard Angami speakers. Hence, T2 Angami [8], [10]. Studies in variations of Angami across the and T3 were merged and treated as one entity. The examples Southern Angami villages and also the internal variations in of tone minimal pairs in Angami are shown in Table I, Kohima village based on clans were conducted [12], [13]. where T1 represents the highest tone and T5, the lowest. Tones in Angami not only have lexical significance but also grammatical functions. However, such tones are quite limited III. DEVELOPMENT OF THE ANGAMI SPEECH DATABASE in number. Angami has six vowels, /a, i, E, u, o, @/. Examples of vowel Here, we describe the steps in the development of the minimal pairs with consonant cluster /kô/ on the onset position Angami speech database. The first subsection describes the are shown in Fig. II. There are 40 consonant phonemes in the preparation of text materials that were read by the informants. language. Angami uses Roman script for writing. The IPA The next subsection describes the geographical areas of data label of the forty consonants, along with the corresponding collection, the speech recording procedure, segmentation and labels (in Roman script) used in this experiment are shown in annotation of speech files and an overview of the speech Table III. database. The speech recordings were then transferred to a computer for further processing. A passage or an article or a poetry, read by a speaker had been recorded in a single speech file. The text corresponding to such a speech was divided in terms of sentences or phrases. Each text was numbered serially, and stored in an excel file. Using Praat 6:0:43 toolkit [18], each sound file was segmented and annotated with the corresponding sentence number in the excel file. Care was taken to ensure that the speech in a sound file matches with the text with the same serial number. Out of recordings from 13 speakers, two were removed due to high levels of noise and disturbances from the environment. Thus, the speech database used in this work consists of speech recordings from a total of 11 speakers: 6 male and 5 female. The average age of the speakers is 28. Fig. 2. The distribution of the sources of sentences.

Load more