General Purpose Methodology and Tooling for Text-To-Speech Support in Voice Services for Under-Resourced Languages
Total Page:16
File Type:pdf, Size:1020Kb
General purpose methodology and tooling for Text-to-Speech support in voice services for under-resourced languages Justyna Kl˛eczar Vrije Universiteit Amsterdam De Boelelaan 1105 1081 HV Amsterdam The Netherlands Student No. 2602204 [email protected] ABSTRACT [7]. According to the GSMA study conducted in 2016, it In Africa, mobile telephony has become widespread in re- was found that over half a billion people across the conti- cent years. This trend has induced growth in development nent have been subscribed to mobile services [6]. The fore- of mobile applications and services. However, due to low lit- cast number of subscriptions is estimated to increase to 725 eracy levels as well as a huge linguistic diversity across the million by 2020. continent, it proves challenging to create accessible applica- This development opens up many opportunities to im- tions for a wide range of African communities. Implementing prove African economy and information sharing. However, voice-based services is one possible way to bypass the prob- due to the low literacy levels among the African community, lem of illiteracy, but not linguistic diversity. In this paper many of the regular services provided by mobile phones turn the Text-to-Speech Slot and Filler system, originally devel- out to be inaccessible. In Sub-Saharan Africa, around 182 oped as part of the Lwazi II project, is presented. The sys- million adults and 48 million youths (aged 15-24) are illiter- tem is aimed at under-resourced languages and supports a ate [19]. On top of this, there is a huge diversity of languages limited dictionary of words. Thanks to its relative simplicity across the continent, summing up to over 1500 in total [4]. of use, it allows for rapid development and creating support For these two reasons, illiteracy and linguistic diversity, it for new languages by non-experts. The Text-to-Speech Slot proves challenging to create useful and accessible applica- and Filler system has the potential to help existing mobile tions for such a wide range of communities. services overcome the linguistic diversity in Africa. To show Implementing voice-based services is one possible way to this, the system was used to create TTS support for a new bypass the problem of illiteracy. This approach has been language, Twi, spoken in Ghana. With the help of a na- undertaken by researchers working at The Network Insti- tive speaker, a number of suitable recordings was created in tute at Vrije Universiteit in Amsterdam. The Radio March´e the domain of weather forecast and then used to build the system they developed is a great example of how a voice in- system. To test the quality of the TTS conversion, several terface can make the application accessible to local farmers new sentences in the language of Twi were synthesised and [14]. Its main purpose was to provide market information evaluated by a group of native speakers in a feedback form. to users through a radio broadcast service. The market in- Most of the syntheses were deemed easy to understand and formation is input by the farmers, who can advertise their sounded mostly natural. In some cases, a word was mis- produce by making a phone call to the system. The infor- pronounced due to incorrect unit selection, indicating that mation is stored on the Web, after which it is shared on local this part of the system may need to be revised. Finally, community radios in different languages. the system was integrated into an existing voice-based ser- One especially interesting feature of Radio March´eis the vice called InfoM´et´eo, extending it by the language of Twi. TTS (Text-to-Speech) system, which allows for automatic Based on this use case, a general purpose methodology and creation of audio files from text input. It was developed tooling of limited domain TTS support for under-resourced by using the Slot and Filler method [14], which is based on languages is provided. mapping plain text to audio files and representing them as attribute-value pairs. The system has only been deployed once, providing support for the languages of Bambara and Keywords Bomu. The great advantage of the Slot and Filler TTS sys- TTS, Text-to-Speech, Slot and Filler, unit selection, illiter- tem is its suitability for many different languages, includ- acy, voice-based services, language, linguistic diversity, ac- ing those that are under-resourced. New languages can be cessibility, Africa added to the system with the help of a native speaker. It is important to note that this system is designed for a lim- ited number of words, making it suitable for applications 1. INTRODUCTION that focus on a very specific domain (for example: market In Africa, mobile telephony has become widespread in re- information). Radio March´euses this TTS system to auto- cent years. For the majority of the African community, the matically generate the radio readings from the input market mobile phone is first and primary means of communication information, and by doing so does not rely on actual human broadcasters for the various languages. Voice based services are a very promising option, especially There are many other examples of useful voice services since they are already familiar to the people. They have been in Africa that could benefit from the TTS technology. Re- used as solutions in several projects carried out in Ghana and using it in such services could potentially help with solving Mali. This section gives a brief overview of these projects. the challenge of linguistic diversity in building mobile appli- cations for African communities. The Slot and Filler system, 2.2.1 Radio Marché though often overlooked in the mainstream state of the art, Radio March´eis a voice-based service targeted at the is definitely of interest in the field of ICT4D due to its sim- farmers community. It is used to broadcast various pro- plicity to set up and use by non-expert users. It does not duces offered by farmers on the local radio [14]. The market require any background in the field of linguistics, nor expert offerings information is first made available on the Web and programming knowledge. With detailed documentation, it then shared by local community radios, thus making this can be handled by a student with basic programming and data accessible to individuals without Internet access. command line skills. Another advantage of the TTS Slot and The Radio March´eproject was executed in Mali and its Filler system is that it is computationally inexpensive, mak- aim was to improve an already existing Market Information ing it an attractive candidate for Text-to-Speech conversion System (MIS), which was mostly based on sharing infor- in ICT4D. mation via SMS and storing it in an Excel file. For many, This paper will provide a general purpose methodology the use of SMS was not an accessible option for sharing in- and tooling of the Slot and Filler TTS method, including a formation. Radio Marche's new system allowed for manual detailed guide of how the system can be built and used. This processing of the information from both voice and text mes- will be done for the language of Twi, based on an example sages, thus supporting many more users with low literacy use case of meteorological readings. levels. The information collected is used for creating the market communiqu´es in a web interface especially made for 2. RELATED WORK that purpose. The audio communiqu´esare generated au- tomatically by a TTS (Text-to-Speech) system, which was 2.1 Accessibility in African Context created solely for under-resourced languages by using the Accessibility of mobile applications in Africa has a some- Slot and Filler method (see Section 3). The languages used what different meaning than in the rest of the world. W3C in this project were Bambara and Bomu. describes "accessibility" as "making websites and applica- tions more accessible to people with disabilities when they 2.2.2 Marketing for Agricultural Products are using mobile phones and other devices" [5]. This mean- The voice-based system for marketing of agricultural prod- ing of the term is based on a Western perspective. In Africa, ucts was developed in Ghana by Dittoh et al. [13]. It was accessibility is affected by a number of factors taken for designed to be accessed through a phone call from a local granted in W3C's description, including access to electric- number, which would be answered by an Interactive Voice ity, technology and the Internet, and last, but not least, a Response (IVR). Support for integration of new languages certain standard of literacy. was provided. In addition to the mobile voice interface, a If we look at the technology, it is apparent that Africa Web interface was also created. is behind on the Western world. The majority of Africans The work flow of the system is as follows: in order to add use outdated mobile phones, which is an important factor his products for sale to the system, a farmer makes a phone that needs to be taken into consideration when creating ap- call to a local phone number, which is answered by the Voice plications for the African communities. One of the obvious Server. Once the farmer has provided the input, he is asked solutions to this problem is the use of SMS for a broad range for confirmation. As soon as it is given, the system transfers of services, as it is supported by all mobile phones. How- the obtained data to a database, which is in turn used by a ever, SMS is problematic in the sense that it does not cover web interface providing all listings to potential buyers with the desired range of users, because of its heavy dependence Internet access.