University of Cape Town

The copyright of this thesis vests in the author. No quotation from it or information derived from it is to be published without full acknowledgementTown of the source. The thesis is to be used for private study or non- commercial research purposes only. Cape Published by the University ofof Cape Town (UCT) in terms of the non-exclusive license granted to UCT by the author. University University of Cape Town Department of Computer Science Thesis submitted in partial fulfilment for the award of Masters Town In Information Technology Cape of Evaluation of the Usability and Usefulness of Automatic Speech Recognition among users in South Africa. Idowu Modupeola Florence [email protected] Supervisor Dr. Audrey Mbogho. University of Cape Town November 18, 2011 DECLARATION I declare that this thesis is not a copy of any other and every supporting paper has been properly referenced. I declare that this work has not been submitted for any degree elsewhere but to the University of Cape Town for the award of Masters Degree in Information Technology in the Department of Computer science. Signature: ……………………………………….. Date: Town Cape of University ii ACKNOWLEDGEMENT I would like to appreciate the relentless effort of Dr. Audrey Mbogho in making this research study a reality. I appreciate her guardianship throughout this period and her time dedication. Your inputs made a great difference. To my husband, and the best daughters in the whole wide world, Tunmi & Tomi. I appreciate your support and love all the way! Town Cape of University iii ABSTRACT An automatic speech recognition (ASR) system is a software application which recognizes human speech, processes it as input, and displays a text version of the speech as output or uses the input as commands for another application’s usage. ASR can either be speaker-dependent or speaker- independent. A speaker-dependent ASR system requires every user to perform training before its usage, while speaker-independent ASR requires no prior training before usage. The technology of ASR is based on identification and comparison of sound patterns; these sound patterns are combinations of the smallest units of sound called phonemes [55]. The phonemes constitute fragments of uttered sounds in speech and their combination gives meaningful sound patterns in languages. There exists a set of phonemes for every language group [51], and associated with each group is the method of pronunciation called the accent. A language group could be identified by the accent in their speech; accent is the set of pronunciation rules of a language group. Accent reflects the cultural divide of a multi cultural society with a common language such as English. Some commercially available ASR systems are designed based on the accents of the following language groups: English, French, German, Italian, Dutch, and Spanish [15]. These language groups are European with none having any similarities with African languages and accents, (except Afrikaans and English, which, though spoken in Africa, originated from Proto-Indo-European languages [20, 51]). Town This study involved the evaluation of commercially available English ASR systems, establishing their usability and usefulness among different language groups in South Africa which use English as a common language. Of particular interest was the effect of African accents on the performance of the ASR systems. ASR technology is widely used and researched in the developed world with reported recognition accuracy of up to 99% [15]. However, EnglishCape spoken with African accents may have adverse effect on the recognition accuracy. of Despite the fact that most existing ASR systems are not designed for English spoken with South Africans’ accents, one can easily purchase them over the shelf in South Africa. The systems used in this study are: 1. Nuance Dragon NaturallySpeaking, Version10.0 (NDNS). 2. WindowsUniversity Speech Recognition, Windows Vista version (WSR). The result of this study indicated that accent has influence on the ASR recognition accuracy. It also indicated that users’ satisfaction was greatly affected by the recognition accuracy obtained. The results also indicated poor performance in environments where speech cannot be loud, for example, in the library. iv TABLE OF CONTENTS DECLARATION .................................................................................................................................... ii ACKNOWLEDGEMENT ..................................................................................................................... iii ABSTRACT ........................................................................................................................................... iv TABLE OF CONTENTS ........................................................................................................................ v List of Tables ....................................................................................................................................... viii List of Tables ....................................................................................................................................... viii List of Figures ........................................................................................................................................ ix ABBREVIATIONS: ............................................................................................................................... 1 CHAPTER 1: INTRODUCTION .......................................................................................................... 2 1.1 Background ....................................................................................................................................... 2 1.1.1 Applications of ASR Systems ......................................................................................... 3 1.1.2 Usability Evaluation Methods ..........................................................................................Town 5 1.2 Problem Statement ..................................................................................................................... 6 1.3 Research Objectives ................................................................................................................... 6 1.4 Research Questions and Hypotheses ..........................................................................................Cape 7 CHAPTER 2: BACKGROUND AND RELATED WORK. .................................................................. 9 2.1 Usage of ASR as a Speech User Interface (SUI).of ......................................................................... 9 2.2 Human Computer Interaction............................................................................................................ 9 2.2.1 Usability ........................................................................................................................ 10 2.2.2 Relevance of usability .................................................................................................. 12 2.2.3.1 Usability Engineering .................................................................................................. 15 2.3.1 ASR PerformanceUniversity and Constraints on Usability. ......................................................................... 19 2.4 The Current State of Research in ASR system Design and Development. ..................................... 23 2.5 Related Work. .......................................................................................................................... 24 2.6 South African Users and ASR systems. ................................................................................... 26 CHAPTER 3: METHODOLOGY ........................................................................................................ 27 3.0 Study Methodology. .................................................................................................................. 27 v 3.1 Participants. .............................................................................................................................. 28 3.1.1 Pilot Test. ...................................................................................................................... 28 3.2 Study Design Approach. ........................................................................................................... 31 3.3 Facilities ................................................................................................................................... 33 3.4 Systems Requirements. ............................................................................................................ 33 3.5 Procedure .................................................................................................................................. 33 3.6 Data Analysis ........................................................................................................................... 35 CHAPTER 4: RESULTS ...................................................................................................................... 36 4.1 Result Analysis ........................................................................................................................ 36 4.1.2 NDNS Data Analysis. .................................................................................................. 38 4.1.3 WSR Data Analysis. ....................................................................................................

University of Cape Town

Review Report

A STUDY on INNOVATIVE TRENDS in MULTIMEDIA LIBRARY USING SPEECH ENABLED SOFTWARES V Jeevitha Periyar University, [email protected]

Speech Recognition Software Review

Desenvolvimento De Um Protótipo De Sistema Web Para Elaboração De Laudos Médicos Utilizando Sistemas De Reconhecimento Automático De Fala

“Voice Recognition System”