Design of an Enhanced Speech Authentication System Over Mobile Devices

DESIGN OF AN ENHANCED SPEECH AUTHENTICATION SYSTEM OVER MOBILE DEVICES by MOHSHIN UDDIN ANWAR MASTER OF SCIENCE IN INFORMATION AND COMMUNICATIN TECHNOLOGY Institute of Information and Communication Technology BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY 2018 ii iii Dedicated to My Beloved Parents iv Table of Contents Declaration .................................................................... Error! Bookmark not defined. Table of Contents ............................................................................................................ v List of Tables................................................................................................................. vii List of Figures ............................................................................................................. viii List of Abbreviations of Technical Symbols and Terms ............................................... ix Acknowledgment ........................................................................................................ xiii Abstract ........................................................................................................................ xiv CHAPTER 1 INTRODUCTION .................................................................................... 1 1.1 Introduction ...................................................................................................... 1 1.2 Motivation ........................................................................................................ 4 1.3 Objective .......................................................................................................... 5 1.4 Organization of the Thesis ............................................................................... 5 CHAPTER 2 LITERATURE REVIEW ......................................................................... 7 2.1 Speech Recognition and Authentication Concepts .......................................... 7 2.1.1 Speech Recognition ................................................................................... 7 2.1.2 Speaker Recognition ................................................................................. 8 2.1.3 Speech Features Recognition .................................................................. 11 2.1.4 Voice Activity Detection......................................................................... 13 2.1.5 Some Use Cases: ..................................................................................... 23 2.2 Different Speech Recognition Models ........................................................... 23 2.2.1 Gaussian Mixture Models ....................................................................... 23 2.2.2 The HMM and Variants .......................................................................... 26 2.2.3 GMM-HMM coupling ............................................................................ 27 2.2.4 Multilayer Perceptron Neural Networks ................................................. 29 2.2.5 Evolutionary Algorithms with DNN ....................................................... 32 2.2.6 DNN-HMM Hybrid Model ..................................................................... 35 2.3 Challenges ...................................................................................................... 37 2.3.1 Vulnerabilities in Mobile Speaker Verification ...................................... 37 2.3.2 Cost ......................................................................................................... 41 2.4 Terminologies ................................................................................................. 41 2.4.1 FAR and FRR: ........................................................................................ 41 2.4.2 GA ........................................................................................................... 42 2.4.3 GFCC ...................................................................................................... 42 v 2.4.4 LACE ...................................................................................................... 42 2.4.5 LSTM ...................................................................................................... 44 2.4.6 MFCC ...................................................................................................... 44 2.4.7 WER and WRR ....................................................................................... 44 2.5 Past Experiments with Several Databases ...................................................... 44 CHAPTER 3 PROPOSED WORK ............................................................................... 51 3.1 Algorithm ....................................................................................................... 51 3.2 Platform, Compiler and Language ................................................................. 52 3.3 Data Set .......................................................................................................... 53 3.4 Experimental Setup ........................................................................................ 53 3.5 Proposed Solution ........................................................................................... 54 3.6 Details of the Experiment ............................................................................... 55 3.6.1 Steps/ Methods in the Experiment .......................................................... 55 3.6.2 Flowchart ................................................................................................ 58 3.6.3 Pseudocode .............................................................................................. 59 3.6.4 Getting decision on any unknown sample using this model ................... 62 CHAPTER 4 RESULT AND OPTIMIZATIONS ........................................................ 63 4.1 Outputs ........................................................................................................... 63 4.2 Discussions and Comparative Gain from this Experiment............................. 72 4.2.1 Outcome .................................................................................................. 72 4.2.2 Gain Obtained from Present Work .......................................................... 72 4.2.3 Comparative Analysis of the Accuracy................................................... 73 4.3 Discussion ...................................................................................................... 75 CHAPTER 5 CONCLUSION ....................................................................................... 76 5.1 Conclusion ...................................................................................................... 76 5.2 Future Work ................................................................................................... 76 References ..................................................................................................................... 78 Appendix A: Setting of Hyper Parameters.................................................................... 91 A.1 Iteration 1 ........................................................................................................... 91 Appendix B: OUTPUTS ............................................................................................... 92 B.1 Iteration-1 ........................................................................................................... 92 B.2 Iteration-2 ......................................................................................................... 105 B.3 Iteration-3 ......................................................................................................... 108 vi List of Tables Table 2.1: Speaker Recognition Systems with/ without Smartphones ......................... 15 Table 4.1: 1st Iteration Results ...................................................................................... 63 Table 4.2: 1st Iteration 6th Generation is with Dense Model ......................................... 65 Table 4.3: 2nd Iteration Results: 3 generations .............................................................. 66 Table 4.4: 2nd Iteration Results: Generation 1 ............................................................... 66 Table 4.5: 2nd Iteration Results: Generation 2 ............................................................... 67 Table 4.6: 2nd Iteration Results: Generation 3 ............................................................... 68 Table 4.7: 3rd Iteration Results: 4 generations .............................................................. 69 Table 4.8: 3rd Iteration Results: Generation 1 ............................................................... 70 Table 4.9: Confusion Matrix over the Optimum Result ............................................... 71 Table 4.10: Classification Report over the Optimum Result ........................................ 71 Table 4.11: Comparison with similar experiment using GA over DNN....................... 73 Table 4.12: Comparison with similar experiment using any evolution strategies over DNN .............................................................................................................................. 74 vii List of Figures Figure 2.1: (a) Speaker identification, (b) Speaker Verification ..................................... 9 Figure 2.2: DNN-HMM hybrid architecture. ................................................................ 36 Figure 2.3: LACE Architecture ....................................................................................

Design of an Enhanced Speech Authentication System Over Mobile Devices

Talk Bank: a Multimodal Database of Communicative Interaction

Multilingvální Systémy Rozpoznávání Řeči a Jejich Efektivní Učení

Here ACL Makes a Return to Asia!

Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)

The Workshop Programme

The 2Nd Workshop on Arabic Corpora and Processing Tools 2016 Theme: Social Media Workshop Programme

Exploring Phone Recognition in Pre-Verbal and Dysarthric Speech

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Minds.Data.Res-REV.Ps

End-To-End Speech Recognition Using Connectionist Temporal Classification

Pdf Field Stands on a Separate Tier

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language