Design of an Enhanced Speech Authentication System Over Mobile Devices
Total Page:16
File Type:pdf, Size:1020Kb
DESIGN OF AN ENHANCED SPEECH AUTHENTICATION SYSTEM OVER MOBILE DEVICES by MOHSHIN UDDIN ANWAR MASTER OF SCIENCE IN INFORMATION AND COMMUNICATIN TECHNOLOGY Institute of Information and Communication Technology BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY 2018 ii iii Dedicated to My Beloved Parents iv Table of Contents Declaration .................................................................... Error! Bookmark not defined. Table of Contents ............................................................................................................ v List of Tables................................................................................................................. vii List of Figures ............................................................................................................. viii List of Abbreviations of Technical Symbols and Terms ............................................... ix Acknowledgment ........................................................................................................ xiii Abstract ........................................................................................................................ xiv CHAPTER 1 INTRODUCTION .................................................................................... 1 1.1 Introduction ...................................................................................................... 1 1.2 Motivation ........................................................................................................ 4 1.3 Objective .......................................................................................................... 5 1.4 Organization of the Thesis ............................................................................... 5 CHAPTER 2 LITERATURE REVIEW ......................................................................... 7 2.1 Speech Recognition and Authentication Concepts .......................................... 7 2.1.1 Speech Recognition ................................................................................... 7 2.1.2 Speaker Recognition ................................................................................. 8 2.1.3 Speech Features Recognition .................................................................. 11 2.1.4 Voice Activity Detection......................................................................... 13 2.1.5 Some Use Cases: ..................................................................................... 23 2.2 Different Speech Recognition Models ........................................................... 23 2.2.1 Gaussian Mixture Models ....................................................................... 23 2.2.2 The HMM and Variants .......................................................................... 26 2.2.3 GMM-HMM coupling ............................................................................ 27 2.2.4 Multilayer Perceptron Neural Networks ................................................. 29 2.2.5 Evolutionary Algorithms with DNN ....................................................... 32 2.2.6 DNN-HMM Hybrid Model ..................................................................... 35 2.3 Challenges ...................................................................................................... 37 2.3.1 Vulnerabilities in Mobile Speaker Verification ...................................... 37 2.3.2 Cost ......................................................................................................... 41 2.4 Terminologies ................................................................................................. 41 2.4.1 FAR and FRR: ........................................................................................ 41 2.4.2 GA ........................................................................................................... 42 2.4.3 GFCC ...................................................................................................... 42 v 2.4.4 LACE ...................................................................................................... 42 2.4.5 LSTM ...................................................................................................... 44 2.4.6 MFCC ...................................................................................................... 44 2.4.7 WER and WRR ....................................................................................... 44 2.5 Past Experiments with Several Databases ...................................................... 44 CHAPTER 3 PROPOSED WORK ............................................................................... 51 3.1 Algorithm ....................................................................................................... 51 3.2 Platform, Compiler and Language ................................................................. 52 3.3 Data Set .......................................................................................................... 53 3.4 Experimental Setup ........................................................................................ 53 3.5 Proposed Solution ........................................................................................... 54 3.6 Details of the Experiment ............................................................................... 55 3.6.1 Steps/ Methods in the Experiment .......................................................... 55 3.6.2 Flowchart ................................................................................................ 58 3.6.3 Pseudocode .............................................................................................. 59 3.6.4 Getting decision on any unknown sample using this model ................... 62 CHAPTER 4 RESULT AND OPTIMIZATIONS ........................................................ 63 4.1 Outputs ........................................................................................................... 63 4.2 Discussions and Comparative Gain from this Experiment............................. 72 4.2.1 Outcome .................................................................................................. 72 4.2.2 Gain Obtained from Present Work .......................................................... 72 4.2.3 Comparative Analysis of the Accuracy................................................... 73 4.3 Discussion ...................................................................................................... 75 CHAPTER 5 CONCLUSION ....................................................................................... 76 5.1 Conclusion ...................................................................................................... 76 5.2 Future Work ................................................................................................... 76 References ..................................................................................................................... 78 Appendix A: Setting of Hyper Parameters.................................................................... 91 A.1 Iteration 1 ........................................................................................................... 91 Appendix B: OUTPUTS ............................................................................................... 92 B.1 Iteration-1 ........................................................................................................... 92 B.2 Iteration-2 ......................................................................................................... 105 B.3 Iteration-3 ......................................................................................................... 108 vi List of Tables Table 2.1: Speaker Recognition Systems with/ without Smartphones ......................... 15 Table 4.1: 1st Iteration Results ...................................................................................... 63 Table 4.2: 1st Iteration 6th Generation is with Dense Model ......................................... 65 Table 4.3: 2nd Iteration Results: 3 generations .............................................................. 66 Table 4.4: 2nd Iteration Results: Generation 1 ............................................................... 66 Table 4.5: 2nd Iteration Results: Generation 2 ............................................................... 67 Table 4.6: 2nd Iteration Results: Generation 3 ............................................................... 68 Table 4.7: 3rd Iteration Results: 4 generations .............................................................. 69 Table 4.8: 3rd Iteration Results: Generation 1 ............................................................... 70 Table 4.9: Confusion Matrix over the Optimum Result ............................................... 71 Table 4.10: Classification Report over the Optimum Result ........................................ 71 Table 4.11: Comparison with similar experiment using GA over DNN....................... 73 Table 4.12: Comparison with similar experiment using any evolution strategies over DNN .............................................................................................................................. 74 vii List of Figures Figure 2.1: (a) Speaker identification, (b) Speaker Verification ..................................... 9 Figure 2.2: DNN-HMM hybrid architecture. ................................................................ 36 Figure 2.3: LACE Architecture ....................................................................................