Speech Pattern Recognition for Speech to Text Conversion”, Thesis Phd, Saurashtra University
Total Page:16
File Type:pdf, Size:1020Kb
Saurashtra University Re – Accredited Grade ‘B’ by NAAC (CGPA 2.93) Kumbharana, Chandresh K., 2007, “Speech Pattern Recognition for Speech to Text Conversion”, thesis PhD, Saurashtra University http://etheses.saurashtrauniversity.edu/id/eprint/337 Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. Saurashtra University Theses Service http://etheses.saurashtrauniversity.edu [email protected] © The Author Speech Pattern Recognition for Speech To Text Conversion A thesis to be submitted to Saurashtra University in fulfillment of the requirement for the degree of Doctor of Philosophy (Computer Science) By CK Kumbharana Guided by Dr. N. N. Jani Prof. & Head Department of Computer Science Saurashtra University Rajkot Department of Computer Science Saurashtra University Rajkot November - 2007 Certificate by Guide This is to certify that the contents of the thesis entitled “Speech Pattern Recognition for Speech to Text Conversion” is the original research work of Mr. Kumbharana Chandresh Karsanbhai carried out under my supervision and guidance. I further certified that the work has not been submitted either partially or fully to any other university/Institute for the award of any degree. Place: Rajkot Dr. N.N. Jani Date: 30-11-2007 Prof. & Head Department of Computer Science Saurashtra University, Rajkot I Candidate’s Statement I hereby declare that the work incorporated in the present thesis is original and has not been submitted to any University/Institution for the award of the Degree. I further declare that the result presented in the thesis, considerations made there-in, contribute in general the advancement of knowledge in education and particular to – “Speech Pattern Recognition For Speech to Text Conversion” Date: 30-11-2007 (CK Kumbharana) Candidate II Acknowledgement I extend my sincere thanks to Dr. NN Jani, Professor and Head, Department of Computer Science, who has permitted to work under his guidance. I take this opportunity to convey my deepest gratitude to him for his valuable advice, comfort encouragement, keen interest, constructive criticism, scholarly guidance & wholehearted support. I admire the assistance of my friend Prof. Suneelchandra Ramanth Dwivedi, Associate Professor, Department of Computer Science, Bhavnagar University, Bhavnagar, for his help and enthusiastic support during the work. I express my special thanks to Dr. Anil D. Ambasana, The Associate Professor, Department of Education, Saurashtra University, Rajkot for his constant encouragement, elderly advice and for providing the required moral support during the work. His mentorship is a rewarding experience, which I will treasure my whole life. I am highly obliged by the five persons Dr Anil Ambasana, Mr. Prashant Ambasana, Heena Kumbharana, Mansi Kumbharana & Kirti Makadia. Who has given their valuable time and voice for recording of the Gujarati Characters, which is finally used as the master database for my researcher work. I have no words to express my sincere thanks to all the teaching and non-teaching staff of my department i.e. Department of Computer Science, Saurashtra University, Rajkot. I can also not forget all the current as well as passed students of MCA who are always eagerly seeking and praying to “GOD” for the completion of my work as per my ambition and desire. I am afraid the list of names will be very long one. I am grateful to their encouragement and convictions and again thanks them all with due regards. III I must express my indebtedness to my father Karasanbhai and mother Dhirajben, wife Heena & my lovely loving daughter Ku. Mansi. Without their kind and silent support this work could not have been completed. Above all, I offered my heartiest regard to “ GOD ” for giving strength to work inspiration. -CK Kumbharana IV List of Figures Figure Page Name of Figure No. No. 2.1 Simple speech production model..………………………………… 23 2.2 Schematic view of human speech production mechanism..……….. 24 2.3 Mechanism for creating speech……………………………………. 25 2.4 Speech coding and decoding process……………………………… 27 2.5 Speech analysis synthesis system………………………………….. 27 2.6 Synthesis models applied in speech coding ……………………… 27 2.7 Block diagram of text-to-speech synthesis system………………... 29 2.8 Speech Enhancement by spectral magnitude estimation…………... 31 2.9 Basic structure of speaker recognition system…….………………. 33 2.10 Wave………………………………………………………………. 40 2.11 Wave..……………………………………………………………… 40 2.12 Sine waves of various frequencies………………………………… 41 2.13 Wavelength………………………………………………………… 44 Two tones of different amplitude (frequency and phase is 2.14 47 constant)…………………………………………………………… The male scale, showing the relationship between pitch (in mels) 2.15 48 and frequency in Hertz…………………………………………….. 2.16 Zero crossing………………………………………………………. 52 2.17 The basic/enhanced SSP pipeline………………………………….. 57 3.1 Speech Processing Mechanism……………………………………. 94 3.2 Speech recognition system………………………………………… 95 3.3 Model for speech file preparation…………………………………. 96 3.4 Speech ditaction process…………………………………………... 96 3.5 Speech recognition system setup…………………………………... 97 4.1 Layout of S in Glyph………………………………………………. 123 4.2 Layout of F in Glyph……………………………………………….. 124 4.3 Layout of l in Glyph……………………………………………… 124 4.4 Layout of L in Glyph………………………………………………. 124 4.5 Layout of ] in Glyph……………………………………………... 124 4.6 Layout of } in Glyph……………………………………………... 125 4.7 Layout of [ in Glyph………………………………………………. 125 4.8 Layout of { in Glyph………………………………………………. 125 4.9 Layout of M in Glyph……………………………………………… 125 4.10 Layout of Á in Glyph……………………………………………… 126 4.11 Layout of \ in Glyph………………………………………………. 126 V Figure Page Name of Figure No. No. 4.12 Layout of o in Glyph……………………………………………… 126 4.13 Main screen for editor ckksedit……………………………………. 130 4.14 Title bar……………………………………………………………. 130 4.15 Menu Bar…………………………………………………………... 131 4.16 File menu…………………………………………………………... 131 4.17 Edit menu………………………………………………………….. 133 4.18 Format menu………………………………………………………. 134 4.19 Toolbar – 1………………………………………………………… 135 4.20 Toolbar – 2………………………………………………………… 136 4.21 Editing region……………………………………………………… 137 4.22 Status bar…………………………………………………………... 138 4.23 Accelerator (IDR_MENU1)……………………………………….. 155 4.24 Dialog IDD_CKKPLAY…………………………………………... 155 4.25 Icon (IDR_MENU1)………………………………………………. 156 4.26 Cepstrum analysis steps for pitch detection……………………….. 168 Conceptual diagram illustrating vector quantization codebook 4.27 169 formation.………………………………………………………….. 4.28 Block diagram of the MFCC processor……………………………. 173 4.29 Original & extracted wave signal of S for male speaker…………... 174 4.30 Mel-frequency cepstral coefficient of S for male speaker………… 175 4.31 Detailed fft magnitude of S for male speaker……………………… 175 4.32 log 10 mel-scale filter bank output of S for male speaker…………. 176 4.33 Original & extracted wave signal of S for female speaker………… 177 4.34 Mel-frequency cepstral coefficient of S for female speaker………. 178 4.35 Detailed fft magnitude of S for female speaker…………………… 178 4.36 log 10 mel-scale filter bank output of S for female speaker………. 179 4.37 Original & extracted wave signal of R for male speaker………….. 180 4.38 Mel-frequency cepstral coefficient of R for male speaker………… 181 4.39 Detailed fft magnitude of R for male speaker……………………... 181 4.40 log 10 mel-scale filter bank output of R for male speaker………… 182 4.41 Original & extracted wave signal of R for female speaker………... 183 4.42 Mel-frequency cepstral coefficient of R for female speaker………. 184 4.43 Detailed fft magnitude of R for female speaker…………………… 184 4.44 log 10 mel-scale filter bank output of R for female speaker………. 185 4.45 Original & extracted wave signal of Z for male speaker…………... 186 4.46 Mel-frequency cepstral coefficient of Z for male speaker…..……... 187 VI Figure Page Name of Figure No. No. 4.47 Detailed fft magnitude of Z for male speaker……………………… 187 4.48 log 10 mel-scale filter bank output of Z for male speaker…………. 188 4.49 Original & extracted wave signal of Z for female speaker………… 189 4.50 Mel-frequency cepstral coefficient of Z for female speaker……….. 190 4.51 Detailed fft magnitude of Z for female speaker……………………. 190 4.52 log 10 mel-scale filter bank output of Z for female speaker………. 191 4.53 Original & extracted wave signal of T for male speaker………….. 192 4.54 Mel-frequency cepstral coefficient of T for male speaker………… 193 4.55 Detailed fft magnitude of T for male speaker……………………... 193 4.56 log 10 mel-scale filter bank output of T for male speaker………… 194 4.57 Original & extracted wave signal of T for female speaker………... 195 4.58 Mel-frequency cepstral coefficient of T for female speaker………. 196 4.59 Detailed fft magnitude of T for female speaker…………………… 196 4.60 log 10 mel-scale filter bank output of “T for female speaker……… 197 4.61 Original & extracted wave signal of D for male speaker………….. 198 4.62 Mel-frequency cepstral coefficient of D for male speaker………… 199 4.63 Detailed fft magnitude of D for male speaker……………………... 199 4.64 log 10 mel-scale filter bank output of D for male speaker………… 200 4.65 Original & extracted wave signal of D for female speaker………... 201 4.66 Mel-frequency cepstral coefficient of D for female speaker……… 202 4.67 Detailed fft magnitude of D for female speaker…………………… 202 4.68 log 10 mel-scale filter bank output of D for female speaker………. 203 VII List of Tables Table Page Name of Table No. No. 1.1 Comparative char. for dragon naturally speaking professional and 12 IBM via voice………………………………………………………... 4.1 Font measurement…………………………………………………… 122 4.2 Character measurement……………………………………………… 123 4.3 File menu popup options…………………………………………….