"Some Investigations for Segmentation in Speech Synthesis by Concatenation for More Naturalness with Application to Text to Speech (Tts) for Marathi Language"
Total Page:16
File Type:pdf, Size:1020Kb
"SOME INVESTIGATIONS FOR SEGMENTATION IN SPEECH SYNTHESIS BY CONCATENATION FOR MORE NATURALNESS WITH APPLICATION TO TEXT TO SPEECH (TTS) FOR MARATHI LANGUAGE" A THESIS SUBMITTED TO BHARATI VIDYAPEETH UNIVERSITY, PUNE FOR AN AWARD OF THE DEGREE OF DOCTOR OF PHILOSOPHY IN ELECTRONICS ENGINEERING UNDER THE FACULTY OF ENGINEERING AND TECHNOLOGY SUBMITTED BY MRS. SMITA P. KAWACHALE UNDER THE GUIDANCE OF DR. J. S. CHITODE RESEARCH CENTRE BHARATI VIDYAPEETH DEEMED UNIVERSITY COLLEGE OF ENGINEERING, PUNE - 411043 JUNE, 2015 CERTIFICATE This is to certify that the work incorporated in the thesis entitled “Some investigations for segmentation in speech synthesis by concatenation for more naturalness with application to text to speech (TTS) for Marathi language” for the degree of ‗Doctor of Philosophy‘ in the subject of Electronics Engineering under the faculty of Engineering and Technology has been carried out by Mrs. Smita P. Kawachale in the Department of Electronics Engineering at Bharati Vidyapeeth Deemed University, College of Engineering, Pune during the period from August 2010 to October 2014 under the guidance of Dr. J. S. Chitode. Principal College of Engineering, Bharati Vidypaeeth University, Pune Place: Date: I DECLARATION BY THE CANDIDATE I declare that the thesis entitled “Some investigations for segmentation in speech synthesis by concatenation for more naturalness with application to text to speech (TTS) for Marathi language” submitted by me for the degree of ‗Doctor of Philosophy‘ is the record of work carried out by me during the period from August 2010 to October 2014 under the guidance of Dr. J. S. Chitode and has not formed the basis for the award of any degree, diploma, associate ship, fellowship, titles in this or any other university or other institution of higher learning. I further declare that the material obtained from other sources has been duly acknowledged in the thesis. Signature of the candidate (Mrs. Smita P. Kawachale) Place: Date: II CERTIFICATE OF GUIDE This is to certify that the work incorporated in the thesis entitled ―Some investigations for segmentation in speech synthesis by concatenation for more naturalness with application to text to speech (TTS) for Marathi language‖ submitted by Mrs. Smita P. Kawachale for the degree of ‗Doctor of Philosophy‘ in the subject of Electronics Engineering under the faculty of Engineering and Technology has been carried out in the Bharati Vidyapeeth University‘s College of Engineering, Pune during the period from August 2010 to October 2014 under my direct guidance. Dr. J. S. Chitode (Research guide) Place: Date: III ACKNOWLEDGEMENT There are number of people; without whom this thesis might not have been written and to whom I am greatly indebted. In the first place I would like to record my gratitude to honorable Prof Dr. A. R. Bhalerao for his supervision, advice and guidance from the very early stage of this research as well as giving me extraordinary experiences throughout the work. Above all and the most needed, he provided me unflinching encouragement and support in various ways. His truly scientist and engineering intuition has made him as a constant oasis of ideas and passions in science, which exceptionally inspire and enrich my growth as a student, a researcher and a scientist want to be. I am indebted to him more than he knows. I would like to thank, my guide, Professor Dr. J. S. Chitode for providing me the opportunity to work with him. I am so deeply grateful for his help, professionalism and valuable guidance throughout this work and through my entire program of study that I do not have enough words to express my deep and sincere appreciation. To my parents, who have been sources of encouragement and inspiration to me throughout my life. To my brother and brother in law, who have been very helpful and supportive for completion of this thesis. To my father in law and mother in law, both of whom have supported to my work and this thesis and encouraged me to redefine and recreate my ability. To my dear husband, a very special thank you for your practical and emotional support as I added the roles of wife and then mother, to the competing demands of work, study and personal development. To my daughter, special thanks for being patient and helpful for my thesis in my daily work routine. IV I would also like to thank the experts who were involved in the validation and evaluation of this research work: Dr. S. R. Gengage, Dr. G. S. Mani and Dr. Mrs K. S. Jog. Without their participation and input, the validation and evaluation of this work could not have been successfully conducted. Many thanks go in particular to M.I.T. college team, my HOD, Dr. G. N. Mulay, to my all collogues and students. I am much indebted to Professor Allwyn Anuse for his valuable advice, support, guidance and help in carrying out NN based segmentation part of my work. Special thanks to all my students for their great help and support, Pritam, Vrushali, Nilesh, Anand, Rahul, Ankit, Rushikesh, Nishit, Khushboo, Nikhil, Jaydeep, Pratap, Kuldeep, Nitish, Gaurav, Rohit, Saurabh, Chaitnya, Nihar and Subhash. I would also acknowledge, all my team at Bharati Vidyapeeth, Prof. Vaidya Sir, Mrs Raut madam, Prof. Dawande, Prof. Chimate, Prof. Kurkute, Mrs. Sampada Dhole madam, Prof. Prachi Mule, Prof. Mrs Paygude, Prof. Mrs. Vandana Gaikwad for their advice and their help to share their bright thoughts with me, which were very fruitful for shaping up my ideas and research. Special thanks to Mrs. Mangal Patil madam, for her consistent help, support and time she has given to my work for all these last five years. Special thanks to Mr. Salunke of RD cell. Thanks to my entire linguist‘s team, Prof. Smita Bondre, Prof. H. G. Mate, Prof. Birajdaar and Mrs. Swati Kulkarni for language guidance and time they have given for contextual analysis which helped in database optimization and syllabification. I am very grateful to Mr. Suryawanshi, Bharati Bhavan for his timely help and support. Thanks for all the help related to university submissions, formats and procedures. V Finally, I would like to thank everybody who was important to the successful realization of thesis, as well as expressing my apology that I could not mention personally one by one. Mrs. Smita Kawachale VI CONTENTS 1 List of tables XVIII 2 List of figures XIX-XXVIII 3 Abbreviations XXIX 4 Abstract XXX-XXXV No. Title of the chapter & contents Page no. 1. System block diagram 1-3 Introduction to system block diagram 2 2. Objectives of the research 4-9 2.1 Objectives 6 2.2 Sub-objectives of the research work 7 2.3 Organization of the thesis 7 3. Theory of TTS 10-34 3.1 Introduction 11 3.2 Sound elements for speech synthesis 15 3.2.1 Classification of speech 16 3.2.2 Elements of a language 18 3.3 Methods and approaches to speech synthesis 18 3.4 Language study 25 3.4.1 The consonants 26 3.4.2 Vowels 27 3.4.3 Consonant conjuncts 28 3.5 Present scenario of TTS systems 29 3.5.1 DECTalk 29 3.5.2 Bell labs text-to-speech 30 3.5.3 Laureate 30 3.5.4 SoftVoice 31 3.5.5 CNET PSOLA 31 VII No. Title of the chapter & contents Page no. 3.5.6 ORATOR 32 3.5.7 Eurovocs 32 3.5.8 Lernout & Hauspie‘s 33 3.5.9 Apple plain talk 33 3.5.10 Silpa 34 4. Literature review 35-63 4.1 Introduction 36 4.2 Review of the literature 37 Review of context based speech synthesis 4.2.1 37 papers Review of evaluation of speech synthesis 4.2.2 42 with spectral smoothing methods Review of concatenative speech synthesis 4.2.3 48 and segmentation Review of recent papers on performance improvement 4.3 51 of TTS systems. 4.4 Summary of literature review 61 Contextual analysis and classification of syllables 5. 64-88 for Marathi language 5.1 Review of the related literature 65 5.2 Language study 65 5.3 CV structure 67 5.4 Block diagram of contextual analysis 67 5.5 Implementation of contextual analysis system 76 5.5.1 Input text 76 5.5.2 Text encoding 76 5.5.3 CV structure formation 77 Performance evaluation and result discussion 5.5.4 77 of contextual analysis 5.6 Conclusion of contextual analysis 88 Position based syllabification using neural and 6. 89-185 non-neural techniques VIII No. Title of the chapter & contents Page no. 6.1 Factors of voice quality variation for database creation 94 6.2 Neural network for segmentation 95 6.3 Neural network and its types 96 Basic block diagram of neural network for 6.3.1 98 segmentation 6.4 Why automatic segmentation? 103 6.4.1 Syllable as unit 105 6.4.2 Why not dictionary? 108 6.4.3 Energy, extracted feature for NN 109 Why energy is used as parameter for soft- 6.4.4 110 cutting? Basic algorithm of neural network segmentation 6.5 111 system 6.5.1 Purpose of algorithm 111 6.5.2 Methodology used 111 6.5.3 Result/outcome 112 6.5.4 Basic algorithm of segmentation system 112 6.5.5 Flowchart 112 6.6 Energy calculation 113 6.6.1 Actual formula used 114 6.6.2 Algorithm for energy calculation 115 6.6.2.1 Purpose of energy algorithm 115 6.6.2.2 Methodology used 115 6.6.2.3 Results/outcome of the algorithm 115 6.6.3 Flowchart 115 6.7 Post-processing of energy plot 117 6.7.1 Normalization 117 6.7.1.1 Purpose of normalization 118 6.7.1.2 Methodology used 118 6.7.1.3 Results/outcome of the algorithm 118 6.7.2 Normalization algorithm 118 6.7.3 Smoothing energy plot 119 6.8 Block diagram of basic TTS system and segmentation 120 IX No.