The Hieroglyphs

Contents The Hieroglyphs This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 i Contents This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ii (Third Draft) The Hieroglyphs: Building Speech Applications Using CMU Sphinx and Related Resources Arthur Chan Evandro Gouveaˆ Rita Singh Mosur Ravishankar Ronald Rosenfeld Yitao Sun David Huggins-Daines Mike Seltzer March 11, 2007 Contents Copyright c by Arthur Chan, Evandro Gouvea,ˆ Rita Singh, Ravi Mosur, Ronald Rosenfeld, Yitao Sun, David Huggins-Daines and Mike Seltzer This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ii Contents Table of Contents iii List of Figures xiii List of Tables xiv I Before You Start 1 1 License and use of CMU Sphinx, SphinxTrain and CMU-Cambridge LM Toolkit 3 1.1 License Agreement of Sphinx 2 , Sphinx 3 and SphinxTrain SourceCode ............................. 4 1.2 LicenseAgreementofCMUSphinx4 . 5 1.3 License Agreement of the CMU-Cambridge LM Toolkit . 6 1.4 AboutTheHieroglyphs . 6 1.5 Howtocontribute? ......................... 7 1.6 VersionHistoryofSphinx2 . 7 1.6.1 Sphinx2.5ReleaseNotes . 7 1.6.2 Sphinx2.4ReleaseNotes . 8 1.7 VersionHistoryofSphinx3 . 8 1.7.1 A note in current developmentof Sphinx 3.7 . 8 1.7.2 Sphinx3.6.3ReleaseNotes . 8 iii Contents 1.7.3 Sphinx3.6.2ReleaseNotes . 9 1.7.4 Sphinx3.6.1ReleaseNotes . 10 1.7.5 Sphinx3.6ReleaseNotes . 11 1.7.6 Sphinx3.5ReleaseNotes . 15 1.7.7 Sphinx3.4ReleaseNotes . 16 1.8 VersionHistoryofSphinx4 . 17 1.9 VersionHistoryofPocketSphinx . 19 1.9.1 PocketSphinx0.2 . 19 1.9.2 PocketSphinx0.2.1 . 19 1.9.3 PocketSphinx0.2.1 . 19 2 Introduction 21 2.1 WhydoIwanttouseCMUSphinx?. 21 2.2 TheHieroglyphs ........................... 22 2.3 TheSphinxDevelopers . 23 2.4 Which CMUSphinx shouldIuse? ................. 23 2.5 Other Open Source Speech Recognition Project in CMU: Hep- haestus ................................ 26 2.5.1 Festival ............................ 26 2.5.2 CMUCommunicatorToolkit . 27 2.5.3 CMUdict,CMUlex . 27 2.5.4 SpeechDatabases . 27 2.5.5 OpenVXMLandOpenSALTbrowser . 28 2.5.6 Miscellaneous. 28 2.6 HowdoIgetmorehelp? ...................... 28 II Tutorial 29 3 Software Installation 31 3.1 Sphinx3.X.............................. 31 3.2 SphinxTrain ............................. 32 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 iv Contents 3.3 CMULMToolkit ........................... 33 3.4 Othertoolsrelatedtotherecognizer . 34 3.5 FAQoftheinstallationprocess . 34 3.5.1 What ifI am a user of Microsoft Windows? . 34 3.5.2 HowaboutotherWindowscompilers? . 34 3.5.3 HowaboutotherUnixplatforms? . 35 3.5.4 What if I can’t build Sphinx 3 .X or SphinxTrain ? . 35 3.5.5 WhatifIamnotafanofC? . 35 3.5.6 HowdoIgethelpforinstallation? . 35 4 Building speech recognition system 37 4.1 Step1: DesignofaSpeechRecognitionSystem . 38 4.1.1 Doyoureallyneedspeech? . 39 4.1.2 IsYourRequirementFeasible? . 39 4.1.3 SystemConcern ....................... 43 4.1.4 Ourrecommendation . 43 4.1.5 Summary ........................... 43 4.2 Step2:DesigntheVocabulary . 44 4.2.1 TheSystemforThisTutorial . 44 4.2.2 PronunciationsoftheWords . 44 4.3 Step 3: Design the Grammar for a Speech Recognition System 49 4.3.1 CoverageofaGrammar . 49 4.3.2 Mismatch between the Dictionary and the Language Model ............................. 49 4.3.3 IntepretationofARPAN-gramformat . 51 4.3.4 Grammarforthistutorial . 52 4.3.5 Conversion of the language model to DMP file format . 54 4.4 Step 4: Obtain an Open Source Acoustic Model . 55 4.4.1 Howtogetamodel? . 55 4.4.2 Arethereothermodels?. 57 4.4.3 WhyAcousticModelsaresoRare? . 57 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 v Contents 4.4.4 InterpretationofModelFormats . 57 4.4.5 Summary ........................... 62 4.5 Step5: Developaspeechrecognitionsystem . 63 4.5.1 AbouttheSphinx3’sPackage . 63 4.5.2 A batch mode system using decode and decode anytopo: Whatcouldgowrong?. 66 4.5.3 A live mode simulator using live pretend . 71 4.5.4 ALive-modeSystem . 71 4.5.5 CoulditbeFaster?. 73 4.6 Step6:Evaluateyourownsystem . 75 4.6.1 On-lineEvaluation. 75 4.6.2 Off-lineEvaluation. 76 4.7 Interlude ............................... 77 4.8 Step7: CollectionandProcessingofData . 77 4.8.1 Determinetheamountofdataneeded . 78 4.8.2 Record the data in appropiate condition: environmen- talnoise ............................ 78 4.9 Step8:Transcriptionofdata. 80 4.9.1 Word-level and Phoneme-level Transcription . 80 4.9.2 CheckingofTranscription . 80 4.9.3 SoftwareToolsforTranscription . 80 4.9.4 Transcription for Acoustic Modeling and Language Mod- eling .............................. 81 4.9.5 SphinxTrain ’s and align Transcription Formats . 81 4.10Step(7&8)PlanB:BuyingData . 83 4.11Training of Acoustic Model. A note before we proceed . 84 4.11.1SphinxTrain as a Set of Training Tools . 84 4.11.2Trainingfollowarecipe . 85 4.11.3Trainingfromscratch . 86 4.11.4OurRecommendations . 86 4.12Step9andbeyond. 88 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 vi Contents III CMU SphinxSpeech Recognition System 89 5 Sphinx’s Front End 91 5.1 Introduction ............................. 91 5.2 BlockDiagram ............................ 92 5.3 FrontEndProcessingParameters . 93 5.4 DetailofFrontEndprocessing. 93 5.4.1 Pre-emphasis . 93 5.4.2 Framing ............................ 95 5.4.3 Windowing .......................... 95 5.4.4 PowerSpectrum . 95 5.4.5 MelSpectrum......................... 95 5.4.6 MelCepstrum......................... 96 5.5 MelFilterBanks ........................... 97 5.6 TheDefaultFrontEndParameters . 97 5.7 ComputationofDynamicCoefficients. 98 5.7.1 Generic and simpilified notations in SphinxTrain . 100 5.7.2 FormulaeforDynamicCoefficient . 100 5.7.3 Handling of Initial and Final Position of an Utterance . 100 5.8 CautionsinUsingtheFrontEnd . .101 5.9 Compatibility of Front Ends in Different Versions . 102 5.10cepview ................................102 6 General Software Description of SphinxTrain 105 6.1 On-lineHelpofSphinxTrain . .105 6.2 SoftwareArchitectureofSphinxTrain . 107 6.2.1 General Description of the C-based Applications . 107 6.2.2 General Description of the Perl-based Tool . 109 6.3 Basic Acoustic Models Format in SphinxTrain . 110 6.4 Sphinxdataandmodelformats . .110 6.4.1 Sphinx2dataformats . .110 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 vii Contents 6.4.2 Sphinx2Modelformats . .111 6.4.3 Sphinx3modelformats. .116 6.4.4 Sphinx4modelformats. .120 7 Acoustic Model Training 121 8 Language Model Training 123 8.1 InstallationoftheToolkit . .123 8.2 TerminologyandFileFormats . .125 8.3 TypicalUsage.............................126 8.4 DiscountingStrategies . .127 8.4.1 GoodTuringdiscounting . .127 8.4.2 WittenBelldiscounting . .128 8.4.3 Lineardiscounting . .128 9 Search structure and Speed-up of the speech recognizer 129 9.1 Introduction .............................129 9.2 Sphinx 3 .X’s recognizer general architecture . 130 9.3 Importanceoftuning . .131 9.4 Sphinx 3 .X’s decode anytopo(ors3slow) . .131 9.4.1 GMMComputation. .132 9.4.2 Searchstructure . .132 9.4.3 Treatmentoflanguagemodel. 133 9.4.4 Triphonerepresentation. 133 9.4.5 ViteriPruning. .133 9.4.6 SearchTuning . .133 9.4.7 2-ndpassSearch. .134 9.4.8 Debugging...........................135 9.5 Sphinx3.X’sdecode(akas3fast) . .135 9.6 ArchitectureofSearchindecode . 135 9.6.1 Initialization . .136 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 viii Contents 9.6.2 Lexiconrepresentation . .137 9.6.3 Languagemodel . .139 9.6.4 Pruning ............................142 9.6.5 Phonemelook-ahead. .145 9.7 Architecture of GMM Computation in decode . 145 9.7.1 Frame-leveloptimization . 145 9.7.2 GMM-leveloptimization . 146 9.7.3 Gaussian-level optimization : VQ-based Gaussian Se- lection and SVQ-based Gaussian Selection . 146 9.7.4 Gaussian/Component-level optimization : Sub-vector quantization .........................147 9.7.5 Interaction between different level of optimization . .148 9.7.6 Related tools, gs select, gs view,gausubvq . 148 9.7.7 Interaction between GMM computation and search rou- tinesinSphinx3.X . .148 9.8 Overallsearchstructureofdecode . 149 9.9 Multi-passsystemsusingSphinx3.X . 150 9.9.1 WordLattice. .150 9.9.2 astar ..............................152 9.9.3 dag...............................152 9.10OthertoolsinsideS3.Xpackages . 153 9.10.1align ..............................153 9.10.2allphone ............................154 9.11Using the Sphinx 3 decoder with semi-continuous and con- tinuousmodels............................155 10 Speaker Adaptation using Sphinx 3 157 10.1SpeakerAdaptation . .157 10.2Different Principles of Speaker Adaptation . .158 10.2.1In terms of the mode of collecting adaptation data . 158 10.2.2In terms of technique of parameter estimation . 159 10.3MLLRwithSphinxTrain . .160 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ix Contents 10.3.1Usingbwforadaptation. 161 10.3.2mllr solve ...........................162 10.3.3Using mllr transform to do offline mean transformation 162 10.3.4On-line adaptation using Sphinx 3 .0 and Sphinx 3 .X decoder ............................163 10.4MAPwithSphinxTrain . .163 A Command Line Information 167 A.1 Sphinx3Decoders. .167 A.1.1 decode .............................167 A.1.2 livedecode ...........................175

The Hieroglyphs

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support