The Hieroglyphs

Contents The Hieroglyphs This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 i Contents This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ii (Third Draft) The Hieroglyphs: Building Speech Applications Using CMU Sphinx and Related Resources Arthur Chan Evandro Gouveaˆ Rita Singh Mosur Ravishankar Ronald Rosenfeld Yitao Sun David Huggins-Daines Mike Seltzer March 11, 2007 Contents Copyright c by Arthur Chan, Evandro Gouvea,ˆ Rita Singh, Ravi Mosur, Ronald Rosenfeld, Yitao Sun, David Huggins-Daines and Mike Seltzer This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ii Contents Table of Contents iii List of Figures xiii List of Tables xiv I Before You Start 1 1 License and use of CMU Sphinx, SphinxTrain and CMU-Cambridge LM Toolkit 3 1.1 License Agreement of Sphinx 2 , Sphinx 3 and SphinxTrain SourceCode ............................. 4 1.2 LicenseAgreementofCMUSphinx4 . 5 1.3 License Agreement of the CMU-Cambridge LM Toolkit . 6 1.4 AboutTheHieroglyphs . 6 1.5 Howtocontribute? ......................... 7 1.6 VersionHistoryofSphinx2 . 7 1.6.1 Sphinx2.5ReleaseNotes . 7 1.6.2 Sphinx2.4ReleaseNotes . 8 1.7 VersionHistoryofSphinx3 . 8 1.7.1 A note in current developmentof Sphinx 3.7 . 8 1.7.2 Sphinx3.6.3ReleaseNotes . 8 iii Contents 1.7.3 Sphinx3.6.2ReleaseNotes . 9 1.7.4 Sphinx3.6.1ReleaseNotes . 10 1.7.5 Sphinx3.6ReleaseNotes . 11 1.7.6 Sphinx3.5ReleaseNotes . 15 1.7.7 Sphinx3.4ReleaseNotes . 16 1.8 VersionHistoryofSphinx4 . 17 1.9 VersionHistoryofPocketSphinx . 19 1.9.1 PocketSphinx0.2 . 19 1.9.2 PocketSphinx0.2.1 . 19 1.9.3 PocketSphinx0.2.1 . 19 2 Introduction 21 2.1 WhydoIwanttouseCMUSphinx?. 21 2.2 TheHieroglyphs ........................... 22 2.3 TheSphinxDevelopers . 23 2.4 Which CMUSphinx shouldIuse? ................. 23 2.5 Other Open Source Speech Recognition Project in CMU: Hep- haestus ................................ 26 2.5.1 Festival ............................ 26 2.5.2 CMUCommunicatorToolkit . 27 2.5.3 CMUdict,CMUlex . 27 2.5.4 SpeechDatabases . 27 2.5.5 OpenVXMLandOpenSALTbrowser . 28 2.5.6 Miscellaneous. 28 2.6 HowdoIgetmorehelp? ...................... 28 II Tutorial 29 3 Software Installation 31 3.1 Sphinx3.X.............................. 31 3.2 SphinxTrain ............................. 32 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 iv Contents 3.3 CMULMToolkit ........................... 33 3.4 Othertoolsrelatedtotherecognizer . 34 3.5 FAQoftheinstallationprocess . 34 3.5.1 What ifI am a user of Microsoft Windows? . 34 3.5.2 HowaboutotherWindowscompilers? . 34 3.5.3 HowaboutotherUnixplatforms? . 35 3.5.4 What if I can’t build Sphinx 3 .X or SphinxTrain ? . 35 3.5.5 WhatifIamnotafanofC? . 35 3.5.6 HowdoIgethelpforinstallation? . 35 4 Building speech recognition system 37 4.1 Step1: DesignofaSpeechRecognitionSystem . 38 4.1.1 Doyoureallyneedspeech? . 39 4.1.2 IsYourRequirementFeasible? . 39 4.1.3 SystemConcern ....................... 43 4.1.4 Ourrecommendation . 43 4.1.5 Summary ........................... 43 4.2 Step2:DesigntheVocabulary . 44 4.2.1 TheSystemforThisTutorial . 44 4.2.2 PronunciationsoftheWords . 44 4.3 Step 3: Design the Grammar for a Speech Recognition System 49 4.3.1 CoverageofaGrammar . 49 4.3.2 Mismatch between the Dictionary and the Language Model ............................. 49 4.3.3 IntepretationofARPAN-gramformat . 51 4.3.4 Grammarforthistutorial . 52 4.3.5 Conversion of the language model to DMP file format . 54 4.4 Step 4: Obtain an Open Source Acoustic Model . 55 4.4.1 Howtogetamodel? . 55 4.4.2 Arethereothermodels?. 57 4.4.3 WhyAcousticModelsaresoRare? . 57 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 v Contents 4.4.4 InterpretationofModelFormats . 57 4.4.5 Summary ........................... 62 4.5 Step5: Developaspeechrecognitionsystem . 63 4.5.1 AbouttheSphinx3’sPackage . 63 4.5.2 A batch mode system using decode and decode anytopo: Whatcouldgowrong?. 66 4.5.3 A live mode simulator using live pretend . 71 4.5.4 ALive-modeSystem . 71 4.5.5 CoulditbeFaster?. 73 4.6 Step6:Evaluateyourownsystem . 75 4.6.1 On-lineEvaluation. 75 4.6.2 Off-lineEvaluation. 76 4.7 Interlude ............................... 77 4.8 Step7: CollectionandProcessingofData . 77 4.8.1 Determinetheamountofdataneeded . 78 4.8.2 Record the data in appropiate condition: environmen- talnoise ............................ 78 4.9 Step8:Transcriptionofdata. 80 4.9.1 Word-level and Phoneme-level Transcription . 80 4.9.2 CheckingofTranscription . 80 4.9.3 SoftwareToolsforTranscription . 80 4.9.4 Transcription for Acoustic Modeling and Language Mod- eling .............................. 81 4.9.5 SphinxTrain ’s and align Transcription Formats . 81 4.10Step(7&8)PlanB:BuyingData . 83 4.11Training of Acoustic Model. A note before we proceed . 84 4.11.1SphinxTrain as a Set of Training Tools . 84 4.11.2Trainingfollowarecipe . 85 4.11.3Trainingfromscratch . 86 4.11.4OurRecommendations . 86 4.12Step9andbeyond. 88 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 vi Contents III CMU SphinxSpeech Recognition System 89 5 Sphinx’s Front End 91 5.1 Introduction ............................. 91 5.2 BlockDiagram ............................ 92 5.3 FrontEndProcessingParameters . 93 5.4 DetailofFrontEndprocessing. 93 5.4.1 Pre-emphasis . 93 5.4.2 Framing ............................ 95 5.4.3 Windowing .......................... 95 5.4.4 PowerSpectrum . 95 5.4.5 MelSpectrum......................... 95 5.4.6 MelCepstrum......................... 96 5.5 MelFilterBanks ........................... 97 5.6 TheDefaultFrontEndParameters . 97 5.7 ComputationofDynamicCoefficients. 98 5.7.1 Generic and simpilified notations in SphinxTrain . 100 5.7.2 FormulaeforDynamicCoefficient . 100 5.7.3 Handling of Initial and Final Position of an Utterance . 100 5.8 CautionsinUsingtheFrontEnd . .101 5.9 Compatibility of Front Ends in Different Versions . 102 5.10cepview ................................102 6 General Software Description of SphinxTrain 105 6.1 On-lineHelpofSphinxTrain . .105 6.2 SoftwareArchitectureofSphinxTrain . 107 6.2.1 General Description of the C-based Applications . 107 6.2.2 General Description of the Perl-based Tool . 109 6.3 Basic Acoustic Models Format in SphinxTrain . 110 6.4 Sphinxdataandmodelformats . .110 6.4.1 Sphinx2dataformats . .110 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 vii Contents 6.4.2 Sphinx2Modelformats . .111 6.4.3 Sphinx3modelformats. .116 6.4.4 Sphinx4modelformats. .120 7 Acoustic Model Training 121 8 Language Model Training 123 8.1 InstallationoftheToolkit . .123 8.2 TerminologyandFileFormats . .125 8.3 TypicalUsage.............................126 8.4 DiscountingStrategies . .127 8.4.1 GoodTuringdiscounting . .127 8.4.2 WittenBelldiscounting . .128 8.4.3 Lineardiscounting . .128 9 Search structure and Speed-up of the speech recognizer 129 9.1 Introduction .............................129 9.2 Sphinx 3 .X’s recognizer general architecture . 130 9.3 Importanceoftuning . .131 9.4 Sphinx 3 .X’s decode anytopo(ors3slow) . .131 9.4.1 GMMComputation. .132 9.4.2 Searchstructure . .132 9.4.3 Treatmentoflanguagemodel. 133 9.4.4 Triphonerepresentation. 133 9.4.5 ViteriPruning. .133 9.4.6 SearchTuning . .133 9.4.7 2-ndpassSearch. .134 9.4.8 Debugging...........................135 9.5 Sphinx3.X’sdecode(akas3fast) . .135 9.6 ArchitectureofSearchindecode . 135 9.6.1 Initialization . .136 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 viii Contents 9.6.2 Lexiconrepresentation . .137 9.6.3 Languagemodel . .139 9.6.4 Pruning ............................142 9.6.5 Phonemelook-ahead. .145 9.7 Architecture of GMM Computation in decode . 145 9.7.1 Frame-leveloptimization . 145 9.7.2 GMM-leveloptimization . 146 9.7.3 Gaussian-level optimization : VQ-based Gaussian Se- lection and SVQ-based Gaussian Selection . 146 9.7.4 Gaussian/Component-level optimization : Sub-vector quantization .........................147 9.7.5 Interaction between different level of optimization . .148 9.7.6 Related tools, gs select, gs view,gausubvq . 148 9.7.7 Interaction between GMM computation and search rou- tinesinSphinx3.X . .148 9.8 Overallsearchstructureofdecode . 149 9.9 Multi-passsystemsusingSphinx3.X . 150 9.9.1 WordLattice. .150 9.9.2 astar ..............................152 9.9.3 dag...............................152 9.10OthertoolsinsideS3.Xpackages . 153 9.10.1align ..............................153 9.10.2allphone ............................154 9.11Using the Sphinx 3 decoder with semi-continuous and con- tinuousmodels............................155 10 Speaker Adaptation using Sphinx 3 157 10.1SpeakerAdaptation . .157 10.2Different Principles of Speaker Adaptation . .158 10.2.1In terms of the mode of collecting adaptation data . 158 10.2.2In terms of technique of parameter estimation . 159 10.3MLLRwithSphinxTrain . .160 This page was generated with the help of DOC++ http://docpp.sourceforge.net March11,2007 ix Contents 10.3.1Usingbwforadaptation. 161 10.3.2mllr solve ...........................162 10.3.3Using mllr transform to do offline mean transformation 162 10.3.4On-line adaptation using Sphinx 3 .0 and Sphinx 3 .X decoder ............................163 10.4MAPwithSphinxTrain . .163 A Command Line Information 167 A.1 Sphinx3Decoders. .167 A.1.1 decode .............................167 A.1.2 livedecode ...........................175

Load more