Automatic Speech Recognition and Natural Language Processing
Total Page:16
File Type:pdf, Size:1020Kb
Research Project Report: Automatic Speech Recognition and Natural Language Processing Christoph Kuhr (11046220) 18.03.2016 Abstract This report describes the theoretical background, work and the results of a research project at the Cologne University of Applied Sciences coducted from September 2015 to March 2016. The project Automatic Speech Recognition and Natural Language Processing shall line out state-of-the-art research in multiple fields of computer lingustics. To demonstrate a use case, an automated speech recognition (ASR) system with natural language processing (NLP) capabilities will be implemented. The aspects under investigation are both deep learning algorithms used in ASR systems and their acceleration. Current computer linguistic concepts and the customization of the corpora of both, the speech recognition and the language processing, shall be investegated as well. Automatic Speech Recognition and Natural Language Processing Contents 1 Introduction 1 2 Automatic Speech Recognition 4 2.1 Brief History . 4 2.2 Theory . 4 2.2.1 Speech and its Acoustic Features . 5 2.2.2 Hidden-Markov-Model . 6 2.2.3 N-grams and Statistical Language Modeling . 7 2.2.3.1 Decoding . 9 2.2.3.2 Smoothing . 9 2.2.3.3 Quality Measurements . 9 2.2.4 Knowledge Base . 10 2.2.4.1 Phonetic Dictionary . 10 2.2.4.2 Acoustic Model . 10 2.2.4.3 Language Model . 10 2.3 Recent Research . 11 2.3.1 Recurrent Neural Network Language Model . 11 2.3.2 Continuous Space Language Model . 12 2.3.3 Case Studies . 13 2.3.3.1 Google Voice Search in Mobile Applications . 13 2.3.3.2 Emotion-Detection Voice Applications in Automated Call Cen- ters . 13 2.3.3.3 You're as Sick as you Sound . 15 3 Natural Language Processing 16 3.1 Brief History . 16 3.2 Theory . 16 3.2.1 Parsing . 17 3.2.2 Chunking . 17 3.2.3 Part-of-Speech (POS) Tagging . 17 3.2.4 Stemming . 18 3.2.4.1 Suffix-stripping . 18 3.2.4.2 Lemmatisation . 18 3.2.4.3 Stochastic Algorithms . 19 3.2.4.4 Matching Algorithms . 19 3.3 Recent Research . 19 3.3.1 POS Tagging and Chunking . 19 3.3.2 Native Language Acquisition . 20 3.3.3 Simple Synchrony Networks (SSN) . 20 4 Test System, Training and Experiments 22 4.1 Heterogeniuous System Architecture (HSA) . 22 4.2 CMU-Sphinx . 23 4.2.1 Sphinx-4 . 23 4.2.2 Sphinxbase . 25 i Automatic Speech Recognition and Natural Language Processing 4.2.3 Pocketsphinx . 25 4.2.4 Sphinxtrain . 26 4.3 Knowledge Base . 26 4.3.1 Voxforge Dictionary, Acoustic and Language Models . 26 4.3.2 German Corpus for Building Language Models . 26 4.3.2.1 Wikipedia Dump . 26 4.3.2.2 Gutenberg Online Archive . 27 4.3.2.3 Zeit Online Archive . 27 4.3.3 Language Model Training and Toolkits . 27 4.3.3.1 SRILM Toolkit . 27 4.3.3.2 RNNLM Toolkit . 28 4.3.3.3 CSLM Toolkit . 29 4.3.3.4 CMUCLM Toolkit . 29 4.4 NLP with Parsers, Treebanks and Taggers . 30 4.4.1 Pattern . 30 4.4.2 Stanford Parser . 30 4.4.3 NLTK Unigram POS Tagger . 30 4.4.4 GermaNet . 30 4.5 psApp.py - A Python ASR and NLP Tool . 31 4.5.1 User Interface . 31 4.5.2 Recognition Process . 31 4.5.2.1 Activation by Keyword Search . 31 4.5.2.2 Continuous Speech Recognition . 32 4.5.2.3 Processing of Recognized Sentences . 35 4.6 Experiments . 35 4.6.1 ASR . 35 4.6.1.1 Test Environments and Configurations . 36 4.6.1.2 Test Sets . 37 4.6.1.3 Test Results . 39 4.6.2 NLP . 40 4.6.2.1 Imperative . 40 4.6.2.2 Question - Yes-No . 41 4.6.2.3 Question - Probe . 41 4.6.2.4 Identify Numerics . 42 5 Conclusions 43 5.1 Automatic Speech Recognition . 43 5.2 Natural Language Processing . 44 5.3 Future Work . 44 Appendices 46 A Wikipedia Dump Cleanup Python Script 46 B Gutenberg Online Cleanup Python Script 48 C Building RNNLM Toolkit from Source 49 ii Automatic Speech Recognition and Natural Language Processing D RNNLM Toolkit Training Output 50 E librnnlm.cpp with clBLAS SGEMV Implementation 50 F Building CMULM Toolkit from Source 52 G CMULM Toolkit Training Output 52 H Building SRILM Toolkit from Source 53 I SRILM Toolkit Training Output 54 J Building Pocketsphinx from Source 55 J.1 Dependencies . 55 J.2 Sphinxbase . 55 J.3 Pocketsphinx . 55 K Recognition Process C++ Implementation 55 L Numeric Reduction Python Parser 57 M CKY Recognition Algorithm 58 N Probabilistic CKY Recognition Algorithm 59 O Earley Recognition Algorithm 59 iii Automatic Speech Recognition and Natural Language Processing List of Figures 2.1 Spectrogram of the Word Sequence \Hello World" . 5 2.2 Hidden Markov Model Chain of an Utterance . 7 2.3 N-gram of the Sentence Ist die Sonne gr¨un? .................. 8 2.4 Architecture of a Recurrent Neural Network . 11 2.5 Architecture of a CSLM Neural Network . 12 2.6 Block Diagramm of Google Search by Voice . 13 2.7 Anger Detection System . 14 2.8 HMM for Anger Turn Prediction . 15 3.9 POS Tagging Decision Tree . 18 3.10 Decision Tree with Classification Error Count . 18 3.11 Simple Synchrony Networks unfolded over a derivation sequence . 21 4.12 Non-Uniform Memory Access . 22 4.13 Heterogenous Unified Memory Access . 23 4.14 Sphinx-4 Decoder Architechture . 24 4.15 Python Application psApp.py UI . 33 List of Tables 2.1 HF Optimizer Algorithm . 12 4.2 \Heimdall" IPA and Arpabet Notation . 32 4.3 Pocketsphinx Keyword Search Parameters . 32 4.4 Pocketsphinx Voxforge Parameters . 33 4.5 Test Run Configuration for the Voxforge Language Model . 36 4.6 Test Run Configuration for the merged Gutenberg and Voxforge Language Model 36 4.7 Recognition Test Set Voxforge . 37 4.8 Recognition Test Set Voxforge and Gutenberg-Online . 38 4.9 Recognition Test Results showing the Number of Mis Recognized Words of Voxforge . 39 4.10 Recognition Test Results showing the Number of Mis Recognized Words of Voxforge and Gutenberg-Online . 39 4.11 NLP Imperative: Chunks, POS Tags, Relations . 40 4.12 NLP Question - Yes-No: Chunks, POS Tags, Relations . 41 4.13 NLP Question - Probe: Chunks, POS Tags, Relations . 41 4.14 NLP Numerical Expression Reduction . 42 M.15CKY Recognition Algorithm . 58 N.16 Probabilistic CKY Recognition Algorithm . 59 O.17 Earley Recognition Algorithm . 59 iv Automatic Speech Recognition.