Acceleration of Sphinx 3 for Implementation in Embedded Systems

Loughborough University Institutional Repository Acceleration of sphinx 3 for implementation in embedded systems This item was submitted to Loughborough University's Institutional Repository by the/an author. Additional Information: • A Doctoral Thesis. Submitted in partial fulllment of the requirements for the award of Doctor of Philosophy of Loughborough University. Metadata Record: https://dspace.lboro.ac.uk/2134/9237 Publisher: c SUNYI HU Please cite the published version. This item was submitted to Loughborough’s Institutional Repository (https://dspace.lboro.ac.uk/) by the author and is made available under the following Creative Commons Licence conditions. For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/ !""#$#%!&'()*(+*,-.')/*0*+(%* '1-$#1#)&!&'()*')*#12#33#3*,4,&#1,* BY SUNYI HU A DOCTORAL THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF DOCTOR OF PHILOSOPHY OF LOUGHBOROUGH UNIVERSITY DECEMBER 2011 © 2011 SUNYI HU 3#3'"!&#3*&(*14*2#$(5#3* +!&.#%*6'7')8*.9:*1(&.#%*.!'(9*49:*,',&#%*4!)79!)*.9:* ;'+#*$'4!*4')8*!)3*,()*2(+#'*.9<* ABSTRACT This thesis presents a fully pipelined and parameterised parallel hardware implementation of a large vocabulary, user-independent and continuous speech recognition system for use in mobile applications. Algorithm acceleration is achieved by realising in hardware the most time-consuming components of the speech recognition system. By adopting a parallel solution, the necessary calculations can be completed in a sufficiently short elapsed time for embedded target systems. Sphinx 3 is identified as an appropriate speech recognition system for this work and is profiled to determine the most time-consuming parts of the code. As these parts of the code employ calculations based on floating point operations, which are not suitable for the high-performance and low-power execution on embedded systems, these calculations have been converted to scaled integer operations. It is verified using the AN4, RM1 and TIMIT speech databases that the scaled integer version of the speech recognition system can achieve a similar word error rate to the original floating point version, while taking less than 8% of the calculation time used by the original version. The scaled integer version of the speech recognition system is redesigned in VHDL for parallel implementation in electronic hardware. The designs of a calculation module and a data module are described, both of which can be configured according to the number of parallel units and the data module can be configured according to the total numbers of feature vectors and senones used in the speech representation. The hardware designs are synthesised to a range of FPGAs and the results showed that the larger Virtex7 devices are capable of holding several thousands of senones which are sufficient for most recognition tasks. Hardware designs with different numbers of parallel calculation units are simulated at both behavioural level and platform-based level and the resulting implementations are able to operate in real time. The results show that the hardware implementation, even with only one calculation unit, can perform the same calculations almost 80 times faster than does a modern embedded microprocessor, even when operating at only one fifth of the clock frequency. With larger numbers of parallel calculation units, the whole design can operate at even lower clock frequencies, saving power while maintaining a rapid calculation speed. The hardware designs are also implemented on a physical system having both an FPGA and a microprocessor board to demonstrate the operational capabilities of a full system. i ACKNOWLEDGEMENTS First of all, I would like to take this opportunity to thank my supervisors Dr. David Mulvaney and Dr. Sekharjit Datta for their continual guidance and support throughout this research. Without their valuable advice and assistance from our weekly meeting, it would not have been possible for me to achieve all the objectives of this research and complete this thesis. My thanks also go to Dr Vassilios Chouliaras and Dr. Omar Farooq for their help on the earlier stage of this research, and Professor Bryan Woodward for his help on the correction of this thesis. Second, I would like to express my deep gratitude to my father Zijing Hu, my mother Haiou Yu and my sister Yanjuan Hu for their unconditional love, enormous support and encouragement in all aspects of my life. My special thanks go to my wife Liya Ying, for her endless love, great understanding and patience throughout my long study period in UK. I also want to thank all of my family members who took care of my lovely son Bofei Hu when I was away from home. Last but not least, I would like to thank all of my friends and research colleagues who have helped me to complete this research work. Thank you for everything you have done to make my life in Loughborough so memorable. ii TABLE OF CONTENTS Abstract .......................................................................................................... i! Acknowledgements ...................................................................................... ii! Table of Contents ........................................................................................ iii! List of Figures ............................................................................................... x! List of Tables .......................................................................................... xviii! List of Abbreviations .................................................................................. xx! List of Publications ..................................................................................xxiv! 1! Introduction ............................................................................................. 1! 1.1! Motivation ...................................................................................................... 4! 1.2! Aim and objectives ......................................................................................... 5! 1.3! Original contributions .................................................................................... 7! 1.4! Structure of the thesis ..................................................................................... 7! 2! Background of automatic speech recognition ......................................... 9! 2.1! Overview of automatic speech recognition .................................................. 10! 2.1.1! ASR difficulties ................................................................................. 10! 2.1.2! ASR types.......................................................................................... 11! 2.1.3! ASR performance .............................................................................. 13! 2.1.4! ASR applications ............................................................................... 14! 2.2! Framework of automatic speech recognition ............................................... 17! 2.2.1! Front-end feature extraction .............................................................. 18! 2.2.2! Back-end pattern classification ......................................................... 21! 2.3! Available ASR software ............................................................................... 40! 2.3.1! Software selection criteria ................................................................. 40! 2.3.2! Types of ASR software ..................................................................... 41! iii 2.3.3! Free software comparison ................................................................. 42! 2.4! ASR on mobile phones................................................................................. 49! 2.4.1! Network speech recognition .............................................................. 49! 2.4.2! Distributed speech recognition .......................................................... 50! 2.4.3! Embedded speech recognition ........................................................... 51! 2.5! Embedded ASR research .............................................................................. 51! 2.5.1! Whole speech recognition by hardware ............................................ 52! 2.5.2! Back-end search acceleration by hardware ....................................... 54! 2.5.3! GMM scoring acceleration in FPGA ................................................ 55! 2.5.4! GMM scoring acceleration in ASIC ................................................. 57! 2.5.5! GMM scoring acceleration in GPU ................................................... 59! 2.6! Summary ...................................................................................................... 60! 3! Speech recognition system based on Sphinx 3 ..................................... 62! 3.1! Sphinx recogniser selection .......................................................................... 63! 3.1.1! Overview ........................................................................................... 63! 3.1.2! Sphinx 1 ............................................................................................ 64! 3.1.3! Sphinx 2 ............................................................................................ 65! 3.1.4! Sphinx 3 ............................................................................................ 65! 3.1.5! Sphinx 4 ............................................................................................ 66!

Load more