Hardware for Speech and Audio Coding

Linköping Studies in Science and Technology Thesis No. 1093 Hardware for Speech and Audio Coding Mikael Olausson LiU-TEK-LIC-2004:22 Department of Electrical Engineering Linköpings universitet, SE-581 83 Linköping, Sweden Linköping 2004 Linköping Studies in Science and Technology Thesis No. 1093 Hardware for Speech and Audio Coding Mikael Olausson LiU-TEK-LIC-2004:22 Department of Electrical Engineering Linköpings universitet, SE-581 83 Linköping, Sweden Linköping 2004 ISBN 91-7373-953-7 ISSN 0280-7971 ii Abstract While the Micro Processors (MPUs) as a general purpose CPU are converging (into Intel Pentium), the DSP processors are diverging. In 1995, approximately 50% of the DSP processors on the market were general purpose processors, but last year only 15% were general purpose DSP processors on the market. The reason general purpose DSP processors fall short to the application specific DSP processors is that most users want to achieve highest performance under minimized power consumption and minimized silicon costs. Therefore, a DSP processor must be an Application Specific Instruction set Processor (ASIP) for a group of domain specific applications. An essential feature of the ASIP is its functional acceleration on instruction level, which gives the specific instruction set architecture for a group of applications. Hardware acceleration for digital signal processing in DSP processors is essential to enhance the performance while keeping enough flexibility. In the last 20 years, researchers and DSP semiconductor companies have been working on different kinds of accelerations for digital signal processing. The trade-off between the performance and the flexibility is always an interesting question because all DSP algorithms are "application specific"; the acceleration for audio may not be suitable for the acceleration of baseband signal processing. Even within the same domain, for example speech CODEC (COder/DECoder), the acceleration for communication infrastructure is different from the acceleration for terminals. Benchmarks are good parameters when evaluating a processor or a computing platform, but for domain specific algorithms, such as audio and speech CODEC, they are not enough. The solution here is to profile the algorithm and from the resulting statistics make the decisions. The statistics also suggest where to start optimizing the implementation of the algorithm. The statistics from the profiling has been used to improve implementations of speech and audio coding algorithms, both in terms of the cycle cost and for memory efficiency, i.e. code and data memory. In this thesis, we focus on designing memory efficient DSP processors based on instruction level acceleration methods and data type optimization techniques. Four major areas have been attacked in order to speed up execution and reduce iii iv memory requirements. The first one is instruction level acceleration, where con- secutive instructions appear frequently and are merged together. By this merge the code memory size is reduced and execution becomes faster. Secondly, complex addressing schemes are solved by acceleration for address calculations, i.e. dedi- cated hardware are used for address calculations. The third area, data storage and precision, is speeded up by using a reduced floating point scheme. The number of bits is reduced compared to the normal IEEE 754 floating point standard. The result is a lower data memory requirement, yet enough precision for the application; an mp3 decoder. The fourth contribution is a compact way of storing data in a general CPU. By adding two custom instructions, one load and one store, the data memory efficiency can be improved without making the firmware complex. We have tried to make application specific instruction sets and processors and also tried to improve processors based on an available instruction set. Experiences from this thesis can be used for DSP design for audio and speech applications. They can additionally be used as a reference to a general DSP processor design methodology. Preface This thesis presents my research from October 2000 through January 2004. The following four papers are included in this thesis: ¯ Mikael Olausson and Dake Liu, “Instruction and Hardware Accelerations in G.723.1(6.3/5.3) and G.729”, in Proceedings of the 1st IEEE Interna- tional Symposium on Signal Processing and Information Technology (IS- SPIT), Cairo, Egypt, December, 2001, pp 34-39 ¯ Mikael Olausson and Dake Liu, “Instruction and Hardware Acceleration for MP-MLQ in G.723.1”, in Proceeding of IEEE Workshop on Signal Process- ing Systems (SIPS’02), San Diego, California, USA, October 16-18, 2002, pp 235-239 ¯ Mikael Olausson, Andreas Ehliar, Johan Eilert and Dake Liu, “Reduced Floating Point for MPEG1/2 Layer III Decoding”, to be presented at the In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP’04), Montreal, Quebec, Canada, May 17-21, 2004 ¯ Mikael Olausson, Anders Edman and Dake Liu, “Bit Memory Instructions for a General CPU”, to be presented at to The 4th IEEE International Work- shop on System-on-Chip for Real-Time Applications(IWSOC’04) Banff, Al- berta, Canada, July 19-21, 2004 The following two papers related to my research are not included in the thesis: ¯ Eric Tell, Mikael Olausson and Dake Liu, “A General DSP Processor at the Cost of 23k Gates and 1/2 a Man-Year Design Time”, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’03), Hong Kong, April 6-10, 2003, pp 657-660 Volume II ¯ Mikael Olausson and Dake Liu, “The ADSP-21535 Blackfin and Speech Coding”, in Proceedings of the Swedish System-on-Chip Conference (SSoCC), Eskilstuna, Sweden, April 2003 v vi Acknowledgment First of all, I want thank my supervisor Professor Dake Liu for guidance, inspira- tion, and for giving me the opportunity to become a part time Ph.D. student. I would also like to thank Professor Christer Svensson for making it possible to start my research. I want to thank Dr. Anders Edman and my fellow Ph.D. students Andreas Ehliar, Johan Eilert, and Eric Tell for co-authoring papers with me. Further, I want to thank my fellow Ph.D. students Dr. Tomas Henriksson and Dr. Ulf Nordqvist, fellow Ph.D. students Sumant Sathe, Eric Tell, and Daniel Wiklund, and Research Engineer Anders Nilsson for all the sport and outdoor ac- tivities such as: bicycle trips, orienteering, ski trips, and bandy both on and off ice. Thank you to all the members of the divisions of Electronic Devices and Com- puter Engineering at Linköping University who have contributed to a nice working environment. Thanks to Sectra Communications AB for giving me the chance to still work for them as a part time employee while doing my research at the university. A special thanks to my girlfriend Anna Larsson who has read and corrected my English within the papers and supported me in every way. The thesis work was sponsored by the STRINGENT of Swedish Foundation of Strategic Research (SSF) and the Center for Industrial Information Technology at the Linköping Institute of Technology (CENIIT). vii viii Contents Abstract iii Preface v Acknowledgment vii Abbreviations xiii I Introduction 1 1 Introduction 3 1.1 Background . ......................... 3 2 Benchmarking 5 2.1CycleCost.............................. 6 2.1.1 ApplicationProfiling.................... 6 2.2MemoryCost............................ 8 2.2.1 CodeCost.......................... 8 2.2.2 DataCost.......................... 9 2.3References.............................. 10 II Coding Strategies 11 3 Speech Coding 13 3.1 Introduction . ......................... 14 3.2 Speech Coding Techniques . .................. 14 3.2.1 Vocoding.......................... 15 3.2.2 MultiPulseExcitation................... 16 3.2.3 Multiband Excitation . .................. 18 3.3ComplexityAspects......................... 19 ix x CONTENTS 3.3.1 Codingdelay........................ 20 3.4 Hardware Acceleration Opportunities . ............ 20 3.5References.............................. 21 4 Audio Coding 23 4.1 Introduction . .......................... 23 4.2DescriptionofPerceptualCoding.................. 23 4.3CodingStandard.......................... 25 4.4 Hardware Acceleration Opportunities . ............ 26 4.5References.............................. 27 III Implementation 29 5 Hardware 31 5.1 Difference between Speech and Audio CODEC . ........ 31 5.2References.............................. 32 6 Architectures 33 6.1 Architectures for Speech and Audio Coding ............ 33 6.2Programmable............................ 33 6.2.1 GeneralandEmbeddedDSP................ 34 6.3FPGA................................ 34 6.4ASIC................................ 34 6.5References.............................. 35 7 Research Methodology and Achievements 37 7.1ProfiletheAlgorithm........................ 37 7.1.1 StimulitotheAlgorithm.................. 38 7.2 Acceleration on Instruction Level . ............ 38 7.3 Acceleration for Address Calculation . ............ 40 7.4 Function Level Acceleration . ................... 41 7.5DataFormatandPrecision..................... 41 7.6CompactDataStoring........................ 44 7.7 Advantages and Disadvantages of Hardware Acceleration . 47 7.8References.............................. 48 IV Papers 49 8Paper1 51 8.1 Introduction . .......................... 52 CONTENTS xi 8.2 General description of G.723.1 and G.729 . ........... 53 8.3Statisticsofbasicoperations.................... 53 8.3.1 Descriptionoftheoperands................ 53 8.3.2 Investigationofthestatistics...............

Hardware for Speech and Audio Coding

High Efficiency, Moderate Complexity Video Codec Using Only RF IPR

Efficient Multi-Codec Support for OTT Services: HEVC/H.265 And/Or AV1?

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter

Arxiv:2002.01657V1 [Eess.IV] 5 Feb 2020 Port Lossless Model to Compress Images Lossless

Course Outline & Schedule

Video Compression Optimized for Racing Drones

Premam 2015 Malayalam 720P Bdrip X264 AC3 51 14GB 53Golkes

Sparsity in Linear Predictive Coding of Speech

Lossless and Nearly-Lossless Image Compression Based on Combinatorial Transforms

Anti-Forensics of Digital Image Compression Matthew C

Arxiv:1602.05975V3 [Cs.MM] 28 Oct 2017 CDEF Works by Identifying the Direction [4] of Each Block Ed =  (Xp − Μd,K) 

Compression of Multidimensional Biomedical Signals with Spatial and Temporal Codebook-Excited Linear Prediction Elias S