Sparsity in Linear Predictive Coding of Speech
Total Page:16
File Type:pdf, Size:1020Kb
Sparsity in Linear Predictive Coding of Speech Ph.D. Thesis Daniele Giacobello Multimedia Information and Signal Processing Department of Electronic Systems Aalborg University Niels Jernes Vej 12, 9220 Aalborg Ø, Denmark Sparsity in Linear Predictive Coding of Speech Ph.D. Thesis August 2010 Copyright c 2010 Daniele Giacobello, except where otherwise stated. All rights reserved. Abstract This thesis deals with developing improved techniques for speech coding based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well- known linear prediction (LP) model currently applied in many modern speech coders. In the first part of the thesis, we provide an overview of Sparse Linear Predic- tion, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept of high-order sparse predictors. These predictors, by modeling efficiently the spectral envelope and the harmonics components with very few coefficients, have direct applications in speech processing, engendering a joint estimation of short-term and long-term predictors. We also give preliminary results of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given by the sparse linear predictors and the use of the re- estimation procedure, are analyzed in the context of frame independent coding for speech communications over packet networks. i ii List of Papers The main body of this thesis consists of the following papers: [A] D. Giacobello, M. G. Christensen, J. Dahl, S. H. Jensen, and M. Moonen, “Sparse Linear Predictors for Speech Processing,” in Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1353–1356, 2008. [B] D. Giacobello, M. G. Christensen, J. Dahl, S. H. Jensen, and M. Moo- nen, “Joint Estimation of Short-Term and Long-Term Predictors in Speech Coders,” in Proceedings of the 34th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4109–4112, 2009. [C] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, and M. Moonen, “Speech Coding Based on Sparse Linear Prediction,” in Proceedings of the 17th European Signal Processing Conference (EUSIPCO), 2009, pp. 2524–2528. [D] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, and M. Moonen, “Enhancing Sparsity in Linear Prediction of Speech by Iteratively Reweighted 1-norm Minimization,” in Proceedings of the 35th IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4650–4653, 2010. [E] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, and M. Moo- nen, “Sparse Linear Prediction and Its Applications to Speech Processing,” submitted to IEEE Transactions on Audio, Speech, and Language Process- ing, 2010. [F] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, and M. Moonen, “Stable Solutions for Linear Prediction of Speech Based on 1-norm Error Criterion,” to be submitted to IEEE Transactions on Audio, Speech, and Language Processing, 2010. [G] D. Giacobello, T. van Waterschoot, M. G. Christensen, S. H. Jensen, and M. Moonen, “High-Order Sparse Linear Predictors for Audio Processing,” ac- cepted for publication in Proceedings of the 18th European Signal Processing Conference (EUSIPCO), 2010. iii [H] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, and M. Moo- nen, “Retrieving Sparse Patterns Using a Compressed Sensing Framework: Applications to Speech Coding Based on Sparse Linear Prediction,” in IEEE Signal Processing Letters, vol. 17, no. 1, pp. 103–106, 2010. [I] D. Giacobello, M. N. Murthi, M. G. Christensen, S. H. Jensen, and M. Moonen, “Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction,” in Conference Record of the 43rd Asilomar Conference on Sig- nals, Systems and Computers, pp. 1770–1773, 2009. [J] D. Giacobello, M. N. Murthi, M. G. Christensen, S. H. Jensen, and M. Moonen, “Estimation of Frame Independent and Enhancement Components for Speech Communication over Packet Networks,” in Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4682–4685, 2010. The following papers have also been published by the author of this thesis during the Ph.D. studies: [1] D. Giacobello, M. Semmoloni, D. Neri, L. Prati and S. Brofferio, “Voice Activity Detection Based on the Adaptive Multi-Rate Speech Codec Param- eters,” in Proc. 11th International Workshop on Acoustic Echo and Noise Control (IWAENC), 2008. [2] D. Giacobello, D. Neri, L. Prati and S. Brofferio, “Acoustic Echo Cancella- tion on the Adaptive Multi-Rate Speech Codec Parameters,” in Proc. 11th International Workshop on Acoustic Echo and Noise Control (IWAENC), 2008. iv Preface This thesis is submitted to the International Doctoral School of Technology and Science at Aalborg University in partial fulfillment of the requirements for the degree of Doctor of Philosophy. The main body consists of a number of papers that have been published in or have been submitted to peer-reviewed conferences and journals. The work was carried out during the period from September 2007 through August 2010 at the Multimedia Information and Signal Processing Group of the Department of Electronic Systems at Aalborg University. It was funded by the European Union Marie Curie SIGNAL Fellowship, contract no. MEST-CT-2005-021175. There are many people I am indebted to and, without their guidance and encouragement, achieving this important goal in my life would have not been possible. First and foremost, my sincere gratitude goes to my supervisor, Prof. Søren Holdt Jensen, for giving me the opportunity of pursuing a Ph.D. degree and for providing me with a perfect working environment to fully develop my potential. He also supported and encouraged me in all my decisions and provided me with very valuable advices. I also thank my co-promoter within the Marie Curie SIGNAL project, Prof. Marc Moonen, for his invaluable comments on all my papers and also for making my stay at the Katholieke Universiteit Leuven a very pleasant experience. I would also like to extend my gratitude to my co-supervisor Prof. Mads Græsbøll Christensen. Since I first started my Ph.D. studies, he has taken me “under his wings” providing me with some of the ideas he had developed by introducing sparsity constraints in the linear predictive framework. As soon as we started working on it, those ideas truly became the “goose that laid the golden eggs,” and form the core of this thesis. I owe him a great deal and it has been a privilege to work with him, his mentoring has undoubtedly helped me throughout this great scientific adventure. This thesis is also, to a large extent, the result of collaboration with other people, and my various co-authors also deserve an honorable mention here. First of all, Prof. Manohar N. Murthi deserves to be thanked for the technical dis- cussions during my stay at the University of Miami and the very fruitful collab- oration that sprung out of them. I would also like to thank Dr. Joachim Dahl v for his precious insights on convex optimization, and Dr. Toon van Waterschoot for the highly beneficial talks on how to extend our work to other application scenarios. The best part of my Ph.D. studies has undoubtedly been getting to meet and work with many amazing people. In this regards, I would like to acknowledge my present and former colleagues at the Multimedia Information and Signal Processing Group at Aalborg University and all the people at Katholieke Uni- versiteit Leuven, Instituto Superior Tecnico Lisbon, and University of Nice who were involved in the SIGNAL project for the countless interesting technical dis- cussions and the fun times we had at our numerous meetings. I would also like to express my gratitude to Charlotte Skindbjerg Pedersen and the whole admin- istrative staff at Aalborg University for taking care of the bureaucratic matters, thus making my working life easier. In the personal sphere, I would like to thank many people that have been close to me in the past several years and, directly or indirectly, have contributed to this work. In particular (in rigorous alphabetical order): Alessandro, Alessio, Alvaro, Andrea, Behzad, Emilia, Francesca, Gian Paolo, Giulia, Ismael, Kim, Jason, Johan, Lucia, Marco, Mario, Marta, Meg, Pedro, Pierre-Louis, Rocco, Romain, Sabato, Shaminda, Tobias, and Virginia. My largest debt of gratitude is toward my parents. They have been the pillars on which I could hold on to at any moment in my life. They have guided, inspired, encouraged, and supported me. Above all, they have always believed in me. This thesis is dedicated to them. A special thought goes also to all of my family for their unconditional love and support. Finally, I would like to thank Shadi for her support, encouragement, patience, and unwavering love, which made these past three years the best of my life. Daniele Giacobello Aalborg University, August 2010 vi Contents Abstract i List of Papers iii Preface v Introduction 1 1 Background.............................