Linear Prediction (LPC)

Total Page:16

File Type:pdf, Size:1020Kb

Linear Prediction (LPC) ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 6: Linear Prediction (LPC) 1. Resonance & The Source-Filter Model 2. Linear Prediction (LPC) 3. LP Representations 4. LP Synthesis & Modification Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 1 /16 1. Resonance • Resonance is ubiquitous in physical systems e.g. plucked/struck string, drum head A room “coloration” B vocal tract C Time • Resonances D ↔ Poles easily implemented 00 in LTI filters Position Velocity E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 2 /16 Singe Pole-Pair Resonance x[n] y[n] • Simple resonance = + Second-order IIR z-1 Band Pass Filter 2rcosωc z-1 imag 1 2 0.5 -r r 0 ωc -0.5 -1 z-plane -1 -0.5 0 0.5 1 real )| jw ωc ωc |H(e Q = B B ≈ 2(1-r) 0 0 0.1 0.2 0.3 ω / π impulse_bpf.pd E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 3 /16 Resonances in Speech • Vocal Tract (throat + tongue + lips) acts as variable resonator resonances = “formants” 4000 3000 freq / Hz 2000 1000 ^ 0 e 0 0.5 1 1.5 h ε z w tcl c θ I n timeI z / s I d ay m has a watch thin as a dime E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 4 /16 Source-Filter Model • Separation of: Source: fine structure in time/frequency Filter: subsequent shaping by physical resonances Formants t Pitch Glottal pulse train Vocal tract Voiced/ resonances Radiation Speech unvoiced + characteristic Frication f noise t Source Filter • Advantages Good match to real signals Salient pieces E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 5 /16 2. Linear Prediction (LPC) • LPC = Linear Predictive Coding remove redundancy in signal try to predict next point as linear combination of previous values p s[n]= a s[n k]+e[n] k − k=1 th a k are p order linear predictor coefficients { e [ n ] } is residual “innovation” a/k/a prediction error • Transfer function S(z) 1 1 = = E(z) 1 p a z k A(z) − k=1 k − all-pole “autoregressive” (AR) modeling E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 6 /16 Voice Modeling & LPC Direct expression of source-filter model • p s[n]= a s[n k]+e[n] k − k=1 Pulse/noise Vocal tract excitation 1 e[n] H(z) = /A(z) s[n] H(z) |H(ejω)| f z-plane acoustic tube model of vocal tract is all-pole vocal tract resonances change slowly ~ 10-20ms but: nasals E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 7 /16 Estimating LPC Models • You can “see” -40 resonances in -60 -80 a spectral slice: energy / dB -100 0 1000 2000 3000 freq / Hz We can find LPC coefficients ak • { } to minimize energy of residual e [ n ] : p 2 e2[n]= s[n] a s[n k] − k − n n k=1 differentiate w.r.t. ak & solve end up with p linear equations involving autocorrelations r ( j k )= s[n j]s[n k] ss | − | − − n E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 8 /16 LPC Illustration 0.1 0 windowed original -0.1 -0.2 LPC residual -0.3 0 50 100 150 200 250 300 350 400 time / samp dB original spectrum 0 LPC spectrum -20 -40 residual spectrum -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Actual poles: z-plane E4896_L06_lpc_diary E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 9 /16 Short-Time LP Analysis • Solve LPC for each ~20 ms frame 8 6 freq / kHz 4 2 0 20 1 15 0.5 10 12 5 0 0 Imaginary Part -0.5 -5 -10 -1 -1 0 1 -15 0 0.2 0.4 0.6 0.8 1 Real Part 8 6 freq / kHz 4 2 0 0.5 1 1.5 2 2.5 3 time / s E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 10 /16 3. LP Representations • Can interpret LPC filter fit many ways: • Picking out resonances if signal was source + resonances, should find them • Low-order spectral approximation minimizing e 2 [ n ] also minimizes E(ejω) 2 | | different from e.g. Fourier approximation... Finding & removing smooth spectrum • 1 A ( z ) is smooth approximation of S(z) S(z) 1 = E ( z )= S ( z ) A ( z ) is “unsmoothed” S(z) E(z) A(z) ⇒ • Signal whitening removing linear dependence makes residual like white noise (iid, flat spectrum) E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 11 /16 Alternative Forms • Many formulations for pth order all-pole IIR: predictor coefficients a / polynomial A(z) { k} roots λ = r e j ω i of A(z) { i i } reflection coefficients (for lattice filter structure) Line Spectral Frequencies (LSF) • Choice depends on: mathematical convenience numerical stability statistical properties (e.g. for coding) opportunities for modification E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 12 /16 4. LPC Synthesis • LP analysis on ~20ms frames gives prediction filter A ( z ) and residual e[n] recombining them should yield perfect s[n] coding applications further compress e[n] Encoder Filter coefficients {ai} Decoder |1/A(ej!)| Represent Input f s[n] & encode LPC Output analysis e^[n] s^[n] Represent Excitation All-pole Residual & encode generator filter e[n] 1 t H(z) = -i 1 - "aiz e.g. simple pitch tracker ➝ “buzz-hiss” encoding Pitch period values 16 ms frame boundaries 100 50 0 -50 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 13 /16 LPC Warping Replacing delays z-1 with allpass elements z + α • αz +1 warps frequencies but not magnitudes Original 8000 6000 P W 4000 = 0.6 Frequency 0.8 A 2000 0.6 0 0.5 1 1.5 2 2.5 3 Time 0.4 Warped LPC resynth, = -0.2 8000 0.2 A = -0.6 6000 0 0 0.2 0.4 0.6 0.8 P 4000 ^ W Frequency 2000 0 0.5 1 1.5 2 2.5 3 Time http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/ E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 14 /16 Cross-Synthesis • Mix residual (source) of one signal with resonances (filter) of another or: just use white noise as excitation formants carry phonemes ➝ vocoder E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 15 /16 Summary • Resonances (poles) color sound • Source + Filter model decouples excitation and resonances • Linear Prediction is a simple way to model and implement resonances (filter) • Many interpretations, representations, modifications E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 16 /16.
Recommended publications
  • The Phase Vocoder: a Tutorial Author(S): Mark Dolson Source: Computer Music Journal, Vol
    The Phase Vocoder: A Tutorial Author(s): Mark Dolson Source: Computer Music Journal, Vol. 10, No. 4 (Winter, 1986), pp. 14-27 Published by: The MIT Press Stable URL: http://www.jstor.org/stable/3680093 Accessed: 19/11/2008 14:26 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=mitpress. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music Journal. http://www.jstor.org MarkDolson The Phase Vocoder: Computer Audio Research Laboratory Center for Music Experiment, Q-037 A Tutorial University of California, San Diego La Jolla, California 92093 USA Introduction technique has become popular and well understood.
    [Show full text]
  • Digital Speech Processing— Lecture 17
    Digital Speech Processing— Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Processing • Waveform coding – sample-by-sample matching of waveforms – coding quality measured using SNR • Source modeling (block processing) – block processing of signal => vector of outputs every block – overlapped blocks Block 1 Block 2 Block 3 2 Model-Based Speech Coding • we’ve carried waveform coding based on optimizing and maximizing SNR about as far as possible – achieved bit rate reductions on the order of 4:1 (i.e., from 128 Kbps PCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech • to lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including: – source modeling – spectrum modeling – use of codebook methods for coding efficiency • we also need a new way of comparing performance of different waveform and model-based coding methods – an objective measure, like SNR, isn’t an appropriate measure for model- based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis – new subjective measures need to be used that measure user-perceived quality, intelligibility, and robustness to multiple factors 3 Topics Covered in this Lecture • Enhancements for ADPCM Coders – pitch prediction – noise shaping • Analysis-by-Synthesis Speech Coders – multipulse linear prediction coder (MPLPC) – code-excited linear prediction (CELP) • Open-Loop Speech Coders – two-state excitation
    [Show full text]
  • Mpeg Vbr Slice Layer Model Using Linear Predictive Coding and Generalized Periodic Markov Chains
    MPEG VBR SLICE LAYER MODEL USING LINEAR PREDICTIVE CODING AND GENERALIZED PERIODIC MARKOV CHAINS Michael R. Izquierdo* and Douglas S. Reeves** *Network Hardware Division IBM Corporation Research Triangle Park, NC 27709 [email protected] **Electrical and Computer Engineering North Carolina State University Raleigh, North Carolina 27695 [email protected] ABSTRACT The ATM Network has gained much attention as an effective means to transfer voice, video and data information We present an MPEG slice layer model for VBR over computer networks. ATM provides an excellent vehicle encoded video using Linear Predictive Coding (LPC) and for video transport since it provides low latency with mini- Generalized Periodic Markov Chains. Each slice position mal delay jitter when compared to traditional packet net- within an MPEG frame is modeled using an LPC autoregres- works [11]. As a consequence, there has been much research sive function. The selection of the particular LPC function is in the area of the transmission and multiplexing of com- governed by a Generalized Periodic Markov Chain; one pressed video data streams over ATM. chain is defined for each I, P, and B frame type. The model is Compressed video differs greatly from classical packet sufficiently modular in that sequences which exclude B data sources in that it is inherently quite bursty. This is due to frames can eliminate the corresponding Markov Chain. We both temporal and spatial content variations, bounded by a show that the model matches the pseudo-periodic autocorre- fixed picture display rate. Rate control techniques, such as lation function quite well. We present simulation results of CBR (Constant Bit Rate), were developed in order to reduce an Asynchronous Transfer Mode (ATM) video transmitter the burstiness of a video stream.
    [Show full text]
  • TAL-Vocoder-II
    TAL-Vocoder-II http://kunz.corrupt.ch/ TAL - Togu Audio Line © 2011, Patrick Kunz Tutorial Version 0.0.1 Cubase and VST are trademarks of Steinberg Soft- und Hardware GmbH TAL - Togu Audio Line – 2008 1/9 TAL-Vocoder-II ........................................................................................................... 1 Introduction ................................................................................................................. 3 Installation .................................................................................................................. 4 Windows .............................................................................................................. 4 OS X .................................................................................................................... 4 Interface ...................................................................................................................... 5 Examples .................................................................................................................... 6 Credits ........................................................................................................................ 9 TAL - Togu Audio Line – 2008 2/9 Introduction TAL-Vocoder is a vintage vocoder emulation with 11 bands that emulates the sound of vocoders from the early 80’s. It includes analog modeled components in combination with digital algorithms such as the SFFT (Short-Time Fast Fourier Transform). This vocoder does not make a direct convolution
    [Show full text]
  • The Futurism of Hip Hop: Space, Electro and Science Fiction in Rap
    Open Cultural Studies 2018; 2: 122–135 Research Article Adam de Paor-Evans* The Futurism of Hip Hop: Space, Electro and Science Fiction in Rap https://doi.org/10.1515/culture-2018-0012 Received January 27, 2018; accepted June 2, 2018 Abstract: In the early 1980s, an important facet of hip hop culture developed a style of music known as electro-rap, much of which carries narratives linked to science fiction, fantasy and references to arcade games and comic books. The aim of this article is to build a critical inquiry into the cultural and socio- political presence of these ideas as drivers for the productions of electro-rap, and subsequently through artists from Newcleus to Strange U seeks to interrogate the value of science fiction from the 1980s to the 2000s, evaluating the validity of science fiction’s place in the future of hip hop. Theoretically underpinned by the emerging theories associated with Afrofuturism and Paul Virilio’s dromosphere and picnolepsy concepts, the article reconsiders time and spatial context as a palimpsest whereby the saturation of digitalisation becomes both accelerator and obstacle and proposes a thirdspace-dromology. In conclusion, the article repositions contemporary hip hop and unearths the realities of science fiction and closes by offering specific directions for both the future within and the future of hip hop culture and its potential impact on future society. Keywords: dromosphere, dromology, Afrofuturism, electro-rap, thirdspace, fantasy, Newcleus, Strange U Introduction During the mid-1970s, the language of New York City’s pioneering hip hop practitioners brought them fame amongst their peers, yet the methods of its musical production brought heavy criticism from established musicians.
    [Show full text]
  • Overview of Voice Over IP
    Overview of Voice over IP February 2001 – University of Pennsylvania Technical Report MS-CIS-01-31* Princy C. Mehta Professor Sanjay Udani [email protected] [email protected] * This paper was written for an Independent Study course. Princy Mehta Overview of Voice over IP Professor Udani Table of Contents ACRONYMS AND DEFINITIONS...............................................................................................................3 INTRODUCTION...........................................................................................................................................5 IMPLEMENTATION OF VOICE OVER IP...............................................................................................6 OVERVIEW OF TCP/IP ...................................................................................................................................6 PACKETIZATION.............................................................................................................................................7 COMPONENTS OF VOIP..................................................................................................................................8 SIGNALING ....................................................................................................................................................8 H.323 .............................................................................................................................................................8 Logical Entities..........................................................................................................................................9
    [Show full text]
  • Software Sequencers and Cyborg Singers
    Edinburgh Research Explorer Software Sequencers and Cyborg Singers Citation for published version: Prior, N 2009, 'Software Sequencers and Cyborg Singers: Popular Music in the Digital Hypermodern', New Formations: A Journal of Culture, Theory, Politics, vol. 66, no. Spring, pp. 81-99. https://doi.org/10.3898/newf.66.06.2009 Digital Object Identifier (DOI): 10.3898/newf.66.06.2009 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: New Formations: A Journal of Culture, Theory, Politics Publisher Rights Statement: © Prior, N. (2009). Software Sequencers and Cyborg Singers: Popular Music in the Digital Hypermodern. New Formations, 66(Spring), 81-99 doi: 10.3898/newf.66.06.2009. General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 26. Sep. 2021 SOFTWARE SEQUENCERS AND CYBORG SINGERS: POPULAR MUSIC IN THE DIGITAL HYPERMODERN Nick Prior It has been almost twenty years since Andrew Goodwin’s classic essay, ‘Sample and Hold’, claimed that pop music had entered a new phase of digital reproduction.1 If the digital sampler was postmodernism’s musical engine, then hip hop was its recombinant form, and the erosion of divisions between original and copy the celebrated consequence.
    [Show full text]
  • A Vocoder (Short for Voice Encoder) Is a Synthesis System, Which Was
    The Vocoder Thomas Carney 311107435 Digital Audio Systems, DESC9115, Semester 1 2012 Graduate Program in Audio and Acoustics Faculty of Architecture, Design and Planning, The University of Sydney A vocoder (short for voice encoder) is Dudley's breakthrough device a synthesis system, which was initially analysed wideband speech, developed to reproduce human converted it into slowly varying control speech. Vocoding is the cross signals, sent those over a low-band synthesis of a musical instrument with phone line, and finally transformed voice. It was called the vocoder those signals back into the original because it involved encoding the speech, or at least a close voice (speech analysis) and then approximation of it. The vocoder was reconstructing the voice in also useful in the study of human accordance with a code written to speech, as a laboratory tool. replicate the speech (speech synthesis). A key to this process was the development of a parallel band pass A Brief History filter, which allowed sounds to be The vocoder was initially conceived filtered down to a fairly specific and developed as a communication portion of the audio spectrum by tool in the 1930s as a means of attenuating the sounds that fall above coding speech for delivery over or below a certain band. By telephone wires. Homer Dudley 1 is separating the signal into many recognised as one of the father’s of bands, it could be transmitted easier the vocoder for his work over forty and allowed for a more accurate years for Bell Laboratories’ in speech resynthesis. and telecommunications coding. The vocoder was built in an attempt to It wasn’t until the late 1960s 2 and save early telephone circuit early 1970s that the vocoder was bandwidth.
    [Show full text]
  • Mediated Music Makers. Constructing Author Images in Popular Music
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Helsingin yliopiston digitaalinen arkisto Laura Ahonen Mediated music makers Constructing author images in popular music Academic dissertation to be publicly discussed, by due permission of the Faculty of Arts at the University of Helsinki in auditorium XII, on the 10th of November, 2007 at 10 o’clock. Laura Ahonen Mediated music makers Constructing author images in popular music Finnish Society for Ethnomusicology Publ. 16. © Laura Ahonen Layout: Tiina Kaarela, Federation of Finnish Learned Societies ISBN 978-952-99945-0-2 (paperback) ISBN 978-952-10-4117-4 (PDF) Finnish Society for Ethnomusicology Publ. 16. ISSN 0785-2746. Contents Acknowledgements. 9 INTRODUCTION – UNRAVELLING MUSICAL AUTHORSHIP. 11 Background – On authorship in popular music. 13 Underlying themes and leading ideas – The author and the work. 15 Theoretical framework – Constructing the image. 17 Specifying the image types – Presented, mediated, compiled. 18 Research material – Media texts and online sources . 22 Methodology – Social constructions and discursive readings. 24 Context and focus – Defining the object of study. 26 Research questions, aims and execution – On the work at hand. 28 I STARRING THE AUTHOR – IN THE SPOTLIGHT AND UNDERGROUND . 31 1. The author effect – Tracking down the source. .32 The author as the point of origin. 32 Authoring identities and celebrity signs. 33 Tracing back the Romantic impact . 35 Leading the way – The case of Björk . 37 Media texts and present-day myths. .39 Pieces of stardom. .40 Single authors with distinct features . 42 Between nature and technology . 45 The taskmaster and her crew.
    [Show full text]
  • Robotic Voice Effects
    Robotic voice effects From Wikipedia, the free encyclopedia Source : http://en.wikipedia.org/wiki/Robotic_voice_effects "Robot voices" became a recurring element in popular music starting in the late twentieth century, and several methods of producing variations on this effect have arisen. Though the vocoder is by far the best-known, the following other pieces of music technology are often confused with it: Sonovox This was an early version of the talk box used to create the voice of the piano in the Sparky's Magic Piano series from 1947. It was used as the voice of many musical instruments in Rusty in Orchestraville. It was used as the voice of Casey the Train in Dumbo and The Reluctant Dragon[citation needed]. Radio jingle companies PAMS and JAM Creative Productions also used the sonovox in many stations ID's they produced. Talk box The talk box guitar effect was invented by Doug Forbes and popularized by Peter Frampton. In the talk box effect, amplified sound is actually fed via a tube into the performer's mouth and is then shaped by the performer's lip, tongue, and mouth movements before being picked up by a microphone. In contrast, the vocoder effect is produced entirely electronically. The background riff from "Livin' on a Prayer" by Bon Jovi is a well-known example. "California Love" by 2Pac and Roger Troutman is a more recent recording featuring a talk box fed with a synthesizer instead of guitar. Steven Drozd of the The Flaming Lips used the talk box on parts of the groups eleventh album, At War with the Mystics, to imitate some of Wayne Coyne's repeated lyrics in the "Yeah Yeah Yeah Song".
    [Show full text]
  • Speech Compression
    information Review Speech Compression Jerry D. Gibson Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93118, USA; [email protected]; Tel.: +1-805-893-6187 Academic Editor: Khalid Sayood Received: 22 April 2016; Accepted: 30 May 2016; Published: 3 June 2016 Abstract: Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed. Keywords: speech coding; voice coding; speech coding standards; speech coding performance; linear prediction of speech 1. Introduction Speech coding is a critical technology for digital cellular communications, voice over Internet protocol (VoIP), voice response applications, and videoconferencing systems. In this paper, we present an abridged history of speech compression, a development of the dominant speech compression techniques, and a discussion of selected speech coding standards and their performance. We also discuss the future evolution of speech compression and speech compression research. We specifically develop the connection between rate distortion theory and speech compression, including rate distortion bounds for speech codecs. We use the terms speech compression, speech coding, and voice coding interchangeably in this paper. The voice signal contains not only what is said but also the vocal and aural characteristics of the speaker. As a consequence, it is usually desired to reproduce the voice signal, since we are interested in not only knowing what was said, but also in being able to identify the speaker.
    [Show full text]
  • Microkorg Owner's Manual
    E 2 ii Precautions Data handling Location THE FCC REGULATION WARNING (for U.S.A.) Unexpected malfunctions can result in the loss of memory Using the unit in the following locations can result in a This equipment has been tested and found to comply with the contents. Please be sure to save important data on an external malfunction. limits for a Class B digital device, pursuant to Part 15 of the data filer (storage device). Korg cannot accept any responsibility • In direct sunlight FCC Rules. These limits are designed to provide reasonable for any loss or damage which you may incur as a result of data • Locations of extreme temperature or humidity protection against harmful interference in a residential loss. • Excessively dusty or dirty locations installation. This equipment generates, uses, and can radiate • Locations of excessive vibration radio frequency energy and, if not installed and used in • Close to magnetic fields accordance with the instructions, may cause harmful interference to radio communications. However, there is no Printing conventions in this manual Power supply guarantee that interference will not occur in a particular Please connect the designated AC adapter to an AC outlet of installation. If this equipment does cause harmful interference Knobs and keys printed in BOLD TYPE. the correct voltage. Do not connect it to an AC outlet of to radio or television reception, which can be determined by Knobs and keys on the panel of the microKORG are printed in voltage other than that for which your unit is intended. turning the equipment off and on, the user is encouraged to BOLD TYPE.
    [Show full text]