
SHORTEN Simple lo s sle s s andne arlo s sle s s waveform compre s s ion Tony Robinson Technical rep ort CUEDFINFENGTR Cambr idge Univers ityEngineer ing Department Trumpington Street Cambr idge CB PZ UK Decemb er Ab stract Thi s rep ort de scr ib e s a program that p erforms compre ss ion of waveform le s such as audio data A s imple pre dictivemodel of thewaveform i s us e d followed by Human co dingofthe pre diction re s iduals Thi s i s b oth f ast andnear optimal for many com monly o ccur ingwaveform s ignals Thi s f rameworkisthen extended to lossy co ding under the conditions of maximisingthe s egmental s ignal to noi s e ratio on a p er f rame bas i s and co dingto a xe d acceptable s ignal to noi s e ratio Intro duction It i s common tostore digitised waveforms on computers andtheresulting le s can often consume s ignicantamountsofstorage space General compre s s ion algor ithms do not p erform very well on the s e le s as they f ail totakeinto accountthestructure of thedataand thenature of the s ignal contained there in Typically a waveform le will cons i st of s igned bit numb ers andthere will b e s ignicantsample to sample correlation A compre s s ion utility for the s e le must b e re asonably f ast p ortable accept datainamostpopular formats and give s ignicant compre s s ion Thi s rep ort de scr ib e s shorten a program for the UNIX and DOS environmentswhich aims tomeet the s e requirements A s ignicantapplication of thi s program i s tothe problem of compre s s ion of sp eech le s for di str ibution on CDROM Thi s rep ort starts withade scr iption of thi s domain then di scus s e s thetwomain problems as so ciate d with general waveform compre s s ion namely pre dictivemodellingandresidual co ding Thi s f rameworkisthen extended to lossy coding Finallytheshorten implementation i s de scr ib e d andanappendix details the command line options Compre s s ion for sp eech corp ora One imp ortant use for lossless waveform compression is to compre s s sp eech corp ora for di str ibution on CDROM Stateofthe art sp eech recognition systems require gigabytes of acoustic data for mo del e stimation whichtakes many CDROMs tostore Us e of compre s s ion software b othreduce s the di str ibution co st andthenumber of CDROM change s require d toreadthecompletedata s et Thekey f actors in thede s ign of compre s s ion software for sp eech corp ora are thatthere must b e no p erceptual degradation in the sp eech s ignal andthatthedecompre s s ion routine must b e f ast and p ortable There has b een much researchinto ecient sp eechcodingtechnique s andmanystand ards have b een e stabli shed However mo st of thi s workhas b een for telephonyapplications where de dicated hardware can used to p erform the co dingandwhere it i s imp ortantthat the re sulting system operates atawell dene d bit rate In suchapplications lo s sy co ding i s acceptable andindee d nece s sary order to guarantee thatthesystem op erates atthe xe d bit rate Similarly there has b een muchworkinde s ign of general purpose lossless compre s sors for workstation us e Suchsystems do not guarantee any compre s s ion for an arbitrary le but in general achieveworthwhile compre s s ion in re asonable timeongeneral purpose computers Sp eech corp ora compre s s ion nee ds somefeature s of b oth systems Lo s sle s s compre s s ion i s an advantage as it guarantee s there i s no p erceptual degradation in the sp eech s ignal However theestabli she d compre s s ion utilitie s do not exploit theknown structure of the sp eech s ignal Hence shorten was wr itten to ll thi s gap andisnow in us e in the di str ibution of CDROMs containing sp eechdatabas e s The recordings us e d as example s in s ection and s ection are f rom the TIMIT corpus which i s di str ibute d as bit kHz line ar PCM sample s Thi s format i s in common us e d for continuous sp eech recognition research corp ora The recordings were collecte d us inga Sennheiser HMD noisecancellinghe admounte d microphoneinlow noise conditions All ten utterance s f rom sp e aker fcjf are used which amounttoatotal of s econds or about sample s Waveform Mo deling Compression is achieved by building a pre dictivemodel of thewaveform a go o d intro duc tion for sp eech i s JayantandNoll An e stabli shed model for a widevar ietyofwaveforms is thatofanautoregre s s ivemodel also known as line ar pre dictive co ding LPC Here the pre dicted waveform i s a line ar combination of past sample s p X st a st i i i Thecode d s ignal et i s the dierence b etween theestimateoftheline ar pre dictors t andthe sp eech s ignal st et st st However manywaveforms of intere st are not stationarythatisthebestvalue s for the co ecientsofthe pre dictor a vary f rom one s ection of thewaveform toanother It i s i often re asonable toassumethatthe s ignal i s p s eudostationary ie there exi stsa timespan over which re asonable value s for the line ar pre dictor can b e found Thus thethree main stage s in the co ding pro ce s s are blo cking pre dictivemodelling andresidual co ding Blo cking Thetime f rameover whichsample s are blo cked dep ends to some extentonthenature of the s ignal It i s inecienttoblo ckontoo short a time scale as thi s incurs an overhead in the computation and transmi s s ion of the pre diction parameters It i s also inecienttouse atime scale over whichthesignal characteristics change appreciably as thi s will re sultin a p o orer mo del of the s ignal However in the implementation described belowthe linear pre dictor parameters typically takemuch le s s information to transmit than the residual s ignal so thechoice of window length i s not cr itical Thedef aultvalue in theshorten implementation i s which re sults in ms f rame s for a s ignal sample d at kHz Sample interle aved signals are handelle d by tre atingeachdata stre am as indep endent Even in cas e s where thereisaknown correlation b etween the stre ams suchasinstereo audio the withinchannel correlations are often s ignicantly gre ater than the cro s schannel correlations so for lo s sle s s or ne arlo s sle s s co dingthe exploitation of thi s additional correl ation only re sultsinsmall additional gains A rectangular window i s us e d in preference toanytap er ingwindowasthe aim i s to mo del just those sample s within theblo ck not the sp ectral characteristics of the s egment surroundingtheblo ck The windowlength i s longer than theblo cksizebythe pre diction order whichistypically three sample s Line ar Pre diction Shorten supp ortstwo forms of line ar pre diction thestandard pth order LPC analys i s of equation andarestricte d form wherebythe co ecients are s elected from one of four xe d p olynomial pre dictors In thecaseofthegeneral LPC algor ithm the pre diction co ecients a are quantised in i accordance withthe same Laplacian di str ibution us e d for theresidual s ignal andde scr ib e d in s ection The exp ected numb er of bits p er co ecient i s as thi s was foundtobe a go o d tradeo b etween mo delling accuracy andmodel storage Thestandard Durbins algor ithm for computingthe LPC co ecients f rom theauto correlation co ecients is used in a incremental wayOneachiteration theme an square d value of the pre diction re s idual i s calculated andthi s i s us e d to computethe exp ected number of bitsnee ded tocode the residual s ignal Thi s i s added tothenumb er of bitsnee ded tocodethe pre diction co ecientsandthe LPC order i s s elected to minimis e thetotal As the computation of theauto correlation co ecientsisthe mo st exp ens ivestep in this process the s e arch for theoptimal mo del order i s terminated when the last twomodels have re sulte d in a higher bit rate Whilst it i s possible to construct s ignals thatdefe atthis search pro ce dure in practice for sp eechsignals it has b een foundthatthe o ccas ional us e of a lower pre diction order re sults in an ins ignicant incre as e in the bit rateandhas the additional s ide eect of requir ing le s s computetodeco de A re str ictiveformofthe line ar pre dictor has b een foundto b e us eful In thi s cas e the pre diction co ecients are tho s e sp ecie d byttinga p order p olynomial tothe last p data p oints eg a linetothe last two p oints s t s t st s t st st s t st st st Writing e tasthe error s ignal f rom the ithpolynomial pre dictor i e t st e t e t e t e t e t e t e t e t e t As can b e s een f rom equations there i s an ecient recurs ivealgorithm for comput ingthe s et of p olynomial pre diction re s iduals Eachresidual term i s forme d f rom the dierence of the previous order pre dictors As e achterm involve s only a few integer addi tionssubtractions it i s p o s s ible to compute all pre dictors and s elect the b e st Moreover as thesumofabsolutevalue s i s line arly related tothevar iance thi s may be used as the bas i s of pre dictor s election andsothewhole process is cheap to computeasitinvolves no multiplications Figure shows b oth forms of pre diction for a range of maximum pre dictor orders The gure shows that rst and s econdorder pre diction provides a substantial incre as e in compre s s ion andthathigher order pre dictors provide relatively little improvement The gure also shows that for thi s example mo st of thetotal compre s s ion can b e obtaine d us ing no pre diction that i s a zerothorder co der achieved about compre s s ion andthebest pre dictor Hence for lo s sle s s compre s s ion it i s imp ortant not towastetoo much computeonthe pre dictor andtoto p erform the residual co ding eciently
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-