SILK Speech Codec TDP 10/11 Xavier Anguera I Ciro Gracia SILK Codec
Total Page:16
File Type:pdf, Size:1020Kb
SILK Speech Codec TDP 10/11 Xavier Anguera I Ciro Gracia SILK Codec Audio codec desenvolupat per Skype (Febrer 2009) Previament usaven el codec SVOPC (Sinusoidal Voice Over Packet Coder): • LPC analysis. • Quasi-harmonic modelling of the linear prediction (LPC) residual. • Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. Requeriments (Internet Wideband Audio Codec) Optimitzat per a treballar en temps real. Flexibilitat i adaptació de paràmetres a temps real, segons condicions: Xarxa Hardware Senyal d'àudio Paràmetres (Internet Wideband Audio Codec) Bitrate: qualitat vs bitrate. Baix: <10kbps (parla en qualsevol idioma). Alt: excel·lent per a tota senyal musical. Sampling rate: narrowband (8 Khz) – wideband (24 Khz o més). Complexitat: 50 Mhz x86 core, wideband mode (16 KHz sampling rate). Packet Loss Resilience: minimitzar la propagació dels errors. Delay: retard < 30ms. Discontinuous Transmission (DTX): velocitat baixa quan només hi ha soroll de fons. Encoder Sampling Rate: 8, 12, 16, 24 KHz Bitrate: 6-40 Kbps (1 bit/sample good, 1.5 bits/sample transparent) Packet rate: 20 ms frames, 1-5 frames/packet. Bitrate vs latency/ sensitivity. Packet Loss Resilience: us de dependències inter-frame per a detectar errors. Complexity: optimitzacions. Escalabilitat Escalabilitat del encoder Evaluació subjectivaAvaluació de qualitat MOS (Mean Opinion Score) Encoder Voice LTP Scaling Activity Control R Detector a n g Gains e Processor Pitch LSF Analysis Quantizer E n c o d Noise Prediction e Shaping Analysis r analysis High-Pass PreFilter Noise filter Shaping Quantization Decoder 2) Decode R Parameters a n g 4) e 3) 5) 1) Generate LTP LPC 6) d Excitation synthesis synthesis e c o d 1) Range encoded bitstream e 2) Coded Parameters r 3) Pulses and Gains 4) Pitch lags and LTP doefficients 5) LPC coefficients 6) Decoded signañ Pitch analysis • Returns a pitch value every 5ms and the voiced/unvoiced decision • LPC analysis is done with order 16, 12 or 8 • Three levels of correlation are used to reduce complexity Noise shaping analysis Optimizes some parameters to reduce noise effect • Balances quantization noise and bitrate • Spectral shaping of the quantization noise: makes it follow the signal spectrum • Deemphasizes spectral valleys (where noise would be more noticeable) • Matches the levels of the decoded speech formants to the original ones • Resulting parameters are applied to the signal in the PREFILTER module Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: • Voiced: • First a 5 coeff. long-term prediction analysis is performed on 20ms • The residual is input to an LPC analysis • LPC coefficients are converted to Line Spectral Frequencies(LSF) (less sensible to quantization noise) and quantized. Prediction analysis It is done differently depending whether we have voiced or unvoiced signals: • Unvoiced: • No need for LTP analysis • LPC is performed, transformed to an LSF vector and quantized. LSF quantization A codebook method is used and non-uniform quantization rate: • Rarely occurring values are quantized with low distortion but high number of bits • Commonly occurring values are modeled with low error and low number of bits. The used codebook is trained from a large training set a priori LTP quantization It also uses a vector codebook, chosen from 3 possible (containing 10, 20 and 40 vectors each) For each frame the best codebook is chosen according to a rate-distortion minimization function Noise shaping quantization This module joins all outputs from all modules to generate the overall residual that is quantized and sent. Range encoder It is a data compression method proposed in 1979 (now it is patent free) which is based on arithmetic encoding. It uses the probability of occurrence of each pattern to codify with less bits those that occur more often. It encodes the following: voiced/unvoiced, LTP + LPC quantization indexes, residual signal, several intermediate gains .