The Speex Voice Codec Areas of Use
Total Page:16
File Type:pdf, Size:1020Kb
Speex The Speex voice codec Areas of use Provide an alternative to expensive, proprietary speech codecs (e.g. G.723.1, G.729) Help bring good quality VoIP to Linux and other Free OS Lower barrier of entry for VoIP in general Lower barrier of entry for VoIP in general - Both on-disk and over network - Wide range of bit-rates, variable complexity The Speex voice codec ± Features 1 Features Free software / open source, patent and royalty-free. Integration of narrowband and wideband in the same bit-stream. Wide range of bit-rates available (from 2kbps to 44 kbps). Dynamic bit-rate switching and Variable Bit-Rate (VBR). Voice Activity Detection (VAD, integrated with VBR). Variable complexity. Ultra-wideband mode at 32 kHz (up to 48 kHz). Intensity stereo encoding option. The Speex voice codec ± Features 2 Sampling rates Narrowband: 8 kHz Wideband: 16 kHz Ultra-wideband: 32 kHz Quality Quality parameter ranges from 0 to 10 -Integer for constant bit-rate -Float for variable bit-rate The Speex voice codec ± Features 3 Complexity (variable) It is possible to choose a complexity level between 1 and 10. - Lower complexity reduces CPU requirements, but in- creases noise levels. Variable bit-rate Allows the codec to change its bit-rate dynamically to adapt to the ªdifficultyº of the audio being encoded. Drawbacks: - Cannot predict final average bit-rate. - Maximum bit-rate must be known for some applications. The Speex voice codec ± Features 4 Average Bit-Rate (ABR) Adjusts VBR quality in order to meet a specific target bit-rate. - Because of the real-time adjustment of the bit-rate, the global quality will be slightly lower than when encoded with VBR and optimal settings. Voice Activity Detection (VAD) Tries to differentiate between speech and background noise - Always implicitly activated in VBR -> only useful in non- VBR-operation. - Reproduces background sound with minimal bit-usage (ªComfort noise generationº). The Speex voice codec ± Features 5 Discontinuous Transmission (DTX) Addition to VAD/VBR. Stops transmission when background noise is station- ary. - In file-based operations 5 bits are written for each frame. Perceptual enhancement Part of Decoder. Tries to reduce (the perception of) the noise produced by the coding/decoding process. - Usually makes the result (objectively) further from the original (using signal-to-noise ratio) - Still makes it sound better (subjectively) The Speex voice codec ± Features 6 Algorithmic delay Frame size + some amount of ªlook-aheadº to process each frame (about 10 ms). - Narrowband: 30 ms - Wideband: 34 ms In addition comes the CPU-time to process the frames. The Speex voice codec - Comparison Comparison Multi-rate: Allows the codec to change bitrate dynamically, at any moment Embedded: A codec that embeds narrowband bitstreams in wideband bitstreams VBR: Variable bitrate PLC: Packet loss concealment Bit-robust: Robust to corruption at the bit level, as found on wireless networks The Speex voice codec ± CELP 1 CELP (Code Excited Linear Prediction) CELP makes use of: - Linear prediction filters (short-term prediction) - Excited by a stimulus which is coded as residue of a pitch prediction (long-term prediction) by Vector Quantiza- tion. It is implemented in a lot of speech codecs, also in the new AMR codec in 3GPP. The Speex voice codec ± CELP 2 CELP box diagram The Speex voice codec ± CELP 3 CELP versions ACELP: ETSI GSM enhanced full rate (European Mo- bile telephony (1.8 GHz band) - 13 kbps) LD-CELP: Low delay telephone ITU-T G.728 (16 kbps) CS-ACELP: Voice over IP / Digital satellite / HQ digital radio ITU-T G.729 ( 8 kbps) RPE-LTP: ETSI GSM 06.10 full rate (13 kbps) VSELP: ETSI GSM half rate (5.6 kbps) The Speex voice codec ± CELP 4 CELP silence compression Reduces the average bit-rate thanks to a lower-bit-rate com- pression for silence. In the encoder, a voice activity detector is used to distinguish between regions with normal speech activity and those with si- lence or background noise. During normal speech activity, the CELP coding as in Version 1 is used. Otherwise a Silence Insertion Descriptor (SID) is transmitted at a lower bit-rate. This SID enables a Comfort Noise Generator (CNG) in the decoder. The amplitude and spectral shape of this comfort noise is specified by energy and LPC parameters similar as in a normal CELP frame. These parameters are an optional part of the SID and thus can be updated as required. The Speex voice codec PESQ comparison - narrowband PESQ: Perceptual Evaluation of Speech Quality - An enhanced perceptual quality measurement for voice quality in telecommunica- tions. MOS: Mean Opinion Score The Speex voice codec Applications using Speex Asterisk foobar2000 GnomeMeeting JRoar Marathon: Aleph One LinPhone OpenH323 Speex Frontend Sweep TeamSpeak tkPhone Windows builds of Speex utilities The Speex voice codec Sources Audiocoding wiki - CELP: http://www.audiocoding.com/wiki/index.php?page=CELP Data compression website of Nam Phamdo: http://www.data-compression.com/speech.html Jean-Marc Valin - University of Sherbrooke: Speex: Speech Compression for Everyone The Speex website: http://www.speex.org/.