Speex The Speex voice codec Areas of use
Provide an alternative to expensive, proprietary speech codecs (e.g. G.723.1, G.729)
Help bring good quality VoIP to Linux and other Free OS
Lower barrier of entry for VoIP in general
Lower barrier of entry for VoIP in general Both ondisk and over network Wide range of bitrates, variable complexity The Speex voice codec – Features 1 Features Free software / open source, patent and royaltyfree.
Integration of narrowband and wideband in the same bitstream.
Wide range of bitrates available (from 2kbps to 44 kbps).
Dynamic bitrate switching and Variable BitRate (VBR).
Voice Activity Detection (VAD, integrated with VBR).
Variable complexity.
Ultrawideband mode at 32 kHz (up to 48 kHz).
Intensity stereo encoding option. The Speex voice codec – Features 2 Sampling rates Narrowband: 8 kHz Wideband: 16 kHz Ultrawideband: 32 kHz
Quality Quality parameter ranges from 0 to 10 Integer for constant bitrate Float for variable bitrate The Speex voice codec – Features 3 Complexity (variable)
It is possible to choose a complexity level between 1 and 10. Lower complexity reduces CPU requirements, but in creases noise levels.
Variable bitrate
Allows the codec to change its bitrate dynamically to adapt to the “difficulty” of the audio being encoded. Drawbacks: Cannot predict final average bitrate. Maximum bitrate must be known for some applications. The Speex voice codec – Features 4 Average BitRate (ABR)
Adjusts VBR quality in order to meet a specific target bitrate. Because of the realtime adjustment of the bitrate, the global quality will be slightly lower than when encoded with VBR and optimal settings. Voice Activity Detection (VAD)
Tries to differentiate between speech and background noise Always implicitly activated in VBR > only useful in non VBRoperation. Reproduces background sound with minimal bitusage (“Comfort noise generation”). The Speex voice codec – Features 5 Discontinuous Transmission (DTX)
Addition to VAD/VBR. Stops transmission when background noise is station ary. In filebased operations 5 bits are written for each frame.
Perceptual enhancement
Part of Decoder. Tries to reduce (the perception of) the noise produced by the coding/decoding process. Usually makes the result (objectively) further from the original (using signaltonoise ratio) Still makes it sound better (subjectively) The Speex voice codec – Features 6 Algorithmic delay
Frame size + some amount of “lookahead” to process each frame (about 10 ms). Narrowband: 30 ms Wideband: 34 ms
In addition comes the CPUtime to process the frames. The Speex voice codec Comparison Comparison
Multirate: Allows the codec to change bitrate dynamically, at any moment Embedded: A codec that embeds narrowband bitstreams in wideband bitstreams VBR: Variable bitrate PLC: Packet loss concealment Bitrobust: Robust to corruption at the bit level, as found on wireless networks The Speex voice codec – CELP 1 CELP (Code Excited Linear Prediction)
CELP makes use of:
Linear prediction filters (shortterm prediction) Excited by a stimulus which is coded as residue of a pitch prediction (longterm prediction) by Vector Quantiza tion.
It is implemented in a lot of speech codecs, also in the new AMR codec in 3GPP. The Speex voice codec – CELP 2 CELP box diagram The Speex voice codec – CELP 3 CELP versions
ACELP: ETSI GSM enhanced full rate (European Mo bile telephony (1.8 GHz band) 13 kbps)
LDCELP: Low delay telephone ITUT G.728 (16 kbps)
CSACELP: Voice over IP / Digital satellite / HQ digital radio ITUT G.729 ( 8 kbps)
RPELTP: ETSI GSM 06.10 full rate (13 kbps)
VSELP: ETSI GSM half rate (5.6 kbps) The Speex voice codec – CELP 4 CELP silence compression
Reduces the average bitrate thanks to a lowerbitrate com pression for silence.
In the encoder, a voice activity detector is used to distinguish between regions with normal speech activity and those with si lence or background noise.
During normal speech activity, the CELP coding as in Version 1 is used. Otherwise a Silence Insertion Descriptor (SID) is transmitted at a lower bitrate. This SID enables a Comfort Noise Generator (CNG) in the decoder. The amplitude and spectral shape of this comfort noise is specified by energy and LPC parameters similar as in a normal CELP frame. These parameters are an optional part of the SID and thus can be updated as required. The Speex voice codec PESQ comparison narrowband
PESQ: Perceptual Evaluation of Speech Quality An enhanced perceptual quality measurement for voice quality in telecommunica tions.
MOS: Mean Opinion Score The Speex voice codec Applications using Speex
Asterisk foobar2000 GnomeMeeting JRoar Marathon: Aleph One LinPhone OpenH323 Speex Frontend Sweep TeamSpeak tkPhone Windows builds of Speex utilities The Speex voice codec Sources
Audiocoding wiki CELP: http://www.audiocoding.com/wiki/index.php?page=CELP
Data compression website of Nam Phamdo: http://www.datacompression.com/speech.html
JeanMarc Valin University of Sherbrooke: Speex: Speech Compression for Everyone
The Speex website: http://www.speex.org/