Speex The voice Areas of use

 Provide an alternative to expensive, proprietary speech (e.g. G.723.1, G.729)

 Help bring good quality VoIP to Linux and other Free OS

 Lower barrier of entry for VoIP in general

 Lower barrier of entry for VoIP in general ­ Both on­disk and over network ­ Wide range of bit­rates, variable complexity The Speex voice codec – Features 1 Features  Free software / open source, patent and royalty­free.

 Integration of narrowband and wideband in the same bit­stream.

 Wide range of bit­rates available (from 2kbps to 44 kbps).

 Dynamic bit­rate switching and Variable Bit­Rate (VBR).

 Voice Activity Detection (VAD, integrated with VBR).

 Variable complexity.

 Ultra­wideband mode at 32 kHz (up to 48 kHz).

 Intensity stereo encoding option. The Speex voice codec – Features 2 Sampling rates  Narrowband: 8 kHz  Wideband: 16 kHz  Ultra­wideband: 32 kHz

Quality  Quality parameter ranges from 0 to 10 ­Integer for constant bit­rate ­Float for variable bit­rate The Speex voice codec – Features 3 Complexity (variable)

 It is possible to choose a complexity level between 1 and 10. ­ Lower complexity reduces CPU requirements, but in­ creases noise levels.

Variable bit­rate

 Allows the codec to change its bit­rate dynamically to adapt to the “difficulty” of the audio being encoded.  Drawbacks: ­ Cannot predict final average bit­rate. ­ Maximum bit­rate must be known for some applications. The Speex voice codec – Features 4 Average Bit­Rate (ABR)

 Adjusts VBR quality in order to meet a specific target bit­rate. ­ Because of the real­time adjustment of the bit­rate, the global quality will be slightly lower than when encoded with VBR and optimal settings. Voice Activity Detection (VAD)

 Tries to differentiate between speech and background noise ­ Always implicitly activated in VBR ­> only useful in non­ VBR­operation. ­ Reproduces background sound with minimal bit­usage (“Comfort noise generation”). The Speex voice codec – Features 5 Discontinuous Transmission (DTX)

 Addition to VAD/VBR.  Stops transmission when background noise is station­ ary. ­ In file­based operations 5 bits are written for each frame.

Perceptual enhancement

 Part of Decoder.  Tries to reduce (the perception of) the noise produced by the coding/decoding process. ­ Usually makes the result (objectively) further from the original (using signal­to­noise ratio) ­ Still makes it sound better (subjectively) The Speex voice codec – Features 6 Algorithmic delay

 Frame size + some amount of “look­ahead” to process each frame (about 10 ms). ­ Narrowband: 30 ms ­ Wideband: 34 ms

 In addition comes the CPU­time to process the frames. The Speex voice codec ­ Comparison Comparison

Multi­rate: Allows the codec to change bitrate dynamically, at any moment Embedded: A codec that embeds narrowband bitstreams in wideband bitstreams VBR: Variable bitrate PLC: Packet loss concealment Bit­robust: Robust to corruption at the bit level, as found on wireless networks The Speex voice codec – CELP 1 CELP (Code Excited Linear Prediction)

 CELP makes use of:

­ Linear prediction filters (short­term prediction) ­ Excited by a stimulus which is coded as residue of a pitch prediction (long­term prediction) by Vector Quantiza­ tion.

 It is implemented in a lot of speech codecs, also in the new AMR codec in 3GPP. The Speex voice codec – CELP 2 CELP box diagram The Speex voice codec – CELP 3 CELP versions

 ACELP: ETSI GSM enhanced full rate (European Mo­ bile telephony (1.8 GHz band) ­ 13 kbps)

 LD­CELP: Low delay telephone ITU­T G.728 (16 kbps)

 CS­ACELP: Voice over IP / Digital satellite / HQ digital radio ITU­T G.729 ( 8 kbps)

 RPE­LTP: ETSI GSM 06.10 full rate (13 kbps)

 VSELP: ETSI GSM half rate (5.6 kbps) The Speex voice codec – CELP 4 CELP silence compression

 Reduces the average bit­rate thanks to a lower­bit­rate com­ pression for silence.

 In the encoder, a voice activity detector is used to distinguish between regions with normal speech activity and those with si­ lence or background noise.

 During normal speech activity, the CELP coding as in Version 1 is used. Otherwise a Silence Insertion Descriptor (SID) is transmitted at a lower bit­rate. This SID enables a Comfort Noise Generator (CNG) in the decoder. The amplitude and spectral shape of this comfort noise is specified by energy and LPC parameters similar as in a normal CELP frame. These parameters are an optional part of the SID and thus can be updated as required. The Speex voice codec PESQ comparison ­ narrowband

 PESQ: Perceptual Evaluation of Speech Quality ­ An enhanced perceptual quality measurement for voice quality in telecommunica­ tions.

 MOS: Mean Opinion Score The Speex voice codec Applications using Speex

Asterisk foobar2000 GnomeMeeting JRoar Marathon: Aleph One LinPhone OpenH323 Speex Frontend Sweep TeamSpeak tkPhone Windows builds of Speex utilities The Speex voice codec Sources

 Audiocoding wiki ­ CELP: http://www.audiocoding.com/wiki/index.php?page=CELP

website of Nam Phamdo: http://www.data­compression.com/speech.html

 Jean­Marc Valin ­ University of Sherbrooke: Speex: Speech Compression for Everyone

 The Speex website: http://www.speex.org/