ETSI/SMG11ETSI/SMG11 "Speech"Speech Aspects"Aspects"

Presentation of SMG11 Activities to Tiphon OutlineOutline

• SMG11 • GSM Speech • GSM Enhanced • Tandem Free Operation • Adaptive Multi-Rate (AMR) Codec • Narrowband AMR • Wideband AMR • UMTS Matters • Next Meetings SMG11SMG11

• ETSI STC SMG11 is the competent body responsible for speech aspects of the GSM and UMTS standards (since 1996) • SMG11 Chairman: Mr. Kari Järvinen, Nokia • SMG11 plenary meets four times a year • Additional extraordinary meetings as needed • SMG11 currently consists of three sub-groups • TFO sub-group (Tandem Free Operation issues) • AMR sub-group (Adaptive Multi-Rate codec issues) • SQ sub-group (Speech Quality issues) • Sub-groups may have ad-hoc meetings between SMG11 plenary meetings • Typical attendance in SMG11 plenary meetings is between 30-50 • SMG11 e-mail reflector as well as sub-group reflectors are extensively used between the meetings GSMGSM SpeechSpeech CodecsCodecs

• GSM has so far standardised three codecs • 13 kbps GSM-FR (1987); good cellular quality and robust operation in the presence of background noise • 5.6 kbps GSM-HR (1994); possibility for higher system capacity at the expense of slightly lower speech quality in some conditions (particularly in background noise) • 12.2 kbps GSM-EFR (1996); high quality even exceeding the G.726 "wireline reference" under clear channel conditions and in background noise • SMG11 is currently in the process of defining an Adaptive Multi Rate (AMR) codec which will be the fourth GSM speech codec

Subjective speech quality GSM FR GSM HR GSM EFR No coding Clean conditions (MOS) 3.71 3.85 4.43 4.61 Vehicle noise (DMOS) 3.83 3.45 4.25 4.42 Street noise (DMOS) 3.92 3.56 4.18 4.35 Source: TR 06.85 v5.0.0 (1998-07), "Subjective tests on the interoperability of the HR/FR/EFR speech codecs; single, tandem and tandem free operation" GSMGSM EFREFR CodecCodec

• Selected as a basis for a new high quality speech service for PCS 1900 in the US in 1995 (formal standardization procedure in TIA and T1 completed in 1996) • ETSI standardized the same codec for GSM in 1996 • Provides high quality speech service for GSM, GSM 1800 (DCS 1800), and GSM 1900 (PCS 1900) systems in all continents • Technical summary • Source coding rate 12.2 kbps (channel coding 10.6 kbps) • Based on the Algebraic CELP (ACELP) algorithm • Speech frame size and algorithmic delay 20 ms • Optional VAD/DTX function with comfort noise generation • Example implementation for error concealment • Complexity (encoder/decoder) approximately 18 MIPS (processor dependent) • Memory requirement (incl. RAM and ROM) approximately 16-19k 16-bit words GSMGSM EFREFR SpeechSpeech QualityQuality

• GSM EFR speech quality is characterized in ETSI Technical Report “Performance Characterisation of the GSM EFR speech codec”, GSM 06.55. • Additional performance data can be found in ETSI Technical Report "Subjective tests on the interoperability of the HR/FR/EFR speech codecs; single, tandem and tandem free operation", GSM 06.85 • The GSM EFR codec has been included in numerous other formal and informal subjective listening tests and extensive test data is available • The examples in the following slides are an extract of test results from COMSAT laboratories obtained during the PCS 1900 EFR codec standardization, comparing 12.2 kbps EFR codec 32 kbps G.726 codec 8 kbps G.729 (13 kbps GSM FR codec) GSMGSM EFREFR PerformancePerformance

• Basic speech quality at different input levels and tandeming

Test Condition G.726 at GSM EFR G.729 32kbit/s Clean speech, high level, -16 dBOL (MOS) 3.73.8 3.4 Clean speech, medium level, -26 dBOL (MOS) 3.63.6 3.3 Clean speech, low level, -36 dBOL (MOS) 3.02.9 2.7 Self-tandem = codec-codec tandem (MOS) 3.13.4 2.9 Tandem with G.726 at 32kbit/s (MOS) 3.23.6 3.3 GSMGSM EFREFR PerformancePerformance

• Performance in background noise

Test Condition G.726 at GSM EFR G.729 32kbit/s Background noise, Home noise 20 dB (DMOS) 4.5 4.6 4.3 Background noise, Car noise 10 dB (DMOS) 4.4 4.5 3.9 Background noise, Car noise 20 dB (DMOS) 4.6 4.6 4.1 Background noise, Street noise 10 dB (DMOS) 3.7 4.1 3.7 Background noise, Office noise 20 dB (DMOS) 4.3 4.5 3.7 GSMGSM EFREFR PerformancePerformance

• Performance in error conditions

Test Condition Frame BER GSM FR GSM EFR error rate class 2 Clean speech, No errors (MOS)0.0% ≈0% 3.44.1 Clean speech, 13 dB C/I, 30 mph (MOS) ≈0.0%≈ 2% 3.34.0 Clean speech, 10 dB C/I, 30 mph (MOS) ≈0.5%≈ 4% 3.03.8 Clean speech, 7 dB C/I, 30 mph (MOS) ≈3.0%≈ 8% 2.33.2

• In GSM, part of the coded bits are protected by a convolutional code, and residual errors are detected via CRC. The frame error rate for this part is indicated above. Part of the data is unprotected and receive the BER class 2 indicated above. • The frame error rates are not directly comparable to quality figures with no residual errors TandemTandem FreeFree OperationOperation (TFO)(TFO)

• Motivation: "Unnecessary" dual speech encoding and decoding in mobile-to-mobile calls can significantly decrease speech quality • TFO prevents the encoding and decoding performed in the network • Applicable to all the three GSM codecs (FR, HR, and EFR) • The same speech codec must be used in both mobile stations for TFO to work • TFO Standardization ongoing in ETSI SMG11 TFO Sub-group • Work started (TFO sub-group established) in early 1996 • Target: specifications ready by 4Q/1998 (ETSI GSM release 98) • Current work concentrating on completing four Annexes to Stage 3 description: in-band signalling, operation with In-Path Equipments (IPEs), SDL definition, test vectors • The TFO Stage 3 GSM 04.53 will be forwarded to SMG#27 plenary in October- 98 • Formal subjective tests to evaluate the audible effects of TFO signalling are being carried out by Coherent MS-to-MSMS-to-MS Call,Call, nono TFOTFO

A-side B-side PLMN PLMN MSC MSC

Decoding TRAU TRAU Encoding

BSS BSS 64 kbits/s PCM Coded Speech

8 or 16 kbits/s Voice Coded Speech

Encoding Decoding MSa MSb EffectEffect ofof TandeminTandemingg

MOS value

One encoding and Two encodings and Speech codec decoding (normal) decodings (tandem)

Enhanced Full Rate 4.43 4.29

Full Rate 3.71 3.13

Half Rate 3.85 3.15

Source: TR 06.85 v2.0.0 (1998-06), "Subjective tests on the interoperability of the HR/FR/EFR speech codecs; single, tandem and tandem free operation"

Note: The above results are from clean conditions (no background noise, no channel errors) EffectEffect ofof TandeminTandemingg inin ErrorError ConditionsConditions

MOS value

One encoding and Two encodings and Speech codec decoding (normal) decodings (tandem)

Enhanced Full Rate 4.12 3.45

Full Rate 3.41 2.64

Half Rate 3.68 2.77

Source: TR 06.85 v2.0.0 (1998-06), "Subjective tests on the interoperability of the HR/FR/EFR speech codecs; single, tandem and tandem free operation"

Note: EP1 error condition was used (moderate errors). EffectEffect ofof TandeminTandemingg inin BackBackggroundround NoiseNoise

MOS value

One encoding and Two encodings and Speech codec decoding (normal) decodings (tandem)

Enhanced Full Rate 4.25 3.87

Full Rate 3.83 3.34

Half Rate 3.45 2.38

Source: TR 06.85 v2.0.0 (1998-06), "Subjective tests on the interoperability of the HR/FR/EFR speech codecs; single, tandem and tandem free operation"

Note: Vehicle noise of 10 dB was used. TFOTFO ModesModes

• Two modes in TFO • Establishment mode: the necessary conditions for TFO are verified with inaudible bit stealing • Verify whether both transcoders support TFO • Possible change of speech codecs to enable TFO • Duration typically 0.5-1.0 seconds • TFO mode: speech is transmitted compressed through the whole network with bit stealing that guarantees smooth transitions in all situations • TFO includes the proper means to ensure TFO also when In Path Equipment such as Echo Cancellers and DCMEs are used in the fixed network MS-to-MSMS-to-MS Call,Call, withwith TFOTFO

A-side B-side PLMN PLMN MSC MSC

Decoding TRAU TRAU Encoding

BSS BSS 56 or 48 kbits/s 8 or 16 kbits/s

Encoding Decoding

MSa MSb TFOTFO ModeMode

Voice Coded Speech Voice Coded Speech

PCM Coded Speech PCM Coded Speech

XX XXXXX Y XX XXXXY Y

56 Kbits/s 48 Kbits/s 8 Kbits/s 16 Kbits/s

• Coded speech is transmitted in the LSBs of the PCM samples in the A interface with the decoded PCM samples • Both types of speech presentations (PCM and coded) are available at the receiving end • Minor speech degradation in TFO - non TFO transition due to bit-stealing (increased noise) when the 48/56 kbit/s speech samples are used for a very short period AdaptiveAdaptive Multi-RateMulti-Rate CodecCodec

• Source codec rates probably between 4 kbit/s and 14.4 kbit/s (no fixed source rate requirements) • Operation in both GSM full rate (22.8 kbps) and half rate (11.4 kbps) channels • Main advantages in GSM • Increased robustness against channel errors • Enhanced quality in the half-rate channel in good channel conditions • Codec rate selected dynamically depending on radio conditions and local capacity requirements • Codec bit rate selected by an adaptation algorithm specific to the system application e.g. GSM or UMTS • Generic speech codec applicable to many mobile systems • High AMR performance targets and the flexibility obtained by the switchable codec bit-rates (modes) have made it an interesting candidate for UMTS and IMT2000. • Ability to adapt the bit-rate in a wide range may also be of interest for VoIP applications AdaptiveAdaptive Multi-RateMulti-Rate CodecCodec ScheduleSchedule

• Qualification testing has been completed on schedule • Substantial improvements demonstrated, justifying the AMR technique • 5 codecs advanced to the selection phase • Good expectation that all, or nearly all, requirements will be met • Selection phase to end by September 1998 • The AMR speech codec specifications are planned to be completed by December 1998 • The AMR codec will be selected from among five different proposals passing the qualification phase Alcatel/BT/Cellnet/France Telecom/Nortel/Rockwell Ericsson/Nokia 1 Ericsson/Nokia 2 Lucent NEC DeliveryDelivery DatesDates ofof AMRAMR SpecificationsSpecifications

Target date Specifications December • source codec 1998 • channel codec (required) • bad frame handling • in-band signalling of codec mode - transmission aspects and definition of parameters • in-band signalling of channel metric and side information - transmission aspects (bit allocation and channel protection)

December • VAD/DTX/comfort noise generation 1998 • definition of channel metric and side information (objective) parameters • example of codec mode adaptation • layer 3 signalling

June • AMR TRAU frames 1999 • channel performance tables (GSM 05.05) • TFO • test sequences

December • performance characterisation 1999 • [minimum performance of adaptation alogorithms] AMRAMR SpeechSpeech QualityQuality RequirementsRequirements

• Static error conditions: without background noise

Full-Rate Channel Half-Rate Channel C/I Ideal case Worst case Ideal case Worst case performance performance performance performance (requirement) (objective) (requirement) (objective) no errors EFR no errors G.728 no errors G.728 no errors FR no errors 19 dB EFR no errors G.728 no errors G.728 no errors FR no errors 16 dB EFR no errors G.728 no errors G.728 no errors FR at 10 dB 13 dB EFR no errors G.728 no errors FR at 13 dB FR at 7 dB 10 dB G.728 no errors EFR at 10 dB FR at 10 dB FR at 4 dB 7 dB G.728 no errors EFR at 7 dB FR at 7 dB 4 dB EFR at 10 dB EFR at 4 dB FR at 4 dB Table 1a: Clean speech requirements and objectives under static test conditions. AMRAMR SpeechSpeech QualityQuality RequirementsRequirements

• Static error conditions: in the presence of background noise

Full-Rate Channel Half-Rate Channel C/I Ideal case Worst case Ideal case Worst case performance performance performance performance (requirement) (objective) (requirement) (objective) no errors EFR no errors G.729 and FR better than G.729 and FR no errors G.729 and FR no errors no errors 19 dB EFR no errors G.729 and FR better than G.729 and FR no errors G.729 and FR no errors no errors 16 dB EFR no errors G.729 and FR better than FR at 10 dB no errors G.729 and FR no errors 13 dB EFR no errors G.729 and FR FR at 13 dB FR at 7 dB no errors 10 dB G.729 and FR FR at 10 dB FR at 10 dB FR at 4 dB no errors 7 dB G.729 and FR FR at 7 dB FR at 7 dB no errors 4 dB FR at 10 dB FR at 4 dB FR at 4 dB Table 1b: Background noise requirements and objectives under static test conditions. AMRAMR SpeechSpeech QualityQuality RequirementsRequirements

• Dynamic conditions Full-Rate Channel Requirement Same or better than the EFR under the same (no background noise): conditions, and also the same or better than all the AMR full rate tested modes under the same conditions

Objective 1 Same or better than the EFR using the error pattern + 3 dB

Objective 2 Same or better than the EFR using the error pattern + 6 dB

Table 2a: Requirements and objectives under dynamic test conditions for the full-rate channel

Half-Rate Channel

Requirement Same or better than the FR under the same conditions, and also the same or better than all the AMR half rate tested modes under the same conditions

Objective 1 Same or better than the FR on a full rate channel using the error pattern + 3 dB

Objective 2 Same or better than the FR on a full rate channel using the error pattern + 6 dB

Table 2b: Requirements and objectives under dynamic test conditions for the half-rate channel AMRAMR DesiDesiggnn ConstraintsConstraints

• Some AMR design constraints (simplified to a general form) • Only very moderate complexity increase compared to existing GSM codecs • Maximum source coding rate for FR channel modes is 14.4 kbit/s (due to 16 kbit/s sub multiplexing) • In-band signalling for codec modes. Independent adaptation on the up- and down-links. • The AMR codec shall support Tandem Free Operation • The AMR codec shall support DTX operation • The AMR codec and its control will operate without any changes to the air- interface channel multiplexing, with the possible exception of the interleave depth. • It shall be possible to operate power control independently of the AMR adaptation. Not included in qualification and selection tests. • AMRAMR DesiDesiggnn ConstraintsConstraints

• Some AMR design constraints (continued) • Codec mode control relating to capacity or radio link quality should be located in the network (BSS). • Transmission delay: The total algorithmic round trip delay is limited by EFR+10 ms in AMR FR channel, and HR+10 ms in AMR HR channel. • Frame size: 5ms, 10ms or 20 ms • The AMR in-band signalling shall be expandable to signal the use of future AMR modes including signalling the use of the existing GSM FR, GSM HR and GSM EFR speech coders, one or two wideband modes and all AMR speech codec modes in FR channel mode (to guarantee proper TFO operation). QualificationQualification teststests • The expected performance of the AMR candidates was evaluated in qualification tests • Tests conducted in FR and HR channels, including • Clear speech no errors and C/I 19 dB to 1 dB • Speech in background noise with channel errors street noise (@15 dB SNR) car noise (@15 dB SNR) • Tandeming • Speech level dependency • Switching between codec modes • Dynamic C/I: 5 error profiles 3 profiles for downlink test 2 profiles for uplink test OverallOverall performanceperformance aimsaims

Introduce improvements where they are needed • low C/I in FR mode • high C/I in HR mode.

30.00

25.00

20.00

AMR-FR envelope AMR-HR envelope 15.00 EFR HR

10.00

5.00

0.00

C/I (d B) - Ide a l fre que ncy ho pping QualificationQualification resultsresults overviewoverview

• Major benefits of AMR technique demonstrated especially • low C/I in FR mode (1 - 2 delta MOS) • high C/I in HR mode (same as G.728 - wireline) • dynamic conditions in FR mode (up to 1.6 delta MOS) • Several codecs close to meeting all the requirements • Most challenging condition - background noise in HR mode StaticStatic C/IC/I -- examplesexamples

MOS Experiment 1a - Family of Curves MOS Experiment 1b - Family of Curves 5.00 5.00

4.50 4.50

4.00 4.00

3.50 3.50

3.00 3.00 Rate A 2.50 Rate A 2.50 Rate B Rate B 2.00 2.00 Rate C Rate C Spec. 1.50 Spec. 1.50

1.00 1.00 No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB No Errors C/I=19 dB C/I=16 dB C/I=13 dB C/I=10 dB C/I= 7 dB C/I= 4 dB C/I= 1 dB @-26dBovl Conditions @-26dBovl Conditions

FR Channel HR channel Conditions

Failed Conditions HR channel Rate A Rate B Rate C Rate Spec. HR

Experiment 2a - Family of Curves in HR in Curves of Family 2a - Experiment MOS 5.00 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 HR NoHR Errors EC19 HR EC13 HR EC7 HR les les p p

Conditions exam exam round noise; static C/I - round noise; static C/I round noise; static C/I - round noise; static C/I g g FR channel Rate A Rate B Rate C Rate Spec. FR

Experiment 2a - Family of Curves in FR in Curves of Family 2a - Experiment Back Back MOS 5.00 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 FR NoFR Errors EC16 FR EC10 FR EC4 FR DynamicDynamic C/IC/I -- examplesexamples

• Dynamic test designed to evaluate AMR performances in “realistic” radio environment with codec adaptation turned on • Consistent results demonstrated by all candidates • adaptation mechanism finds best codec mode • in FR mode, significant improvement compared to fixed rate codec reference, EFR (up to 1.6 delta MOS) • in HR mode, quality equivalent to GSM FR or better (improvement sensitive to dynamic profile)

MOS Experiment 4a Test Results MOS Experiment 4b Test Results 4.00 3.50

3.50 3.00

3.00

2.50

2.50

2.00 2.00 Ytest EFR Ytest Rate C FR Rate B Rate A 1.50 Rate A 1.50 Rate B Rate D Dynamic Error Condition Rate C Rate D Dynam ic Error Condition

1.00 1.00 DEC1 DEC2 DEC3 DEC4 DEC5 DEC1 DEC2 DEC3 DEC4 DEC5 Typical Result in FR Typical Result in HR ExamplesExamples ofof dynamicdynamic conditionsconditions

C and I profile etsiq3 C/(I+N) profile etsiq3 −50 30 DL C DL UL C UL DL I „ Dynamic error profiles −55 UL I 25 −60 from Radio Simulator 20 −65 (SMG2) 15 −70 C/(I+N) [dB] „ One minute long C and I [dBm] 10 −75

„ Up and down links −80 5

−85 0 „ Correlation of C/I 0 10 20 30 40 50 60 0 10 20 30 40 50 60 time [s] time [s] C and I profile etsiq11 C/(I+N) profile etsiq11 between up and down −50 22 DL C DL UL C UL DL I 20 links controlled UL I −55 18 16 −60 14 12 −65 10 C/(I+N) [dB] C and I [dBm] 8 −70 6 4 −75 2 0 10 20 30 40 50 60 0 10 20 30 40 50 60 time [s] time [s] WidebandWideband AMRAMR

• The narrowband AMR work will continue with the specification of a wideband mode • No target date for finalized specification yet • Feasibility phase on-going • Discussion on Design Constraints and Recommended audio bandwidth • Preliminary working assumption for optimum audio bandwidth (to be confirmed) • 100 Hz to 7 kHz (possibly also 100 Hz to 5 kHz) • In some types of background noise, advantages to reducing low frequencies • So far, there has been little activity on wideband AMR due to work load on the narrowband AMR • Several organisations indicated they are studying wideband AMR. • Results probably not available until end 1998. UMTSUMTS MattersMatters

• Liaisons with ARIB (Japan) • Set-up collaboration on UMTS/IMT-2000 matters • ARIB representatives attending SMG11 meetings • AMR in UMTS and IMT-2000 • Working assumption for UMTS (decision from SMG#26, subject to re-evaluation after the AMR selection) • A possible candidate for IMT-2000 in ARIB, if standardized on schedule • WCDMA simulations • Initial simulation results with the GSM EFR codec and the AMR concept in a WCDMA channel have been presented to SMG11 NewNew WorkWork Item:Item: NoiseNoise SuppressionSuppression

• A new Work Item on Noise Suppression with AMR was approved by SMG in June • • Optional DSP feature to reduce audio background noise • Can improve ease of conversation • Located ahead of the speech codec • Effective in many but not all background noise environments • Optimised for the AMR speech codec • Standardisation to guarantee minimum performance level • • The work has not started yet and the scope of the work and possible standardization has not been fully defined and agreed to NextNext SMG11SMG11 PlenaryPlenary MeetinMeetinggss

• • SMG11#7: 28 September - 2 October 1998; Sophia Antipolis; host Texas Instruments • SMG11#8: 11 - 15 January 1999 • SMG11#9: 3 - 5 June 1999