<<

AMCP/WGD10-WP9

AERONAUTICAL MOBILE COMMUNICATIONS PANEL (AMCP)

Working Group D

Hawaii, U.S.A, January 19 - 28, 1999

Agenda Item 4 : Vocoder Selection

Results of Vocoder Evaluation in Japan

Presented by T. Fujimori

(Prepared by T. Fujimori and M.Ueno, ENRI)

SUMMARY

This paper provides the whole results of the vocoder evaluation in Japan which was conducted by ENRI(Electronic Navigation Research Institute) in October - November 1998. The evaluation consisted of MOS test (Japanese accented English and Japanese Languages) and Articulation test (Japanese Language). Four vocoders for evaluation were labelled A,B,C and D by FAA-TC (FAA’s William J. Huges Technical Center). (A, C, D : candidate vocoders, B : industry standard vocoder as a performance benchmark) The results indicated that voice quality of Vocoder C was slightly higher than that of Vocoder A, and Vocoder D was the lesser performing equipment in all test conditions. 1. Introduction ENRI conducted the vocoder evaluation in Japan in October - November 1998. The purposes of this evaluation were to obtain basic data for studying acceptability of digital voice in air traffic environment in Japan, and to contribute to ICAO’s vocoder selection activity by conducting MOS test for Japanese language(i.e. the test case #14 specified in Vocoder Selection Criteria which was developed at the WG-D/9 meeting). Firstly, the source tapes were produced and sent to FAA-TC on May 1998. Then, these tapes were processed through all four vocoders for the evaluation by FAA-TC and returned to Japan on September 7, 1998. At this time, four vocoders were labelled A,B,C and D by FAA-TC. (A, C, D : candidate vocoders, B : industry standard vocoder as a performance benchmark) It was also informed from FAA-TC that it is unable to inject errors into Vocoder B (therefore, only zero Bit Error Rate has been included in Vocoder B). The evaluation consisted of MOS test (Japanese accented English and Japanese Languages) and Articulation test(Japanese Language). MOS test was carried out with participation of 97 air traffic controllers on October 12 - 28, 1998 and Articulation test was done by 4 trained listeners on October 29 - November 6,1998. The mean values of test data were calculated by the end of November 1998. On FAA’s request, ENRI sent the result of Japanese MOS test as the test case #14 for vocoder selection to FAA-TC by E-mail on December 4, 1998. (Refer to Appendix) The whole results of the evaluation are reported in this paper.

2. Test Tape Construction The source tapes for MOS test were produced by playing background (quiet, jet, prop and helicopter) while speakers read each script. On the other hand, the source tapes for Articulation test were produced by means of mixing with and background noise. Because all syllables used for Articulation test must be recorded at a constant level and level meter shall be used during recording, it is impossible to use a speaker for noise reproduction. All syllables were recorded without noise and then background noise was superimposed. The audiometric booth was used for recording(See Figure 1). Recording condition of source tapes for each test is shown in Table 1. The combinations between noise condition and for recording were derived from a typical operational environment. The source tapes were processed through all four vocoders by FAA-TC. The test tapes were constructed by editing the various sentences prepared as speech samples in the processed tapes.

Table1. Recording Condition of Source Tapes Noise Condit Test Microphone Speaker Sentence ion Quiet SONY ECM-9035 6 - English Jet [79dB/A] SONY ECM-9035 Air Traffic (DAM-ATC) MOS test Prop [89dB/A] Sigtronics S-40 Controllers - Japanese Heli [93dB/A] David Clark model-1 M-4 [3 males, 3 females] (ATC / General) Quiet SONY ECM-9035 Jet [79dB/A] SONY ECM-9035 4 Articulation - Japanese Prop [89dB/A] Sigtronics S-40 Trained Listeners test (100 Syllables) Heli [93dB/A] David Clark model-1 M-4 [2 males, 2 females] Military[97dB/A] SONY ECM-9035

1

Audiometric Booth

DAT#1 for Headphone Amplifier Speaker #1 Recording Display for side tone Sound Level Meter

Avionics Headset Speaker #4 Speaker #2

DAT#2 for Noise Speaker #5 (Ceiling)

Speaker #3

Figure 1 Source Tape Recording System

3. Test method and procedure

3.1 Test Contents The evaluation consisted of two separated methods, MOS test and Articulation test. MOS test was carried out with participation of 97 Japanese active Air Traffic Controllers (71 males, 26 females) from TACC (Tokyo Area Control Center), TIA (Tokyo International Airport) and ENRI. Articulation test was done by four trained listeners. A summary of test contents is given in Table 2 and participants of MOS test in Table 3.

Table2. Summary of the Test Contents Test Test Conditions Participants Vocoder Language 2 [Japanese accented English, Japanese] 97 Speaker 6 [3 males, 3 females] MOS test Bit Rate 1 [4800bps] Air Traffic Background Noise 4 [Quiet, Jet, Prop, Heli] Controllers Bit Error Rate 2 [0.1%, 2%] (Vocoder B : 0% only) Vocoder 4 [A, B, C, D] Language 1 [Japanese] 4 Articulation Speaker 4 [2 males, 2 females] test Bit Rate 1 [4800bps] Trained Background Noise 5 [Quiet, Jet, Prop, Heli, Military] listeners Bit Error Rate 2 [0.1%, 2%] (Vocoder B : 0% only)

Table 3. Summary of the MOS Test Participants

2 Belonging Test dates Number of participants TACC Oct. 12 – 28 (1998) 75 [ 54 males, 21 females ] TIA Oct. 19 – 23 (1998) 20 [ 15 males, 5 females ] ENRI Oct. 15 (1998) 2 [ 2 males ] Total 97 [ 71 males, 26 females ]

3.2 Test System Both tests were conducted in the audiometric booth at NTT-AT(Nippon Telegraph and Telephone Advanced Technology Corporation) in Tokyo (Shown in Figure 2 and Figure 3).

Listening Room

PC DAT Player Equalizer Amplifier

Keypad Receiver Receivers for scoring

Figure 2. Listening System Figure 3. Listening Room

3.3 MOS Test Procedure

3.3.1 MOS test The participants listen to a sentence one by one via a single-ear standard receiver in a four- seated audiometric booth, and give his/her opinion of the speech from a view point of Listening- effort scale(shown in table 4) with pressing an appropriate number of out of five buttons on a keypad. Each button is labeled with a description of the scale. The sentence lists are made up in random order for each group of four people. Four people in a same group evaluate same speech samples. The next group evaluates different speech samples in a different order. One participant listens to 336 sentences, that is one sentence per test condition.

Table 4. Listening-effort Scale Effort required to understand the meaning of sentences Score Complete relaxation possible; no effort required 5 Attention necessary; no appreciable effort required 4 Moderate effort required 3 Considerable effort required 2 No meaning understand with any feasible effort 1

3.3.2 Questionnaire on acceptability of digital voice

3 After the MOS test, the participated air traffic controllers were asked to answer the following questionnaire to obtain a reference data how they accept the digital voice quality for the Air Traffic Control environment. Although it is a simple approach, it helps to know the acceptable level for digital voice use among Japanese air traffic controllers.

Questionnaire Do you think which level and over in the Listening-effort scale is acceptable as voice quality for use in Air Traffic Control environment ? Please mark with a check.

Complete relaxation possible; no effort required Attention necessary; no appreciable effort required Moderate effort required Considerable effort required No meaning understand with any feasible effort

3.4 Articulation Test Procedure Participants listen to a syllable one by one via a single-ear standard receiver in a four-seated audiometric booth, and type the syllable received with he keypad which has keys assigned all and in Japanese. Each participant listens to 100 syllables list (see Table 5) for every test condition in random way. The monosyllable articulation value (Sav) is defined by a percentage of the total number of received elements out of 195 monosyllables Here, Japanese 100syllables are separated into 195 monosyllables. According to ITU-T Handbook on Telephonometry, Sentence articulation value is considered to equal nearly 100% if Sav is above 80%. And it is also known that the value of satisfaction over communication declines rapidly if Sav is below 80%. Therefore, we tentatively established “Sav is above 80%” as the satisfied value in this articulation test. Table 5. Japanese Syllables List (syllables: 100, monosyllables:195) NO 1 2 3 4 5 1 re pa ro pya bya kyo o do mi ryu 2 kya te go nya ra gya ru a pu kyu 3 pyu me ri sya ga to pyo ma sa nyu 4 hu byu hi hyo ze zi su myo se da 5 e gyu gu mya ge ya bi byo ti zyo 6 nyo zu ku ho tya Mu mo rya gi ka 7 ni gyo bu bo pe tyo zo Ke i hya 8 zya u ryo he tyu ko tu So zyu ba 9 myu ta syo ha za pi de no si be 10 nu wa yu ne ki po syu na hyu yo 4. Results

4 4.1 MOS value Means for MOS test are presented in Tables 6 and 7. Graphical representations of these means are shown in Figure 4, 5, 6 and 7.

Table 6. Means for MOS test under Japanese accented English language Bit Error Rate Background Vocoder (BER) Noise A C D B(Ref.) QUIET 4.34 4.41 3.06 4.23 JET 3.71 3.89 2.42 3.46 BER 0.1% PROP 3.94 4.15 2.49 3.75 HELI 3.12 3.26 1.92 2.95 QUIET 4.14 4.34 2.79 4.23 JET 3.62 3.86 2.08 3.46 BER 2% PROP 3.82 4.07 2.10 3.75 HELI 2.95 3.16 1.63 2.95 (Vocoder B : BER 0 %)

Table 7. Means for MOS test under Japanese language Bit Error Rate Background Vocoder (BER) Noise A C D B(Ref.) QUIET 4.45 4.60 3.11 4.20 JET 4.09 4.23 2.70 3.67 BER 0.1% PROP 4.14 4.26 2.64 3.57 HELI 3.15 3.31 2.07 2.88 QUIET 4.25 4.44 2.80 4.20 JET 3.80 4.09 2.31 3.67 BER 2% PROP 3.77 3.99 2.45 3.57 HELI 2.96 3.20 1.66 2.88 (Vocoder B : BER 0 %) (Note : The above MOS values under BER 0.1%& PROP noise condition are the result of MOS test as the test case #14 in Vocoder Selection Criteria developed in the WG-D/9 meeting. Refer Appendix, too.) [Comments] (1) The MOS values for Vocoder A and Vocoder C are higher than those for Vocoder B (performance benchmark) although Vocoder A and Vocoder C have Bit Error. (Vocoder B has no Bit Error.) (2) The MOS values for Vocoder C is slightly higher than those for Vocoder A in any test conditions. (3) The MOS values for Vocoder D is significantly lower than those for the other Vocoders. (4) With regard to some data that the MOS value for Prop noise (89dB/A) is higher than that for Jet noise (79dB/A), it seems that the effect of differences of characteristics among in producing the source tapes appeared sensitively. The characteristics of Sigtronics S-40 microphone used in Prop noise were found to surpass that of SONY ECM- 9035 microphone used in Jet noise by the extra Articulation test. (See Figure 8 of page 9)

5

Japanese Accented English with BER0.1% Condition

5

4 Quiet Jet 3 Prop

MOS value Heli 2

1 Vocoder A Vocoder C Vocoder D Vocoder B (Ref.)

Figure 4. MOS value for Japanese accented English with BER 0.1% (Vocoder B : BER 0 %)

Japanese Accented English with BER 2% Condition

5

4 Quiet Jet 3 Prop

MOS value Heli 2

1 Vocoder A Vocoder C Vocoder D Vocoder B (Ref.)

Figure 5. MOS value for Japanese accented English with BER 2% (Vocoder B : BER 0 %)

6

Japanese with BER 0.1% Condition

5

4 Quiet Jet 3 Prop

MOS value Heli 2

1 Vocoder A Vocoder C Vocoder D Vocoder B (Ref.)

Figure 6. MOS value for Japanese with BER 0.1% (Vocoder B : BER 0 %)

(Note : The above MOS values under PROP noise condition are the result of MOS test as the test case #14 in Vocoder Selection Criteria developed in the WG-D/9 meeting. Refer Appendix, too.)

Japanese with BER 2% Condition

5

4 Quiet Jet 3 Prop

MOS value Heli 2

1 Vocoder A Vocoder C Vocoder D Vocoder B (Ref.)

Figure 7. MOS value for Japanese with BER 2% (Vocoder B : BER 0 %)

7 4.2 Response to Questionnaire on acceptability of digital voice The response to the questionnaire is shown in Table 8.

Table 8. Response to the questionnaire Listening-effort scale Number of poll 5 : Complete relaxation possible; no effort required 4 (4%) 4 : Attention necessary; no appreciable effort required 58 (60%) 3 : Moderate effort required 33 (34%) 2 : Considerable effort required 2 (2%) 1 : No meaning understand with any feasible effort 0 (0%) Total 97(100%) Mean Score 3.66

[Comments]

(1) 60% of all participants selected “4” (i.e. Attention necessary; no appreciable effort required) and 34% chose “3” (i.e. Moderate effort required). The mean score is 3.66. (2) According to the above result, MOS=3.66 can be temporarily assumed as acceptable level for use of digital voice in Air Traffic Control Environment. (3) It is known by experiences that the value of MOS=3.5 can be considered as a measure of voice quality in construction of a digital network. This fact supports MOS=3.66 as acceptable level for use of digital voice.

4.3 Articulation value Means for Articulation test (Monosyllable Articulation value) are presented in Table 9 and graphical representations are shown in Figure 9 and 10.

Table 9. Monosyllable Articulation value (%) Bit Error Rate Background Vocoder (BER) Noise A C D B(Ref.) QUIET 86.0 86.7 81.5 82.9 JET 80.9 82.9 77.1 77.4 BER 0.1% PROP 82.6 84.6 76.3 79.3 HELI 77.4 80.9 72.2 72.2 MIL 70.2 72.1 67.1 66.2 QUIET 84.1 85.4 78.7 82.9 JET 79.8 82.5 73.4 77.4 BER 2% PROP 80.6 85.0 71.5 79.3 HELI 73.8 79.3 68.6 72.2 MIL 66.7 70.0 62.6 66.2

8 [Comments]

(1) The monosyllable articulation values for Vocoder A and Vocoder C are higher than those for standard Vocoder B even under the condition of BER 2%. (2) The monosyllable articulation values for Vocoder C is slightly higher than those for Vocoder A under all the test conditions. (3) The monosyllable articulation values for Vocoder D is significantly lower than those for the other Vocoders under all the test conditions. (4) With the exception of the background noise of Military, the monosyllable articulation values for Vocoder C are above 80%(i.e. tentative satisfied value). (5) The monosyllable articulation values for Vocoder A are above 80% under the noise condition of Jet and Prop. (6) Some monosyllable articulation values for Prop noise is higher those for Jet noise as well as the results of MOS test. It also seems to depend on the characteristics of microphones used for recording speech sentences. (See Figure 8)

Comparsion of Microphone Characteristic (Test Condition : Vocoder A, BER0.1%, Quiet)

100

90

80

70 Monosyllable Articulation Value (%) SONY Sigtronics David Clark

Figure 8. Comparison of Microphone Characteristic

9

Articulation Test (BER 0.1%)

100 Quiet 90 Jet Prop 80 Heli 70 Military

60

50 Vocoder A Vocoder ‚b Vocoder ‚c Vocoder B Monosyllable Articulation Value (%) (Ref.)

Figure 9. Monosyllable Articulation value for Japanese with BER 0.1% (Vocoder B : BER 0 %)

Articulation Test (BER 2%)

100 Quiet 90 Jet Prop 80 Heli Military 70

60

50

Monosyllable Articulation Value (%) Vocoder A Vocoder ‚b Vocoder ‚c Vocoder B (Ref.)

Figure 10. Monosyllable Articulation value for Japanese with BER 2% (Vocoder B : BER 0 %)

5. Data Analysis (Analysis of Variance)

10 5.1 MOS test An Analysis of Variance (ANOVA) was conducted to ascertain the relevance of the independent variables (i.e. Vocoder, Language, BER, Sex of speaker and Background noise) in the MOS test. The results are significant with 99% confidence (a=0.01 level). The results of Analysis of Variance are shown in Table 10. However, the data for Vocoder B were excepted from these data since it is the performance benchmark vocoder and its BER is 0% only.

Table 10. Analysis of Variance for MOS test Independent Mean DF Anova SS F Result Variable Square Vocoder 2 139.78 69.89 550.02 Significant Language 1 1.29 1.29 10.19 Significant BER 1 3.39 3.39 26.72 Significant Sex of speaker 1 19.67 19.67 154.81 Significant Background Noise 3 53.21 17.74 139.59 Significant

[Comments] (1) The dependent measure of MOS test varied significantly by all independent variables. (2) As to the language, the MOS values for messages spoken in Japanese are slightly higher than that spoken in Japanese accented English, One possible reason for it is that the DAM- ATC sentence lists, which were developed by FAA and used in Japanese accented English MOS test, included some FIX names and phraseology unfamiliar to Japanese air traffic controllers. (3) As to the sex of speaker, the MOS values for messages spoken by the male are significantly higher than that spoken by the female. This result is clearly due to the difference of speech level, since the speech levels were left to speakers when recording speech sentences for MOS test. (Refer to Appendix)

5.2 Articulation test ANOVA was conducted to ascertain the relevance of the independent variables (i.e. Vocoder, BER, Sex of speaker, Background noise) in the Articulation test. The results are significant with 99% confidence (a =0.01 level). The results of ANOVA are in Table 11. However the data for Vocoder B (performance benchmark) were excepted from these data.

Table 11. Analysis of Variance for Articulation test Independent Mean DF Anova SS F Result Variable Square Vocoder 2 1365.5 682.7 12.48 Significant BER 1 189.7 189.7 3.47 Not Significant Sex of speaker 1 71.7 71.7 1.30 Not Significant Background Noise 4 3573.0 893.2 16.32 Significant

[Comments]

11 (1) The dependent measure of articulation test varied significantly by factors of Vocoder and Background noise. (2) In contrast with ANOVA for MOS test, the sex of speaker is resulted in “Not significant”. This is simply because that the speech level was equalized when producing the source tapes for Articulation test. Consequently, it is considered that there are no differences in the speech quality for messages spoken by male and female.

6. Conclusion The results indicate the followings. (1) The voice quality of Vocoder C is slightly higher than that for Vocoder A under any test condition. (2) The voice quality for Vocoder D is significantly lower than that for the other Vocoders. (3) With the exception of Helicopter noise, MOS values for Vocoder C are beyond the acceptable level of 3.66 which was assumed based on the result of questionaire to the participated air traffic controllers. (4) The monosyllable articulation values for Vocoder C are above 80%(i.e. tentative satisfied value) under BER 0.1% & all the background noise conditions except Military noise.

7. Recommendation WG-D is invited to consider these results in the vocoder selection process with respect to standardizing a 4.8 kbps vocoder for use in VDL Mode 3 air/ground communication system.

[References]

(1) “Vocoder Evaluation Plan in Japan” (ICAO AMCP WG-D/8 WP-23, Oberpfaffenhaffen, Germany, December, 1997)

(2) “Overview of Vocoder evaluation program in Japan” (ICAO AMCP WG-D/9 WP-5, Ottawa, Canada, September, 1998)

(3) ITU-T Handbook on Telephonometry

(4) ITU-T Recommendation P.800

12 Appendix

Result of Japanese MOS Test for Vocoder Selection (The test case #14 specified in the Vocoder Selection Criteria)

Mean scores (MOS value) for each vocoder are presented in Table 1 and a graphical representation of MOS value is shown in Figure 1.

Table1. MOS value for each vocoder(Japanese Language) Vocoder Sex of Speaker A B C D Male 4.39 4.07 4.40 2.81 Female 3.90 3.07 4.13 2.48 Mean 4.14 3.57 4.26 2.64 (Vocoder A,C,D : BER 0.1% / Vocoder B : BER 0%)

Figure 1. Graphical representation of MOS value

Prop Noise Condition (Japanese)

5

4

Male 3 Female Mean MOS value 2

1 A B C D VOCODER

(Vocoder A,C,D : BER 0.1% / Vocoder B : BER 0%)

The MOS value for Vocoder C is slightly higher than that for Vocoder A. The value for Vocoder D is significantly lower than those for the other Vocoders. With regard to a fact that the MOS value for male voice is higher than that for female voice, it seems that the effect of differences of speech level between male and female appeared sensitively.

13