ISCA Distinguished Lecture

ISCA: International Speech Communication Association Introduction to Speech Science ISCA started in 1999 jointly with ESCA (European Speech Communication Association): with Vocal-tract Models ICSLP(International Conference of Spoken Language Processing) • Purpose: to promote Speech Communication Science and Technology, both in the industrial and academic areas, covering all the aspects of Speech Communication (, , phonology, linguistics, natural language processing, artificial intelligence, cognitive science, signal processing, pattern Takayuki Arai, Ph.D. recognition, etc.) Professor • ISCA offers a wide range of services; Sophia University (Tokyo, Japan) in particular Interspeech, ISCA workshops, SIGs (special interest groups) , and Distinguished Lectures.

www.isca-speech.org Aug., 2018 ISCA Distinguished Lecture 2

ISCA Objectives: • to stimulate scientific research and education, Part 1: • to organize conferences, courses and workshops, • to publish, and to promote publication of scientific works, • to promote the exchange of scientific views in the field of speech communication, • to encourage the study of different languages, • to collaborate with all related associations, • to investigate industrial applications of research results, • and, more generally, to promote relations between public and private, and between science and technology.

Aug., 2018 ISCA Distinguished Lecture 3 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 5

6

1 Chiba and Kajiyama (1941-2) Chiba and Kajiyama • The : Its Nature and Structure (1941-2) • This book approached the mechanism of vowel production and perception from the viewpoints of – physiology – physics • Showed the waveform of a vowel is treatable by Fourier analysis – psychology • introduced the concept of the electric-circuit analog to • It integrated them together simulate of a vocal tract for the first time. • succeeded in calculating vowel spectra from the data of the vocal tract shapes that they measured. Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 8 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 9

Chiba and Kajiyama Chiba and Kajiyama (1941-2) (1941-2)

• They measured the vocal tract shape using a combination of X-ray photography, palatography, and • They successfully approximated the first two laryngoscopic observation of the pharynx. from the vocal tract shapes, and the • The cross-sectional area function for each vowel was frequencies matched well to the ones calculated from then used to calculate the spectra of the sounds. natural speech sounds. Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 10 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 11

Chronology from Arai (2004) Special Issue on 1929 Chiba started Phonetics Lab. at Tokyo School of Foreign Languages 60th Anniversary of the Publication of Acoustical Society of America (ASA) was founded The Vowel, Its Nature and Structure 1934 Research project on "The Vowel“ started (until 1939) 1939 World War II started 1941 Pacific War Started by Chiba and Kajiyama 1942 "The Vowel" was published (Tokyo-Kaiseikan) Title Author page 1945 The end of the war T. Chiba and M. Kajiyama,Pioneers in Speech Acoustics Gunnar Fant 4 1948 Stevens became a doctoral student at Acoustics Lab., MIT 1950 The first citation of "The Vowel" by Fant The Chiba and Kajiyama Book as a Precursor to the Acoustic Kenneth N. 6 Theory of Stevens Chiba became a professor of English at Sophia Univ. Chiba started Phonetics Lab. at Sophia Univ. Nihonno Onsei, OnseiKenyuno KagayaitaHibi Kazuo Kameda 8 1952 Stevens received Sc.D. The Authors of The Vowel - 12 Jakobson, Fant and Halle (1952) cited “The Vowel” 1953 Stevens, Kasowski, and Fant (1953) cited "The Vowel" Chichi no Omoide Kouichi Kajiyama 14 1954 Stevens became an assistant professor at MIT Chiba, Kajiyamano Boin-Ron nituite KikuoMaekawa, 15 1955 Stevens and House (1955) cited "The Vowel" Kiyoshi Honda nd 1958 The 2 Edition of "The Vowel" was published from PSJ The Replication of Chiba and Kajiyama’sMechanical Models of TakayukiArai 31 Fujimura joined the Speech Communication Group at MIT (until 1961) the Human Vocal Cavity Fujisaki joined the Speech Communication Group at MIT (until 1959) Chiba reported clinical studies on aphasia (The Bulletin, PSJ) (J. Phonetic Soc. Jpn., Vol. 5, 2001) 1959 Chiba passed away Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 13

2 Chiba and Kajiyama (1941-2) Arai (2001)

• “Artificial Vowels” – static clay models of five Japanese vowels were made • Cylinder-type models: Precise reproduction of original models based on Chiba & Kajiyama’s measurements and – their 3D measurements were realistic by simplification comparing artificially produced sounds with • Vowel-like sound is emitted when sound source is fed naturally produced counterparts through the glottis end. • Used for education purposes in speech science classes and found that they are extremely useful for teaching basic concepts in vowel production

Chiba and Kajiyama’s mechanical model for vowel [i] – source-filter theory (Chiba and Kajiyama, 1941, p.129) – relation between vocal-tract shape and vowel quality

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 14 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 15

Acoustic-Phonetics Demonstrations Acoustic-Phonetics Demonstrations G500

Acoustic-Phonetics Demos Please Source-filter Theory splab.net/APD/ visit! • We have been collecting a set of demos in acoustic • We can simply model speech sounds as a phonetics combination of “source” and “filter” – an online version featuring acoustic demonstrations • The materials are mainly in the form of Vocal tract – sounds Source Speech – videos/images (filter) ・ source – explanatory descriptions ・Aspiration source When changing the vocal-tract configuration, ・Transient source a source is modified in terms of characteristics. • We have released them through our Web site. ・Frication source & YouTube

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 16 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 17

Acoustic-Phonetics Demonstrations G500 Pure as an input signal Time vs. Frequency (from Handbook in Phonetics, 2018) • There is a tube with a uniform diameter of 28 mm and a length of 170 mm. • When we input a sinusoid from a loudspeaker, a pure tone is produced through the tube. • What happens when we vary the frequency of the sinusoidal wave? • The plot shows the output sound through the acoustic tube when we measure the output level relative to the input level of the sinusoid as its frequency varies. Time • It shows that the output level fluctuates as the frequency increases. • Specifically, the output-level curve has peaks at approximately 500 Hz, 1500 Hz, and 2500 Hz. Frequency H(ω) • They are due to the resonance, and the peak frequencies are called resonance frequencies.

Pure tone

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 18 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 19

3 Impulse train as an input signal (from Handbook in Phonetics, 2018) Vocal Folds • We are going to input a combination of many sinusoids of multiples of 100 Hz. • This means, we input a impulse train whose is 100 Hz. • The fundamental component and its multiples are called harmonics. • A glottal source signal also has a similar spectrum (the harmonic structure). • When we input the impulse train into the tube, the spectrum has the harmonic structure. • We can observe that the output signal with the impulse train input has the same peaks. • The uniform tube approximates the vocal-tract configuration of the neutral vowel, schwa [ə].

Impulse train

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 20 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 21

Acoustic-Phonetics Demonstrations G100 Sound Artificial Larynx

Electrolarynx Whistle-type

Frequency Time

• Time-Frequency Pattern of speech SECOM – The intensity is represented by colors Your Tone Servox Myna (Dencom)

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 22 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 23

Acoustic-Phonetics Demonstrations D710 Vocal-tract Models Sound Spectrograph (from Kent & Reed, 1992)

• The variable band-pass filter was incorporated in this machine. • An utterance is first recorded on a magnetic drum. • It allows to playback the same sound repeatedly. • The signal modulates a variable carrier frequency • We have developed a series of vocal-tract models for → called heterodyning. different purposes, as shown in this figure. • It sweeps the signal to be analyzed past a fixed filter. • There are two filter bandwidths: 300 Hz and 45 Hz. • Combining any of these models with a sound source Phonetics Lab. at Sophia Univ. enables us to demonstrate (Rion, SG-07) – 1) the source-filter theory of vowel production, and – 2) the relationship between the vocal-tract shape and vowel quality. • The online demonstration contains video clips with sounds in addition to a simple description. – If physical models are available, the benefit to students is even greater, but the video clips alone are still helpful. Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 25

4 Acoustic-Phonetics Demonstrations F100 / F110 / F120 Sliding Two-tube (S2T) Model Anatomy T. Arai, Proc. of the ASJ Meeting , Sep. 2018. Handbook in Phonetics, 2018 & • A two-tube model is used to simulate vowels. Physiology • The of the entire model are approximated by those of each individual tube. • The figure shows the cross-sectional dimensions • Lung models: of the two-tube model.

• The first tube has a smaller area than the second tube.

• The narrow tube has the area of A1 and its length is L1. • The wide tube has the area of A2 and its length is L2. • If L1+L2 is constant, we can discuss the lower resonance frequencies of the two-tube model as a function of the location of the boundary between the two tubes. • To demonstrate that the acoustic outcome from the two-tube model and its resonances are approximated from a simple acoustic theory of a uniform tube, we developed two types of sliding two-tube (S2T) model.

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 26 2018/8/15 ISCA Distinguished Lecture 27

Sliding Two-tube (S2T) Model Sliding Two-tube (S2T) Model T. Arai, Proc. of the ASJ Meeting , Sep. 2018. T. Arai, Proc. of the ASJ Meeting , Sep. 2018. Handbook in Phonetics, 2018 Handbook in Phonetics, 2018 L1 L2 • The top panel shows its side view. • We recorded and analyzed an output • The black arrow indicates that signal when we produced sounds from there is a small hole for a source. S2T Model. • This figure shows a spectrographic representation of an output signal Sound source produced by blowing into a reed-type sound source attached to S2T Model • The bottom panel shows the view and sliding the inner bar simultaneously. of the mouth end. • The horizontal axis corresponds to the • The hole looks like the combination length of the back cavity L1. of a large square (30 mm by 30 mm) 30 x 30 & a small square (10 mm by 10 mm). 2 • A1 is 10 x 10 = 100 mm . 2 • A2 is 100 + 30 x 30 = 1000 mm . 10 x 10

2018/8/15 ISCA Distinguished Lecture 28 2018/8/15 ISCA Distinguished Lecture 29

Acoustic-Phonetics Demonstrations G300 Acoustic-Phonetics Demonstrations G300 Sliding Three-tube (S3T) Model Sliding Three-tube (S3T) Model T. Arai, Acoust. Sci. & Tech. , 27(6):384-388, 2006 T. Arai, Acoust. Sci. & Tech. , 27(6):384-388, 2006 T. Arai, JASA , 131(3), Pt. 2, 2444-2454, 2012. T. Arai, JASA , 131(3), Pt. 2, 2444-2454, 2012.

• The S3T model is based on the concept of Fant’s three-tube model. • The short slider slides inside the long tube simulating moving constriction by the tongue.

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 30 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 31

5 Acoustic-Phonetics Demonstrations G300 Acoustic-Phonetics Demonstrations G200 Sliding Three-tube (S3T) Model T. Arai, Acoust. Sci. & Tech. , 27(6):384-388, 2006 Vowel Quality vs. T. Arai, JASA , 131(3), Pt. 2, 2444-2454, 2012. Vocal-tract Configuration [i] [e] [a] [o] [] (adapted from Chiba and Kajiyama, 1942)

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 32 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 33

Acoustic-Phonetics Demonstrations splab.net/APD/

Part 2: • VTM-N20 (left) – Precise reproduction of original models based on Chiba & Kajiyama’s measurements and simplification • VTM-T20 (center) – Connected-tubes based on further simplification to highlight the aspects of VT shape that account for differences • VTM-S20 (right) – Sliding 3-tube model based on the concept of Fant’s 3-tube model (the short slider slides inside the long tube) Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 34 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 35

Manner of Articulation International Phonetic Alphabet (How to articulate)

Bilabial Labio- Dental Alveolar Post- Palatal Velar Uvular Glottal dental alveolar • When articulators make complete closure p b t d k – stops, nasals 1-2 ms after the stop release Nasal m n – lateral , flaps → transient source Trill r • When articulators make a narrow constriction Flap – narrow: Constriction lasts for tens of ms f v s z h – wide: approximants → frication source ɹ j w Lateral Approximant l

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 36 37Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 37

6 Pattern Playback (by courtesy of Prof. Douglas Whalen) (Where to articulate) http://www.haskins.yale.edu/history/Adventure.html • bilabial, labio-dental • alveolar, postalveolar • palatal • velar • uvular

Aug.,38 2018 ISCA Distinguished Lecture (Takayuki Arai) 38 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 39

Pattern Playback Acoustic-Phonetics Demonstrations D800 (from https://en.wikipedia.org/wiki/Pattern_playback) Web version of DPP http://www.splab.net/Digital_Pattern_Playback/

You can upload your image file to get a speech file by DPP!

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 40 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 41

Approximants

• Differences in trajectories of F1-F3 /aja/ /ala/ /ara/ /awa/

DPP (cosine synthesis)

T. Arai et al. , Acoust. Sci. Tech. , 27(6), 393-395, 2006. T. Arai et al. , Proc. of IEEE ICASSP, 2769-2772, 2012. (from Lisker, Word, 1957)

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 42 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 43

7 Acoustic-Phonetics Demonstrations splab.net/APD/ These days a chicken leg is a rare dish O200: From Stop to Approximant

PP • /bi/-/wi/-/ui/ – Transition time of F1, F2 • From 20ms through 160ms (20ms step) DPP Frequency

Time Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 44 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 45

Fricatives Flexible-tongue Model T. Arai, Acoust. Sci. & Tech. , 29(2), 188-190, 2008. ・source is at a bit of down stream from the constriction ・obstacle is often important (lower incisor for /s/) ・ the front cavity is resonated (but not for bilabials) obstacle (Fig. 7.4, Johnson, 1997)

No resonance (no front cavity) for bilabials /i/ /a/ /u/

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 46 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 47

Acoustic-Phonetics Demonstrations splab.net/APD/ Arai (INTERSPEECH, 2013) Flexible-tongue Model • A model for lateral and retroflex approximants was developed. Advantages: It mimics natural tongue • • The flapping tongue enables the front half of the tongue to movement and enables us to rotate towards the palate. produce many different sounds and their dynamics • Shortening the length of the tongue produces alveolar/retroflex approximants, and lengthening it produces • Disadvantages: One needs practice to master lateral approximants. the positioning of the tongue

Arai (INTERSPEECH, 2013) Arai (INTERSPEECH, 2014) • A model for English bunched /r/ was • We designed dynamic models developed. with a flapping tongue for • This model consists of several blocks in the approximants including English oral cavity which can be moved up and retroflex /r/ and /l/ down. Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 48 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 49

8 Arai (ICPhS, 2015) Arai (INTERSPEECH, 2015)

• First, we confirmed the model can produce mid front vowels • Two models: with high intelligibility – 2013 Model / 2014 Model • We further used the model for the pronunciation training as a hands-on tool for phonetic education based on the • We confirmed that consideration that tongue and finger movements are related – 1) different configurations of the vocal tract using in terms of motor control similar temporal changes yielded similar sounds – By changing the tongue height of the model, learners can adjust the vowel quality by listening to the output sounds as well as receiving a – 2) same configuration of the vocal tract with tactile sensation different temporal changes yielded different sounds • Slight training effects were observed when using the physical model 2014 Model Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 50 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 51

Arai (INTERSPEECH, 2016) Arai (INTERSPEECH, 2017) The 2017 Model produced /br/ cluster • New model for /b/, /m/, and /w/ •

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 52 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 53

INTERSPEECH 2017 • “Anatomical model”-type – was presented at the Show & Tell Session – It looks real & can produce /a/

Presented at the “Science Square” National Museum of Nature and Science (Japan)

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 54 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 55

9 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 56 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 57

Acoustic-Phonetics Demonstrations V100 Now STL files for 3D-Printing are available • Target types: VTM-N20 & VTM-T20 – Starting in 2017 ( for free! ) – Please visit http:// splab.net/APD/V100/ – Please read the page for more details

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 58 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 59

Summary Thank you very much! • Several types of physical models of the human vocal tract were introduced. • Even though they were originally designed for education in acoustics and speech science, some of the models are able to be applied for science and application purposes. – A model for English /r/ was originally designed to teach how the sound is produced. → It was applied to practice English vowels for non-native speakers. – Another model for lateral approximant was originally designed to teach how Acknowledgements lateral sounds are produced. → The model was tested to measure differences in sounds radiated from the center and lateral directions because this might help to • JSPS KAKENHI Grant Number 18K02988 evaluate misarticulation in a clinical situation. – A recently model with the movable lower lip and the rotating tongue for the retroflex gesture was applied to simulate an English /br/ cluster. → It is useful because we can test it multiple times with different timings of the articulators and observe each articulator's movement visibly. • Thus, the vocal-tract models have the potential contributions in many applications.

Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 60 Aug., 2018 ISCA Distinguished Lecture (Takayuki Arai) 61

10