Real-Time MRI of Speaking at a Resolution of 33 Ms: Undersampled Radial FLASH with Nonlinear Inverse Reconstruction
Total Page:16
File Type:pdf, Size:1020Kb
Magnetic Resonance in Medicine 69:477–485 (2013) Real-Time MRI of Speaking at a Resolution of 33 ms: Undersampled Radial FLASH with Nonlinear Inverse Reconstruction Aaron Niebergall,1 Shuo Zhang,1* Esther Kunay,2 Gotz€ Keydana,2 Michael Job,2 Martin Uecker,1 and Jens Frahm1 Dynamic MRI studies of the upper airway during speaking, Although MRI promises noninvasive examinations as singing or swallowing are complicated by the need for high well as more flexible and detailed insights into the vocal temporal resolution and the presence of air-tissue interfaces tract, insufficient temporal resolution, and limited image that may give rise to image artifacts such as signal void and quality still emerge as two major obstacles for studying geometric distortions. This work exploits a recently developed speech production. For example, because long image ac- real-time MRI technique to address these challenges for moni- quisition times require either repetitively generated (2,3) toring speech production at 3 T. The method combines a or continuant (4) speech sounds, such strategies preclude short-echo time radial FLASH MRI sequence (pulse repetition a proper visualization of the lips and tongue for plosive time/echo time 5 2.22/1.44 ms; flip angle 5°) with pronounced consonants that are characterized by rapid movements undersampling (15 radial spokes per image) and image recon- struction by regularized nonlinear inversion. The resulting serial within 50–100 ms (5). In addition, motion-induced image images at 1.5 mm in-plane resolution and 33.3 ms acquisition blurring may result from data sharing when using long time are free of motion or susceptibility artifacts. This applica- acquisitions with sliding-window reconstructions to tion focuses on a dynamic visualization of the main articulators achieve high frame rates (6,7). It has therefore been con- during natural speech production (Standard Modern German). cluded that the true temporal resolution for MRI should Respective real-time MRI movies at 30 frames per second be comparable to that of standard videofluoroscopy, clearly demonstrate the spatiotemporal coordination of lips, ideally no longer than 40 ms per image (8). Other prob- tongue, velum, and larynx for generating vowels, consonants, lems are the occurrence of focal signal losses and geo- and coarticulations. The quantitative results for individual metric distortions that depend on the sensitivity of a phonetic events are in agreement with previous non-MRI find- chosen MRI acquisition technique to the unavoidable ings. Magn Reson Med 69:477–485, 2013. VC 2012 Wiley Periodicals, Inc. susceptibility differences near air-tissue interfaces in the upper airway (9,10). Key words: real-time MRI; speaking; articulation; tongue Most recently, we have described a real-time MRI tech- dynamics; speech production; phonetics nique using highly undersampled radial FLASH with image reconstruction by regularized nonlinear inversion (11–13). Preliminary applications to cardiovascular func- tion (14), quantitative blood flow (15), and normal swal- Since the earliest studies of human articulation in the late lowing (16) allow for serial real-time images (i.e., mov- 18th century, methods and applications have continu- ies) with 1.5 to 2.0 mm in-plane resolution and 20–30 ously been expanded. Nowadays, articulatory processes ms acquisition time. Accordingly, the purpose of this during speaking may be analyzed by a wide range of tech- study was to exploit this real-time MRI method to niques including laryngoscopy and laryngography, elec- address the challenges posed by dynamic MRI of speak- tropalatography, electromyography, electromagnetic artic- ing and to investigate the key articulatory configurations ulography, or imaging based on X-ray videofluoroscopy or during natural speech production including vowels, con- sonography (1). However, while former approaches sonants, and coarticulation effects in vowels preceded by require an invasive procedure, which eventually disturbs consonants. the natural articulation of vowels and consonants, video- fluoroscopy involves the exposure to radiation and sonog- raphy is restricted to specific image orientations. METHODS Subjects 1Biomedizinische NMR Forschungs GmbH am Max-Planck-Institut fu¨r biophysikalische Chemie, Gottingen,€ Germany. Twelve subjects (6 men and 6 women; age 28 6 8 years 2Sprachwissenschaftliches Seminar, Georg-August-Universitat,€ Gottingen,€ [mean 6 standard deviation]; range 23–54 years) with no Germany. known illness were recruited from the local University. Additional Supporting Information may be found in the online version of this article. All subjects gave written informed consent before each *Correspondence to: Shuo Zhang, Biomedizinische NMR Forschungs MRI examination and were paid for their participation in GmbH am Max-Planck-Institut fu¨ r biophysikalische Chemie, 37070 this study. Gottingen,€ Germany. E-mail: [email protected] Received 22 December 2011; revised 14 February 2012; accepted 9 March 2012. Magnetic Resonance Imaging DOI 10.1002/mrm.24276 Published online 12 April 2012 in Wiley Online Library (wileyonlinelibrary. All studies were conducted at 3 T using a commercial com). MRI system (TIM Trio, Siemens Healthcare, Erlangen, VC 2012 Wiley Periodicals, Inc. 477 478 Niebergall et al. FIG. 1. a: Setup for real-time MRI of speaking. The arrangement combines a flexible 4-channel receiver coil covering the lower face with a bilateral 2 4 array coil centered to the thyroid prominence and further involves an MR-compatible optical microphone and pro- jection system (screen and mirror). b: T1-weighted scout image (radial FLASH, pulse repetition time/echo time 2.22/1.44 ms, flip angle 5 , and 1.5 1.5 10 mm3) used for defining a midsagittal and coronal (solid line) plane for real-time MRI. c:¼ Reference lines for deriv-   ing spatiotemporal patterns for the lip aperture (LA), tongue tip constrict degree (TTCD), tongue body constrict degree (TBCD), and ve- lum aperture (VEL) during speaking (compare Figs. 6, 8, and 9). Germany) and a body coil for radiofrequency (RF) excita- ported into the MRI system’s database. Typically, most tion. Subjects were examined in a supine position and reconstructions of a speaking study were already avail- MRI signals were acquired by combining a small flexible able at the end of a 20 min in-room time. 4-channel receiver coil covering the lower face (Siemens Healthcare, Erlangen, Germany) with a bilateral 2 4  Speech Tasks and Acoustic Recording array coil centered to the thyroid prominence on both sides of the neck (NORAS MRI products, Hoechberg, Table 1 summarizes four sets of speech tasks. The first Germany) as shown in Fig. 1a. set of seven vowels in German was employed to study Successive image acquisitions during speech produc- tongue gestures at vowel production under different lin- tion relied on a highly undersampled RF-spoiled radial guistic conditions, i.e., isolated segments as well as seg- FLASH MRI sequence with image reconstruction by ments in words and sentences, which vary with respect regularized nonlinear inversion as described (13). The to coarticulation, intonation, and speech rate. The sec- imaging parameters were: repetition time pulse repeti- ond set served to visualize consonant production using tion time 2.22 ms, echo time echo time 1.44 ms, flip meaningless logatom words with a consonant-vowel-con- ¼ ¼2 angle 5, field of view FOV 192 192 mm , in-plane re- sonant-vowel structure. To characterize the main articu- 2  solution 1.5 1.5 mm , and section thickness 10 mm. lators without putative coarticulation effects, the first  Individual images were obtained from a single set of 15 vowel was a long [a:], while the second was a short [a] spokes, which resulted in a temporal resolution of 33.3 in all cases. The third experiment involved real words ms or 30 fps. with or without a nasal consonant ([n], [m], or [n]) Prior to dynamic MRI, scout images (FOV 256 256 between vowels (VCV structure) to identify coarticula- 2  mm ) were obtained in the mid-sagittal plane using the tion effects, i.e., the overlap of articulatory gestures such same FLASH sequence but with full radial sampling and as the timing of velum activity. The final set of logatom conventional gridding reconstruction. An example is words was acquired in a coronal plane as the optimal shown in Fig. 1b. Speech production was then studied orientation for studying larynx motion (18). In general, by multiple real-time MRI movies in mid-sagittal and individual MRI recordings lasted for up to 88 s, while coronal orientations. The midsagittal plane covered the the total examination time was about 15 min per subject. entire vocal tract from the lips, tongue, and nasopharynx All subjects were native speakers of Standard Modern to larynx as well as the upper airway including vocal German (Neuhochdeutsch) and instructed to speak at a folds. The coronal plane was located parallel and central natural rate. For producing isolated segments, they were to the trachea as indicated in Fig. 1b. asked to repeat the vowel of the preceding word. The During data acquisition, online image control was speech tasks were generated on a computer and pro- ensured by sliding-window gridding reconstructions of jected into the MRI magnet using a setup (video projec- 75 spokes obtained by combining five consecutive data tor, lens, screen, and mirror) developed for MRI studies sets each comprising 15 spokes at complementary posi- of human brain function (Fig. 1a). To allow for acoustic tions (17). At the same time, the incoming data are auto- recordings, subjects were equipped with an MR-compati- matically exported for immediate offline reconstruction ble optical microphone with integrated software for by regularized nonlinear inversion to a computer adaptive cancellation of the gradient noise (Dual Chan- equipped with 8 GTX580 graphical processing units nel-FOMRI, Optoacoustics, or Yehuda, Israel). Its two each providing 512 processing cores (Nvidia, CA). Once channels were positioned central to the lips in a unidir- the offline calculation is completed, the images are reim- ectional and omnidirectional orientation.