Downloaded by guest on September 29, 2021 a Savariaux Christophe Jean-Franc¸ois Patri on based somatosensation vowels tongue categorize to able are Speakers www.pnas.org/cgi/doi/10.1073/pnas.1911142117 production of Bayesian accuracy the by the for classifier. accounted congruence to as in articulation, similar and speech sounds, normal is with whispered that of accuracy feed- auditory an the somatosensory with indicate with results alone, possible our back is Overall, categorization task. /e, vowel production that the speech same of separate the repetitions Bayesian from a several recorded a postures for with tongue speakers production the on speech and based categories normal classifier sounds. vowel in between whispered link postures the the tongue classify modeled we auditorily addition, to In sub- required of group was separate A jects somatosensory feedback. on auditory than based rather was feedback categorization feedback vowel auditory that Masked associ- ensured posture. vowel tongue the reached identify the with to configuration ated and tract feedback vocal auditory masked corresponding with the were whisper subjects trial, to non- tongue-positioning required each totally of articulo- end electromagnetic was the At using graphy. shape. that measured /e, tongue procedure were of the feedback postures a Tongue visual within distorted in involving postures range, like, speech tongue articulatory were a/ different participants ε, which achieve in to feedback. used required was auditory vow- task without identify information. tongue-positioning feedback, can A somatosensory somatosensory humans by using whether assessed els provided study phonetic be present to The also access Accordingly, might audi- input. units matched somatosensory experience and speakers’ tory learning, phono- production motor speech access speech During and sounds. to speech listeners from categories enables logical Ren perception Member Board speech Editorial by Auditory accepted and CA, Francisco, 2019) San 28, California, June of review University for Houde, F. John by Edited and 06511; CT Italy; Haven, Genova, New 16152 Tecnologia, di Italiano Istituto oeetidcdcmestr ecin 8 ) td of study switched jaw who A patients, of implanted 9). perturbations cochlear (8, in external production reactions speech which compensatory has in induced production studies movement speech in in shown impor- information The been somatosensory speech. order of of in characteristics tance reorganized acoustical be 7), the to (6, preserve shown tubes to been lip has or articulation blocks which bite in been using has experiments information in auditory during of documented correction role and used crucial that The monitoring, is movements. planning, posit of the information (5)] for somatosensory ACT production speech and and (4), (2), auditory FACTS GEPPETO both (1), Bayesian [DIVA (3), control motor HSFC most speech listeners, of and models lin- speakers recent which between through passes means information the articulators is guistic tract sounds boundaries). of vocal tract propagation the vocal the and While of articulators shape between the contacts or to and position somatosensory with speech the and (associated of to sounds) (related correlates auditory of sensory both characteristics The spectrotemporal therefore sounds. are to rise give P IS-a M 26 rnbeIP Universit INP, Grenoble 5216, UMR GIPSA-lab f ltr nodrt efr h pcfi oeet that movements specific the perform to artic- order tract vocal in of ulators control precise requires speech roducing aoaor ePhon de Laboratoire | oaoesr feedback somatosensory e aoaor ePyhlgee eNuoonto LN)UR50,Universit 5105, UMR (LPNC) Neurocognition de et Psychologie de Laboratoire a,b tqe etrFrRsac nBan agae n ui,Universit Music, and , Brain, on Research For Center etique, ´ ai .Ostry J. David , a n aclPerrier Pascal and , rnbeAps NS -80 rnbe France; Grenoble, F-38000 CNRS, Alpes, Grenoble e ´ | c,d aeoia perception categorical uinDiard Julien , c /vwl during vowels a/ ε, eateto scooy cilUiest,Montr University, McGill Psychology, of Department a,1 e enLcSchwartz Jean-Luc , doi:10.1073/pnas.1911142117/-/DCSupplemental at online information supporting contains article This 1 the under Published is ulse ac ,2020. 2, March published First Editorial the by invited editor guest a is J.F.H. wrote Submission. P.P. Direct PNAS Board. and a is C.S., P.T.-F., article J.D., This J.-L.S., J.-L.S., interest.y D.J.O., competing J.D., J.D., no J.-F.P., declare authors D.J.O., D.J.O., The research; J.-F.P., J.-F.P., tools; and performed J.-F.P., P.P. data; reagents/analytic paper.y research; the analyzed new and designed P.P. contributed P.P. C.S., and P.P. P.T.-F., and and J.-L.S., J.-L.S., J.-L.S., J.D., J.D., D.J.O., D.J.O., J.-F.P., contributions: Author asvl ipaetelm,pooigpsiedslcmn of to displacement device passive robotic provoking a somatosen- limb, using of by the tests conducted displace which be passively in can movement Indeed, processing tongue. limb sory the of design of specificities the studies the required unlike to to has adapted able finding paradigm a This are of postures. feedback, participants tongue auditory on of that based contribution indicate any without results vowels informa- recognize Our somatosensory from alone. accessed tion whether be can addressed representations has far somatosensory so on based study phonemes information. identify no to recognition able speech, object are speakers tactile in information in but somatosensory example, of (14), for use as, the categorization there of for tasks, evidence nonspeech In extensive way. coherent is by and categorized similar whether be a unknown in also listeners can is the speech it in in However, information found somatosensory (13). been neural gyrus and have temporal 12), representations superior (11, phonetic units of phonetic correlates discrete into in sounds accuracy speech greater in results inputs production. com- sensory a speech that both evidence provided of has (10), bination off and on implants their owo orsodnemyb drse.Eal [email protected] Email: addressed. be may correspondence whom To riuaincudjitycnrbt oteeegne the emergence, categories. the phonological of speech to evolution of the contribute and correlates maintenance, jointly orosensory could and auditory articulation the the auditory enables both of that own indicates absence finding its This categories. the phonological on of identification in feedback that somatosensory demonstrate feedback, to bound- able we paradigm category the were experimental designed in when specifically shifts a perception Using inducing aries. unclear, speech is influence signal to acoustic has shown production been speech and also with language. acquisition associated native the input a in for Somatosensory classes crucial phonological of be preservation the to property this considered and generally units, is phonetic discrete into inputs continuous acoustic categorize to known is perception speech Auditory Significance ntepeetsuy epoieeiec htphonological that evidence provide we study, present the In of categorization the involves perception speech Auditory y uQu du e a ´ aeaTrudeau-Fisette Pamela , PNAS b NSlicense.y PNAS onto,Mto n ersineUi,Fondazione Unit, Neuroscience and Motion Cognition, rnbeAps NS -80 rnbe France; Grenoble, F-38000 CNRS, Alpes, Grenoble e ´ ebec | ´ a,Q,Cnd 3 1G1; H3A Canada QC, eal, ´ ac 7 2020 17, March Montr a ` eBilrenJnay2,22 (received 2020 27, January Baillargeon ee ´ a,Mnra,Q,Cnd 2 1L7 H2X Canada QC, Montreal, eal, ´ . y | o.117 vol. https://www.pnas.org/lookup/suppl/ f , d akn Laboratories, Haskins | o 11 no. | 6255–6263

PSYCHOLOGICAL AND COGNITIVE SCIENCES the tongue is very challenging: the tongue is difficult to access On each trial of the tongue-positioning task (Fig. 2A), one of inside the oral cavity and highly resistant to displacement. In the nine target tongue postures was displayed on a screen under our paradigm, speakers were instead required to position their a spatial transformation intended to remove visual information tongue using visual feedback in a task that provided no infor- about tongue position and shape (Fig. 2B). Target positions mation about the shape and the location of the tongue. We were always displayed in the same horizontally aligned config- were able to show that subjects succeed in this positioning task, uration at the center of the screen, ensuring that targets in all although some speakers are more successful than others. We trials looked the same (red circles in Fig. 2A, Bottom). Subjects then tested whether, once they had positioned their tongue, were provided with real-time visual feedback of their tongue speakers were able to provide a somatosensory categorization of movements according to the spatial transformation shown in the vowel associated with the reached tongue position. At each Fig. 2B and were instructed to 1) move their tongue in order reached tongue posture, subjects were asked to whisper in order to reach the displayed target within a 5-s time interval (reach- to enable a subsequent independent auditory evaluation of their ing phase), 2) hold the reached tongue posture and whisper with articulation by a group of external listeners. The auditory feed- auditory feedback masked by noise (whispering task), and 3) back of the speakers was masked by noise in order to ensure that identify the vowel associated with the reached tongue posture categorization was based on somatosensation only. (somatosensory identification task). We found that speakers were able to identify vowels based on Importantly, the target tongue postures used in the tongue- tongue somatosensory information, and there was a good con- positioning task were chosen in order to sample the articulatory cordance with listeners’ judgements of the corresponding sounds. workspace from /e/ to /a/ via /ε/ as comprehensively and as evenly Finally, it is shown that subjects’ somatosensory classification of as possible, in order to obtain a set of well-distributed articu- vowels was close to the classification provided by a Bayesian clas- latory configurations for the primary aim of the study, which is sifier constructed separately from subjects’ vowel articulations to evaluate the use of somatosensory information in the iden- recorded under normal speech conditions, i.e., with phonation tification of these vowels. Thus, for the tongue-positioning task and auditory feedback. These results suggest that phonologi- to be carried out correctly, it was not required that the set of cal categories are specified not only in auditory terms but also tongue postures which subjects produced at the end of the reach- in somatosensory ones. The results support the idea that in ing phase exactly matched the displayed target tongue postures the sensory–motor representation of phonetic units, somatosen- but rather that overall, the set of tongue configurations uni- sory feedback plays a role similar to that of the auditory formly sampled the articulatory workspace for the vowels that feedback. were studied here. The sounds that were whispered by the eight somatosensory Results subjects were identified by eight additional listeners (henceforth In order to assess whether vocal tract somatosensory informa- auditory subjects) in a forced-choice (/e/, /ε/, or /a/) auditory tion can be used for phonetic categorization, we first needed identification task. This perceptual evaluation is crucial since it a means to instruct subjects to achieve a set of tongue pos- is the only way to assess whether or not the tongue position tures without relying on normal speech movement or auditory described by the EMA sensors corresponded to a vowel and feedback that might provide nonsomatic cues. We designed whether or not it was associated with clear auditory character- a tongue-positioning task using electromagnetic articulography istics in the /e/–/ε/–/a/ vowel range. However, it is also crucial (EMA) to visually guide eight subjects (henceforth somatosen- that the whispering phase did not influence the somatosensory sory subjects) toward different target tongue postures, evenly identification performed by somatosensory subjects, by provid- spanning the articulatory range of the three vowels /e, ε, and a/ ing either auditory or motor cues that might have helped them (Fig. 1B). Nine target tongue postures were specified for each in their identification task. In regard to possible auditory cues, subject, including three vowel tongue postures corresponding to we have taken a number of precautions in order to minimize normal productions of vowels /e, ε, and a/, and six intermedi- the likelihood that subjects could hear themselves during the ate tongue postures distributed regularly over the same tongue whispering phase with the auditory feedback masked by noise. vowel workspace. Vowel tongue postures were recorded with To check the effectiveness of this approach, we asked subjects EMA for each subject during a preliminary speech production to report whether or not they could hear themselves whisper, task involving repetitions of vowels under normal speech condi- and no subject responded that he/she could. We have also tions (Materials and Methods). Fig. 1 shows the placement of the evaluated the acoustic power of the whispers and found them EMA sensors and illustrates the set of nine target tongue pos- to be more than 40 dB below the masking noise, which has tures for one representative subject (the set of targets for other been previously shown to make impossible the auditory percep- subjects are presented in SI Appendix, Fig. S3). tion of vowels in French (15) (see SI Appendix for details). In regard to possible motor control cues, as is commonly observed in postural control, subjects did not strictly maintain a stable tongue posture during the whispering phase (see SI Appendix, AB Fig. S1, for a quantification of this articulatory variability). To check the possibility that these small tongue movements Palate trace could have provided helpful motor cues for the somatosen- sory identification task, we carried out a number of analy- ses of these articulatory variations, which are described in SI e Appendix. We found no indication supporting such a possibility. ɛ In particular, we found no evidence suggesting that these small a movements would be directed toward vowel tongue postures. These observations indicate that the somatosensory identifica- tion task was not biased by auditory or motor cues during whispering. Fig. 1. (A) Sagittal view of EMA sensors (black dots) and (B) example of tar- get tongue postures from one subject. Targets include three vowel tongue The analysis of the data was divided into two main parts. The postures (black lines) and six additional intermediate tongue postures (gray first part was devoted to the evaluation of the tongue-positioning dashed lines) distributed regularly over the /e-ε-a/ tongue workspace. The task. This first part is necessary for the second part of the anal- solid line on the top of B represents the subject’s palate trace. ysis since one cannot expect subjects that did not perform the

6256 | www.pnas.org/cgi/doi/10.1073/pnas.1911142117 Patri et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 h ih ujcsi w lse elpia otusdanb ad,bsdo h rprino aineepandb h rttopicplcomponents principal two first the by al. et explained Patri variance of tongue proportion reached the on of based distribution hand), the postures. by presents tongue drawn panel reached contours of each (elliptical set of classes the /e, side average describing two right-hand the in with The subjects together Methods). eight panel, and each /e, the of (Materials main side task the left-hand production in the speech on postures represented the is from lines) obtained (gray postures lines) tongue reached of set the 3. Fig. S2, (S1, subjects remaining 3, the Fig. /e, 3, for the S5; (Fig. whereas covers postures and range, uniformly tongue S4, a/ panel) reached each of and subjects set of ε, for the side that S8 left-hand seen and Bottom, be S7, can S6, It S3, production. pos- speech tongue normal on /e, of for ing superimposed set measured task, configurations the tongue tongue-positioning average study the the the in in reached participants tures the of each task. for tongue-positioning the of Task. analysis Tongue-Positioning Articulatory the of Evaluation 1: Stage speech classifier normal during Bayesian recorded a postures somatosen- of movements. tongue the outcome on relied compared the that with we categorization Finally, evaluated sory sounds. who auditory subjects whispered auditory with the the categorization by Second, provided somatosensory categorization. categorization the somatosensory cat- compared vowel by three main we the obtained three of as separability had the egories This evaluated task. we First identification parts. somatosensory the in to referred be postures. henceforth tongue vowel-like will as criteria /e, evaluation the these in meeting auditorily vowel reliably a be can as that identified sounds to corresponded that /e, 2) whole and continu- the varied over that smoothly whether 1) and ously postures assessed tongue we reached somatosen- specifically, participants the the More in task. succeed identification sory to preserving correctly by deformed task is tongue-positioning shape tongue transformed actual is the posture and Then tongue circles were aligned. target Red horizontally The and shape. and task. display tongue length whispering the actual in and of the equal tongue-positioning to part become the correspond left modified segments lines in the all The and display transformation. that in circles, (Bottom) visual way black circles the lines). and a aligned for posture such and vertically tongue used in circles target as transformation the (black displayed Right spatial to (Bottom tongue the were correspond task. subject’s of sensors the lines during Illustration target to constant (B) lip position corresponding lip task. The sensors their identification subjects. keep and to the subjects lines) to help to and displayed intended circles as (red sensors of target position the to corresponding view) 2. Fig. ntescn ato h td easse performance assessed we study the of part second the In itiuino h e f9 ogepsue ece yec oaoesr ujc costil ntetnu-oiinn ak o ahsubject each For task. tongue-positioning the in trials across subject somatosensory each by reached postures tongue 90 of set the of Distribution (A) h elpstoso esr (sagittal sensors of positions real The (Top) tasks. identification somatosensory and whispering, tongue-positioning, the of trial each of Design (A) lseigof Clustering (B) postures. tongue target of set the on out carried PCA a by determined as vertically), (represented direction a/ ε, ethn ieo ahpnl,there panel), each of side left-hand Top, AB AB S1 S3 n /rne ogepostures Tongue range. a/ and ε, a e a e n /atcltr range articulatory a/ and ε, S2 S6 a e a e n /dur- a/ and ε, i.3shows 3 Fig. S4 S7 a e a e otrs o l ujcsaatfo 1adS eosrequite observe we tongue S5 target and S1 nine from the apart of 3A subjects all tongue range Fig. For target the postures. of the over panel distribution of the dimension resulting component each of the principal In densities along first postures. the reached subject the each estimated the by of we by defined postures so workspace tongue do task reached there To the not postures. or of tongue whether coverage tested the uniform a also characterized was We found which characterized postures. one also which the tongue and we direction reached in postures subjects single tongue clusters target the four dimen- two the between these the single match for (see a good postures Moreover, by tongue 3B). represented tongue reached Fig. reached well the produced for also S8 showed sion were and PCA S7, that sin- second S6, The a postures S3, S2. by subjects but only described subjects that all well for was dimension workspace gle posture tongue target this about Details postures. below. in results tongue provided reached are sepa- other of analysis the and set subject postures tongue their each target separate using of for two set their conducted using (PCAs) one We rately, analyses postures. component tongue with associated target principal direction of and set range the the uni- over were distributed postures formly tongue these whether quantitatively the assessed covered actually postures /e, tongue expected reached evaluated of first set we In the observation, postures. whether this vowel assess quantitatively average to the order from deviations noticeable were h C ntenn agttnu otrssoe htthe that showed postures tongue target nine the on PCA The S8 S5 h he lentv ocdcoc ipa ftesomatosensory the of display choice forced alternative three The ) a e a e n /rneo ogecngrtos ealso We configurations. tongue of range a/ and ε, PNAS Percent ofvarianceaccounted

Second component fo 10 20 0 r components ofreached by thetwomainPCA | esmaietemain the summarize We Appendix. SI tongue postures First component S5 S2 ac 7 2020 17, March S4 09 100 90 80 S1 S3 S7 S6 S8 h ih iesosthe shows side right the | /tnu otrs(black postures tongue a/ ε, o.117 vol. | δ o 11 no. i and α | i after 6257

PSYCHOLOGICAL AND COGNITIVE SCIENCES uniform distributions, and this was confirmed by a statistical across subjects (Kruskal–Wallis χ2 = 123.95, P < 0.001). Pair- Kolmogorov–Smirnov test. wise comparisons revealed that the entropy distributions for Auditory analysis of the adequacy of the reached tongue pos- subjects S4 and S5 were significantly greater than for the oth- tures. In order to investigate the adequacy of the reached tongue ers (P < 0.01 with Wilcoxon test and Benjamini and Hochberg postures for purposes of somatosensory vowel identification, we correction method for multiple comparisons). This indicates that evaluated whether the whispered sounds associated with these the whispers produced by these two subjects conveyed little infor- postures were compatible with vowel production. To do so, sub- mation about vowel identity, presumably because the associated jects’ whispered productions were classified by a separate group reached tongue postures did not correspond to one of the sounds of listeners (auditory subjects). First, we analyzed the consistency /e, ε, and a/ that were under test. of the responses provided by auditory subjects in the auditory identification task, which we call auditory answers henceforth. Consistency and Separability of the Clusters of Reached Tongue Pos- Second, we inferred from the set of auditory answers given for tures Associated with the Three Auditory Labels. In normal speech each whispered sound a canonical auditory label for this sound production there is substantial similarity of tongue posture across (see Materials and Methods for details) and assessed whether repetitions of the same vowel (consistency across repetitions), these auditory labels were associated with well-separated clusters and these tongue postures can also be reliably distinguished of tongue postures. from the tongue postures associated with repetitions of another vowel (separability across vowels). In order to check whether the Consistency of Auditory Answers Across Auditory Subjects. We tongue-positioning task reproduced these speech characteristics, assessed the consistency of the auditory classification of each we asked whether or not the set of tongue postures associated, reached tongue posture recorded during the tongue-positioning for example, with sounds that listeners judged as the vowel /a/ task by computing the entropy of the distribution of auditory was different from the set of tongue postures associated with answers attributed to each of the whispered utterances (see SI sounds that listeners judged as /ε/. We assessed the auditory Appendix for details). We expected that whispers which sound labels assigned by the set of auditory subjects by evaluating the like whispered vowels would be labeled consistently by auditory consistency and the separability of the grouping of tongue pos- subjects and would therefore result in distributions of auditory tures made on the basis of the auditory labels. We expected that answers with low entropy. On the other hand, we expected that whispers that carry relevant information for vowel classification whispers that would not sound like whispered vowels would be should be associated with articulatory characteristics that are 1) labeled with greater uncertainty, resulting in distributions of consistent within each category and 2) different enough across auditory answers with greater entropy (close to uniformly ran- categories to preserve their distinctiveness. Hence, if the four dom answers). Fig. 4 presents violin plots of the distribution of EMA sensors are good descriptors of the posture of the tongue, entropies of auditory answers for each whispered sound pro- the auditory labels for the whispers that sound like one of the duced by each of the somatosensory subjects. From this figure /e, ε, and a/ vowels should be associated with quite compact it can be seen that the whispers of all but two somatosensory and well-separated clusters of reached tongue postures. In the subjects (S4 and S5) have low average entropy, with violin plots case of whispers that do not sound like one of these vowels, the being larger at the base than at the top. This means that most clusters of reached tongue postures should be wide (more vari- of the whispers produced by the somatosensory subjects were able) and largely overlap one another. Silhouette scores provide classified in a consistent way across auditory subjects. Statisti- a measure of clustering quality in terms of consistency and sep- cal analysis confirmed that the distribution of entropy differs arability, by comparing how well each data point belongs to its own cluster as compared to its neighboring clusters (16). Our sil- houette analysis assigns to each tongue posture a value in the 1.2 [−1, 1] range, with positive values corresponding to data that are well inside their cluster, close to 0 values corresponding to data that are close to the boundaries of their cluster and nega- 1 tive values corresponding to data that are closer to a neighboring cluster than to their own cluster (see SI Appendix for details). The 0.8 gray bars in Fig. 5, Middle, show for each somatosensory sub- ject the average silhouette score of clusters of reached tongue postures based on auditory labels. We call these scores audi- 0.6 tory clustering scores. They are arranged in Fig. 5 in descending

Entropy order from left to right. Fig. 5, Middle, also shows average sil- 0.4 houette scores of clusters of reached tongue postures based on somatosensory labels (see below for an analysis of somatosen- 0.2 sory clustering). Examples of tongue clusters associated with high and low average silhouette scores are shown in the left and right upper panels in Fig. 5, respectively (figures corresponding to 0 SI Appendix S1 S2 S3 S4 S5 S6 S7 S8 other subjects are presented in , Fig. S5). It can be seen that for a subject with high silhouette scores (S6; Fig. 5, Fig. 4. Distribution of entropy of auditory answers for each whisper pro- Left), the clusters of tongue postures are well separated with duced by each somatosensory subject (violin plots). Small entropy values moderate overlap between auditory vowel categories. In con- correspond to whispers with low uncertainty about vowel identity; in par- trast, clusters of a subject with low silhouette scores (S5; Fig. 5, ticular, zero entropy values correspond to whispers that were given the Right) show strong overlap. Furthermore, for well-separated same label by all auditory subjects. Data points correspond to the differ- clusters, the tongue postures associated with the auditory labels ent whispers performed across trials by the somatosensory subjects during /e/, /ε/, and /a/ are correctly distributed from high to low around the positioning task. They are distributed over 10 entropy values, which the tongue postures characteristic of each vowel (black lines correspond to possible entropy values for the three vowel labels and eight auditory answers (see SI Appendix for details). The dashed line indicates the in Fig. 5, Left). theoretical maximum uncertainty (entropy of 1.1) of auditory labeling. Gray In order to assess the significance of the auditory clustering colored bars represent the average entropy of the auditory answers for each scores, as compared to the null hypothesis that tongue pos- somatosensory subject. tures were labeled at random, we performed a nonparametric

6258 | www.pnas.org/cgi/doi/10.1073/pnas.1911142117 Patri et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 3 6 7 n 8 h eut bandfrteohrsub- other the for subjects obtained 3–5): al. et results Patri (Figs. The analysis S8. our and of S7, 1 S6, stage postures, S3, by tongue evidenced task vowel-like tongue-positioning on reached as the satisfactorily focus in they well will because performed the we who of follows, subjects evaluation tongue the which the conventional In identification, of sounds. somatosensory /e, these production the with the of associated in postures coverage tongue- and of the workspace terms perform task in to both ability task, their positioning in subjects across ferences Was 3) other? articulatory the with to consistent patterns? category identification vowel vowel one somatosensory pos- from tongue separated regions categorization reached well specific with each providing among consistent with identification, identification auditory vowel vowel associated somatosensory was Was which 2) /a/ ture? identify to and able to /ε/, subject /e/, feedback somatosensory somatosensory each three Was use addresses 1) analysis questions: The Identi- specific feedback. Somatosensory somatosensory their the on in Performance Task. of fication Evaluation 2: Stage violin the in 4. entropy Fig. high of with plots points those data also fewest are the that scores present fact clustering who between the highest the by relationship with supported subjects further three the the is sounds of and interpretation postures with the tongue associated This that typically reliable vowels. posture and are tongue these positions /a/ whole sensor and the EMA of /ε/, indicators the /e/, in vowels observed the regularities of that productions with compatible indicate typical postures tongue observed, low reached whom regularly also subjects for was these 4). S8), classification and (Fig. S7, auditory remaining answers S6, the entropy auditory (S3, of scores subjects the clustering somatosensory of auditory four entropy high the of contrast, level In convey indicated low as to identification, their seemed vowel by whispers four for their remaining information the that relevant of some fact silhouette those the had than despite S2) lower subjects, clearly and production. were (S1 that vowel subjects scores with somatosensory other compatible Two configurations cor- above, poorly to stated postures respond tongue listeners hypothesis, reached vowel their our by for because information with identification relevant provided val- little consistent highest conveyed answers whispers the is their auditory that have This the also 4). It of subjects S5. (Fig. entropy two and these of S4 that except ues seen subjects be all for can differ- comparisons) multiple significantly (P for were chance scores from clustering ent auditory that found with and (17) /ε/, test /e/, to randomization corresponding Fig. lines in black (as bottom gray and in middle, (upper, displayed reference postures tongue for reached indicated other postures the tongue from vowel them three distinguish respectively). the to /a/, to order correspond in lines colored Black specifically bars). 3). are gray postures (light tongue scores target clustering associated auditory of (Left right) to to corresponding left (P (from chance (P order from chance descending from different in different significantly arranged are subject, scores somatosensory clustering Auditory each for obtained scores clustering 5. Fig. h rvosaayi hwdta hr r motn dif- important are there that showed analysis previous The asso h uioyadsomatosensory and auditory the show Bars (Middle) labels. somatosensory and auditory with associated postures tongue reached the of Clustering enx ou nsbet’casfiainbased classification subjects’ on focus next We odad(Right and good ) < < .1 o l ujcs h lseso ece ogepsue soitdt ahvwlctgr o ersnaiesubject representative a for category vowel each to associated postures tongue reached of clusters The subjects. all for 0.01) .1wt omBnern correction Holm–Bonferroni with 0.01 Somatosensory Audiitory e e 10 4 admpruain.I was It permutations. random oaoesr crs.Frec oe aeoy the category, vowel each For scores). somatosensory [Bottom] and scores auditory ([Top] scores clustering poor ) ɛ ɛ S6 a a

Average Silhouette -0.1 0.1 0.2 0.3 0.4 0 n a/ and ε, 6S 8S 1S 4S5 S4 S2 S1 S7 S8 S3 S6 <

.1 o l ujcsecp 4adS.Smtsnoycutrn crsaesignificantly are scores clustering Somatosensory S5. and S4 except subjects all for 0.01) Clustering Scores Clustering Auditory Labels Somatosensory Labels httelbln a ifrn rmcac o l subjects all for chance from different was for (P labeling used the those using that to random (10 similar scores was clustering auditory tests labeling somatosensory randomization assessed the We We nonparametric not scores. values). or clustering silhouette somatosensory whether associated scores and these somatosensory subject call each each for of shapes labels somatosensory (see bars subject the black The with associ- clusters. ciated 5, be separated poorly Fig. to and in expected wider is with classification ated irrelevant well-separated and and compact quite clusters, with is associated of classification in be to As somatosensory basis expected henceforth). relevant task the labels identification, identification somatosensory on auditory answers somatosensory these category the call in vowel (we provided each tongue answers of of to the clusters sake the associated of for separability postures displayed the and also consistency the be ing will S5) and S4, only. information S2, (S1, jects rsnstecre o h ujcswopromdwl the well performed who subjects the 6, for Fig. curves subjects. the somatosensory presents the corre- Left, of corresponding each for the coefficients and lation curves illustrates identification /a/ 6 resulting and Fig. the them. computing /ε/, between by coefficient /e/, correlation curves vowels Pearson’s identification the auditory classified and of were somatosensory each that as tongue postures listeners (see reached or tongue speakers of of com- by subset proportion we continuum each articulatory the this along In an puted articu- steps /a/. along the in with steps to associated target postures nine /e/ associated in from tongue the resulted reached continuum This to of pos- space. set closest tongue the latory were target of that nine composed nine was postures the into subset of postures each each tongue tures: reached to of corresponding do set To subsets the postures. articulatory subdivided tongue an target we along nine so, the curve spanned identification that auditory continuum an and sory clustering somatosensory lower tongue- had the also perform whispered scores. effi- well not their did as task classified who positioning information who subjects tongue expected, subjects somatosensory As own auditory speech. of their the basis categorize as to ciently the able on that were indicates postures This subjects auditory magnitude. four well highest in similar these the performed were clus- that somatosensory had who scores and tering auditory and S8, had likewise task and scores, clustering tongue-positioning S7, the S6, in S3, Subjects parisons). efisl vlae oaoesr dnicto yanalyz- by identification somatosensory evaluated firstly We o ahsmtsnoysbetw osrce somatosen- a constructed we subject somatosensory each For < IAppendix SI .1wt omBnern orcinfrmlil com- multiple for correction Holm–Bonferroni with 0.01 rsn h vrg ihutevle asso- values silhouette average the present Middle, o h lseigo tongue of clustering the for S6, Fig. Appendix, SI e e o eal) eqatfidtesmlrt of similarity the quantified We details). for PNAS ɛ ɛ | S5 ac 7 2020 17, March 4

admpruain) efound We permutations). random

a a

y r o s n e s o t a m A o u S d i t o r y | o.117 vol. | o 11 no. | 6259

PSYCHOLOGICAL AND COGNITIVE SCIENCES S6 r=0.93 *** S1 r=0.95 *** agreement between these two curves provides further evidence 100 of a strong link between tongue postures achieved during nor-

swers mal speech production and the perception of these vowels based %

0 on somatosensory feedback. As expected, for the remaining four of an of e ɛ a e ɛ a subjects shown Fig. 7, Right, the similarity between curves is less. S3 r=0.94 *** S2 r=0.71 *** Discussion Somatosensory Auditory Somatosensory Information May Provide a Perceptual Basis for Vowel Categories. We have designed an experimental task to e ɛ a e ɛ a test whether speakers have access to phonologically relevant r=0.96 *** S8 S4 r=0.32 ns somatosensory information from the tongue. In this task subjects were required 1) to reach a range of vowel-like tongue postures between the vowels /e-ε-a/, in a setup designed to minimize the use of everyday speech motor control strategies, and 2) to iden- ɛ e ɛ a e a tify the associated vowels on the basis of tongue somatosensory S7 S5 r=0.84 *** r=0.66 *** feedback. A first step in the data analysis involved an evaluation of whether subjects were able to correctly perform the tongue- positioning task. In order to do so, we assessed 1) whether e ɛ a e ɛ a subjects were able to produce a range of articulations that cov- ered the intended vowel space and 2) whether subjects reached Fig. 6. Identification curves of somatosensory labels (plain lines) and audi- tory labels (dashed lines) for subjects associated with (Left) good and (Right) tongue postures that correspond to the production of a vowel, poor clustering scores. Subjects are arranged within Left and Right in by evaluating the consistency of listeners’ perceptual judgments. descending order (from the top to the bottom) of auditory clustering scores, Despite the difficulty of the task, four subjects (S3, S6, S7, and as in Fig. 5. Blue, cyan, and red curves correspond to /e/, /ε/, and /a/ cate- S8) were clearly able to perform the task correctly: 1) they gories, respectively. For each of these curves the nine dots along the x axis achieved tongue postures that were well distributed over the indicate the percentage of answers for the corresponding vowel for each of entire intended vowel space (Fig. 3); 2) their whispers were the nine subsets of reached tongue postures. In all of the panels, Pearson’s well identified as vowels by independent listeners, and these correlation coefficient is reported as a measure of similarity between the identifications were consistent across listeners (Fig. 4); and 3) auditory and somatosensory identification curves. Statistical significance: the clusters of reached tongue postures associated with each ns = nonsignificant; *** = significant (P < 0.001). auditory vowel category were only partially overlapping and con- sistent with the expected vowel articulations (Fig. 5, gray bars). tongue-positioning task, and Fig. 6, Right, gives the curves for the The remaining four speakers (S1, S2, S4, and S5) either failed other subjects. to achieve correct tongue postures, were too variable, or pro- Fig. 6, Left, shows curves that display a clear separability of the duced whispered speech that was not correctly classified by regions associated with each vowel in the somatosensory space. listeners in the auditory identification task. We interpret these First, in terms of similarity, the somatosensory identification observations as evidence for the fact that for these latter four curves are very close to the auditory identification curves (Pear- subjects, tongue postures as a whole did not correspond to son’s correlation coefficient above 0.8, P < 0.001 from two-tailed vocal tract configurations that are appropriate for vowel pro- Student statistics implemented in the MATLAB functions corr, duction. As a consequence we focused on subjects S3, S6, S7, with Holm–Bonferroni correction for multiple comparisons) and S8 for the evaluation of their capacity to identify vowels that, in agreement with conventional observations, display well- separated acoustic regions for each vowel. Second, we clearly see in each curve the existence of three distinct categories with S6 r=0.88 *** S1 r=0.81 *** regions in which only one vowel gets high identification scores 100

and in a sequential order that matches the sequential order

swers

% n

along the articulatory continuum. As expected, this pattern is not a f

o 0 observed in the curves obtained from the subjects that did not e ɛ a e ɛ a perform the tongue-positioning task well (Fig. 6, Right). S3 r=0.88 *** S2 r=0.06 ns We further explored the consistency between vowel identifica- Somatosensory tion provided by subjects in the absence of auditory feedback and Bayes the associated tongue postures by comparing the somatosensory identification curves with identification curves resulting from a e ɛ e ɛ a r=0.98 *** Bayesian classifier that models the link between vowel categories S8 S4 r=0.63 *** and tongue postures during normal speech production (see SI Appendix for details). We determined the parameters for this classifier from the statistical distribution of the tongue postures e ɛ a e ɛ a reached for each vowel during the preliminary production task. S7 r=0.96 *** S5 r=0.49 ** We used the model to predict the identification curves that would be expected for each subject if their classification was based on statistics of the tongue postures that they achieved during normal speech production (see SI Appendix for details). e ɛ e ɛ a Fig. 7 presents the identification curves predicted by the Fig. 7. Identification curves from somatosensory labels (plain lines) along Bayesian classifier (dashed lines) along with those obtained from with predictions from a Bayesian classifier (dashed lines). In all panels, Pear- the actual somatosensory identification (solid lines). For the son’s correlation coefficient is reported as a measure of similarity between four subjects who performed well in the tongue-positioning task model predictions and somatosensory identification curves. Statistical sig- (Fig. 7, Left), there is a clear similarity between the two curves as nificance: ns = nonsignificant; ** = significant (P < 0.01); *** = significant reflected by the high Pearson correlation coefficients. The close (P < 0.001). See Fig. 6 for more details.

6260 | www.pnas.org/cgi/doi/10.1073/pnas.1911142117 Patri et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 ar tal. et Patri h otiuino oaoesr edakt pehpercep- far, so speech However, to 21). feedback ref. application somatosensory in the of lips to contribution the due the to sentences enhance- vocoders the speech vibrotactile example, of of for those lipreading (see, and training of speech specific that ment of to those due experience both are normal sounds, that sensory the speech with all with associated integrates associated are perception are speech that that inputs studies idea these generally, the More support perception. speech or was to directly production information it speech contributes Thus, during whether somatosensation hand. with the /pa/, linked to indirectly or unvoiced neck the puffs the air to of the applied of /pa/ perception presence and the the /ba/ that increased par- observed of of They test hand noise. identification the in (20) stimuli or perceptual Derrick neck a and the during to Gick with ticipants puffs vein, synchrony air same inaudible in the slight syllables applied In two stimulus. these acoustic speaker of the a one was from continuum mouthing resulted acoustic which silently /ba-da/ information a who haptic on by (19), /d/ influenced stop Dekle stop alveolar and labial the the Fowler and between /b/ boundary of perceptual findings the that the shown results have with These accord sounds. speech in timing these are of asso- a production input with the somatosensory applied with the ciated with was compatible the stretch directions that (in the in /æ/ showed and when and (18) altered “head”) al. was (in et /ε “had”) Ito vowels skin, between boundary facial perceptual the robotic stretch a Using to studies. device several has in perception demonstrated ? previously in been Vowel information somatosensory in of Operate role The Feedback Somatosensory Does How satisfactorily. task of this half perform only could surprisingly subjects Not positioning our impossible. passive effectively because is and tongue identification the that for visuo-motor-somatosensory cues used movement be providing complex might avoid to needed a was which perform match, required task subjects tongue-positioning The that identifi- participants. somatosensory our evaluate to in us cation per- for prerequisite the successfully a of to was performance task subjects Successful task. some tongue-positioning of the form inability the by not is weakened information somatosensory from identified perceptually be perception between to link high-front strong speech. produced natural postures from a in tongue and direction is feedback somatosensory there /e-ε-a/ on for based that the information and along relevant low-back; least prop- provides at categorical has own erties, identification on this its that based identification; on vowel interest somatosen- feedback that of speech evidence sory normal provides vowels study during our achieved the Hence, postures production. of tongue the each of statistics of provides that classifier categorization Bayesian a identifica- of a those somatosensory to close their are that curves auditory tion find of also pattern We categorical identification. the follows closely somatosensory subjects, identification these for that auditory whis- shows and their curves somatosensory identification to the of listening similar comparison subjects a were auditory Moreover, pers. that by provided continuum ones /e-ε-a/ the to the across iden- curves somatosensory yielded tification postures, tongue own their on based vowels three suc- the of also identification interest. above, partic- task of somatosensory described the the tongue-positioning we of in that 100% the ceeded of criteria that performed identification the is a to the successfully study Hence, according matched our 5). who, of that (Fig. ipants result subjects way important auditory a by first way in sounds consistent whispered so a their in did categories identifica- they vowel four judged somatosensory and these they the since that in task found successful tion quite We were feedback. subjects somatosensory on based oeta h ocuinta pehpoei aeoiscan categories phonetic speech that conclusion the that Note classification vowel consistent produced who subjects, These yteisiuinlehc omte fteUiest Grenoble-Alpes University the of consent. approved informed committee was provided and subjects ethics All Helsinki (IRB00010290-2016-07-05-08). institutional of declaration the 1964 by the hearing in or specified standards cognitive no reported and speakers sub- All French impairment. task. native identification sub- were auditory of called the jects group subjects, only below of first performed group described The subjects, second tasks The auditory group). task. all second identification auditory performed the the for subjects, except y, between somatosensory and 29 called group average first jects, y, the for 35 y, ages 24 and group, average y, 21 each 36 in and males, 20 four between subjects, ranging (eight subjects of groups different Participants. Methods and Materials and production speech previously learning. of its to processes addition in in involvement level linguistic documented a at role infor- a somatic alternatives. plays that mation the demonstrates disentangle study to present postulated one the are enable Nevertheless, not that does processes it com- but cognitive intrinsically above, various is the study with present patible the configurations, speech of rendered prediction be the should impossible. feed- categorization with for perturbing phonetic (27), necessary crucial somatosensory models by be if that forward to tested using and assumed be control tract widely forward might is vocal corre- which This the commands, cerebellum, prediction. tract the of motor this vocal state the update the actual of to the whether basis to evaluate comparison the sponds on to a predicted but for state, targets not sensory feedback, uses with categories, somatosensory phonetic and in to This described link auditory a tract parameters. establishes vocal geometrical which the relevant estimation, of phonologically state arise of the could terms categorization of phonetic estimation (2), the al. from et Parrel by recently cortex. auditory of somatosensory involvement the the be without not in possible should than classification rather somatosensory Accordingly, domain fact sensory-to- domain. auditory in would the a categorization in via vowel occur account, categories, this By character- phonological map. somatosensory sensory auditory of with association identification the istics vowel from somatosensory result context, would motor of this consequences In somatosensory per- commands. estimated predicted of and be basis could the production representations on auditory speech abstract hypoth- that in authors esized These imagery magnetoencephalography. mental using ception context of the in Poeppel’s studies developed and 25) of Tian (24, by Model suggested sensory- Prediction as Stream a Dual brain, involve the would in map explanation to-sensory alternative an units, netic was cortex auditory if somatosensory even suppressed. that possible expect remain speech would would and one classification come account, production asso- might this speech and By This during sounds learning. (23). speech feedback al. of somatic pairing et ciated repeated Guenther the exam- or through for (22) about proposed Keating was directly as by domain units, could ple this these of feedback with regions associated auditory with typically domain, of pho- somatosensory absence the of for in identification the goals occur the in specify units, that units phonetic domains netic the sensory of the production in the targets which to are according framework there cognitive theoretical different the to in attributable First, processes. be could route perception somatosensory speech in a to for of role information existence The linguistic relevant categorization. provides a phonetic it plays namely, shows study perception; also present speech information The modulatory. somatosensory be to that shown only was tion h xeietlpooo a odce nacrac ihteethical the with accordance in conducted was protocol experimental The decoding somatosensory categorical is there that showing By more and (26) al. et Houde by proposed was as Alternatively, pho- for targets sensory of existence the of context the in Also h vrl td a opsdo w at htivle two involved that parts two of composed was study overall The PNAS | ac 7 2020 17, March | o.117 vol. | o 11 no. | 6261

PSYCHOLOGICAL AND COGNITIVE SCIENCES Experimental and Tasks Design. used these new target tongue postures, and the somatosensory identifica- Overview. The experimental protocol was designed to achieve two goals. tion task. These last two tasks are the main experimental tasks of interest. The first goal was to have subjects produce a set of tongue configurations No use was made on the second day of any of the data recorded on the that covers the articulatory range of the French vowels /e, ε, a/. These par- first day. The first day was strictly for training. Importantly, no reference ticular vowels were selected for this study because their primary articulatory to the forthcoming somatosensory identification task was given during the features can be readily observed with an EMA system, which provides highly training on the first day, in order to prevent the possible development of accurate information in the front part of the vocal tract (from the lips to the strategies for the tongue-positioning task. In all tasks, subjects were seated region of the soft palate). The second goal was to minimize the likelihood in a soundproof room in front of a computer screen. The influence of the that subjects use anything other than somatosensory information, such as jaw on tongue positioning was removed by means of a bite block specifi- auditory or visual information or stored motor strategies, to identify the cally molded for the teeth of each subject (Coltene PRESIDENT putty). The vowel associated with their tongue configurations. bite block was designed to keep the jaw in a comfortable position but to In a preliminary production task we used EMA to record for each sub- ensure that articulatory positioning was restricted to the tongue. The bite ject tongue postures associated with the production of vowels /e/, /ε/, and block was molded by the subject while holding a reference spacing of 15 mm /a/. Using the average tongue posture for each vowel, we defined a set of between their teeth. intermediate postures by linear interpolation and extrapolation along /e–ε/ Design of specific tasks. The sessions on both days began with a habitua- and /ε–a/, such that overall a uniformly distributed set of nine target tongue tion task followed by a production task in which specific tongue postures postures was obtained, with seven of them spanning the articulatory space were recorded and used for each subject separately to establish the tar- from /e/ to /a/ average tongue postures and the last two extrapolated out- get tongue postures that were used in the positioning and whispering side of this range (one beyond /e/, the other beyond /a/). An example of the tasks which followed. The habituation task was performed right after the set of target tongue postures is shown in Fig. 1, Right (for the remaining placement of the EMA sensors and consisted of the production of 10 pho- subjects, see SI Appendix, Fig. S3). This set of target tongue postures was netically balanced sentences intended to get subjects used to speaking then used in the tongue-positioning task described below. with the sensors glued to the tongue (see SI Appendix, Fig. S7, for the In the tongue-positioning task, subjects (somatosensory subjects) could list of phonetically balanced sentences). The bite block was not used in see a real-time visually distorted display of the EMA sensors on their tongue, this habituation task. After the habituation task, the production task con- connected by solid lines, along with a distorted static display of the intended sisted of the normal production of five French vowels (/e/, /ε/, /a/, / /, and target. They were given 5 s (reaching phase) to move their tongue toward /œ/). In each trial of the production task the subjects started from the the displayed target. tongue posture for the consonant /l/ (see below). This was followed by a The visual display was designed to avoid providing the subject with visual visually displayed “go” signal, upon which subjects were required to pro- information about tongue position or shape (Fig. 2A). Target positions were duce the /l/ sound followed by one of the five French vowels (/e/, /ε/, always displayed in the same horizontally aligned configuration at the cen- /a/, / /, and /œ/). In order to maintain a certain level of naturalness, the ter of the screen, ensuring that targets in all trials looked the same (red consonant–vowel (CV henceforth) syllables were displayed as French words circles in Fig. 2A). Then the position and movement of the subject’s sensors (“les” [plural “the”] for /le/, “lait” [“milk”] for /lε/, “la” [feminine “the”] (black circles in Fig. 2A) were displayed on the screen relative to the posi- for /la/, “lors” [“during”] for /l /, and “leur” [“their”] for /lœ/). Subjects tion of the displayed target according to the spatial transformation shown were instructed to only produce the syllable onset and nucleus and to hold in Fig. 2B. When the tongue was in the target tongue posture, the sen- the vowel for around 3 s (with the help of a countdown displayed on sors’ position matched the horizontally displayed target configuration in the screen). Each of the five CV items was produced 10 times in random the visual display. In other words, at the target tongue posture, the dis- order. The production task and all following tasks were performed with played position of subject’s EMA sensors did not correspond to the actual the bite block. physical configuration of the subject’s tongue but rather was warped onto We selected the consonant /l/ for the onset of the syllable because we a straight line. In this way, visual information about actual tongue shape wanted all of the five vowels to be articulated in a prototypical way, in order and position was eliminated (see real position vs. visual display panels in to have typical tongue postures for each vowel. This last point required that Fig. 2A). The tongue-positioning task thus minimized visual cues regard- the selected initial consonant should have little influence on the articulation ing tongue posture and the availability of speech motor control strategies. of the subsequent vowel. The French consonant /l/ has an apical, nonve- The goal was to ensure that the subsequent judgment of tongue postures larized articulation location. A number of studies (summarized in table 1 (the primary task) was not based on anything other than somatosensory of ref. 28) have shown that the nonvelarized /l/ enables a fair amount of information. variability in the anterior–posterior positioning of the tongue associated After the reaching phase of the positioning task, somatosensory subjects with the articulation of the subsequent vowel. In addition, electropalato- were instructed to whisper while holding the reached posture (whisper- graphic data (e.g., ref. 29, p. 189) provided evidence that the nonvelarized ing task). In-ear masking noise (white noise) was used in the whispering /l/ does not feature tongue grooving but an arched tongue, in the coro- task to prevent subjects from identifying their reached tongue postures nal plane. These observations suggest that consonant /l/ should minimally based on auditory information (Fig. 2A, Middle). Masking noise was sent constrain the articulation of the subsequent vowel, contrary, for example, through in-ear headphones, compatible with the EMA system (Etymotic to the postalveolar sibilants /s/ and /R/, which would introduce articula- ER1). After each positioning and whispering trial, subjects performed a tory constraints in terms of anterior–posterior positioning and/or coronal somatosensory identification task by indicating which vowel they thought tongue grooving. they were whispering. The response was a three-alternative forced choice The set of target tongue postures for the tongue-positioning task was among the target vowels /e/, /ε/, and /a/, as illustrated in Fig. 2A, Bot- obtained from average tongue postures produced for vowels /e/, /ε/, and /a/ tom Right. The masking noise was maintained during the somatosensory during the production task. Average tongue postures were obtained from identification task. the 10 repetitions of each vowel. Six intermediate target tongue postures Subjects’ whispers were classified by an additional group of listen- were computed as equally spaced linear interpolations (for four postures) ers (auditory subjects) in an auditory identification task. Auditory sub- and extrapolations (for two postures) of the average vowel tongue postures, jects were instructed to indicate which vowel corresponded best to the as illustrated in Fig. 1, Right, for one particular subject (S8). The combined whispered sound using a three-alternative choice design similar to the set of vowel tongue postures and intermediate tongue postures constituted one in the somatosensory identification task. Each subject in this audi- the nine target tongue postures used during the tongue-positioning task. tory identification task heard all whispers produced by all somatosen- The tongue-positioning task. Fig. 2 illustrates the design of each trial. At sory subjects, and each whisper was heard and identified only once by the start of each trial, subjects were instructed to start from a tongue posi- each subject. tion for the consonant /l/. The consonant /l/ was chosen because it does The task performed by the somatosensory subjects involved two sessions not substantially influence subsequent tongue positioning, while provid- that were completed on two consecutive days. The session on the first day ing a relatively stable reference in terms of tongue elevation (close to the was set up in order to provide subjects with training for the positioning palate). The target tongue posture, which was shown as four points in a and whispering task. This tongue-positioning and whispering task was pre- straight line (Fig. 2, Bottom Left), was then presented along with a real-time ceded by a production task under normal condition, which was used for display of the subject’s sensor positions using the distorted spatial represen- the definition of the target tongue postures used in tongue positioning tation described above (black lines and circles in Fig. 2). Subjects were given and whispering. The session on the second day again included the produc- 5 s after a visually displayed “go” signal to get as close as they could to tion task under normal conditions for the definition of new target tongue the displayed target. The nine target tongue postures were displayed 10 postures, along with the tongue-positioning and whispering tasks, which times each in randomized order across trials. Each subject thus completed

6262 | www.pnas.org/cgi/doi/10.1073/pnas.1911142117 Patri et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 1 .M iemn .S ars .S ofa,B .Gifih h iciiaino speech of discrimination The Griffith, C. B. Hoffman, S. H. Harris, S. K. Liberman, M. A. 11. 4 .J eemn .L ltk,Hn oeet:Awno nohpi object haptic syn- into and Complementarity window Escudier, P. A Lallouache, T. movements: Schwartz, J.-L. Hand Robert-Ribes, J. Klatzky, 15. L. R. Lederman, J. S. 14. signals. significant biologically as Chang sounds F. E. speech of 13. perception the On Pisoni, B. D. 12. Perkell S. J. 10. ar tal. et Patri ino h hsee peh hsts a efre yegtsubjects eight by performed classifica- was auditory task This on speech. based whispered postures the tongue of reached tion the of evaluation task. identification auditory The / /e/, to Right respectively Bottom (corresponding screen a on uea n ftetrevwl fitrs,b eetn “ selecting by interest, of vowels three the of pos- tongue one reached vowels as the the classify identify ture after to correctly immediately instructed were trial, to subjects each task, able whispering On subjects information? Are somatosensory on study: based the of question tongue stability. mary articulatory median task. best the identification of to somatosensory window time closest The 1-s the the was within each calculated that for posture configuration considered we tongue task, pos- the whispering tongue the trial reached a of select representative the to was order generally that In is ture control. it postural as of whisper, kind any each for during case expected was posture tongue the hear could they not or whispers. whether asked in their were confident they feel when subjects, we from noise, reports masking Appendix the (SI of possibility this power of rejecting the compar- with power and values signals, acoustic these bone-conducted and the ing airborne whispers the evaluating in noise, both By masking whispers, We information. the the 2). auditory (Fig. despite screen provide whis- that the could their possibility on displayed keep the in-ear was checked to through meter carefully subjects presented level a somatosensory SPL) low, the dB feed- intensity pering for (80 Auditory order noise screen. dis- In white the position, being headphones. with on their s) masked displayed maintain 2 was was subjects (around back posture help whispering To tongue countdown while screen. reached reference posture the computer a the tongue with on reached seconds, played few this a hold for to the instructed during were as task. block whispering bite postures. same tongue The the target holding nine task. while the production performed uniformly by was that defined configurations task tongue workspace, This of full range the of a purpose generate covered main to the since was crucial task not this was targets the reaching precisely that targets (9 trials 90 .S .Nsr .J sr,Smtsnoypeiini pehproduction. speech production. in speech precision Somatosensory of Ostry, basis J. D. Somatosensory Nasir, M. Ostry, S. J. D. 9. Shiller, M. D. Tremblay, S. 8. perturbation the for strategies Compensation Orliaguet, P. J. Perrier, P. equivalence Savariaux, Acoustic C. vowels: bite-block 7. of Production Lubker, J. Lindblom, B. Gay, T. 6. Kr J. B. 5. result- change perceptual the drives What Diard, J. Schwartz, J.-L. Perrier, P. Patri, J.-F. production. 4. speech speech of neuroanatomy of Computational model Hickok, facts G. The 3. Houde, J. Nagarajan, S. acquisition Ramanarayanan, speech V. of Parrell, theory neural B. A model: 2. diva The Guenther, H. F. Tourville, A. J. 1. ic h hseigts atdfrafwscns oesalvrainin variation small some seconds, few a for lasted task whispering the Since gyrus. boundaries. phoneme across and within sounds loss. hearing profound with and hearing normal with ryi ioa peh uioy iul n ui-iulietfiaino French of identification audio-visual and noise. visual, in Auditory, vowels oral speech: bimodal in ergy recognition. Evol. Behav. Brain (2006). 1918–1923 speech in space control the of study Nature A tube: lip a using [u] production. vowel rounded the of compensation. selective by perception. and production speech of model modeling bayesian a in hypotheses of framework. Evaluation adaptation? motor speech from ing (2012). 135–145 13, control. task-based and (2019). estimation e1007321 15, state Fusing control: motor production. and a.Neurosci. Nat. gr .Knapza .NucafrRb,Twrsaneurocomputational a Towards Neuschaefer-Rube, C. Kannampuzha, J. oger, ¨ 6–6 (2003). 866–869 423, hoyo pehmtrcnrladspotn aafo speakers from data supporting and control motor speech of theory A al., et aeoia pehrpeetto nhmnspro temporal superior human in representation speech Categorical al., et .Aos.Sc Am. Soc. Acoust. J. LSCmu.Biol. Comput. PLoS ). ont Psychol. Cognit. ag ont Process. Cognit. Lang. 3–5 (1979). 330–350 16, × .Aos.Sc Am. Soc. Acoust. J. 4813 (2010). 1428–1432 13, 0rpttos nti ogepstoigts.Note task. tongue-positioning this in repetitions) 10 olwn ahtnu-oiinn ra,subjects trial, tongue-positioning each Following .Aos.Sc Am. Soc. Acoust. J. 4–6 (1987). 342–368 19, 4824 (1995). 2428–2442 98, 1092(2018). e1005942 14, hspr ftesuyivle phonetic a involved study the of part This 5–8 (2011). 952–981 26, .Ti bevto si gemn with agreement in is observation This ). 6738 (1998). 3677–3689 103, nti akw drse h pri- the addressed we task this In pehCommun. Speech 0–1 (1981). 802–810 69, .Ep Psychol. Exp. J. ,ad//i rnh i.2, Fig. French; in /a/ and ε/, .Phonetics, J. 9–0 (2009). 793–809 51, 5–6 (1957). 358–368 54, LSCmu.Biol. Comput. PLoS , “ e,” a.Rv Neurosci. Rev. Nat. 3–7 (2000). 233–272 28, ´ ur Biol. Curr. e ` , r“a” or e,” ` 16, 9 .Ldfgd .Maddieson, I. Ladefoged, P. 29. for characteristics coarticulatory and cerebellum. positional Articulatory, the Espinosa, in A. Recasens, models D. Internal Kawato, 28. M. Miall, C. R. Patri, M. D. control. feedback state 27. as production Speech Nagarajan, S. S. Houde, F. J. pro- speech 26. in detection error and self-monitoring of Dynamics Poeppel, D. Tian, speci- X. functional The 25. stimulation: on imagination of effect The Poeppel, D. Tian, X. 24. in evidence” Articulatory coarticulation: of model window “The Keating, sentences A. P. Lipreading O’Connell, 22. P. M. Coulter, C. D. Demorest, E. M. Bernstein, E. L. 21. perception. speech to in contributions integration Aero-tactile Cross-modal Derrick, hand: D. Gick, and B. eye with 20. Listening Dekle, J. D. Fowler, perception. A. speech C. in function 19. Somatosensory Ostry, J. D. Tiede, M. Ito, T. 18. Ernst D. M. 17. 3 .H unhr .S hs,J .Tuvle erlmdln n mgn fthe of imaging and modeling Neural Tourville, A. J. Ghosh, S. S. Guenther, H. F. 23. of validation and interpretation the to aid graphical A Silhouettes: Rousseeuw, J. P. 16. orl nsuydsg,dt olcinadaayi,dcso opbih or publish, to decision analysis, and manuscript. collection the data of design, preparation study had in funders “IDEX role The no (ANR-15-IDEX-02). NeuroCoG Communication program by d’avenir” Other “Investissements supported the and also Deafness was on It Universit Program), R01DC017439. Institute [MINDED] grant National pre- Disorders Disorders the (MultIscale from NeuroDEvelopmental 754490 and for agreement therapies grant cision Skłokodowska-Curie under program Marie innovation and agreement the research from 2020 Com- Schwartz), grant Horizon Union’s Jean-Luc European European 013 investigator, the the principal (FP7/2007-2 Unit(e)s”; under Program “Speech Council 339152, Framework Research Seventh European munity’s the from funding ACKNOWLEDGMENTS. request. upon readers to identification available made the be will of construction and classifier, in provided Bayesian are curves) the analysis, silhouette using of scores audi- clustering design of the entropy of the computation of answers, (computation tory analysis data and processing, data ing, analysis. data and processing, data recording, Data eight the whisper. among given audi- answer a by frequent for provided most answers sake the answer auditory label the individual auditory For the assigned and labels. answer subjects we vowel tory auditory two case, call these that we of clarity, in one of to labels; label associated vowel auditory be frequent the could randomly sounds most whispered different some auditory two that canonical with Note the to sound. be assigned this to /a/ label for and this label /ε/, considered /e/, We among answers. we auditory label sound the vowel whispered frequent each For most answer. the but- auditory extracted an appropriate called the is subject on auditory clicking subjects. “ by auditory and “a,” sound across randomized ton each balanced was was labeled block separate blocks subjects each of in Auditory presentation within presented of whispers order were the the of subjects headphones. order somatosensory through The the each Audi- blocks. time tasks. of one other whispers whispers the the of The to any listened in subjects involved not tory were who subjects), (auditory l aadsusdi h ae n l rgasue o hi analysis their for used programs all and paper the in discussed data All ulsesLd xod ntdKndm 1996). Kingdom, United Oxford, Ltd, Publishers dialects. Catalan (2005). two from Evidence dark/l: clear/l/and Sci. Neurosci. meg. and imagery mental (2015). from Evidence duction: processing. speech in (2013). copies efference of ficity hearing-impaired and normal-hearing of Performance subjects. vocoders: vibrotactile with (2009). perception. speech U.S.A. Sci. Acad. (2004). 676–685 otclitrcin neligslal production. syllable underlying interactions cortical pp. 26, chap. 1990), UK, Kingston, Cambridge, J. Press, 451–470. Speech, University (Cambridge of Eds. Physics Beckman, and E. Grammar M. the Between I: Phonology Laboratory in analysis. cluster 3–4 (1998). 338–347 2, rnbeAps universit Alpes: Grenoble e ´ , r“ or e,” .Aos.Sc Am. Soc. Acoust. J. ´ 2(2011). 82 5, emtto ehd:Abssfreatinference. exact for basis A methods: Permutation al., et .Cmu.Ap.Math. Appl. Comput. J. 2514 (2009). 1245–1248 106, e ` ”o h cen ahidvda ae rvddb an by provided label individual Each screen. the on e” .Ep sco.Hm ecp.Perform. Percept. Hum. Psychol. Exp. J. ` PNAS IAppendix SI h eerhlaigt hs eut a received has results these to leading research The 9128 (1991). 2971–2984 90, | h onso h ol’ World’s the of Sounds The ac 7 2020 17, March . elinvto”i h rmwr of framework the in l’innovation” de e ´ 36 (1987). 53–65 20, .Cgi.Neurosci. Cognit. J. | ri Lang., Brain .Cg.Neurosci. Cogn. J. o.117 vol. .It hntcAssoc. Phonetic Int. J. eal bu aarecord- data about Details 1–2 (1991). 816–828 17, Nature 8–0 (2006). 280–301 96, | o 11 no. 1020–1036 25, rnsCogn. Trends 502–504 462, tt Sci. Stat. 352–364 27, rn.Hum. Front. rc Natl. Proc. (Blackwell 1–25 35, | Papers 6263 19,

PSYCHOLOGICAL AND COGNITIVE SCIENCES