Phoneme Recognition by Ear and by Eye H.W. Campbell
Total Page:16
File Type:pdf, Size:1020Kb
PHONEME RECOGNITION BY EAR AND BY EYE A DISTINCTIVE FEATURE ANALYSIS H.W. CAMPBELL ^^вшш PHONEME RECOGNITION BY EAR AND BY EYE A distinctive feature analysis PRCMOTOR: PROF.DR. W.J.M. LEVELT PHONEME RECOGNITION BY EAR AND BY EYE A distinctive feature analysis PROEFSCHRIFT TER VERKRIJGING VAN DE GRAAD VAN DOCTOR IN DE SOCIALE WETENSCHAPPEN AAN DE KATHOLIEKE UNI VERSITEIT TE NIJMEGEN, OP GEZAG VAN DE RECTOR MAGNIFICUS PROF. Mr F.J.F.M. DUYNSTEE, VOLGENS BESLUIT VAN HET COLLEGE VAN DECANEN IN HET OPENBAAR TE VERDEDIGEN OP VRIJDAG 22 FEBRUARI 1974, DES NAMIDDAGS TE 2.00 UUR PRECIES DOOR HUGO, WILFRED CAMPBELL GEBOREN TE PARAMARIBO Druk: Stichting Studentenpers Nijmegen 1974 to Jeannette and Jennifer ACKNOWLEDGEMENT FINANCIAL AND MATERIAL SUPPORT : The Netherlands Organization for the Advancement of pure Research (Z.W.O.) VIDEO RECORDING : A. BaMcer (Utrecht University), P. Das (Groningen University), H.P. Sipman, Th. van Haaren and F. Westerbeek (Nijmegen University) EXPERINENTAL WORK : J. Joustra and many other students CORRECTION OF ENGLISH : Mrs J.A. Thomassen TYPING : Jeannette Campbell and Miss M. Maria DRAWINGS : W. Klink CONTENTS CHAPTER 1. INTRODUCTION. 11 1.1. Aim, Scope and Exposition. 11 1.2. Distinctive feature descriptiai of discrete 14 segments. 1.3. The Chomsky-Halle distinctive feature system. 17 1.4. The role of distinctive features in speech 20 processing. 1.5. Input modality and speech processing. 27 CHAPTER 2. PROCESSING OF DISTINCTIVE FEATURES AND RESPONSE PROBABILITY. 34 2.1. Introduction. 34 2.2. Hierarchical ordering of distinctive features. 35 2.3. Tree structure of HCS as a decision structure. 47 2.3.1. The underlying decision structure. 47 2.3.2. Predicting the confusion matrix. 53 2.3.3. Experimental evidence favouring the tree-like decision structures. 56 2.4. Sequential elimination as a general choice strategy. 75 2.4.1. Elimination by Aspects. 75 2.4.2. Elijnination by Aspects and the constant- ratio rule. 76 2.5. Concluding remarks. 84 CHAPTER 3. CCMBINING HEARING AND SPEECHREADING FOR SPEECH RECEPTION. 87 3.1. Introduction. 87 3.2. The effectiveness of canbining hearing and speechreading for the reception of speech. 88 3.3. Combination principles for the joint effects of hearing and speechreading in speech recognition. 90 3.3.1. Independence-additivity model. 90 3.3.2. The response strength model. 91 3.3.3. Maxijidzation principle. 94 3.3.4. Interaction between hearing and speech- reading. 95 3.4. Test of the combination principles. 99 3.4.1. Introduction. 99 3.4.2. Multi-choice tasks. 100 3.4.3. Two-choice tasks. 104 3.4.4. Combined infonnatian from different features. 109 3.4.5. Discussion. 112 3.5. Concluding remarks. 114 CHAPTER 4. SEQUENTIAL ELIMINATION PRINCIPLE AND MODE OF FEATURE PROCESSING. 116 4.1. Introduction. 116 4.2. Modes of feature processing: same-different judgments. 117 4.3. Underlying decision structure and response latency. 133 4.4. Concluding remarks. 150 CHAPTER 5. FEATURE SYSTEM AND SPEECH PERCEIVER'S INTER PRETATION OF PHONEMES. -,51 5.1. Introduction. 151 5.2. Feature specifications and recognition data. 152 5.3. The hypothesis of binarity. 160 5.3.1. The optimal feature system. 160 5.3.2. The hypothesis of strict binarity. 163 5.4. The hypothesis of feature doninance. 167 5.4.1. The phonological feature hierarchy. 167 5.4.2. Psychological validity of the theory of markedness. 172 5.5. Interpretation of feature hierarchies from recognition data. 177 5.5.1. Feature hierarchy for consonants. 177 5.5.2. Feature hierarchy for vowels. 182 5.5.3. Speech code and speech percenter's interpretation. 185 APPENDIX 188 SUNMARY 201 SAMENVATTING 208 REFERENCES 216 CHAPTER 1 INTRODUCTION 1.1. AIM, SCOPE AND EXPOSITION The sensory pathway through which we receive speech reflects the activities of the speech producing system in a specific manner. The physical stimuli that are received may be either sound or visual pattern (e.g. seen movements of articulators). In this sense hearing and speechreading (lipreading) are merely different means of perceiv ing the same speech event. It does not mean, however, that given opt imal hearing and speechreading conditions, speech will be received with the same accuracy irrespective of which one of these two input mod alities is being used. With speechreading less distinctions can be made between several elements than with hearing (e.g. Woodward and Barber, 1961; Ewing, 1962 and Frisina, 1963). Another important point of difference between hearing and speech- reading concerns the fact that in speechreading one is restricted to a direct face to face communication, thus to a specific body relation to the speaker. Of course, this does not mean that the speaker must be bodily present. The communication may be perfectly established by means of a visual recording system, for example, a closed circuit television system that enables the speaker and speechreader to be in different places. Nevertheless one must look in a specific direction in order to receive the relevant speech cues. In contrast to speechreading, however, hearing encompasses all directions all the time. When we are listening, it is not necessary to maintain a fixed body position with respect to either a fixed or changing environment. 11 Notwithstanding the limited proficiency with speechreading, it must be considered one of the means by which the speech producing activities of the speech mechanism can be recovered. If we are interested in speech recognition, this identical referent (speech producing mechanism) for hearing and speechreading becomes very important. More specifically, by referring to this common speech-encoding mechanism one may initially disregard altogether the specificity of the physical parameters that are involved separately in hearing and speech- reading and attempt to describe the recognition of speech elements received both by hearing and speechreading in terms of the same set of articulatory attributes. In the present study, precisely this line of reasoning has been followed. We have exploited the possibility of assuming a recognition system that is characterized by the processing of general attributes (features) of stimuli, but also in doing so of using as a characterization of speech elements their description in terms of phonetic distinctive features, which is essentially a linguistic description of elementary classes of speech events (phonemes). In this way, we could describe the recognition of speech elements independent of the way (hearing or speechreading) speech has been received. When speech recognition is, moreover, described in terms of an attribute analysis of the incoming physical speech stimuli, we can see whether it is valid to consider this attribute analysis identical to the phonetic distinctive feature description of the speech elements that we started with. If this is the case, i.e. if phonetic distinctive features, which indicate articulatory differences between phonemes, have perceptual validity, then one cannot escape the necessity to consider their role and relevance with respect to the structure of the underlying speech recognition system. The aim of the study, therefore, is to consider the relevance of the phonetic distinctive feature as a unit in phoneme recognition irrespective of the input modality and given the relevance of this unit, the way in which it is processed. We attempt to show that the same distinctive feature description of phonemes can be used to account for the processing of speech which is received in three different ways: auditorily (hearing only), visually (speechreading = lipreading only) and audiovisually (hearing and speech- reading combined). This will be done by studying the recognition of consonants and vowels presented in Dutch CV (consonant - vowel) 12 syllables to normal hearing subjects. In either way of speech reception, performance will be considered the result of one underlying speech processing device, which uses the distinctive feature as a processing unit. Furthermore, it will be assumed that distinctive features are processed serially (one feature at a time) rather than in a parallel (several features simultaneously) fashion. This point of view of course leads us to the question whether the decoding of speech follows a functional scheme strictly conforming to sane linguistic feature system or whether it will conform to an "empirical" feature systan which deviates in number and kind of features from the linguistic system. Finally, an attempt is made to test various combination principles which might Be descriptive of the interaction between hearing and speechreading, when speech is received audiovisually. More specifically, we will be concerned with finding a combination rule on the basis of which the combination from hearing and speechreading can be described satisfactorily. The study is organized in the following way. The remaining part of the present chapter is devoted to a description of the status of the distinctive feature, both as a linguistic category and as a unit in speech processing. With respect to the role of the distinctive feature as a linguistic category, we shall be concerned with both the distinctive feature description of the phoneme and the nature of the linguistic feature system that has been used in our study to obtain a linguistic description of the various phonemes. Considering the distinctive feature as a processing unit, we first present evidence from the literature which confirms its role in speech processing, then a preliminary model is discussed which expresses the initial