Phones and Phonemes
Total Page:16
File Type:pdf, Size:1020Kb
NLPA-Phon1 (4/10/07) © P. Coxhead, 2006 Page 1 Natural Language Processing & Applications Phones and Phonemes 1 Phonemes If we are to understand how speech might be generated or recognized by a computer, we need to study some of the underlying linguistic theory. The aim here is to UNDERSTAND the theory rather than memorize it. I’ve tried to reduce and simplify as much as possible without serious inaccuracy. Speech consists of sequences of sounds. The use of an instrument (such as a speech spectro- graph) shows that most of normal speech consists of continuous sounds, both within words and across word boundaries. Speakers of a language can easily dissect its continuous sounds into words. With more difficulty, they can split words into component sounds, or ‘segments’. However, it is not always clear where to stop splitting. In the word strip, for example, should the sound represented by the letters str be treated as a unit, be split into the two sounds represented by st and r, or be split into the three sounds represented by s, t and r? One approach to isolating component sounds is to look for ‘distinctive unit sounds’ or phonemes.1 For example, three phonemes can be distinguished in the word cat, corresponding to the letters c, a and t (but of course English spelling is notoriously non- phonemic so correspondence of phonemes and letters should not be expected). How do we know that these three are ‘distinctive unit sounds’ or phonemes of the English language? NOT from the sounds themselves. A speech spectrograph will not show a neat division of the sound of the word cat into three parts. Rather we know these are phonemes because BOTH of the following are true: • The three are ‘unit’ sounds. A different English word cannot be formed by replacing part of the c sound and part of the a sound by a different sound. The whole of a phoneme must be replaced to make a valid English word. Thus the c sound in cat is a ‘unit’ sound because it can be removed entirely to change cat into at, or replaced entirely by a b sound to change cat into bat. • The three are ‘distinctive’ sounds. Changing a single phoneme in cat is sufficient to make a word which is recognizably different to a speaker of English. The words bat, kit and cad are each minimally different from the word cat but are recognizably different words to an English speaker. In summary, a phoneme is defined as a ‘distinctive unit sound’ of a language: ‘unit’ because the whole of a phoneme must be substituted to make a different word; ‘distinctive’ because changing a single phoneme can generate a word which is recognizably different to a speaker of the language. Note that ‘phoneme’ is a subjective concept, not an objective one. To test whether a partic- ular sound operates as a phoneme in a language we cannot use instruments such as speech spectrographs. Rather we have to ask a speaker of the language whether removing that sound from a word and substituting another (preferably one already known to be a phoneme of the language) generates a new word (or what could be a new word if the new sound sequence doesn’t exist in the language). Since English spelling in particular is non-phonemic, we need some way of consistently representing phonemes. I will use IPA (International Phonetic Alphabet) symbols where appropriate. You are NOT expected to learn these; a table (see Appendix) will be given if and when required. By convention, phonemic representations of sounds are enclosed in slashes. Thus the English words discussed earlier, cat, bat, kit and cad, can be represented phonemically as /kæt/, /bæt/, /kt/ and /kæd/. Comparing ‘minimal pairs’ confirms that /k/, /b/, /æ/, //, /t/ and /d/ are indeed English phonemes; e.g. /æ/ is a phoneme because in the word cat it can be substituted by // to make the word kit. (Note that these six might or might 1 I’ve noticed that a common mistake in reproducing this definition in examinations is to replace distinctive by distinct. Don’t! Distinctive here refers to the ability of a phoneme to make distinguish between words; distinct would just mean that the phonemes were different, which isn’t the same. Page 2 NLPA-Phon1 (4/10/07) not be phonemes in another language.) It’s important to note what I have NOT said. I have not said that a phoneme corresponds to a specific sound. Indeed it does not. No two individuals pronounce the English /k/ phoneme in exactly the same way – for one thing their vocal tracts are of different shapes. Neither does an individual produce exactly the same sound on different occasions. More importantly, the pronunciation of a phoneme is affected by its neighbours in a word. For example there is a consistent difference between the pronunciation of the /k/ phoneme in cat and its pronun- ciation in kit. In normal speech phonemes ‘run together’. One consequence is that because /æ/ is pronounced further back in the throat than //, any preceding /k/ will be as well. A phoneme of a language represents a CLUSTER of similar sounds which a speaker of that language does not regard as distinctively different from one another. I will return to this issue later. 2 Production of Phonemes Remembering that a phoneme represents a cluster of sounds treated in some sense as equivalent by speakers of a given language, some 40-odd phonemes can be distinguished in most dialects of English. Although all the sounds corresponding to a phoneme may not be produced in exactly the same way, for each phoneme we can describe the ‘typical’ way in which it is produced. The sounds corresponding to all English phonemes are powered by lung air being pushed out. A sound is then produced in two ways: • By vibrating the vocal ‘cords’: two muscular folds of skin low down in the throat which can be made to vibrate. The frequency of the vibration can be changed (within limits). • By altering the positions of components of the throat and mouth between the vocal cords and the exit of air. These alterations may merely modify the note produced by the vocal cords (by changing the size of the cavity) or may themselves produce a noise (for example by causing air friction). Vowels When lung air passes over the vibrating vocal cords and then passes freely out of the mouth, the sounds are called vowels. Thus vowels can be continued until you run out of breath. The positions of the lips and tongue alter the size and shape of the resonating cavity to produce different sounds. Vowels can be classified along a number of independent directions, including: • The height of the tongue (i.e. the size of the smallest opening). • The part of the tongue (front to back) causing the smallest opening. • The degree of lip rounding (open to rounded). Some examples in ‘Standard English English’ (SEE): /i/ is a high, front, unrounded vowel, as in beet /bit/ or neat /nit/.2 // is a low, back, unrounded vowel, as in bar and bath. /u/ is a high, back, rounded vowel, as in spoon. Front Mid Back / / beat / / boot High i u // bit // put / / about, Mid // bet Bert, sofa // bought 3 Low /æ/ bat // but // pot // bar In English, back vowels (other than the very lowest) are automatically rounded, front and 2 A more precise representation of this phoneme is /i/, where the shows length. I will generally ignore length distinctions; the theory presented here is intended to be the minimum necessary. 3 Although it is traditional to use // for this English phoneme, strictly this is not the correct IPA symbol. NLPA-Phon1 (4/10/07) Page 3 mid vowels are not, so that a classification needs only two dimensions, as in the table below. Note that ‘Standard English English’ (SEE) pronunciation is intended. In addition to these ‘pure’ vowels, English makes considerable use of diphthongs: sequences of two vowels ‘run together’ to form a SINGLE phoneme. A diphthong may include vowels not normally found alone. Examples of SEE diphthongs are given in the following table. /e/ baby, wait, day /o/ bone, soap, no // ear, cheer 4 /a/ kite, cry /a/ cow, out // air, share // coin, toy // tour In principle, vowels are infinitely variable as the position of the tongue and lips can be varied continuously. (This makes learning to make the correct vowel sounds in a foreign language very difficult.) Languages tend to use different sets of more-or-less distinct vowels. English dialects vary greatly in the vowel phonemes used. In particular, American English differs considerably from English English. This is relevant because the English speech synthesis software currently available is often based on Standard American English (SAE). The main difference is that in SAE the back vowels //, // and // are usually replaced by /a/, so that the vowels in taught, tot and tart are all pronounced as /a/ – the first sound in the SEE diphthong /a/. (However, // is retained before /r/). The pure vowels /e/ and /o/ may be substituted for the SEE diphthongs /e/ and /o/. Further, most Americans at least partially pronounce r sounds which SEE speakers omit. Thus ear in SAE is closer to /r/ rather than SEE //. Stops By contrast with vowels, some sounds are made by completely stopping and then releasing the flow of air out of the mouth. These sounds are called stops (or plosives). In SEE there are three stop positions, corresponding to the initial phonemes in pale, tale and kale. The sound is stopped respectively by the lips (bilabial), by the front of the tongue and the ridge behind the top teeth (alveolar), and by the back of the tongue and the soft palate (velar).