Spanish Adaptation of SAMPA and Automatic Phonetic Transcription
Llisterri, J., & Mariño, J. B. (1993). Spanish adaptation of SAMPA and automatic phonetic transcription. SAM-A/UPC/001/V1. ESPRIT Project 6819 SAM-A, Speech Technology Assessment in Multilingual Applications. http://liceu.uab.cat/~joaquim/publicacions/ SAMPA_Spanish_93.pdf SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
ESPRIT PROJECT 6819 (SAM-A0 Speech Technology Assessment in Multilingual Applications
Report Title: Spanish adaptation of SAMPA and automatic phonetic transcription
Document No: SAM-A/UPC/001/V1
Status: Final
Date: 19.2.1993 Revised: 20.4.1993
Source: Joaquim Llisterri Universidad Autónoma de Barcelona
Tel 34 3 581 12 16 Fax 34 3 581 16 86
José B. Mariño Universidad Politécnica de Cataluña
Tel 34 3 401 64 37 Fax 34 3 401 64 47
Note: This work was partially supported by Spanish Government TIC 91-1488-C06-02
1 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
1. The Spanish adaptation of SAMPA
1.1. Phonetic and phonological inventories of Spanish
1.1.1. The phonetic inventory
The traditional description of the inventory of phonetic segments used by Peninsular Spanish speakers is found in Navarro (1918). His early descriptive work can be completed with the detailed list of allophones compiled by Canellada - Kuhlman (1987). The number of allophones quoted by these two sources amounts up to 20 vocalic elements and 43 consonantal segments. The causes of the allophonic variability according to these traditional sources can be summarized as follows:
Assimilations of place of articulation
¥ Interdental allophones of /t/, /n/ and /l/, the labiodental allophone of /m/, dental allophones of /n/, /s/ and /l/ and the palatal and velar allophones of /n/ are included in this category. The occurrence of these allophones is always conditioned by the place of articulation of the following segment.
¥ The vowels can be modified according to the following consonant: /a/ has a palatal allophone and a velar one; the quality of the other vowels is also changed by the following consonant in the same syllable, producing open and close varieties.
Changes in manner of articulation
¥ The approximant allophones of /b/, /d/ and /g/ and the affricate allophone of /y/ appear according to the character of the preceding consonant.
Devoicing and voicing
¥ /b/, / /, /r/ and /g/ have devoiced allophones in syllable-final position before a voiceless consonant.
¥ / / and /s/ have voiced allophones when they are followed by a voiced consonant.
Position in the syllable and syllabic type
¥ According to Navarro (1918) the Spanish vowels have close and open allophones depending on the structure of the syllable in which they appear; vowels tend to be closed in CV syllables and open in CVC syllables.
¥ The vowels /i/ and /u/ have allophonic variants -- known as semiconsonants or semivowels -- according to their nuclear or peripheral position in the syllable.
¥ The approximant allophones of /b/, /d/ and /g/ and the affricate allophone of /y/ are also conditioned by their position in the syllable
2 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
Position in the word
¥ Initial vs. non-initial position in the word is another factor that controls the appearance of the approximant allophones of /b/, /d/ and /g/ and the affricate allophone of /y/ .
¥ Traditional phoneticians such as Navarro (1918) distinguish lax allophones of the vowels depending on their position within the word.
Stress
¥ As well as the position in the word, the situation with respect to the main stress also results in lax allophones of the vowels. The affricate allophone of /y/ is also conditioned by the stress.
1.1.2. The phonological inventory
On the other hand, the inventory of phonological units proposed by classical authors such as Alarcos (1950) consists of 5 vowels and 19 consonants. The phonological segments identified by Alarcos are the following ( transcribed according to IPA conventions ):
Phoneme (IPA) voiceless labial plosive voiced labial plosive voiceless labiodental fricative voiceless dental plosive voiced dental plosive voiceless interdental fricative voiceless palatal affricate voiced palatal fricative voiceless alveolar fricative voiceless velar plosive voiced velar plosive voiceless velar fricative voiced labial nasal voiced alveolar nasal voiced palatal nasal voiced alveolar lateral voiced palatal lateral voiced alveolar tap voiced alveolar trill
3 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
a central open vowel e front mid vowel i front close vowel o back mid rounded vowel u back close rounded vowel
In order to be able to use a manageable number of units, but also to ensure a certain amount of phonetic detail, a compromise has been sought between the maximal number of allophones and the relatively short list of phonological units.
1.2. Statistical study of the occurrence of Spanish allophones
To arrive at such a compromise, a statistical study of the frequency of occurrence of the Spanish allophones has been undertaken. Since no data on the distribution of the allophones were available1, it was decided to undertake a pilot experiment to evaluate the frequency of occurrence of the set of allophones described in the literature.
Three native Spanish speakers aged between 20 and 40 were interviewed by one experimenter for around one hour to obtain a large sample of speech. The interviewers restricted their interventions to the minimum, so that semi-spontaneous guided interviews were obtained. The recordings took place in an acoustically controlled environment using professional recording equipment An orthographic transcription was made, introducing punctuation according to prosodic, syntactic and semantic criteria. This transcription was the input of au automatic grapheme to allophone conversion programme, that generated a phonetic output with most of the allophones described in the literature. A sample of more that 100.000 segments was obtained, and the frequency of occurrence of each allophone, as well as other parameters, was computed.
1.3. Final inventory for Spanish
A final inventory was established by eliminating all the allophones with a frequency of occurrence below 0.10% in the corpus analyzed. Following this procedure, 31 segments were retained. The following table shows the IPA transcription for each allophone, its phonetic definition, the frequency of occurrence in the analyzed corpus and the frequency of occurrence quoted by Rojo (1991) when available.
IPA % of % of occurrence occurrence in the according corpus to Rojo analyzed (1991) voiceless bilabial plosive 2.6 2.66
1 Previously published studies were carried out considering only phonological segments ( see. for example. Rojo (1991) )
4 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
voiced bilabial plosive 0.45 2.66 voiceless dental plosive 4.63 4.48 voiced dental plosive 0.76 4.79 voiceless velar plosive 4.04 3.98 voiced velar plosive 0.11 0.95 voiced bilabial nasal 3.63 3.09 voiced alveolar nasal 7.02 6.99 voiced palatal nasal 0.27 0.19 voiced velar nasal 0.46 included in /n/ voiceless palatal affricate 0.40 0.28 voiced bilabial approximant 2.47 included in /b/ voiceless labiodental fricative 0.51 0.68 voiceless interdental fricative 1.53 1.68 voiced dental approximant 3.20 included in /d/ voiceless alveolar fricative 6.95 7.58 voiced alveolar fricative 1.33 included in /s/ voiced palatal fricative 0.19 0.22 voiceless velar fricative 0.63 0.73 voiced velar approximant 0.79 included in /g/ voiced alveolar lateral 4.25 5.08 voiced palatal lateral 0.54 0.38 voiced alveolar trill 0.40 0.79 voiced alveolar tap 4.25 5.67 front close vowel 4.29 7.5 voiced palatal approximant 2.60 included in /i/ front mid vowel 13.72 13.51 central open vowel 13.43 13.40 back mid rounded vowel 10.37 9.57 back close rounded vowel 1.98 3.16 voiced labial-velar approximant 1.35 included in /u/
Thus, the final inventory contains the 24 phonemes defined by Alarcos (1950) plus 7 segments traditionally considered allophones: the three approximant variants of the
5 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription voiced plosives [ ], the voiced allophone of /s/ -- [z] -- the velar allophone of /n/ -- [ ]-- and the two semiconsonants or semivowels -- [j w] --.
1.4. Phonetic notation in Spanish using SAMPA
In our Spanish adaptation of SAMPA we have taken into account the proposals made by Wells (1989: 52-53), which are summarized below:
Approximant allophones of / b d g /
Since [D] ( IPA [ ] ) and [G] ( IPA [ ] ) already exist in SAMPA, only [B] is needed to represent the approximant [ ]
Alveolar trill
The digraph [rr] can be used to represent the alveolar trill.
Affricate allophone of /y/
The affricate allophone of /y/ can be symbolized by [dZ] ( IPA [ ] ). However, this allophone has not been retained in our basic inventory due to its low frequency of occurrence.
Palatal fricative consonant
According to Wells (1989:52) the palatal fricative consonant /y/ can be considered an allophone of the semivowel [j]. Alarcos (1950 ¤ 98) offers convincing arguments in favor of the phonological status of /y/ based on functional grounds and cites minimal pairs contrasting /y/ and the other consonants of the phonological system. His solution is widely accepted in the literature on Spanish phonetics and phonology and, moreover, it does not seem to be counterintuitive with regard to native speakers' intuitions. It is widely accepted that the phoneme /y/ can be realized as a fricative, as an approximant, and also as an affricate under certain conditions. Following Wells' (personal communication ) suggestion, this phoneme will be represented in SAMPA by the digraph /jj/.
The following table summarizes the set of symbols that can be used in SAMPA to transcribe the phonemes and allophones of Spanish selected according to the previously described criteria.
IPA SAMPA Example Transcription p voiceless bilabial plosive pala "pala b voiced bilabial plosive bala "bala t voiceless dental plosive tala "tala d voiced dental plosive dar dar k voiceless velar plosive cala "kala g voiced velar plosive gala "gala
6 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription
m voiced bilabial nasal mala "mala n voiced alveolar nasal nada "naDa J voiced palatal nasal caña "kaJa N voiced velar nasal hongo "oNgo tS voiceless palatal affricate chico "tSiko B voiced bilabial approximant lava "laBa f voiceless labiodental fricative falso "falso T voiceless interdental fricative zona "Tona D voiced dental approximant cada "kaDa s voiceless alveolar fricative sala "sala z voiced alveolar fricative desde "dezDe jj voiced palatal fricative ayer a"jjer x voiceless velar fricative jamón xa"mon G voiced velar approximant lago "laGo l voiced alveolar lateral la la L voiced palatal lateral llana "Lana rr voiced alveolar trill carro "karro r voiced alveolar tap caro "karo i front close vowel tila "tila j voiced palatal approximant labio "laBjo e front mid vowel tela "tela a central open vowel tal tal o back mid rounded vowel todo "toDo u back close rounded vowel tul tul w voiced labial-velar approximant agua "aGwa
If there is a need to represent other allophones not present in the set of segments described, the following SAMPA symbols are available:
IPA SAMPA Example Transcription dZ voiced palatal affricate conyugal kondZu"Gal
2. Automatic phonetic transcription for Spanish: generating SAMPA representations from orthographic representations
2.1. Grapheme to allophone correspondences
In order to produce an automatic transcription the correspondences between the graphemes and the SAMPA symbols have to be established. The following table summarizes some of the main correspondences that has been taken into account to design the transcription algorithm.
7 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription grapheme rules for the transcription to SAMPA examples a after a pause: b after , , p perro: "perro 8 SAM-A Spanish adaptation of SAMPA and automatic phonetic transcription 3. References ALARCOS, E. (1950) Fonología española. Madrid: Gredos ( Biblioteca Románica Hispánica, Manuales 1 ), 1965 4a ed. aumentada y revisada. CANELLADA, M. J. - KUHLMAN MADSEN, J. (1987) Pronunciación del español. Lengua hablada y literaria. Madrid: Castalia. NAVARRO TOMÁS, T. (1918) Manual de pronunciación española. Consejo Superior de Investigaciones Científicas: Madrid, Instituto Miguel de Cervantes ( Publicaciones de la Revista de Filología Española, III ). 21» edición, 1982. ROJO, G. (1991) " Frecuencia de fonemas en español actual ", in BREA, M.- FERNANDEZ REI, F. ( Coord ) Homenaxe ó profesor Constantino García. Santiago de Compostela: Universidade de Santiago de Compostela. Servicio de Publicación e Intercambio Científico. Pp. 451-467. WELLS, J. C. (1989) " Computer-coded phonemic notation of individual languages of the European Community ", Journal of the International Phonetic Association 19,1: 31-54 9 always followed by : k queso: "keso
: rr honra: "onrra other cases: r arpa: "arpa trampa: "trampa pera: "pera amor: a"mor s rasgo: "rrasGo casa: "kasa trasto: trasto : no sound queso: "keso in non nuclear position in the syllable: w cigüeña: Ti"GweJa in nuclear position in the syllable: u lujo: "luxo