TIPA: a System for Processing Phonetic Symbols in LATEX
Total Page:16
File Type:pdf, Size:1020Kb
TIPA: A System for Processing Phonetic Symbols in LATEX Fukui Rei Department of Asian and Pacific Linguistics, Institute of Cross-Cultural Studies, Faculty of Letters, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, TOKYO 113 Japan [email protected] Introduction TIPA Encoding TIPA1 is a system for processing IPA (International Selection of symbols The selection of TIPA pho- Phonetic Alphabet) symbols in LATEX. It is based netic symbols5 was made based on the following 2 AFONT on TSIPA but both MET source codes and works. LATEX macros have been thoroughly rewritten so • Phonetic Symbol Guide [9] (henceforth abbre- that it can be considered as a new system. viated as PSG). Among many features of TIPA, the following • The official IPA charts of ’49, ’79, ’89 and ’93 are the new features as compared with TSIPA or any versions. other existing systems for processing IPA symbols. • Recent articles published in the JIPA6,such • A new 256 character encoding for phonetic sym- as “Report on the 1989 Kiel Convention” [6], bols (‘T3’), which includes all the symbols and “Further report on the 1989 Kiel Convention” diacritics found in the recent versions of IPA [7], “Computer Codes for Phonetic Symbols” and some non-IPA symbols. [3], “Council actions on revisions of the IPA” • Complete support of LATEX2ε. [8], etc. • • Roman, slanted, bold, bold extended and sans An unpublished paper by J. C. Wells: “Com- serif font styles. puter-coding the IPA: a proposed extension of SAMPA” [10]. • Easy input method in the IPA environment. • Popular textbooks on phonetics. • Extended macros for accents and diacritics.3 More specifically, TIPA contains all the sym- • A flexible system of macros for ‘tone letters’. bols, including diacritics, defined in the ’79, ’89 and • An optional package (vowel.sty) for drawing ’93 versions of IPA. And in the case of the ’49 version vowel diagrams.4 of IPA, which is described in the Principles [5], • A slightly modified set of fonts that go well there are too many obsolete symbols and only those when used with Times Roman and Helvetica symbols that had had some popularity at least for fonts. some time or for some group of people are included. Besides IPA symbols, TIPA also contains sym- 1 TIPA stands for TEXIPAor Tokyo IPA. The primary bols that are useful for the following areas of pho- ftp site in which the latest version of TIPA is placed is netics and linguistics. ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa,andalsoit • is mirrored onto the directory fonts/tipa of the CTAN Symbols used in the American phonetics (e.g. £ ± « archives. ¯, , , ,etc.). 2 TSIPA was made in 1992 by Kobayashi Hajime, Fukui • Symbols used in the historical study of Indo- Rei and Shirakawa Shun. It is available from a CTAN archive. ß ÿ Þ º » One problem with TSIPA was that symbols already in- European languages (e.g. þ, , , , , ,and a e cluded in OT1, T1 or Math fonts are excluded, because of accents such as , ,etc.). the limitation of its 128 character encoding. As a result, a • Symbols used in the phonetic description of string of phonetic representation had to be often composed § ¢ µ languages in East Asia (e.g. ¥, , , , ,etc.). of symbols from different fonts, disabling the possibility of automatic inter-word kerning. And also too many symbols • Diacritics used in ‘extIPA Symbols for Disor- had to be realized as macros. dered Speech’ [4] and ‘VoQS (Voice Quality 3 These macros are now defined in a separate file called Symbols)’ [1] (e.g. n ,f, m, etc). ‘ ’ in order for the authors of other packages to " exaccent.sty " be able to make use of them. The idea of separating these It should be also noted that TIPA includes all macros from other ones was suggested by Frank Mittelbach. the necessary elements of ‘tone letters’, enabling 4 This package (vowel.sty) can be used independently from the TIPA package. Documentation is also made sepa- 5 In the case of TSIPA, the selection of symbols was based rately in ‘vowel.tex’ so that no further mention will be made on “Computer coding of the IPA: Supplementary Report” [2]. here. 6 Journal of the International Phonetic Association. 102 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting TIPA: A System for Processing Phonetic Symbols in LATEX all the theoretically possible combinations of the ’0 ’1 ’2 ’3 ’4 ’5 ’6 ’7 tone letter system. In the recent publication of the ’00x International Phonetic Association tone letters are Accents and diacritics admitted as an official way of representing tones ’04x but the treatment of tone letters is quite insuffi- ’05x Punctuation marks cient in that only a limited number of combina- ’06x Basic IPA symbols I (vowels) tion is allowed. This is apparently due to the fact ’07x Diacritics, etc. that there has been no ‘portable’ way of combining ’10x symbols that can be used across various computer Basic IPA symbols II environments. Therefore TEX’s productive system of macro is an ideal tool for handling a system like ’13x Diacritics, etc. tone letters. ’14x Pct. AFONT In the process of writing MET source Basic IPA symbols III codes for TIPA phonetic symbols there have been (lowercase letters) many problems besides the one with the selection ’17x Diacr. of symbols. One of such problems was that some- ’20x times the exact shape of a symbol was unclear. Tone letters and other For example, the shapes of the symbols such as  ’23x suprasegmentals (Stretched C), J (Curly-tail J) differ according to ’24x sources. This is partly due to the fact that the Old IPA, non-IPA symbols IPA has been continuously revised for the past few ’27x decades, and partly due to the fact that different ’30x ways of computerizing phonetic symbols on different Extended IPA symbols systems have resulted in the diversity of the shapes ’33x Gmn. of phonetic symbols. ’34x Although there is no definite answer to such a Basic IPA symbols IV problem yet, it seems to me that it is a privilege of ’37x Gmn. AFONT those working with MET to have a systematic way of controlling the shapes of phonetic symbols. Pct. = Punctuation marks, Diacr. = Diacritics, Gmn. = Symbols for Germanic languages. Encoding The 256 character encoding of TIPA is now officially called the ‘T3’encoding.7 In deciding Table 1: Layout of the T3 encoding this new encoding, care is taken to harmonize with existing other encodings, especially with the T1 encoding. Also the easiness of inputting phonetic symbols is taken into consideration in such a way text encodings. However it is a matter of trade-off to that frequently used symbols can be input with decide which punctuation marks are to be included. small number of keystrokes. For example ‘:’ and ‘;’ might have been preserved in Table 1 shows the layout of the T3 encoding. T3 but in this case ‘:’ has been traditionally used as The basic structure of the encoding found in the a substitute for the length mark ‘ :’ so that I decided first half of the table (character codes ’000-’177) to exclude ‘:’ in favor of the easiness of inputting the is based on normal text encodings (ASCII, OT1 length mark by a single keystroke. and T1) in that sectioning of this area into several The encoding of the section for accents and groups such as the section for accents and diacritics, diacritics is closely related to T1 in that the accents the section for punctuation marks, the section for commonly included in T1 and T3 have the same numerals, the sections for uppercase and lowercase encoding. letters is basically the same with these encodings. The sections for numerals and uppercase letters Note also that the T3 encoding contains not are filled with phonetic symbols that are used fre- only phonetic symbols but also usual punctuation quently in many languages, because numerals and marks that are used with phonetic symbols, and in uppercase letters are usually not used as phonetic such cases the same codes are assigned as the normal symbols. And the assignments made here are used as the ‘shortcut characters’, which will be explained 7 In a discussion with the LATEX2ε team it was suggested that the 128 character encoding used in WSUIPA would be in the section entitled “Ordinary phonetic symbols” refered to as the OT3 encoding. (page 105). TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 103 Fukui Rei ASCII :;" For a table of the T3 encoding, see Appendix C ; " TIPA : (page 114). ASCII 0123 456789 TIPA fonts 1 2 3 4 5 6 7 8 9 TIPA 0 ASCII @ABC DEFGHI This version of TIPA includes two families of IPA A B C D E F G H I TIPA @ fonts, tipa and xipa. The former family of fonts ASCII JKLM NOPQRS is for normal use with LATEX, and the latter family K L M N O P Q R S TIPA J is intended to be used with ‘times.sty’(PSNFSS). ASCII TUVW XYZ| They all have the same T3 encoding as explained in U V W X Y Z | TIPA T the previous section. • tipa Table 2: TIPA shortcut characters Roman: tipa8, tipa9, tipa10, tipa12, tipa17 As for the section for uppercase letters in the Slanted: tipasl8, tipasl9, tipasl10, usual text encoding, a series of discussion among tipasl12 the members of the ling-tex mailing list revealed Bold extended: tipabx8, tipabx9, that there seem to be a certain amount of consensus tipabx10, tipabx12 on what symbols are to be assigned to each code. Sans serif: tipass8, tipass9, tipass10, For example they were almost unanimous for the tipass12, tipass17 B D S assignments such as A for A, for B, for D, for S, Bold: tipab10 T for T, etc.