FUPA) in the UCS Source: Michael Everson and Klaas Ruppel Version: 2.0 Date: 1998-11-02

Title: Encoding Finno-Ugric Phonetic Alphabet (FUPA) in the UCS Source: Michael Everson and Klaas Ruppel Version: 2.0 Date: 1998-11-02 The Finno-Ugric Phonetic Alphabet (FUPA) 0. Introduction This paper presents a collection of characters, diacritics, and notation marks used in the FUPA scheme. This transcription is and has been used creatively by different scientists in different countries at different times (Lagercrantz probably made the most baroque use of the system); therefore the phonetic or phonematic values of the elements of the FUPA are intentionally left aside here. However, all the users of this scheme have one thing in common: they use the FUPA in a technical way, as described below. We prefer the term ÒFUPAÓ to ÒFUTÓ (Finno- Ugric Transcription) because of its analogy to the IPA (International Phonetic Alphabet). It is not the intention of this paper to encourage exact (or over-exact) phonetic transcription or the extensive use of diacritics. The intention is rather to facilitate the use of the FUPA by the Uralicist community in the context of the Universal Character Set (UCS), by ensuring that a comprehensive analysis of the system leads to the possibility of encoding FUPA texts, past and present, with the UCS. In the exploratory versions of this document, firm advice on how to use FUPA will not be given; when the study is complete, however, advice on which characters in the UCS refer to whch letters used in the FUPA system will be given. An example of this, we believe, will be the advice for the LATIN SMALL LETTER ENG to be used in coded representations of FUPA texts, past and present, for the velar nasal, in preference to the GREEK SMALL LETTER ETA, although a great many printed texts use an eta glyph (Ç) and not an eng glyph (Ë). This project will lead to a standardization and normalization of the FUPA which will enable Uralicists to use the UCS unambiguously to represent their texts, past, present, and future. It is well known that this collection is not yet complete and may have other faults. Comment is invited; the point of this project is to arrive at a consensus on the FUPA in the UCS for all future work in Uralics. Characters are identified by their glyphs, their UCS positions, and their names. If no UCS position (no hexadecimal code) is given, the character is not presently found in the UCS (ISO/IEC 10646 and Unicode), and is therefore a candidate for inclusion. It is intended that missing characters be proposed for addition to the UCS. 1.0. Basic elements. The basic elements of the FUPA are SMALL LETTERs. The normal display of FUPA examples is italic. Full texts may be presented in plain style. Letters crossed out here indicate that they have not yet been identified as having been used in any Uralicist source. Letters followed by an asterisk * indicate problematic characters which need further clarification and discussion. Further definitions are given below. The glyphs below are given in plain style, not in italics. NOTE: In order to facilitate discussion of this document, the following comment with regard to the asterisked characters should be noted: The IPA and the FUPA are different systems but there is some overlap between them. The IPA does not use the curly g, for instance, but instead always Page 1 uses the script ‘ to represent the voiced velar stop. No FUPA text has yet been found which uses both at the same time for different purposes. The question is, shall we, in standardizing the FUPA, abolish the distinction between g and ‘, or shall we ensure that both characters are defined so that the choice of one or the other is left to the discretion of the Uralicist? In ordinary text this is not so important Ð both characters are already in the UCS Ð but the superscript forms, Æ and ·, are not. Both of these characters must, if the distinction is considered important, be proposed to the responsible committees. This distinction is even more acutely questionable with regard to the letters alpha and epsilon; that is, with regard to the choice between the IPA alpha and open e, and the Greek alpha and epsilon. A choice must be made with regard to FUPA practice as to which of the available characters should be used. Texts do differentiate between a round script Latin • and a round crossed Ü (Greek alpha), but does this mean that one (in non italics) is Latin a and one is Latin •, or is the distinction between Latin • and Greek Ü? Or is one Latin a, with an italic •, and the other Latin •, with an italic Ü? Latin open e is always single- humped ‹, but Greek epsilon may be double-humped ‹ or single humped Ý. Is the latter acceptable in FUPA transcription? This issue is probably the first which the URA-LIST should undertake with regard to discussion of the present paper. 1.1. Base character. A character with no diacritics attached to it. Base characters can be added to the UCS. 1.2. Precomposed character. A character with one or more diacritic mark attached to it. Precomposed characters include, for example, Š, Ÿ, and Œ (regardless of their use as basic letters in Germanic and Finnic alphabets). There are many precomposed characters already encoded in the UCS, but it is proposed that, for FUPA support, no additional precomposed characters should be added to the UCS. The FUPA must make use of Level 3, combining character technology, precisely because it is a dynamic and productive system. 1.3. Combining characters. A combining character as defined in the UCS is the same as what among Uralicists is called a diacritic or a diacritical mark. 1.4. The repertoire of base characters. In version 2 of this document, citations for attested characters here and a complete bibliography will be given. a 0061 LATIN SMALL LETTER A * . SovijŠrvi & Peltola 1977:3 € 0250 LATIN SMALL LETTER TURNED A * . Itkonen 1986:7 • 0251 LATIN SMALL LETTER ALPHA * . SovijŠrvi & Peltola 1977:4 ‚ 0252 LATIN SMALL LETTER TURNED ALPHA * . SovijŠrvi & Peltola 1977:4 ¾ 00E6 LATIN SMALL LETTER AE . SovijŠrvi & Peltola 1977:3 „ LATIN SMALL LETTER TURNED AE . Lehtisalo 1956:cvii Š LATIN SMALL LETTER SIDEWAYS AE . SovijŠrvi & Peltola 1977:3 b 0062 LATIN SMALL LETTER B . Itkonen 1986:7 € 0180 LATIN SMALL LETTER B WITH STROKE . Itkonen 1992:15 c 0063 LATIN SMALL LETTER C . Itkonen 1986:7 d 0064 LATIN SMALL LETTER D . SovijŠrvi & Peltola 1977:3 ‘ 0111 LATIN SMALL LETTER D WITH STROKE . Itkonen 1986:7 ð 00F0 LATIN SMALL LETTER ETH . SovijŠrvi & Peltola 1977:3 e 0065 LATIN SMALL LETTER E . SovijŠrvi & Peltola 1977:3 ‹ 025B LATIN SMALL LETTER OPEN E *. Itkonen 1986:7 ‰ 0259 LATIN SMALL LETTER SCHWA . SovijŠrvi & Peltola 1977:4 Page 2 ‚ LATIN SMALL LETTER TURNED OPEN E * . Itkonen 1958:xxxiii f 0066 LATIN SMALL LETTER F . SovijŠrvi & Peltola 1977:3 g 0067 LATIN SMALL LETTER G * . SovijŠrvi & Peltola 1977:3, Itkonen 1986:7 å 01E5 LATIN SMALL LETTER G WITH STROKE . Itkonen 1992:15 ‘ 0261 LATIN SMALL LETTER SCRIPT G * Toivonen 1948:xxvii, Itkonen 1958:xxxii “ 0263 LATIN SMALL LETTER GAMMA * h 0068 LATIN SMALL LETTER H . SovijŠrvi & Peltola 1977:3 – 0068 LATIN SMALL LETTER H WITH HOOK . SovijŠrvi & Peltola 1977:4 i 0069 LATIN SMALL LETTER I . SovijŠrvi & Peltola 1977:3 • LATIN SMALL LETTER TURNED I * . SovijŠrvi & Peltola 1977:4 j 006A LATIN SMALL LETTER J . SovijŠrvi & Peltola 1977:3 k 006B LATIN SMALL LETTER K . SovijŠrvi & Peltola 1977:3 l 006C LATIN SMALL LETTER L . SovijŠrvi & Peltola 1977:3 Â 0142 LATIN SMALL LETTER L WITH STROKE . SovijŠrvi & Peltola 1977:3 m 006D LATIN SMALL LETTER M . SovijŠrvi & Peltola 1977:3 n 006E LATIN SMALL LETTER N . SovijŠrvi & Peltola 1977:3 Ë 014B LATIN SMALL LETTER ENG . Itkonen 1986:7 o 006F LATIN SMALL LETTER O . SovijŠrvi & Peltola 1977:3 … LATIN SMALL LETTER SIDEWAYS O . SovijŠrvi & Peltola 1977:4 ž LATIN SMALL LETTER SIDEWAYS DIAERESIZED O ¿ 00F8 LATIN SMALL LETTER O WITH STROKE . SovijŠrvi & Peltola 1977:3 ‰ LATIN SMALL LETTER SIDEWAYS O WITH STROKE SovijŠrvi & Peltola 1977:4 ¥ 0275 LATIN SMALL LETTER BARRED O . Itkonen 1958:xxx „ 0254 LATIN SMALL LETTER OPEN O . SovijŠrvi & Peltola 1977:4 ˆ LATIN SMALL LETTER SIDEWAYS OPEN O . SovijŠrvi & Peltola 1977:4 Ï 0153 LATIN SMALL LIGATURE OE . SovijŠrvi & Peltola 1977:3 ƒ LATIN SMALL LETTER TURNED OE . SovijŠrvi & Peltola 1977:4 — LATIN SMALL LETTER UK . SovijŠrvi & Peltola 1977:10 • LATIN SMALL LETTER TOP HALF O . Itkonen 1958:xxxii – LATIN SMALL LETTER BOTTOM HALF O . Lagercrantz 1939:146 p 0070 LATIN SMALL LETTER P . SovijŠrvi & Peltola 1977:3 q 0071 LATIN SMALL LETTER Q . SovijŠrvi & Peltola 1977:3 r 0072 LATIN SMALL LETTER R . SovijŠrvi & Peltola 1977:3 © 0279 LATIN SMALL LETTER TURNED R . SovijŠrvi & Peltola 1977:4 s 0073 LATIN SMALL LETTER S . SovijŠrvi & Peltola 1977:3 ³ 0283 LATIN SMALL LETTER ESH . BenkÑ 1993:xviii § 00DF LATIN SMALL LETTER SHARP S . BenkÑ 1993:xviii t 0074 LATIN SMALL LETTER T . SovijŠrvi & Peltola 1977:3 ç 0167 LATIN SMALL LETTER T WITH STROKE . Sinor 1988:276 u 0075 LATIN SMALL LETTER U . SovijŠrvi & Peltola 1977:3 † LATIN SMALL LETTER SIDEWAYS U . SovijŠrvi & Peltola 1977:4 ‡ LATIN SMALL LETTER SIDEWAYS DIAERESIZED U SovijŠrvi & Peltola 1977:4 Ÿ 026F LATIN SMALL LETTER TURNED M . SovijŠrvi & Peltola 1977:4 ¬ LATIN SMALL LETTER SIDEWAYS TURNED M . Lehtisalo 1956:cvi v 0076 LATIN SMALL LETTER V . Itkonen 1986:7 ¼ 028C LATIN SMALL LETTER TURNED V * . SovijŠrvi & Peltola 1977:4 w 0077 LATIN SMALL LETTER W . SovijŠrvi & Peltola 1977:3 ½ 028D LATIN SMALL LETTER TURNED W x 0078 LATIN SMALL LETTER X .

FUPA) in the UCS Source: Michael Everson and Klaas Ruppel Version: 2.0 Date: 1998-11-02

Combining Diacritical Marks Range: 0300–036F the Unicode Standard

Gerard Manley Hopkins' Diacritics: a Corpus Based Study

Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)

Unicode Alphabets for L ATEX

MUFI Character Recommendation V. 3.0: Alphabetical Order

Allowed Characters in the .VERSICHERUNG TLD

Appendix 3. Precomposed Characters in the New Finnish Keyboard Layout

1 Symbols (2286)

The Brill Typeface User Guide & Complete List of Characters

A Multilingual Lexical Database Application with a Structured Interlingua

Accents Over Spanish Letters

Multilingualism, the Needs of the Institutions of the European Community