Iso/Iec Jtc1/Sc2/Wg2 N2957 L2/05-183
Total Page:16
File Type:pdf, Size:1020Kb
ISO/IEC JTC1/SC2/WG2 N2957 L2/05-183 2005-08-02 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Preliminary proposal to add medievalist characters to the UCS Source: Michael Everson, Odd Einar Haugen, António Emiliano, Susana Pedro, Florian Grammel, Peter Baker, Andreas Stötzner, Marcus Dohnicht, Diana Luft Status: Expert Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2005-08-02 Introduction. A set of characters used by specialists in medieval European philology and linguistics is absent from the Universal Character Set. These characters differ in nature; some are original ligatures which acquired letter status due to their phonemic value; some are letterforms distinct from other letterforms innovated to distinguish sounds; some are combining diacritical letters used in abbreviations or suspensions of various kinds; and some are best described as “letters with syllabic content”. This is not a complete proposal, but a work in progress; all of the example Figures referred to will be added in the next version of this document. Note that the glyphs used in the code charts here may not be optimal. Theoretical preliminaries. Contemporary medievalist philologists and linguists want to be able to represent typographically (in printed format and on computer screens) the character sets which were in use for many centuries in several regions of medieval Europe. Those character sets derive from the common Latin script and contained many characters which simply disappeared with the development of contemporary printing conventions. Early printers made abundant use of “special” medieval characters, but eventually these fell out of use, with notable exceptions like $, ¶, &, Ç, ˜, @, and the ¯ used in Ireland. Contemporary philologists and linguists who want to study the graphemic conventions in use in medieval times—thereby drawing solid or grounded conclusions about the nature and structure of the language systems represented in writing—must rely on bona fide transcriptions of the texts. Bona fide transcriptions are only possible when the elemental character set used in the manuscripts is encoded uniquely and available for use in fonts. What most philologists did in the 20th century was to publish transliterations, that is, editions which substitute modern characters (or sequences of modern characters) for the original medieval characters. Transliteration-based editions are virtually useless for those scholars who are interested in the study of medieval writing systems, phonology, and even textual structure. Transliterations (or “normalized editions”) and even translations may, of course, be required for editions aimed at students or the general public, but the base texts must result from transcribing the sources: the first step must always be a transcription. Transcription is an editorial process which does not entail the replacement or the distortion of the original character set. A bona fide transcription of a Runic, Coptic, or Egyptian Hieroglyphic text can be no less than a close rendition of the original characters, using typographic versions of Runic letters, Coptic letters, and Hieroglyphic characters. The same practice must apply to European medieval texts. The practice of expanding abbreviations common to medievalists throughout the 20th century is not transcription, but simply transliteration: abbreviations, or brachygraphemes, were special characters. (Brachygraphy is, according to the Oxford English Dictionary, “The art or practice of writing with abbreviations or with abbreviated characters; shorthand, stenography.”) Many of these graphemes were 1 Everson et al. Preliminary proposal to add medievalist characters to the UCS polyvalent—that is, they could be transliterated into different sequences of “normal letters”, according to textual context, country, region, time period, and even individual scribal practices. Polyvalence is not a uncommon feature of alphabetic writing systems, as our own modern spelling systems show; for instance, in European Portuguese orthography the letter E can have the values [E], [e], [I], [i], [j], [i],8 [ı8$], and Ø; this letter combined with -m or -n can further represent [e$] and [å$ı8$]. Graphemic polyvalence results in many instances from language change and from the conservative nature of spelling systems. Another common practice in the 20th century was to eliminate the original punctuation and to add modern punctuation—many scholars believed that medieval punctuation served no discernable purpose. “Modernization” of the use of capital letters was also standard practice. A medieval character set makes it possible to shed many “chronocentric” biases and prejudices which tainted many editorial efforts in the 19th and 20th centuries in many countries, rendering the ensuing editions virtually useless for at least some kinds of contemporary research. This does not mean that medievalist scholars wish or even need to represent in print every minutia that handwritten sources present—that is palaeography proper. In medieval texts this is a particularly delicate issue, because scholars have to deal with a considerable amount of regional or individual stylistic variation. The rationale behind encoding medieval characters and designing medieval fonts is not to capture in print every single glyph variation (a task which is virtually impossible and also meaningless), but to capture the character set used in the manuscripts under scrutiny. We understand the character/glyph model and how it applies to the medieval character set. Accurate transcriptions of medieval texts allow scholars to quote medieval texts without distorting their graphemic content, and allow the texts to be studied by means of computer applications such as concordance generators and wordlist generators. Accurate transcriptions which make use of a medieval character set are a means to preserve—and interchange—all the relevant graphemic, textual, and linguistic information contained in a text; they are also indirectly a means to contribute to the preservation of Europe’s early heritage. Case-pairing. Most of the casing pairs shown below are attested in the examples. Those which are not, fall into two categories: those for which no capital can be constructed (such as LONG S) and those for which natural capitals can be easily formed. In an early version of this document we had proposed a single lower-case character <µ> LATIN SMALL LETTER Y WITH STROKE in use by some Welsh medievalists to indicate an epenthetic schwa sound (Figure 8). Subsequently we discovered that this character and its capital are ballotting in FPDAM2, as a character used in the Lubuagan Kalinga language of the Philippines. Because of the general structural feature of the Latin script (from a theoretical point of view), and in order to facilitate modern casing operations for these letters, we have judged it appropriate to supply case-pairs for all the letters which admit of them. In a scholarly publication, for instance, an article title at the top of a journal page might be set in all caps; it would be nonsensical for all but one or two of the medievalist Latin letters to be able to be cased with an all caps command. (This precedent was set with the encoding of the archaic Coptic extensions.) Discussion. 1. Letters used for medieval Welsh. <ê ë> The voiceless lateral fricative written <ll> in modern Welsh may be written in medieval Welsh as a joined ligature of <ë> LATIN SMALL LETTER MIDDLE-WELSH LL. Its capital form <ê> LATIN CAPITAL LETTER MIDDLE-WELSH LL has been used as an abbreviation in Welsh scholarship (Figures 1, 2, 3, 4, 5, 52). <¢> The voiced dental fricative written <dd> in modern Welsh is written <¢> LATIN SMALL LETTER INSULAR D distinctly from <d> in some editions of medieval Welsh texts; Nordic medievalists also make use of this letter (Figures 1, 5, 21, 29, 39, 40, 53, 73). <ß> Some Welsh medievalists (and other Indo-Europeanists of a certain era) also use <ß> LATIN SMALL LETTER SCRIPT D to write this sound in transcription. While this letter may sometimes have 2 Everson et al. Preliminary proposal to add medievalist characters to the UCS been represented in print by using a DELTA from a Greek lead-type font, it derives from the handwritten Latin d, and behaves like a Latin letter in ordering and is found alongside Greek text proper (Figures 6, 7, 8, 41). <∏ ≥> A unique <≥> LATIN SMALL LETTER MIDDLE-WELSH V is used distinctly from <u>, <v>, and <w>, though it is true to say that the phonetic value of all four of these letters is polyvalent in medieval Welsh (Figures 3, 9, 10, 11, 12, 13). <º ¥> Some Welsh medievalists use LATIN SMALL LETTER Y WITH LOOP to indicate the schwa sound of <y> (Figure 14). <≤ ¡> As in many medieval traditions, <¡> LATIN SMALL LETTER R ROTUNDA is distinguished from <r> LATIN SMALL LETTER R. This named character is derived from a positional variant of <r> following <o> in the South Italian Beneventan, though in medieval Welsh and Nordic it is not limited to this position. In any case, Welsh and Nordic medievalists distinguish R from R ROTUNDA in their printed editions; the letter is also common in early printed texts throughout Europe (Figures 1, 3, 16, 17, 21, 24, 35, 36, 38, 42, 70, 71, 73). The case-pairing <≤> LATIN CAPITAL LETTER R ROTUNDA is attested in printed texts from the 15th century (Figure XXX). 2. Letters used for medieval