
A Survey of Media and Data Processing Development for Written Taiwanese Iûⁿ Ún-giân Henry H. Tân-Tēⁿ Lecturer of Dahan Inst. Tech. Montclair State University [email protected] [email protected] Abstract Many proposals have been advanced for writing Taiwanese; none as yet have been designated official. Of all the proposals, Ph-ōe-jī (POJ) has the longest history and the largest corpus of didactic materials, dictionaries, and literary works. This paper surveys development in Taiwanese media and data processing as they pertain to POJ. In the former category we include the print media of books, periodicals, and newspapers, as well as the broadcast media and the Internet. In the latter we discuss the evolution of Taiwanese textual processing, segmentation, machine translation, and Unicode support. Finally we propose a preliminary program for applied computational linguistics in Taiwanese, for the purpose of revitalizing and advancing written Taiwanese. Keywords: Ph-ōe-jī (POJ), written Taiwanese, vernacular literature, media, textual processing, computational linguistics 1. Introduction Broadly speaking, “Taiwanese” refers to the languages of the Taiwanese people, including Holo1, Hakka, and the Austronesian languages. Of all the groups the Holo are the most numerous, accounting for over 70% of the population (Huang 1995). Therefore, as early as the Japanese colonial period a century ago, Holo was known as Taiwanese, as well. In this article we use the terms interchangeably. The Holo written language and literature, or Tâi-gú-bûn (TGB), can ultimately be traced to the Southern Min dramas dated 1566 (G 1995), prior to the era of mass 1 1 migration from China to Taiwan. At that time Han characters were primarily employed in the classical language, not in service of the written vernacular. The earliest orthography tailored to the language (and related dialects) is undoubtedly Ph- ōe-jī (POJ, “vernacular writing”), traceable to the 1832 A Dictionary of the Hok-këèn Dialect of the Chinese Language by the missionary Walter Henry Medhurst (Heylen 2001; Klöter 2002). Since then the romanization scheme has undergone a number of minor changes and is currently stable. It has, moreover, been adapted to provide for the minority Hakka language. POJ initially served the interests of the Protestant missionaries and local Taiwanese converts and their descendants, particularly the illiterate and semi-literate. Until the 1980s much of the POJ literature, therefore, revolved around Christian themes followed distantly by education of a more general nature. Whereas both the Japanese and Chinese Nationalist regimes on the island had a history of suppressing romanization, the post-martial law era (1987-) of democratic reforms saw the emergence of small, competing organizations seeking to promote written Taiwanese of one form or another. In this lively if sometimes fractious environment, POJ found renewed usage in a newly secularized context even as its functions in the religious domain appeared to continue to decline. Concomitant with this development was the switch from monoscript to mixed scripts within running texts; that is, among those familiar with POJ the mainstream preference today is for mixing Han (Chinese) characters and romanization (a practice known as hàn-lô), particularly in formal publication2 (TiuN 1998: 230). Many new phonetic systems and orthographies – at least 64 in one study – emerged during this period (Iûⁿ and Tiuⁿ 1999). By virtue of its long history and recent revival, POJ is presently the system with the most numerous and varied publications, 2 including didactic materials, dictionaries, and literary works. The status derived thereof is nevertheless insecure, as a number of alternative systems have sought legitimacy via endorsement by the political and academic establishments, at the same time engaging in publishing efforts of their own3. In the sections to follow we first survey the use of POJ in various types of media for the past century and beyond, in its capacity as an orthography or phonetic scheme for Holo. Other orthographic or annotative choices – for example, Han characters – are not considered. We then summarize recent efforts in the area of POJ Taiwanese computing, including both text manipulation tools, pedagogic tools, and research in applied computational linguistics. 2. Development of written Taiwanese in the media This section is divided into three portions, namely the print media (books, periodicals, newspapers), broadcast media, and the newly emerged Internet. 2.1 Print media: books, periodicals, newspapers We have collected a substantial bibliography of publications utilizing POJ as an orthography or phonetic annotation system for Han characters. We have additionally consulted an as-yet unpublished bibliography by Lī Heng-chhiong. 2.1.1 Books Printed books are by far the most common sources of POJ. In Table 1 several categories employing POJ are identified. In principle literary works of an overtly religious nature (e.g. The Psalms) are classified as “religious works.” 3 3 Table 1. POJ Books By Year and Type Decade Year Pre- 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Total (%) Type N/A 1900 ~09 ~19 ~29 ~39 ~49 ~59 ~69 ~79 ~89 ~99 ~02 Religious 40 51 11 27 38 56 30 119 80 11 11 26 3 503 (60) Didactic 2 4 - - 9 5 2 20 9 3 9 58 15 136 (16) Literary - - - 1 4 - 1 9 4 - 1 71 17 108 (13) Reference - 15 1 1 5 3 2 - 1 4 2 13 2 49 (6) Specialized - 11 2 3 5 2 2 - 3 1 4 8 3 44 (5) Total (%) 42 81 14 32 61 66 37 148 97 19 27 176 40 840 (100) (5) (10) (2) (4) (7) (8) (4) (18) (12) (2) (3) (21) (5) 1. Religious works: These account for the single largest category, over 500 titles (60%) in our catalog, all Christian. Publishers no longer reprint most them. 2. Language teaching materials: Over 130 titles are known, including some bilingual textbooks in English or Japanese. 3. Literary works: Including fiction, prose, poetry, drama, translated works, and folk literature. At least 100 titles are known, the earliest indigenous fictional work being Mother’s Tears (1925) by Lōa Jîn-seng (g 2000). 4. References: Over 40 volumes, including dictionaries featuring Mandarin, English, Japanese, and Spanish. Other references concern geography, proverbs, botany. 5. Specialized texts: Covering subjects other than language pedagogy, such as mathematics, astronomy, medicine, botany, and social commentary. In addition, in recent years a few academic papers and monographs in Taiwanese have appeared in spite of the general reluctance of institutions to accept them. 2.1.2 Newspapers As of this writing no newspaper or similar publication, either religious or secular in orientation, exists in Taiwanese. Historically Taiwan Church News 4 (originally Taiwan Prefectural City Church News) published in the Holo language using POJ. Founded in 1885, it was the first newspaper of any language to publish on the island. As the longest publishing paper, as well, it is in a unique position of having documented Taiwanese society during the century of Manchurian, Japanese, and Chinese rule, and to have done so in a major language of the masses. According to the Committee on History of the Presbyterian Church of Formosa, in 1928 and 1932 it merged with three regional church newspapers also publishing in POJ (2000: 190). In 1942 the Japanese colonial authorities forced it to cease publishing. Resuming publication after 1945, it eventually succumbed to pressure from the Chinese Nationalist regime in 1970 and abandoned the traditional language in favor of the official Mandarin (see section 2.1.4). This policy was partly reversed in the 1990s with the inclusion of a “special column” consciously devoted to the mother tongue, and then mostly in hàn-lô. The space afforded it was quite limited and increasingly so; eventually this column also came to include Mandarin contents. Other historical newspapers include one regional Presbyterian weekly publishing in the early 1940s and a Catholic newspaper in the mid-1930s. We believe additional sources have yet to be unearthed. In the 1990s several mass-circulation (Mandarin) dailies featured occasional columns on language-related topics by well-known teachers or activists. Neither the front-page news nor the popular entertainment sections, however, have offered Taiwanese alternatives, even as headlines incorporating Holo catchwords have become more fashionable4. Indeed the literary pages have been devoted to works in the standard Mandarin language (the rare poem not withstanding). As is the case with periodicals (section 2.1.3), immigrant newspapers in the United States have been more willing to experiment with new forms. The Pacific 5 5 Times and particularly Taiwan Tribune have featured original and reprinted articles in Taiwanese. The historically more radical Taiwan Tribune pioneered Taiwanese language editorials in the 1990s; its literature section often carried poetry and essays in the language. Unlike the Taiwan-based papers, both of these arrange the text horizontally, a format aesthetically compatible with han3-lo5. In 2002 the trilingual Taiwanese Children’s Newspaper began soliciting subscribers. Apparently responding to the market potential partly driven by the English and local languages components of the Nine-year Comprehensive Curriculum for the Elementary and Junior High Education, it targeted elementary schoolchildren and their parents by offering contents in Mandarin, Taiwanese, and English. Even though the Taiwanese portion accounts for only a quarter of the space, the concept was unprecedented. 2.1.3 Periodicals As with newspapers, all known POJ periodicals (primarily monthlies) prior to the 1970s were Church-published. Of these, the most recent new periodical appeared in 1960; the last to cease publication did so in 1969. The longest surviving periodical published for fifteen years (ah-miā ê bí-niû, 1954-68). The first known periodical of a secular nature was the Taiwanese Language and Literature Bi-monthly (1977-1979). As the domestic political milieu of the time prohibited such a publication, it was produced and published in the United States (and distributed outside Taiwan).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages24 Page
-
File Size-