A Preliminary Study of Nuosu Yi Syllable Frequency in Text Revised 2015 February

Presented at the International Conference on Yi-Burmese Languages and Linguistics (ICYBLL) Chengdu, Sichuan, China 2012 November A Preliminary Study of Nuosu Yi Syllable Frequency in Text Revised 2015 February Dennis Walters, Doerthe Schilken, and Susan Walters SIL International, East Asia Group Abstract The standard written form of Nuosu Yi includes 1,165 symbols. The symbols correspond to phonemic syllables as spoken in the Shengzha variety of Northern Yi. Aspiring readers of Nuosu Yi text must memorize this sound-symbol correspondence. Studies across languages and writing systems have shown that early study of frequently used symbols can speed learning progress. (Dale and Chall 1948; Hu and Catts 1998; Johnson, Smith, and Jensen 1972) While software exists for doing corpus-based analysis of Yi language data (Chen 2010; 2011), until now the literature lacked an ordered list of frequently used Nuosu syllables. This study lists Nuosu Yi syllables in order of their frequency of occurrence in a body of text. The corpus includes nine texts, containing a total of 23,536 syllables. In the sample, 783 unique syllables occurred at least once. The cumulative usage data show that a person with reading knowledge of 402 symbols (of the available 1,165) could have read 95% of our text sample. The Nuosu Yi syllable frequency data are also compared with Mandarin Chinese syllable frequency (Sung 2005). ICYBLL 2012 Introduction Since the Yi language syllabary was approved by the China State Council in 1980, its use has been successfully popularized in the Liangshan region (Lewis, Simons, and Fennig 2014; Bradley 2009). Nuosu people take pride in seeing their written language on public signs, in their schools, on television, and in the Liangshan Daily newspaper (Yi language version). The writing system is used in traditional Nuosu culture, and there is a body of popular literature available in bookstores. Yi language material is increasingly available on the Internet as well, and it is clear that many people are studying written Yi language, both in school and informally. The task for aspiring readers of Nuosu Yi is to memorize the sound-symbol correspondence; most of them find it takes considerable effort and time to learn to read it. Hu and Catts (1998), showed that high frequency symbols are more readily learned than low frequency symbols in logographic orthographies as well as in alphabetic ones. This means that knowing the most frequently used characters can help a teacher teach effectively and a reader to learn more quickly. Until now, a quantitative study listing the most commonly used Nuosu Yi syllables has not been publicly available. This study lists the syllables, which occur in a body of Nuosu Yi text, in order of their frequency of occurrence (Appendix). It is intended to provide a preliminary set of data, to explore and refine the method, and to suggest directions for future research. Nuosu Yi syllables and symbols The Nuosu Yi syllabary is based on a traditional writing system used among the Yi people of southern Sichuan Province. (Huang 2001; Chen et al. 1985) Since its approval, it has been used, in addition to the national language, in education in the Liangshan region. The syllabary includes 819 basic symbols, plus a syllable iteration character and punctuation. Unlike an abugida (or alphasyllabary) system, the Nuosu Yi system pairs a single unitary symbol with each basic phonemic syllable. The symbols generally do not have systematic variations that could help a reader memorize the corresponding sounds, except that mid-high tone syllables are formed by adding an inverted breve mark above related basic symbols. There is also a syllable iteration character ꀕ , which stands in for the second occurrence of a reduplicated syllable: ꈀꎭꎭ → ꈀꎭꀕ. Including mid-high tone symbols and the iteration character, there are 1,165 symbols (Table 1). Because of the one-to-one correspondence between syllables and their symbols, we refer to Nuosu “symbols” and “syllables” interchangeably in this paper. Commonly recognized varieties of Northern Yi include Yinuo and Tianba in the north, Shengzha in the central and southwestern parts of Liangshan, and Suodi and Adur in the south (Bradley 2001; Chen et al. 1985). The standard syllabary is based on phonemic analysis of the Shengzha variety as spoken in the vicinity of Xide. Because it is a phonemic system, native speakers of most Northern Yi varieties find at least an approximate match between the syllables they speak and the symbols in the syllabary. Walters and Schilken Nuosu Yi Syllable Frequency 2 of 24 ICYBLL 2012 Table 1. Number of standard Yi symbols Nuosu Symbols Count Basic symbols 819 Mid high tone symbols 345 Iteration symbol “w” 1 Total 1,165 Traditionally, Nuosu writing was taught in the home by bimos, the keepers and agents of Nuosu traditional religion, equipping their sons, and sometimes their daughters, to use the writing in Nuosu folk culture. Teaching involved memorization of traditional poetry and other texts. A literacy campaign in the 1950s promoted a romanized writing system, not using the traditional Nuosu symbols. While the romanized system was easy to learn, Nuosu people preferred their traditional writing. Further study and development resulted in approval of the character-based Scheme for Standard Yi Writing (China State Council 1980; Chen, et al. 1985). After that time, public education in Liangshan began to include a special track using Nuosu Yi language as the medium of instruction for all subjects. Currently, home instruction and school instruction ensure that some Nuosu Yi people become confident readers, yet the proportion is small, and many others still desire to learn to read their own language. Text Corpus The text corpus under study was a collection of material readily available to the authors in electronic form. As shown in Table 2, about half the material is narrative, including some transcribed oral material. Another forty percent or so is behavioral, in the form of traditional proverbs and poetry. About five percent of the material is hortatory or expository in Longacre’s (1996) classification. The variety in genre as well as in written versus spoken text gives a measure of balance to the corpus. Still, the present sample has a greater proportion of poetry and proverbs than anything else. Because of this, we might expect a reduced frequency of some function words, and a greater proportion of content words—nouns, verbs, and descriptors—than we would see in a more balanced sample. Table 2. Text corpus by size and genre Text Description Genre Syllable Count Proportion of Total Witch Folk tale from trad Folk Narrative 2,029 8.6% Nuosu culture Day die Folk tale from trad Folk Narrative 861 3.7% Nuosu culture Firewood Young person describes a Personal 215 0.9% daily life task. Narrative Flood Mythical flood account. Poetic 8,525 36.2% Narrative Proverbs Poetic proverbs. Proverbs 10,242 43.5% Walters and Schilken Nuosu Yi Syllable Frequency 3 of 24 ICYBLL 2012 Magpie Old person recounts a Personal 328 1.4% childhood experience. Narrative No fight A teacher warns students Hortatory 187 0.8% not to fight. Welcome A teacher welcomes new Hortatory 180 0.8% students. Sewing needle Adult recounts an Personal 969 4.1% experience as a student. Narrative Total count 23,536 100% Data Processing Finding the relative frequency of language symbols is done by combing through volumes of text, listing units found there, counting their occurrences, and storing the results. Afterward, the data may be collated and presented in various ways. Automated techniques for storing and processing text have been available almost since the invention of electronic computers. For Yi language material, (Shama 2000) initially used a double-byte encoding, similar to what was done for Chinese characters before Unicode. This scheme allowed for input, storage, editing, and typesetting of Yi language material. Later, Yi characters were included in the GB18030 standard, and in Unicode since version 3.0 (Unicode Consortium 2000). These developments have greatly facilitated computer processing of Yi language data. Data shown in this study were extracted in the following steps: Install Primer (Weber 1999) software and set up a project for the language under study. Choose texts and prepare the electronic files. Create working copies of data files for analysis. Strip each text of metadata, leaving raw text only. For each text, use BabelPad (West 2004) or a similar utility to convert Yi symbols to romanized form with spaces between each syllable. Place each file in the directory where Primer will expect to find data. Use Primer to generate the frequency word list. Import the frequency word list to a spreadsheet program. Sort the data, record counts, generate histogram, etc. This work flow yielded the desired data, but with some drawbacks. For example, the iteration character ꀕ appears in our frequency list as number 42 although in the text corpus it actually stands for a number of different characters. Ideally, the software would automatically identify the reduplicated syllables and correct the counts. Also, Primer’s counting feature expected text data to be presented in romanized form with spaces between counted forms, so we Walters and Schilken Nuosu Yi Syllable Frequency 4 of 24 ICYBLL 2012 converted the character texts to romanized form. With newer tools, syllable counts and word counts may be done more simply. PrimerPro (Schroeder 2011), an updated version of Primer, is Unicode compliant and has a graphical user interface. UnicodeCCount (Warfel and White 2011) may produce the ordered frequency list without the need to convert syllabary symbols to romanized form. Alternatively, a skilled programmer could automate the entire process of harvesting electronic data and analyzing it for frequency, as described in Chen (2010, 2011). Results As shown in Table 3, the most frequently occurring character in our sample ꃅ/mu/ ‘do; ADVR’ occurred 707 times.

A Preliminary Study of Nuosu Yi Syllable Frequency in Text Revised 2015 February

LCSH Section Y

Shixing, a Sino-Tibetan Language of South-West China: a Grammatical Sketch with Two Appended Texts Ekaterina Chirkova

2010 Center for East Asian Studies

P229A180008 University of Kansas

Liangshan Yi Language Lessons

The Duoxu Language and the Ersu-Lizu-Duoxu Relationship Katia Chirkova

Kansas Board of Regents Program Review