The Yurok Language Project at Berkeley
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Lexicography, Vol. 24 No. 4, pp. 405–419 doi:10.1093/ijl/ecr018 Advance access publication 20 July 2011 405 AN ONLINE DICTIONARY WITH TEXTS AND PEDAGOGICAL TOOLS: THE YUROK LANGUAGE PROJECT AT BERKELEY Downloaded from https://academic.oup.com/ijl/article/24/4/405/1076335 by guest on 27 September 2021 Andrew Garrett: University of California, Berkeley ([email protected]) Abstract In this paper, I report on an online dictionary project for a highly endangered Native language of North America. This project involves a dynamic lexicon, linked to a corpus of texts and enriched with several associated tools, which is designed to be useful for (and is regularly used by) scholars, language teachers, and language learners. The inter- est of this project for a broader audience emerges not from lexicographic innovations as such, but from how texts and lexicon are combined and how the interests of diverse user communities are addressed. 1. B a ck gro und 1.1 Language context Yurok is spoken in northwestern California, near the Oregon border; it is one of three distantly related branches of the Algic language family. One of the other two branches also consists of a single language, Wiyot, formerly spoken just to the south along the Pacific coast; Teeter (1964) presents an overview (the last first-language speaker died in the 1960s). The third Algic branch comprises the widely dispersed Algonquian language family of central and eastern North America; for a brief survey see Mithun (1999: 327-340). Like most indigenous languages of California, Yurok was spoken in pre-contact times by relatively few people in a small territory; there were about 2,500 Yurok people occupying 80 miles along the lower Klamath River and the Pacific coast (Kroeber 1925: 17). In 2011 there are still a few elderly first-language speakers of Yurok, but it is no longer easy for them to play an active role in language teaching. (At the other end of the North American spectrum, compare Navajo, with over 150,000 speakers in Arizona and New Mexico, or the Algonquian language Ojibwe, with some 50,000 # 2011 Oxford University Press. All rights reserved. For permissions, please email: [email protected] 406 Andrew Garrett speakers in over three Canadian provinces and five US states.) For this reason, sound recordings and other linguistic documentation created over many dec- ades are essential resources for language learners, teachers, curriculum design- ers, and academic researchers. At the University of California, Berkeley, the Yurok Language Project (YLP) has created a corpus of such documentary materials, and makes it available to users of an online lexicon at linguis- tics.berkeley.edu/yurok. To understand the structure and purposes of the YLP lexicon, it is helpful to Downloaded from https://academic.oup.com/ijl/article/24/4/405/1076335 by guest on 27 September 2021 know about the language’s external circumstances—in particular, its history of documentation and the current situation of community members. Even among the hundreds or thousands of endangered languages in the world today, these circumstances are somewhat distinctive. First, Yurok has a substantial record of previous documentation. Strictly speaking, this began in the nineteenth century with vocabulary and sentences taken down by explorers, settlers, and US government ethnographers, but serious linguistic research began in 1901. In that year the anthropologist A. L. Kroeber, a student of Franz Boas, came to the University of California and initiated a program of ethnographic and linguistic documenta- tion of California’s indigenous languages. California had some 80-100 indigen- ous languages belonging to over 20 distinct families; most were still spoken in thriving speech communities as late as 1850, and almost all were still spoken by communities or at least individuals when Kroeber began his research. As it happens, Yurok was the California language on which he worked most inten- sively. Thus, at a time when everyone in the community still spoke Yurok, and elderly monolingual speakers had grown up prior to White contact (which was around 1850 along the Klamath River), Kroeber was able to document trad- itional speech genres (ceremonies, narrative, song, usufruct) and record exten- sive information about grammar and language usage (including register and style). In the first half of the twentieth century J. P. Harrington and Kroeber’s Berkeley colleague T. T. Waterman also recorded ethnogeographical informa- tion in Yurok, and in the mid-twentieth century the linguists Edward Sapir and R. H. Robins did significant grammatical documentation. Most of these scho- lars published important publications on Yurok (Waterman 1920, Robins 1958, Sapir 2001), but the bulk of the documentary material remains unpub- lished, in archives in Berkeley, Philadelphia, and Washington. In more recent decades, especially in the 1980s and 2000s, linguistic documentation with a more modern orientation has enriched the corpus further, as has community- based material prepared by native speakers working with the tribal language program (Exline n.d., Trull 2003). Second, though like all Native communities in the US the Yurok community is economically disadvantaged, with very high unemployment and associated health and social problems, it has a good local technological infrastructure. Schools and tribal offices have computers and broadband internet connections, The Yurok Language Project at Berkeley 407 and computer literacy is common; Humboldt State University, a part of the California State University system, is located in the area and is quite involved in Native education and cultural projects. These facts, combined with the very small number of remaining first-language speakers of Yurok, make it both desirable and feasible to make use of earlier material in forms that can be distributed online in collab- oration with language teaching programs. The Yurok Tribe has an active lan- guage revitalization program, situated administratively in its Education Downloaded from https://academic.oup.com/ijl/article/24/4/405/1076335 by guest on 27 September 2021 Department. Language classes are available in most area schools, from pre- school through secondary school, and there are informal evening adult classes as well. Members of the Yurok community are very motivated to learn their heritage language, with hundreds of people having acquired good basic vo- cabulary skills and some rudimentary conversational ability. But as the fluent first-language speakers are elderly, those who teach the language in schools are themselves also learners; they need access to information about vocabulary, usage, and pronunciation. Obviously, there is no national or even state-wide curriculum support for teaching this one language from among the dozens of California indigenous languages whose communities would need such support. 1. 2 Project goals The Yurok Language Project has three main goals. One is language documen- tation, that is, linguistic fieldwork with the (few remaining) fluent speakers of Yurok—recording vocabulary, researching grammatical topics, and document- ing language usage. A second goal is to develop a generically and chronologic- ally diverse corpus of texts (described further below). A third goal is to produce scholarly publications and community-oriented language materials. Work associated with all three goals is organized around a lexicon project, and is driven by two assumptions. Our first assumption is that lexicographic and other linguistic claims should be transparent, in the sense that the data justifying those claims should be directly accessible. Evidence supporting claims made in a traditional grammar or dictionary is usually not presented in full; space may make this impossible even if the researchers would prefer it. But Oxford English Dictionary users can now easily check lexicographic claims against online corpora such as Literature Online and the Corpus of Contemporary American English, and it would be equally desirable for users of small-corpus languages to be able to do the same. Insofar as possible, as a matter of scholarly transparency, the corpus under- lying linguistic claims should be available to users. Our second assumption is related: any stakeholders may engage with the lexicon or the text corpus as researchers. In many traditional projects, there is a sharp distinction between scholarly researchers, whose findings determine the content of a grammar or dictionary, and other users. In an 408 Andrew Garrett endangered-language project, with many engaged participants from the heri- tage community, maintaining this separation would inhibit progress. In this context, stakeholders include not only scholars interested in typological or theoretical claims about language, but teachers who wish to understand why a certain grammatical claim is made, why a word in the dictionary is defined or classified in a certain way, or why it is said to have the pronunciation it has; learners who wish to learn about usage patterns that are implied but not clearly exemplified in a dictionary entry; and community members who wish to take Downloaded from https://academic.oup.com/ijl/article/24/4/405/1076335 by guest on 27 September 2021 issue with definitions of plant or animal terms, based on the usage in their own families. The quality of documentary products is improved if such users are involved with analytic decisions, and if they have the ability to intepret the underlying data themselves. How this works in the YLP database will be seen below. 2. Lexicon and documentation database The database associated with the Yurok lexicon and text corpus has three main elements—a corpus of audio recordings, a set of texts, and a lexicon—with some additional material as well. These are described separately in the follow- ing sections.1 2.1 Audio recordings Audio recordings in the YLP database are of two sorts: primary and second- ary. A primary recording is a recording of one or more speakers of Yurok using the language. Some recordings are ‘natural’ in the sense that they document an unplanned linguistic interaction, for example a conversation in which the speakers forget that they are being recorded.