Steps Toward a Grammar Embedded in Data Nicholas Thieberger

Steps Toward a Grammar Embedded in Data Nicholas Thieberger

Steps toward a grammar embedded in data Nicholas Thieberger 1. Introduction 1 Inasmuch as a documentary grammar of a language can be characterized – given the formative nature of the discussion of documentary linguistics (cf. Himmelmann 1998, 2008) – part of it has to be based in the relationship of the analysis to the recorded data, both in the process of conducting the analysis with interactive access to primary recordings and in the presenta- tion of the grammar with references to those recordings. This chapter discusses the method developed for building a corpus of recordings and time-aligned transcripts, and embedding the analysis in that data. Given that the art of grammar writing has received detailed treatment in two recent volumes (Payne and Weber 2006, Ameka, Dench, and Evans 2006), this chapter focuses on the methodology we can bring to developing a grammar embedded in data in the course of language documentation, observing what methods are currently available, and how we can envisage a grammar of the future. The process described here creates archival versions of the primary data while allowing the final work to include playable versions of example sen- tences, in keeping with the understanding that it is our professional respon- sibility to provide the data on which our claims are based. In this way we are shifting the authority of the analysis, which traditionally has been lo- cated only in the linguist’s work, and acknowledging that other analyses are possible using new technologies. The approach discussed in this chap- ter has three main benefits: first, it gives a linguist the means to interact instantly with digital versions of the primary data, indexed by transcripts; second, it allows readers of grammars or other analytical works to verify claims made by access to contextualised primary data (and not just the limited set of data usually presented in a grammar); third, it provides an archival form of the data which can be accessed by speakers of the lan- guage. The great benefit for typologists is that they can access annotated corpora from fieldwork-based research in order to address topics not con- sidered by the original analyst. Well-structured linguistic data will allow a number of outputs to be created, including a grammatical description writ- 366 Nicholas Thieberger ten as a book, but with other potential ways of visualizing the data. The citable archival form of the data can be derived in more ephemeral multi- media or online forms suited to ‘mobilising’ the data (Nathan 2006). For example, archival files are typically very high resolution and are accord- ingly very large. To deliver them via the web or on a CD or DVD attached to a grammar requires that they be converted to a lower resolution. Simi- larly, it is now common to create lexical databases from which a dictionary can be derived. This separation of underlying forms of the data from deliv- ery forms is central to the methods discussed in this chapter. Writing a grammar of a previously undescribed language is a major un- dertaking, typically the endpoint of fieldwork-based linguistic research, of which the methodology has recently undergone significant changes with the introduction of new tools for digital recording, transcription and analy- sis. The process of recording such a language has, until recently, relied little on an empirical dataset and more on the genius of the researcher – observing and writing notes while they live in a community of speakers. The resulting work, the grammar, is a crafted collection of these observa- tions and, more often than not, it is only fieldnotes and written texts that are recorded, with perhaps a few tapes to confirm phonological claims (cf. Dixon 2006). The innovative approach discussed in this chapter does not supersede the linguist’s role, it enhances the scientific basis of the lin- guist’s work by providing ready access to primary data. The decisions about what to record, how to record it, and all the normal elicitation and experimental techniques that make up fieldwork today are still imple- mented by the linguist, but it is in the methods of recording, naming and transcribing the field materials and consequent access to data that novel outcomes can be achieved. An opportunistic corpus cannot answer all questions asked by a grammar writer and there will always be a need to elicit forms, especially for paradigms. Such forms do, of course, need to be marked as being elicited rather than naturally occurring so that their status is clear. The typical grammar of the past few decades makes no reference to the source of its data nor to how to access further data on the language than is included in the grammatical description. For example, was the data all elicited or was it recorded and transcribed? If it was recorded, then who was it recorded with – are the speakers old or young, male or female? If texts are the source of example sentences, then where in the text does the example come from? Where is the data itself stored? A sample of some thirty grammars from that period found one (Heath 1984, discussed below) that provided sufficient data to allow verification Steps toward a grammar embedded in data 367 of the author’s analysis with textual data readily available to investigate features of the grammar not addressed by the grammar writer. None of them provided recordings and none provided links from example sentences to the context of the sentence, neither by provision of the media, nor by the more arduous path of spelled out timecodes to media files that may be available somewhere in the world (but not collocated with the grammatical analysis). It is appropriate to focus on the past few decades because it has been possible to provide access to textual and dynamic media recordings for most of that time, but it has not been part of normal linguistic method- ology to take advantage of this possibility. Thus it is not a criticism of any one grammar to observe that it does not consider the corpus on which it is built to be a relevant part of its construction so that the whole corpus or a suitable presentation version of it should be provided to the reader. Rather, one has a sense of wonder that field linguistics as a discipline has kept going as long as it has in willful ignorance of the availability of new meth- ods for recording, transcribing, concordancing, annotating, and presenting the data on which it bases its generalizations. These are methods with which it should be completely engaged, relishing the opportunity to access recorded textual material instantly, and to account for any small inconsis- tency in the data by reference to that data rather than by sidestepping one or two seemingly aberrant forms in the transcripts because of the difficulty of locating the primary media in an analog tape. Not only do these methods improve the linguist’s work, but they also create the kinds of records that speakers of the languages can reasonably expect to result from fieldwork, and that funding bodies are increasingly coming to demand of publicly funded research. 2. The art of grammar writing in recent literature Two recent works on grammar writing summarise the state of the art, but neither considers the possibilities offered by new technologies for access to primary data. Thus, in a collection of work which details many aspects of grammar writing, Ameka, Dench, and Evans (2006) briefly discuss the issue of new technological methods for accessing data, but conclude that it means that data should be made available by a digital archive (Evans and Dench 2006: 25). Archiving data ensures its longevity; however, it is the relationship of the grammar to the data that ideally forms the basis of the analysis engaged in by the grammar writer. Archived data provides the foundation for this relationship, and archiving is a necessary but not suffi- 368 Nicholas Thieberger cient activity to ensure both that linguistic analysis is embedded in the data, and that there can be long-term access to the data. Mosel (2006), in the same volume, hopes that every grammar would include a text collection which: consists of annotated digitalized recordings of different language genres (e.g. myths, anecdotes, procedural texts, casual conversation, political de- bates and ritual speech events), accompanied by a transcription, a translation and a commentary on the content and linguistic phenomena. (Mosel 2006: 53) New technologies provide the means for creation of digital records in the course of linguistic fieldwork and analysis, and this requires a change in linguistic methodology, as discussed below. Another recent collection of papers on grammar writing (Payne and Weber 2006) makes no reference to the potential of a new kind of gram- matical description interoperating with its source data via the use of new technologies, despite two chapters touching on technology in grammar writing (Weber 2006a, 2006b). Weber’s (2006b) discussion of the linguis- tic example likens a grammatical description to a museum of fine art, with galleries exhibiting the features of the language, so there could be a gallery of relative clauses, a gallery of noun classes and so on. In these galleries the example sentences form the exhibits. He points out, however, that a museum is not a warehouse – suggesting that data collection on its own, as advocated by some proponents of language documentation, simply results in a warehouse of recorded material, in contrast to a museum in which each item needs to be provided with interpretive material. Similarly, I suggest, a grammatical description must be based on a cor- pus (the warehouse) of catalogued items (time-aligned transcripts, re- cordings, and so on), but use examples to illustrate given points within the description.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us