What Corpus-Based Research on Negation in Auslan and PJM Tells Us About Building and Using Sign Language Corpora
Total Page:16
File Type:pdf, Size:1020Kb
What Corpus-Based Research on Negation in Auslan and PJM Tells Us About Building and Using Sign Language Corpora Anna Kuder1, Joanna Filipczak1, Piotr Mostowski1, Paweł Rutkowski1, Trevor Johnston2 1Section for Sign Linguistics, Faculty of the Polish Studies, University of Warsaw, Krakowskie Przedmieście 26/28, 00-927 Warsaw, Poland 2Department of Linguistics, Faculty of Human Sciences, Macquarie University, Sydney, NSW, Australia, 2109 {anna.kuder, j.filipczak, piotr.mostowski, p.rutkowski}@uw.edu.pl, [email protected] Abstract In this paper, we would like to discuss our current work on negation in Auslan (Australian Sign Language) and PJM (Polish Sign Language, polski język migowy) as an example of experience in using sign language corpus data for research purposes. We describe how we prepared the data for two detailed empirical studies, given similarities and differences between the Australian and Polish corpus projects. We present our findings on negation in both languages, which turn out to be surprisingly similar. At the same time, what the two corpus studies show seems to be quite different from many previous descriptions of sign language negation found in the literature. Some remarks on how to effectively plan and carry out the annotation process of sign language texts are outlined at the end of the present paper, as they might be helpful to other researchers working on designing a corpus. Our work leads to two main conclusions: (1) in many cases, usage data may not be easily reconciled with intuitions and assumptions about how sign languages function and what their grammatical characteristics are like, (2) in order to obtain representative and reliable data from large-scale corpora one needs to plan and carry out the annotation process very thoroughly. Keywords: sign language, corpus linguistics, corpus building, annotation, tagging, negation, non-manuals made very similar observations about their annotation 1. Introduction procedures and the phenomena they were revealing. These Representative and reliable data are indispensable in findings are outlined in the present paper as they might be conducting linguistic research on sign languages. Due to of interest to other researchers working on corpus significant sociolinguistic variation, resulting from annotation and, in particular, on negation in sign languages. numerous distinctive acquisition and usage patterns found in signing communities, researchers are often unable to 2. Building and Annotating a Sign draw clear generalizations concerning sign language Language Corpus grammars from individual signers’ intuitions (as such Needless to say, building a sign language corpus is judgments are not always accepted unanimously by other extremely time-consuming and labor-intensive. In most signers). The fact that sign language grammars have not projects that are currently being developed, Deaf people are been standardized to the extent typical for languages with filmed in pairs as they respond to elicitation materials a long tradition of writing and schooling (like English) shown to them on a screen (see, e.g., Hanke et al., 2010; comes as no surprise taking into account that Deaf people Rutkowski et al., 2017). Once videos are collected, they usually live dispersed within much larger speaking need to be annotated (Johnston, 2010). When starting the communities; sign languages are fairly young; and the annotation process, it is vital to create written translations inter-generational transmission of language in signing of as much of the recordings as possible as a matter of communities is often interrupted. To explore the extensive priority, even before glossing annotation starts. inter- and intra-signer variation, more and more research Translations are invaluable for being able to locate groups have decided to undertake the task of creating a potentially interesting parts of the text in order to prioritize corpus of the sign language they work on. Among those what should be glossed first. Translations can be prepared projects are: the Dutch Sign Language (NGT) corpus1 by Deaf signers, bilinguals, or hearing interpreters. It is (Crasborn and Zwitserlood, 2008), the British Sign important to employ a number of translators in order to Language (BSL) corpus2 (Schembri et al., 2013) and the have each chunk inspected by more than one person to German Sign Language (DGS) corpus3 (Hanke et al., 2010). ensure there is broad agreement. Individual sign glossing More projects are underway. Basing linguistic analyses of can be compromised if the overall meaning is not first the communication of the Deaf on real usage data (rather established. (However, if there is unresolved disagreement than on intuitions of individual signers) is becoming among competent signers, this is also relevant and a methodological standard worldwide. interesting. It may point to some real ambiguity or Our current work also belongs to the field of sign language indeterminacy in the structure of the utterance that linguists corpus linguistics. In this paper, we would like to discuss need to take account of.) our study on negation in Auslan (Australian Sign When it comes to assigning glosses to individual signs one Language) and PJM (Polish Sign Language, polski język can either have a predefined lexical database or build the migowy) as an example of experience in using corpus data lexicon as one annotates the material. From our experience, for research purposes (cf. Filipczak et al., 2015). The either strategy will help ensure that the task is carried out Auslan and PJM teams agreed upon fundamental consistently. Each lexeme needs to have its own unique methodological issues but actually worked separately on label (assigned to every occurrence of the sign). Once their own corpus material. Interestingly, both teams then glossed, the video material is machine readable and ready 1 www.ru.nl/corpusngten/about-corpus-ngt/latest-news/ 3 www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/the- 2 www.bslcorpusproject.org/project-information/ project.html to be used in linguistic research. (One must have annotation/tagging specifically for the purposes of this confidence that the sign tokens identified during any study, on top of already existing annotations in each corpus. searching, sorting and counting of the corpus are all As the general annotation guidelines for the Auslan and instances of the particular type that one is interested in, as PJM corpora are different, the negation tagging systems well as representing all of the instances of that type in the employed by the two teams also differed substantially, corpus.) which makes the fact that the results were quite similar (as shown below) even more interesting. 3. Negation study For the Australian negation study, 413 video clips (24.7 hours of signed interaction) that had previously been The study reported in this paper is one of the first studies segmented into signs and then glossed were examined. The on sign language negation discussing corpus data. There is annotation files for these clips were produced by 89 of the a corpus-based study by Oomen and Pfau (2017) 100 individuals in the corpus. At the beginning of this concerning sentential negation in NGT. However, our work study, approximately 9000 clauses had already been is the first one to compare negation data extracted from two identified in these files during previous research. However, independently created sign language corpora. It should be in only 89 of these was the entire text segmented into noted that there exists a widely-cited typology of negation clauses and given time-aligned translation into written patterns in sign languages (Zeshan, 2004; 2006), however, English. The remaining 324 clips already contained clause it was proposed on the basis of individual signers’ boundary annotations only at points that had been relevant grammaticality judgments and questionnaire data. to corpus-based research prior to this study. The 89 texts Research based on corpus findings (for NGT, as well as for contained monologic spontaneous narratives, re-tells or elicited responses to visual stimuli (pictures and videos), or Auslan and PJM) offers a completely new perspective on responses to interview questions involving dialogue with Zeshan’s typology. the interviewer and another participant also being 3.1 Sources of Data interviewed. 375 of the 413 files had comprehensive time- The source of data for the Australian negation study was aligned translations in written English and these accounted the Auslan corpus – the first sign language corpus in the for 12 hours of recordings. Taking into account that the Auslan data were prepared as world. The Auslan archive4 consists of 1100 video clips which, taken together, last approximately 300 hours. 100 outlined above, there were three ways in which it was Deaf signers were recorded for the purpose of creating the possible to locate all instances of negation in them: • searching the gloss annotations for all instances of corpus; each of them performed 11 elicitation tasks during the recording session. Video recordings were edited and Auslan signs known to be associated with negation or uploaded into the ELAN annotation software (Crasborn negative semantics, and investigating the relevant clauses for headshaking; and Sloetjes, 2008). The Auslan corpus annotation is an on- • going process. So far, more than 350 clips have annotation searching the English translations for any words or files containing annotation at different levels of detail. word forms associated