Utterance-Unit Annotation for the JSL Dialogue Corpus: Toward a Multimodal Approach to Corpus Linguistics

Proceedings of the 9th Workshop on the Representation and Processing of Sign Languages, pages 13–20 Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Utterance-Unit Annotation for the JSL Dialogue Corpus: Toward a Multimodal Approach to Corpus Linguistics Mayumi Bono 1&2, Rui Sakaida 1, Tomohiro Okada 2, Yusuke Miyao 3 1 National Institute of Informatics, 2 SOKENDAI (The Graduate University for Advanced Studies), 3 The University of Tokyo 1&2, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN 3, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, JAPAN {bono, lui, tokada-deaf}@nii.ac.jp, [email protected] Abstract This paper describes a method for annotating the Japanese Sign Language (JSL) dialogue corpus. We developed a way to identify interactional boundaries and define a ‘utterance unit’ in sign language using various multimodal features accompanying signing. The utterance unit is an original concept for segmenting and annotating sign language dialogue referring to signer’s native sense from the perspectives of Conversation Analysis (CA) and Interaction Studies. First of all, we postulated that we should identify a fundamental concept of interaction-specific unit for understanding interactional mechanisms, such as turn-taking (Sacks et al. 1974), in sign-language social interactions. Obviously, it does should not relying on a spoken language writing system for storing signings in corpora and making translations. We believe that there are two kinds of possible applications for utterance units: one is to develop corpus linguistics research for both signed and spoken corpora; the other is to build an informatics system that includes, but is not limited to, a machine translation system for sign languages. Keywords: utterance unit, annotation, sign language dialogue directions, to identify utterance unit. The method is based 1. Introduction on classic observations in a research field of Conversation This paper describes a method for annotating the Japanese Analysis (CA) and Interaction Studies for spoken social interactions. Sign Language (JSL) dialogue corpus (Bono et al., 2014)1. Some linguists, including Deaf researchers who are 2.2 Sentence Unit interested in collecting sign language dialogue, began The previous studies on sign language linguistic have been collecting data in April 2011. When we started, the general purpose of the project was to increase awareness of sign focus on ‘sentence unit’ from the perspective of traditional language as a distinct language in Japan. However, the linguistics. Crasborn (2007) introduces the workshop academic aspects of the study recently became clear organized by his colleague and himself, which focuses on through interdisciplinary collaboration with engineering how to recognize a sentence in sign languages. He researchers, i.e., for natural language processing and image concludes that “we need to be alert to the risk of letting processing. In this paper, we introduce a preliminary result translations in another language influence our of our annotation process and annotated data, while segmentation of signed language discourse, and keep our explaining the concept of a ‘utterance unit.’ We anticipate minds open for possible constructions that are modality that this concept will serve as a theoretical benchmark for specific” (Crasborn, 2007: 108). promoting interdisciplinary research using spontaneous Obviously, it should not rely on the writing system of dialogue data in the corpus linguistics of sign languages. spoken languages, because there is a risk of detecting an interactional chunk as a candidate of utterance unit using 2. Research Question and Background grammatical boundary of translated texts (e.g. JSL to In this study, we sought to find a way to identify Japanese). As widely known, there are some functional and interactional boundaries in sign languages and defined an grammatical utterance-final particles in Japanese, such as utterance unit using various multimodal features ne (ね), yo (よ), yone (よね) etc., they are possibly a signal accompanying signing. of identifying interactional boundary. On the other hands, 2.1 Utterance Unit there is no functional and grammatical manual signs in JSL. The concept of utterance unit was already provided to In case of sign languages, these kinds of utterance final segmenting and annotating spontaneous Japanese elements are spread in multimodal way, such as facial dialogues (Den et al. 2010; Maruyama et al., in print). They expressions and body postures. propose a way of annotating utterance unit in two levels by emerging four linguistic and phonetic schemes, inter- 2.3 Turn Constructional Units (TCUs) in CA pausal units, intonation-units, clause-units and pragmatic First of all, we had to introduce a classic concept of units. interaction-specific unit for understanding interactional In this paper, we define the concept of utterance unit for mechanisms, such as turn-taking (Sacks et al., 1974). segmenting and annotating JSL dialogue data. We utilize Conversation analysis (CA) is a sociological approach to JSL signer’s native sense which is related to not only the study of social interactions that applies the concept of grammatical features but also multimodal features, such as turn constructional units (TCUs) (Sacks et al., 1974) as mouth movements, non-manual movements, and gaze fundamental building chunks of ‘turns’ in spoken 1 Bono et al. (2014) introduces JSL colloquial corpus treat only dialogue part in this paper, we call it JSL dialogue composed by dialogue part and lexicon part. Because we corpus. 13 interactions, composed of utterances, clauses, phrases, and the participants about their language, life, environment, etc. single words. CA research indicates that participants can (for introductory purposes only, not open access); anticipate TCUs and possible completion points of the animation narrative (AniN), in which one participant had ongoing turn using grammatical, prosodic, and pragmatic memorized the story “Canary Row” and explained it to the features of turn endings. Consequently, the turns in an other participant; and lexical elicitation, in which interaction are exchanged smoothly among participants participants showed the corresponding signs for 100 slides without difficulty. Signers also naturally identify the boundaries of an of pictures and texts shown on a monitor, which is called utterance in social interaction, namely TCUs, to exchange JSL lexicon corpus (not included in this paper). turns visually. Signers probably recognize visual signals We collected pre-formed, lexical-level signing produced in that are related to the grammatical, prosodic, and pragmatic a single-narrative setting and in spontaneous, utterance- completion points of turns. The concepts of TCUs and level signing in a dialogue setting. In the single-narrative utterance units are similar. Here, we try to define an setting, we tried to detect enriched, deaf-specific signings utterance unit in sign languages that aligns with the using a theme for the narrative (i.e., folklore) and stimuli theoretical background of TCUs. (pictures, images, etc.) to elicit signing at a lexical level. In 2.4 Applications a dialogue setting, we used video material to evoke a After identifying utterance units, we believe that they will depictive signing (i.e., constructed action; Cormier, 2013) have two applications: one is to develop corpus linguistics narrative task. We did not prepare a script for signing in research for signed and spoken corpora; the other is to build advance. Consequently, the boundaries of the utterances an informatics system that includes, but is not limited to, a were free, and were determined by participants who machine translation system for sign languages. organized a turn-taking system in dialogue. With regard to the former application, we anticipate that the research target of sign language studies will change 3.3 The amount of data drastically from example-based data to naturally occurring In the second stage of data collection, we collected data in data, to study not only the grammatical aspects but also the Nagasaki, Fukuoka, Toyama, Ishikawa, and Ibaragi social aspects of sign language interactions, such as turn- Prefectures, from 2014 to 2016 (green in Fig. 1). In this taking systems (Sacks et al., 1974) and repair sequences collection, we added two more dialogue tasks: ‘my curry (Schegloff et al., 1977) from the perspective of CA. recipe (Cur)’ and ‘proud of my country (Pro).’ With regard to the latter application, we anticipate technical and theoretical breakthroughs in data collection and data Figure 1: Prefectures where dialogues were collected. storing using informatics technology, such as processing natural language and images. To recognize small hand and body movements in sign languages using image processing techniques (e.g., OpenPose), we will need to redesign the settings for data collection, lighting, frame rate, etc. If we want to translate sign language dialogue into spoken and written languages using deep learning or artificial intelligence technology, we will need to build a shared corpus to develop these systems. The basic concept of the utterance unit is simple. However, we believe that it is a fundamental issue for developing sign language studies by combining research issues in linguistics and informatics. 3. Data We collected JSL dialogues from 2012 to 2016. We have collected dialogues in 7 of the 47 Japanese

Load more