Automatic Focus Annotation: Bringing Formal Pragmatics Alive in Analyzing the Information Structure of Authentic Data
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Focus Annotation: Bringing Formal Pragmatics Alive in Analyzing the Information Structure of Authentic Data Ramon Ziai Detmar Meurers Collaborative Research Center 833 University of Tubingen¨ rziai,dm @sfs.uni-tuebingen.de { } Abstract (1) Q: What did John show Mary? A: John showed Mary [[the PICtures]] . Analyzing language in context, both from a F theoretical and from a computational perspec- tive, is receiving increased interest. Com- In the answer in (1), the NP the pictures is fo- plementing the research in linguistics on dis- cussed and hence indicates that there are alterna- course and information structure, in compu- tive things that John could show Mary. It is com- tational linguistics identifying discourse con- monly assumed that focus here typically indicates cepts was also shown to improve the perfor- the presence of alternative denotations (denotation mance of certain applications, for example, focus, Krifka and Musan 2012, p.8), making it Short Answer Assessment systems (Ziai and a semantic notion. Depending on the language, Meurers, 2014). different devices are used to mark focus, such as Building on the research that established de- prosodic focus marking or different syntactic con- tailed annotation guidelines for manual anno- structions (e.g. clefts). In this paper, we adopt a tation of information structural concepts for written (Dipper et al., 2007; Ziai and Meur- notion of focus based on alternatives, as advanced ers, 2014) and spoken language data (Calhoun by Rooth (1992) and more recently, Krifka and et al., 2010), this paper presents the first ap- Musan (2012), who define focus as indicating “the proach automating the analysis of focus in au- presence of alternatives that are relevant for the in- thentic written data. Our classification ap- terpretation of linguistic expressions” (Krifka and proach combines a range of lexical, syntactic, Musan, 2012, p. 7). Formal semantics has tied the and semantic features to achieve an accuracy notion of alternatives to an explicit relationship of 78.1% for identifying focus. between questions and answers called Question- Answer Congruence (Stechow, 1991), where the 1 Introduction idea is that an answer is congruent to a question if The interpretation of language is well known to both evoke the same set of alternatives. Questions depend on context. Both in theoretical and com- can thus be seen as a way of making alternatives putational linguistics, discourse and information explicit in the discourse, an idea also taken up by structure of sentences are thus receiving increased the Question-Under-Discussion (QUD) approach interest: attention has shifted from the analysis of (Roberts, 2012) to discourse organization. isolated sentences to the question how sentences Complementing the theoretical linguistic ap- are structured in discourse and how information is proaches, in the last decade corpus-based ap- packaged in sentences analyzed in context. proaches started exploring which information As a consequence, a rich landscape of ap- structural notions can reliably be annotated in proaches to discourse and information struc- what kind of language data. While the information ture has been developed (Kruijff-Korbayova´ and status (Given-New) dimension can be annotated Steedman, 2003). Among these perspectives, the successfully (Riester et al., 2010; Nissim et al., Focus-Background dichotomy provides a particu- 2004) and even automated (Hempelmann et al., larly valuable structuring of the information in a 2005; Nissim, 2006; Cahill and Riester, 2012), sentence in relation to the discourse. (1) is an ex- the inter-annotator agreement results for Focus- ample question-answer pair from Krifka and Mu- Background (Ritz et al., 2008; Calhoun et al., san (2012, p. 4) where the focus in the answer is 2010) show that it is difficult to obtain high lev- marked by brackets. els of agreement, especially due to disagreement 117 Proceedings of NAACL-HLT 2018, pages 117–128 New Orleans, Louisiana, June 1 - 6, 2018. c 2018 Association for Computational Linguistics about the extent or size of the focused unit. proaches here. More recently, Ziai and Meurers (2014) showed The availability of the annotated Switchboard that for data collected in task contexts includ- corpus (Calhoun et al., 2005, 2010) sparked in- ing explicit questions, such as answers to read- terest in information-structural categories and en- ing comprehension questions, reliable focus an- abled several researchers to publish studies on notation is possible. In addition, an option for detecting focus. This is especially true for the externally validating focus annotation was estab- Speech Processing community, and indeed many lished by showing that such focus annotation im- approaches described below are intended to im- proves the performance of Short Answer Assess- prove computational speech applications in some ment (SAA) systems. Focus enables the system way, by detecting prominence through a combina- to zoom in on the part of the answer addressing tion of various linguistic factors. Moreover, with the question instead of considering all parts of the the exception of Badino and Clark (2008), all ap- answer as equal. proaches use prosodic or acoustic features. In this paper, we want to build on this strand All approaches listed below tackle the task of research and develop an approach for automati- of detecting ‘kontrast’ (as focus is called in the cally identifying focus in authentic data including Switchboard annotation) automatically on various explicit question contexts. In contrast to Calhoun subsets of the corpus using different features and (2007) and Sridhar et al. (2008), who make use of classification approaches. For each approach, we prosodic properties to tackle the identification of therefore report the features and classifier used, focus for content words in spoken language data, the data set size as reported by the authors, the (of- we target the analysis of written texts. ten very high) majority baseline for a binary dis- We start in section 2 by discussing relevant re- tinction between ‘kontrast’ and background, and lated work before introducing the gold standard the best accuracy obtained. If available in the orig- focus annotation we are using as foundation of inal description of the approach, we also report the our work in section 3. Section 4 then presents accuracy obtained without acoustic and prosodic the different types of features used for predicting features. which tokens form a part of the focus. In sec- Calhoun (2007) investigated how focus can be tion 5 we employ a supervised machine learning predicted through what she calls “prominence setup to evaluate the perspective and specific fea- structure”. The essential claim is that a “focus tures in terms of the ability to predict the gold stan- is more likely if a word is more prominent than dard focus labeling. Building on these intermedi- expected given its syntactic, semantic and dis- ate results and the analysis thereof in section 6, course properties”. The classification experiment in section 7 we then present two additional fea- is based on 9,289 words with a 60% majority base- ture groups which lead to our final focus detection line for the ‘background’ class. Calhoun (2007) model. Finally, section 8 explores options for ex- reports 77.7% for a combination of prosodic, syn- trinsically showing the value of the automatic fo- tactic and semantic features in a logistic regres- cus annotation for the automatic meaning assess- sion model. Without the prosodic and acoustic ment of short answers. It confirms that focus anal- features, the accuracy obtained is at 74.8%. There ysis pays off when aiming to generalize assess- is no information on a separation between training ment to previously unseen data and contexts. and test set, likely due to the setup of the study 2 Previous Approaches being geared towards determining relevant factors in predicting focus, not building a focus predic- There is only a very small number of approaches tion model for a real application case. Relatedly, dealing with automatically labeling information the approach uses only gold-standard annotation structural concepts.1 Most approaches related to already available in the corpus as the basis for fea- detecting focus automatically almost exclusively tures, not automatic annotation. center on detecting the ‘kontrast’ notion in the En- Sridhar et al. (2008) use lexical, acoustic and glish Switchboard corpus (Calhoun et al., 2010). part-of-speech features in trying to detect pitch ac- We therefore focus on the Switchboard-based ap- cent, givenness and focus. Concerning focus, the 1For a broader perspective of computational approaches work attempts to extend Calhoun (2007)’s analy- in connection with information structure, see Stede (2012). sis to “understand what prosodic and acoustic dif- 118 ferences exist between the focus classes and back- questions, other approaches such as Pinchak and ground items in conversational speech”. 14,555 Lin (2006) avoid assigning pre-determined classes words of the Switchboard corpus are used in to- to questions and instead favor a more data-driven tal, but filtered for evaluation later to balance the label set. In more recent work, Lally et al. (2012) skewed distribution between ‘kontrast’ and ‘back- use a sophisticated combination of deep parsing, ground’. With the thus obtained random baseline lexical clues and broader question labels to ana- of 50%, Sridhar et al. (2008) obtain 73% accu- lyze questions. racy when using all features, which again drops only slightly to 72.95% when using only parts of 3 Data speech. They use a decision tree classifier to com- bine the features in 10-fold cross-validation for The present work is based on the German CREG training and testing. corpus (Ott et al., 2012). CREG contains re- sponses by American learners of German to com- Badino and Clark (2008) aim to model contrast prehension questions on reading texts. Each re- both for its role in analyzing discourse and infor- sponse is rated by two teaching assistants with re- mation structure, and for its potential in speech gard to whether it answers the question or not.