Challenges in modality annotation in a Brazilian Portuguese Spontaneous Speech Corpus Luciana Beatriz Avila SecondHeliana Author Mello Second Author PosLin-UFMG/UFV/Capes AffiliationUFMG/FGV/CNPq / Address line 1 Affiliation / Address line 1 Av Antonio Carlos 6627 AffiliationAv Antonio / Address Carlos 6627line 2 Affiliation / Address line 2 Belo Horizonte, MG 31270-901 Brazil Belo Horizonte,Affiliation MG/ Address 31270 line-901 3 Brazil Affiliation / Address line 3
[email protected] heliana.melloemail@
[email protected] email@domain category stands for, as well as identifying linguistic Abstract elements that carry it, is of utmost relevance. Our goal in annotating modality in a This short paper introduces the first notes about a modality annotation system that is under spontaneous speech Brazilian Portuguese Corpus is development for a spontaneous speech to provide a reliable starting point for researchers Brazilian Portuguese corpus (C-ORAL- that might be interested in developing BRASIL). We indicate our methodological methodologies associated to NLP that ensue the decisions, the points which seem to be well extraction of oral discourse reliability, certainty resolved and two issues for further discussion and factuality markers, or carrying sentiment and investigation. analysis, modeling modality and similar objectives. 3 Defining modality 1 Credits In this paper we study modality in a spontaneous The authors are thankful to CNPq, FAPEMIG and speech corpus, the C-ORAL-BRASIL, which will CAPES (Proc. nº BEX 9537/12-0) for research be presented in 4 below. As for spontaneous funding support. speech, we follow Cresti and Scarano (1998:5) in characterizing it as “the fulfillment of linguistic 2 Introduction acts, not programmed and not programmable, Modality annotation is inexistent for both written because they emerge during the unfolding of an and spoken Brazilian Portuguese corpora, thus the interaction, always new and unpredictable, novelty of this project.