XFORMAL: a Benchmark for Multilingual Formality Style Transfer

Olá, Bonjour, Salve! XFORMAL: A Benchmark for Multilingual Formality Style Transfer Eleftheria Briakou∗ Di Lu Ke Zhang Joel Tetreault University of Maryland Dataminr, Inc. Dataminr, Inc. Dataminr, Inc. [email protected] [email protected] [email protected] [email protected] Abstract BRAZILIAN-PORTUGUESE saiam disso, força de vontade!! We take the first step towards multilingual get out of it, willpower!! style transfer by creating and releasing XFOR- MAL, a benchmark of multiple formal refor- Abandonem essa situação, tenham força de vontade. mulations of informal text in Brazilian Por- Abandon this situation, have willpower! tuguese, French, and Italian. Results on XFOR- FRENCH MAL suggest that state-of-the-art style transfer Il avait les yeux braqués ailleurs. approaches perform close to simple baselines, He had his eyes fixed elsewhere. indicating that style transfer is even more chal- Il ne prêtait pas attention à la situation. lenging when moving multilingual.1 He was not paying attention to the situation. 1 Introduction ITALIAN Style Transfer (ST) is the task of automatically in bocca al lupo! transforming text in one style into another (for ex- good luck! ample, making an impolite request more polite). Ti rivolgo un sincero augurio! Most work in this growing field has focused primar- I send you a sincere wish! ily on style transfer within English, while covering Table 1: Informal-Formal pairs in XFORMAL. different languages has received disproportional interest. Concretely, out of 35 ST papers we re- (BR-PT) in Table1. Both informal-formal pairs viewed, all of them report results for ST within share the same content. However, the informal English text, while there is just a single work cov- language conveys more information than is con- ering each of the following languages: Chinese, tained in the literal meaning of the words (Hovy, Russian, Latvian, Estonian, and French (Shang 1987). These examples relate to the notion of deep et al., 2019; Tikhonov et al., 2019; Korotkova et al., formality (Heylighen et al., 1999), where the ulti- 2019; Niu et al., 2018). Notably, even though some mate goal is that of adding the context needed to efforts have been made towards multilingual ST, disambiguate an expression. On the other hand, researchers are limited to providing system outputs variations in formality might just reflect different as a means of evaluation, and progress is hampered situational and personal factors, as shown in the by the scarcity of resources for most languages. At Italian (IT) example. arXiv:2104.04108v1 [cs.CL] 8 Apr 2021 the same time, ST lies at the core of human com- This work takes the first step towards a more munication: when humans produce language, they language-inclusive direction for the field of ST condition their choice of grammatical and lexical by building the first corpus of style transfer for transformations to a target audience and a specific non-English languages. In particular, we make situation. Among the many possible stylistic varia- the following contributions: 1. Building upon tions, Heylighen et al.(1999) argue that “a dimen- prior work on Formality Style Transfer (FoST)(Rao sion similar to formality appears as the most im- and Tetreault, 2018), we contribute an evaluation portant and universal feature distinguishing styles, dataset, XFORMAL that consists of multiple for- registers or genres in different languages”. Con- mal rewrites of informal sentences in three Ro- sider the informal excerpts and their formal refor- mance languages: Brazilian Portuguese (BR-PT), mulations in French (FR) and Brazilian Portuguese French (FR), and Italian (IT); 2. Without assum- ∗Work done as a Research Intern at Dataminr, Inc. ing access to any gold-standard training data for 1Code and data: https://github.com/Elbria/xformal-FoST the languages at hand, we benchmark a myriad of leading ST baselines through automatic and hu- that use reconstruction and back-translation losses man evaluation methods. Our results show that (e.g., Logeswaran et al.(2018); Prabhumoye et al. FoST in non-English languages is particularly chal- (2018)). Another line of work, focuses on ma- lenging as complex neural models perform on par nipulation methods that remove the style-specific with a simple rule-based system consisting of hand- attribute of text (e.g., Li et al.(2018); Xu et al. crafted transformations. We make XFORMAL, our (2018)), while recent approaches use reinforcement annotations protocols, and analysis code publicly learning (e.g., Wu et al.(2019); Gong et al.(2019), available and hope that this study facilitates and probabilistic formulations (He et al., 2020), and encourages more research towards Multilingual ST. masked language models (Malmi et al., 2020). 2 Related Work Inter-language ST is introduced by Mirkin and Meunier(2015) who proposed personalized MT Controlling style aspects in generation tasks is stud- for EN-French and EN-German. Subsequent MT ied in monolingual settings with an English-centric works control for politeness (Sennrich et al., focus (intra-language) and cross-lingual settings 2016a), voice (Yamagishi et al., 2016), personality together with Machine Translation (MT) (inter- traits (Rabinovich et al., 2017), user-provided ter- language). Our work rests in intra-language ST minology (Hasler et al., 2018), gender (Vanmassen- with a multilingual focus, in contrast to prior work. hove et al., 2018), formality (Niu et al., 2017; Feely et al., 2019), morphological variations (Moryossef ST datasets that consist of parallel pairs in dif- et al., 2019), complexity (Agrawal and Carpuat, ferent styles include: GYAFC for formality (Rao 2019) and reading level (Marchisio et al., 2019). and Tetreault, 2018), Yelp (Shen et al., 2017) and Amazon Product Reviews for sentiment (He and 3 XFORMAL Collection McAuley, 2016), political slant and gender con- trolled datasets (Prabhumoye et al., 2018), Expert We describe the process of collecting formal Style Transfer (Cao et al., 2020), PASTEL for imitat- rewrites using data statements protocols (Bender ing personal (Kang et al., 2019), SIMILE for simile and Friedman, 2018; Gebru et al., 2018). generation (Chakrabarty et al., 2020), and others. Curation rational To collect XFORMAL, we Intra-language ST was first cast as generation firstly curate informal excerpts in multiple lan- task by Xu et al.(2012) and is addressed through guages. To this end, we follow the procedures methods that use either parallel data or unpaired described in Rao and Tetreault(2018) (hence- corpora of different styles. Parallel corpora de- forth RT18) who create a corpus of informal- signed for the task at hand are used to train tra- formal sentence-pairs in English (EN) entitled ditional encoder-decoder architectures (Rao and Grammarly’s Yahoo Answers Formality Corpus Tetreault, 2018), learn mappings between latent rep- (GYAFC). resentation of different styles (Shang et al., 2019), Concretely, we use the L6 Yahoo! Answers cor- or fine-tune pre-trained models (Wang et al., 2019). pus that consists of questions and answers posted Other approaches use parallel data from similar to the Yahoo! Answers platform.2 The corpus con- tasks to facilitate transfer in the target style via do- tains a large number of informal text and allows main adaptation (Li et al., 2019), multi-task learn- control for different languages and different do- ing (Niu et al., 2018; Niu and Carpuat, 2020), and mains.3 Similar to the collection of GYAFC, we zero-shot transfer (Korotkova et al., 2019) or create extract all answers from the Family & Relation- pseudo-parallel data via data augmentation tech- ships (F&R) topic that correspond to the three niques (Zhang et al., 2020; Krishna et al., 2020). languages of interest: Família e Relacionamen- Approaches that rely on non-parallel data include tos (BR-PT), Relazioni e famiglia (IT ), and Amour disentanglement methods based on the idea of learn- et relations (FR)(Step 1). We follow the same ing style-agnostic latent representations (e.g., Shen pre-processing steps as described in RT18 for con- et al.(2017); Hu et al.(2017)). However, they sistency (Step 2). We filter out answers that: a) are recently criticized for resulting in poor content consist of questions; b) include URLs; c) have fewer preservation (Xu et al., 2018; Jin et al., 2019; Luo 2https://webscope.sandbox.yahoo.com/ et al., 2019; Subramanian et al., 2018) and alter- catalog.php?datatype=l&did=11 natively, translation-based models are proposed 3More details are included under A.F. Corpus BR-PT FR IT Step Description BR-PT FR IT L6 Yahoo! 230;302 225;476 101;894 (QC1) Location restriction 151 78 59 + Step 1 37;356 34;849 13;443 (QC2) Qualification test 54 40 33 + Step 2 14;448 14;856 4;095 (QC3) Rewrites review 9 16 11 + Step 3 8;617 11;118 2;864 Table 3: Number of Turkers after each QC step. Table 2: Number of sentences in filtered versions of the Turkers’ demographics We recruit Turkers L6 Yahoo! Corpus across curation steps and languages. from Brazil, France/Canada, and Italy for BR-PT, than five or more than 25 tokens; or d) constitute FR, and IT, respectively. Beyond their country of duplicates.4 We automatically extract informal can- residence, no further information is available. didate sentences, as described in §5.3( Step 3). Fi- nally, we randomly sample 1;000 sentences from Compensation We compensate at a rate of $0:10 the pool of informal candidates for each language. per HIT with additional one-time bonuses that Table2 presents statistics of the curation steps. bumps them up to a target rate of over $10/hour. Procedures We use the Amazon Mechanical After this entire process, we have constructed a Turk (MTurk) platform to collect formal rewrites high-quality corpus of formality rewrites of 1;000 for our informal sentences. For each language, we sentences for three languages.

Load more