Fact-based Text Editing

Hayate Iso†∗ Chao Qiao‡ Hang Li‡ †Nara Institute of Science and Technology ‡ByteDance AI Lab [email protected], {qiaochao, lihang.lh}@bytedance.com

Abstract Set of triples {(Baymax, creator, Douncan Rouleau), We propose a novel text editing task, referred (Douncan Rouleau, nationality, American), to as fact-based text editing, in which the goal (, creator, Steven T. Seagle), is to revise a given document to better de- (Steven T. Seagle, nationality, American), scribe the facts in a knowledge base (e.g., sev- (Baymax, series, ), eral triples). The task is important in practice (Big Hero 6, starring, )} because reflecting the truth is a common re- quirement in text editing. First, we propose a Draft text method for automatically generating a dataset Baymax was created by , a winner of for research on fact-based text editing, where Eagle Award. Baymax is a character in Big Hero 6 . each instance consists of a draft text, a revised Revised text text, and several facts represented in triples. Baymax was created by American creators We apply the method into two public table- Duncan Rouleau and Steven T. Seagle . Baymax is to-text datasets, obtaining two new datasets a character in Big Hero 6 which stars Scott Adsit . consisting of 233k and 37k instances, respec- tively. Next, we propose a new neural network Table 1: Example of fact-based text editing. Facts are architecture for fact-based text editing, called represented in triples. The facts in green appear in FACTEDITOR, which edits a draft text by re- both draft text and triples. The facts in orange are ferring to given facts using a buffer, a stream, present in the draft text, but absent from the triples. and a memory. A straightforward approach to The facts in blue do not appear in the draft text, but address the problem would be to employ an in the triples. The task of fact-based text editing is to encoder-decoder model. Our experimental re- edit the draft text on the basis of the triples, by deleting sults on the two datasets show that FACTE- unsupported facts and inserting missing facts while DITOR outperforms the encoder-decoder ap- retaining supported facts. proach in terms of fidelity and fluency. The results also show that FACTEDITOR conducts inference faster than the encoder-decoder ap- aims to revise the text by adding missing facts and proach. deleting unsupported facts. Table1 gives an exam- 1 Introduction ple of the task. As far as we know, no previous work did address arXiv:2007.00916v1 [cs.CL] 2 Jul 2020 Automatic editing of text by computer is an impor- the problem. In a text-to-text generation, given a tant application, which can help human writers to text, the system automatically creates another text, write better documents in terms of accuracy, flu- where the new text can be a text in another language ency, etc. The task is easier and more practical than (machine translation), a summary of the original the automatic generation of texts from scratch and text (summarization), or a text in better form (text is attracting attention recently (Yang et al., 2017; editing). In a table-to-text generation, given a table Yin et al., 2019). In this paper, we consider a new containing facts in triples, the system automatically and specific setting of it, referred to as fact-based composes a text, which describes the facts. The text editing, in which a draft text and several facts former is a text-to-text problem, and the latter a (represented in triples) are given, and the system table-to-text problem. In comparison, fact-based ∗ The work was done when Hayate Iso was a research text editing can be viewed as a ‘text&table-to-text’ intern at ByteDance AI Lab. problem. First, we devise a method for automatically cre- and copy mechanisms (Gu et al., 2016; Gulcehre ating a dataset for fact-based text editing. Recently, et al., 2016) has dramatically changed the land- several table-to-text datasets have been created and scape, and now one can perform the task rela- released, consisting of pairs of facts and corre- tively easily with an encoder-decoder model such sponding descriptions. We leverage such kind of as Transformer provided that a sufficient amount data in our method. We first retrieve facts and of data is available. For example, Li et al.(2018) their descriptions. Next, we take the descriptions introduce a deep reinforcement learning framework as revised texts and automatically generate draft for paraphrasing, consisting of a generator and an texts based on the facts using several rules. We evaluator. Yin et al.(2019) formalize the prob- build two datasets for fact-based text editing on lem of text edit as learning and utilization of edit the basis of WEBNLG (Gardent et al., 2017) and representations and propose an encoder-decoder ROTOWIRE, consisting of 233k and 37k instances model for the task. Zhao et al.(2018) integrate respectively (Wiseman et al., 2017) 1. paraphrasing rules with the Transformer model for Second, we propose a model for fact-based text text simplification. Zhao et al.(2019) proposes a editing called FACTEDITOR. One could employ method for English grammar correction using a an encoder-decoder model, such as an encoder- Transformer and copy mechanism. decoder model, to perform the task. The encoder- Another approach to text editing is to view the decoder model implicitly represents the actions for problem as sequential tagging instead of encoder- transforming the draft text into a revised text. In decoder. In this way, the efficiency of learning contrast, FACTEDITOR explicitly represents the and prediction can be significantly enhanced. Vu actions for text editing, including Keep, Drop, and Haffari(2018) and Dong et al.(2019) con- and Gen, which means retention, deletion, and duct automatic post-editing and text simplification generation of word respectively. The model utilizes on the basis of edit operations and employ Neu- a buffer for storing the draft text, a stream to store ral Programmer-Interpreter (Reed and De Freitas, the revised text, and a memory for storing the facts. 2016) to predict the sequence of edits given a se- It also employs a neural network to control the quence of words, where the edits include KEEP, entire editing process. FACTEDITOR has a lower DROP, and ADD. Malmi et al.(2019) propose a se- time complexity than the encoder-decoder model, quential tagging model that assigns a tag (KEEP and thus it can edit a text more efficiently. or DELETE) to each word in the input sequence Experimental results show that FACTEDITOR and also decides whether to add a phrase before outperforms the baseline model of using encoder- the word. Our proposed approach is also based decoder for text editing in terms of fidelity and on sequential tagging of actions. It is designed for fluency, and also show that FACTEDITOR can per- fact-based text editing, not text-to-text generation, form text editing faster than the encoder-decoder however. model. 2.2 Table-to-Text Generation 2 Related Work Table-to-text generation is the task which aims to 2.1 Text Editing generate a text from structured data (Reiter and Text editing has been studied in different settings Dale, 2000; Gatt and Krahmer, 2018), for exam- such as automatic post-editing (Knight and Chan- ple, a text from an infobox about a term in biol- der, 1994; Simard et al., 2007; Yang et al., 2017), ogy in wikipedia (Lebret et al., 2016) and a de- paraphrasing (Dolan and Brockett, 2005), sentence scription of restaurant from a structured represen- simplification (Inui et al., 2003; Wubben et al., tation (Novikova et al., 2017). Encoder-decoder 2012), grammar error correction (Ng et al., 2014), models can also be employed in table-to-text gen- and text style transfer (Shen et al., 2017; Hu et al., eration with structured data as input and gener- 2017). ated text as output, for example, as in (Lebret The rise of encoder-decoder models (Cho et al., et al., 2016). Puduppully et al.(2019) and Iso et al. 2014; Sutskever et al., 2014) as well as the atten- (2019) propose utilizing an entity tracking module tion (Bahdanau et al., 2015; Vaswani et al., 2017) for document-level table-to-text generation.

1The datasets are publicly available at https:// One issue with table-to-text is that the style of github.com/isomap/factedit generated texts can be diverse (Iso et al., 2019). Re- y0 AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2. xˆ0 AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission. x0 AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission. (a) Example for insertion. The revised template y0 and the reference template xˆ0 share subsequences. The set of triple templates T\Tˆ is {(BRIDGE-1, operator, PATIENT-2)}. Our method removes “that was operated by PATIENT-2” from the revised template y0 to create the draft template x0.

y0 AGENT-1 was created by BRIDGE-1 and PATIENT-2. xˆ0 The character of AGENT-1, whose full name is PATIENT-1, was created by BRIDGE-1 and PATIENT-2. x0 AGENT-1, whose full name is PATIENT-1, was created by BRIDGE-1 and PATIENT-2. (b) Example for deletion. The revised template y0 and the reference template xˆ0 share subsequences. The set of triple templates Tˆ \T is {(AGENT-1, fullName, PATIENT-1)}. Our method copies “whose full name is PATIENT-1” from the reference template x0 to create the draft template x0.

Table 2: Examples for insertion and deletion, where words in green are matched, words in gray are not matched, words in blue are copied, and words in orange are removed. Best viewed in color. searchers have developed methods to deal with the as t = (subj, pred, obj). For simplicity, we refer problem using other texts as templates (Hashimoto to the nouns or noun phrases of subject and object et al., 2018; Guu et al., 2018; Peng et al., 2019). simply as entities. The revised text is a sequence The difference between the approach and fact- of words denoted as y. The draft text is a sequence based text editing is that the former is about table- of words denoted as x. to-text generation based on other texts, while the Given the set of triples T and the revised text y, latter is about text-to-text generation based on struc- we aim to create a draft text x, such that x is not in tured data. accordance with T , in contrast to y, and therefore text editing from x to y is needed. 3 Data Creation 3.2 Procedure In this section, we describe our method of data creation for fact-based text editing. The method Our method first creates templates for all the sets of automatically constructs a dataset from an existing triples and revised texts and then constructs a draft table-to-text dataset. text for each set of triples and revised text based on their related templates. 3.1 Data Sources Creation of templates There are two benchmark datasets of table-to- For each instance, our method first delexical- text, WEBNLG (Gardent et al., 2017)2 and RO- izes the entity words in the set of triples T and TOWIRE(Wiseman et al., 2017)3. We create two the revised text y to obtain a set of triple tem- datasets on the basis of them, referred to as WEBE- plates T 0 and a revised template y0. For exam- DIT and ROTOEDIT respectively. In the datasets, T ={ } each instance consists of a table (structured data) ple, given (Baymax, voice, Scott Adsit) and y = and an associated text (unstructured data) describ- “Scott Adsit does the voice for Baymax”, it T 0 ={ ing almost the same content.4. produces the set of triple templates (AGENT- 1, voice, PATIENT-1)} and the revised template For each instance, we take the table as triples y0 =“AGENT-1 does the voice for PATIENT-1”. of facts and the associated text as a revised text, Our method then collects all the sets of triple tem- and we automatically create a draft text. The set plates T 0 and revised templates y0 and retains them of triples is represented as T = {t}. Each triple t in a key-value store with y0 being a key and T 0 consists of subject, predicate, and object, denoted being a value. 2The data is available at https://github.com/ ThiagoCF05/webnlg. We utilize version 1.5. Creation of draft text 3We utilize the ROTOWIRE-MODIFIED data provided by Iso et al.(2019) available at https://github.com/ Next, our method constructs a draft text x using a aistairc/rotowire-modified. The authors also pro- set of triple templates T 0 and a revised template y0. vide an information extractor for processing the data. For simplicity, it only considers the use of either 4In ROTOWIRE, we discard redundant box-scores and un- related sentences using the information extractor and heuristic insertion or deletion in the text editing, and one can rules. easily make an extension of it to a more complex setting. Note that the process of data creation is WEBEDIT ROTOEDIT reverse to that of text editing. TRAIN VALID TEST TRAIN VALID TEST 0 0 Given a pair of T and y , our method retrieves #D 181k 23k 29k 27k 5.3k 4.9k 0 0 0 another pair denoted as Tˆ and xˆ , such that y and #Wd 4.1M 495k 624k 4.7M 904k 839k 0 xˆ have the longest common subsequences. We #Wr 4.2M 525k 649k 5.6M 1.1M 1.0M refer to xˆ0 as a reference template. There are two #S 403k 49k 62k 209k 40k 36k possibilities; Tˆ 0 is a subset or a superset of T 0. (We ignore the case in which they are identical.) Table 3: Statistics of WEBEDIT and ROTOEDIT, where 0 #D is the number of instances, #Wd and #Wr are the to- Our method then manages to change y to a draft tal numbers of words in the draft texts and the revised 0 template denoted as x on the basis of the relation texts, respectively, and #S is total the number of sen- 0 0 0 0 between T and Tˆ . If Tˆ ( T , then the draft tences. 0 0 0 template x created is for insertion, and if Tˆ ) T , then the draft template x0 created is for deletion. introduce an additional triple (ROOT, IsOf, subj) For insertion, the revised template y0 and the for each subj, where ROOT is a dummy entity. reference template xˆ0 share subsequences, and the set of triples T\Tˆ appear in y0 but not in xˆ0. Our 4 FACTEDITOR method keeps the shared subsequences in y0, re- moves the subsequences in y0 about T\Tˆ, and In this section, we describe our proposed model for 0 copies the rest of words in y , to create the draft fact-based text editing referred to as FACTEDITOR. template x0. Table 2a gives an example. The shared subsequences “AGENT-1 performed as PATIENT- 4.1 Model Architecture 3 on BRIDGE-1 mission” are kept. The set of FACTEDITOR transforms a draft text into a revised triple templates T\Tˆ is {(BRIDGE-1, operator, text based on given triples. The model consists PATIENT-2)}. The subsequence “that was oper- of three components, a buffer for storing the draft ated by PATIENT-2” is removed. Note that the text and its representations, a stream for storing the subsequence “served” is not copied because it is revised text and its representations, and a memory 0 not shared by y0 and xˆ . for storing the triples and their representations, as For deletion, the revised template y0 and the shown in Figure1. 0 reference template xˆ share subsequences. The FACTEDITOR scans the text in the buffer, copies set of triples Tˆ \T appear in xˆ0 but not in y0. the parts of text from the buffer into the stream Our method retains the shared subsequences in if they are described in the triples in the mem- y0, copies the subsequences in xˆ0 about Tˆ \T , ory, deletes the parts of the text if they are not and copies the rest of words in y0, to create mentioned in the triples, and inserts new parts of the draft template x0. Table 2b gives an exam- next into the stream which is only presented in the ple. The subsequences “AGENT-1 was created by triples. BRIDGE-1 and PATIENT-2” are retained. The The architecture of FACTEDITOR is inspired by set of triple templates Tˆ \T is {(AGENT-1, full- those in sentence parsing Dyer et al.(2015); Watan- Name, PATIENT-1)}. The subsequence “whose abe and Sumita(2015). The actual processing of full name is PATIENT-1” is copied. Note that the FACTEDITOR is to generate a sequence of words subsequence “the character of” is not copied be- into the stream from the given sequence of words cause it is not shared by y0 and xˆ0. in the buffer and the set of triples in the memory. After getting the draft template x0, our method A neural network is employed to control the entire lexicalizes it to obtain a draft text x, where the editing process. lexicons (entity words) are collected from the cor- responding revised text y. 4.2 Neural Network We obtain two datasets with our method, referred Initialization to as WEBEDIT and ROTOEDIT, respectively. Ta- FACTEDITOR first initializes the representations of ble3 gives the statistics of the datasets. content in the buffer, stream, and memory. In the WEBEDIT data, sometimes entities only There is a feed-forward network associated with appear in the subj’s of triples. In such cases, we the memory, utilized to create the embeddings of also make them appear in the obj’s. To do so, we triples. Let M denote the number of triples. The embedding of triple t , j = 1, ··· ,M is calculated j Stream b as st t Buffer

subj pred obj tj = tanh(W t · [ej ; ej ; ej ] + bt), where W t and bt denote parameters, push pop subj pred obj ej , ej , ej denote the embeddings ~ tt of subject, predicate, and object of triple tj, and [;] denotes vector concatenation. (a) The Keep action, where the top embedding of the buffer There is a bi-directional LSTM associated with bt is popped and the concatenated vector [˜tt; bt] is pushed the buffer, utilized to create the embeddings of into the stream LSTM. words of draft text. The embeddings are obtained as Stream st bt Buffer b = BILSTM(x), where x = (x1,..., xN ) is the list of embeddings of words and b = (b1,..., bN ) is the list of representations of words, where N denotes the number of words. There is an LSTM associated with the stream for pop representing the hidden states of the stream. The (b) The Drop action, where the top embedding of the buffer first hidden state is initialized as bt is popped and the state in the stream is reused at the next time step t + 1. "PN PM t # ! i=1 bi j=1 j s1 = tanh W s · ; + bs Stream s b Buffer N M t t where W s and bs denotes parameters.

Action prediction push FACTEDITOR predicts an action at each time t us- ~ tt Wp yt ing the LSTM. There are three types of action, namely Keep, Drop, and Gen. First, it composes (c) The Gen action, where the concatenated vector [˜t ; W y ] is pushed into the stream, and the top embed- a context vector ˜tt of triples at time t using atten- t p t ding of the buffer is reused at the next time step t + 1. tion Figure 1: Actions of FACTEDITOR. M X ˜tt = αt,jtj j=1 Action execution where αt,j is a weight calculated as FACTEDITOR takes action based on the prediction result at time t.   > For Keep at time t, FACTEDITOR pops the top αt,j ∝ exp vα · tanh (W α · [st; bt; tj]) embedding bt in the buffer, and feeds the combina- tion of the top embedding b and the context vector where vα and W α are parameters. Then, it creates t of triples ˜t into the stream, as shown in Fig. 1a. the hidden state zt for action prediction at time t t The state of stream is updated with the LSTM  as s = LSTM([˜t ; b ], s ). FACTEDITOR also zt = tanh W z · [st; bt;˜tt] + bz t+1 t t t copies the top word in the buffer into the stream. where W z and bz denote parameters. Next, it For Drop at time t, FACTEDITOR pops the top calculates the probability of action at embedding in the buffer and proceeds to the next state, as shown in Fig. 1b. The state of stream

P (at | zt) = softmax(W a · zt) is updated as st+1 = st. Note that no word is inputted into the stream. where W a denotes parameters, and chooses the For Gen at time t, FACTEDITOR does not pop action having the largest probability. the top embedding in the buffer. It feeds the Draft text x Bakewell pudding is Dessert that can be served Warm or cold. Revised text y Bakewell pudding is Dessert that originates from Derbyshire Dales. Keep Keep Keep Keep Gen(originates) Gen(from) Gen(Derbyshire Dales) Action sequence a Drop Drop Drop Drop Keep

Table 4: An example of action sequence derived from a draft text and revised text.

combination of the context vector of triples ˜tt the sequence of input words x can be written as and the linearly projected embedding of word w T into the stream, as shown in Fig. 1c. The state Y P (a | T , x) = P (at | zt) of stream is updated with the LSTM as st+1 = ˜ t=1 LSTM([tt; W pyt], st), where yt is the embed- ding of the generated word yt and W p denotes where P (at | zt) is the conditional probability of parameters. In addition, FACTEDITOR copies the action at given state zt at time t and T denotes the generated word yt into the stream. number of actions. FACTEDITOR continues the actions until the The conditional probability of sequence of gen- buffer becomes empty. erated words y = (y1, y2, ··· , yT ) given the se- quence of actions a can be written as Word generation FACTEDITOR generates a word y at time t, when T t Y the action is Gen, P (y | a) = P (yt | at) t=1 Pgen(yt | zt) = softmax(W y · zt) where P (yt | at) is the conditional probability of where W y is parameters. generated word yt given action at at time t, which To avoid generation of OOV words, FACTEDI- is calculated as TOR exploits the copy mechanism. It calculates the ( P (yt | zt) if at = Gen probability of copying the object of triple tj P (yt | at) = 1 otherwise > Pcopy(oj | zt) ∝ exp (vc · tanh(W c · [zt; tj])) Note that not all positions have a generated word. where vc and W c denote parameters, and oj is the In such a case, yt is simply a null word. object of triple tj. It also calculates the probability The learning of the model is carried out via super- of gating vised learning. The objective of learning is to min-

> imize the negative log-likelihood of P (a | T , x) pgate = sigmoid(wg · zt + bg) and P (y | a) where wg and bg are parameters. Finally, it cal- T X culates the probability of generating a word wt L(θ) = − {log P (at | zt) + log P (yt | at)} through either generation or copying, t=1 where θ denotes the parameters. P (yt | zt) = pgatePgen(yt | zt) A training instance consists of a pair of draft M X text and revised text, as well as a set of triples, + (1 − pgate) Pcopy(o | z ), j t denoted as x, y, and T respectively. For each j=1:o =yt j instance, our method derives a sequence of actions where it is assumed that the triples in the memory denoted as a, in a similar way as that in (Dong have the same subject and thus only objects need et al., 2019). It first finds the longest common sub- to be copied. sequence between x and y, and then selects an action of Keep, Drop, or Gen at each position, 4.3 Model Learning according to how y is obtained from x and T (cf., The conditional probability of sequence of actions Tab.4). Action Gen is preferred over action Drop a = (a1, a2, ··· , aT ) given the set of triples T and when both are valid. Table Encoder Decoder Text Encoder Decoder Table Encoder Text Encoder Decoder

y x y x y T T T

(a) Table-to-Text (b) Text-to-Text (c) ENCDECEDITOR

Figure 2: Model architectures of the baselines. All models employ attention and copy mechanism.

4.4 Time Complexity We utilize ExactMatch (EM), BLEU (Papineni et al., 2002) and SARI (Xu et al., 2016) scores5 The time complexity of inference in FACTEDITOR is O(NM), where N is the number of words in the as evaluation metrics for fluency. We also utilize buffer, and M is the number of triples. Scanning precision, recall, and F1 score as evaluation metrics of data in the buffer is of complexity O(N). The for fidelity. For WEBEDIT, we extract the entities generation of action needs the execution of atten- from the generated text and the reference text and tion, which is of complexity O(M). Usually, N is then calculate the precision, recall, and F1 scores. much larger than M. For ROTOEDIT, we use the information extraction tool provided by Wiseman et al.(2017) for calcula- 4.5 Baseline tion of the scores. For the embeddings of subject and object for We consider a baseline method using the encoder- both datasets and the embedding of the predicate decoder architecture, which takes the set of triples for ROTOEDIT, we simply use the embedding and the draft text as input and generates a revised lookup table. For the embedding of the predi- text. We refer to the method as ENCDECEDITOR. cate for WEBEDIT, we first tokenize the predicate, The encoder of ENCDECEDITOR is the same as lookup the embeddings of lower-cased words from that of FACTEDITOR. The decoder is the standard the table, and use averaged embedding to deal with attention and copy model, which creates and uti- the OOV problem (Moryossef et al., 2019). lizes a context vector and predicts the next word at each time. We tune the hyperparameters based on the BLEU score on a development set. For WEBEDIT, we The time complexity of inference in ENCDE- set the sizes of embeddings, buffers, and triples CEDITOR is O(N 2 +NM) (cf.,Britz et al.(2017)). to 300, and set the size of the stream to 600. For Note that in fact-based text editing, usually N is ROTOEDIT, we set the size of embeddings to 100 very large. That means that ENCDECEDITOR is and set the sizes of buffers, triples, and stream to less efficient than FACTEDITOR. 200. The initial learning rate is 2e-3, and AMS- 5 Experiment Grad is used for automatically adjusting the learn- ing rate (Reddi et al., 2018). Our implementation We conduct experiments to make comparison be- makes use of AllenNLP (Gardner et al., 2018). tween FACTEDITOR and the baselines using the two datasets WEBEDIT and ROTOEDIT. 5.2 Experimental Results Quantitative evaluation 5.1 Experiment Setup We present the performances of our proposed The main baseline is the encoder-decoder model model FACTEDITOR and the baselines on fact- ENCDECEDITOR, as explained above. We further based text editing in Table5. One can draw several consider three baselines, No-Editing, Table-to-Text, conclusions from the results. and Text-to-Text. In No-Editing, the draft text is First, our proposed model, FACTEDITOR, directly used. In Table-to-Text, a revised text is achieves significantly better performances than the generated from the triples using encoder-decoder. main baseline, ENCDECEDITOR, in terms of al- In Text-to-Text, a revised text is created from the most all measures. In particular, FACTEDITOR draft text using the encoder-decoder model. Figure 2 gives illustrations of the baselines. 5We use a modified version of SARI where β equals 1.0, available at https://github.com/tensorflow/ We evaluate the results of revised texts by the tensor2tensor/blob/master/tensor2tensor/ models from the viewpoint of fluency and fidelity. utils/sari_hook.py FLUENCY FIDELITY Model BLEU SARI KEEP ADD DELETE EM P% R% F1% Baselines No-Editing 66.67 31.51 78.62 3.91 12.02. 0. 84.49 76.34 80.21 Table-to-Text 33.75 43.83 51.44 27.86 52.19 5.78 98.23 83.72 90.40 Text-to-Text 63.61 58.73 82.62 25.77 67.80 6.22 81.93 77.16 79.48 Fact-based text editing ENCDECEDITOR 71.03 69.59 89.49 43.82 75.48 20.96 98.06 87.56 92.51 FACTEDITOR 75.68 72.20 91.84 47.69 77.07 24.80 96.88 89.74 93.17

(a) WEBEDIT

FLUENCY FIDELITY Model BLEU SARI KEEP ADD DELETE EM P% R% F1% Baselines No-Editing 74.95 39.59 95.72 0.05 23.01 0. 92.92 65.02 76.51 Table-to-Text 24.87 23.30 39.12 14.78 16.00 0. 48.01 24.28 32.33 Text-to-Text 78.07 60.25 97.29 13.04 70.43 0.02 63.62 41.08 49.92 Fact-based text editing ENCDECEDITOR 83.36 71.46 97.69 44.02 72.69 2.49 78.80 52.21 62.81 FACTEDITOR 84.43 74.72 98.41 41.50 84.24 2.65 78.84 52.30 63.39

(b) ROTOEDIT

Table 5: Performances of FACTEDITOR and baselines on two datasets in terms of Fluency and Fidelity. EM stands for exact match.

obtains significant gains in DELETE scores on both Covered facts Factual errors WEBEDIT and ROTOEDIT. CQTUPARA RPT MS USUP DREL Second, the fact-based text editing models ENCDECEDITOR 14 7 16 21 3 12 FACTEDITOR 24 4 9 19 1 3 (FACTEDITOR and ENCDECEDITOR) significantly improve upon the other models in terms of fluency Table 6: Evaluation results on 50 randomly sampled re- scores, and achieve similar performances in terms vised texts in WEBEDIT in terms of numbers of correct of fidelity scores. editing (CQT), unnecessary paraphrasing (UPARA), Third, compared to No-Editing, Table-to-Text repetition (RPT), missing facts (MS), unsupported facts has higher fidelity scores, but lower fluency scores. (USUP) and different relations (DREL) Text-to-Text has almost the same fluency scores, but lower fidelity scores on ROTOEDIT. There are four types of factual errors: fact repe- Qualitative evaluation titions (RPT), fact missings (MS), fact unsupported We also manually evaluate 50 randomly sampled (USUP), and relation difference (DREL). Both revised texts for WEBEDIT. We check whether the FACTEDITOR and ENCDECEDITOR often fail to revised texts given by FACTEDITOR and ENCDE- insert missing facts (MS), but rarely insert unsup- CEDITOR include all the facts. We categorize the ported facts (USUP). ENCDECEDITOR often gen- factual errors made by the two models. Table6 erates the same facts multiple times (RPT) or facts shows the results. One can see that FACTEDITOR in different relations (DREL). In contrast, FACTE- covers more facts than ENCDECEDITOR and has DITOR can seldomly make such errors. less factual errors than ENCDECEDITOR. Table7 shows an example of results given by FACTEDITOR has a larger number of correct edit- ENCDECEDITOR and FACTEDITOR. The revised ing (CQT) than ENCDECEDITOR for fact-based texts of both ENCDECEDITOR and FACTEDITOR text editing. In contrast, ENCDECEDITOR often in- appear to be fluent, but that of FACTEDITOR cludes a larger number of unnecessary rephrasings has higher fidelity than that of ENCDECEDITOR. (UPARA) than FACTEDITOR. ENCDECEDITOR cannot effectively eliminate the {(Ardmore Airport, runwayLength, 1411.0), (Ardmore Airport, 3rd runway SurfaceType, Poaceae), Set of triples (Ardmore Airport, operatingOrganisation, Civil Aviation Authority of New Zealand), (Ardmore Airport, elevationAboveTheSeaLevel, 34.0), (Ardmore Airport, runwayName, 03R/21L)}

Ardmore Airport , ICAO Location Identifier UTAA . Ardmore Airport 3rd runway Draft text is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport Revised text 3rd runway is made of Poaceae and Ardmore Airport name is 03R/21L . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Ardmore Airport , ICAO Location Identifier UTAA , is operated by ENCDECEDITOR Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 m long .

Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport FACTEDITOR 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Table 7: Example of generated revised texts given by ENCDECEDITOR and FACTEDITOR on WEBEDIT. Entities in green appear in both the set of triples and the draft text. Entities in orange only appear in the draft text. Entities in blue should appear in the revised text but do not appear in the draft text.

WEBEDIT ROTOEDIT 6 Conclusion Table-to-Text 4,083 1,834 In this paper, we have defined a new task referred Text-to-Text 2,751 581 to as fact-based text editing and made two contri- ENCDECEDITOR 2,487 505 butions to research on the problem. First, we have FACTEDITOR 3,295 1,412 proposed a data construction method for fact-based Table 8: Runtime analysis (# of words/second). Table- text editing and created two datasets. Second, we to-Text always shows the fastest performance (Bold- have proposed a model for fact-based text editing, faced). FACTEDITOR shows the second fastest runtime named FACTEDITOR, which performs the task by performance (Underlined). generating a sequence of actions. Experimental results show that the proposed model FACTEDI- TOR performs better and faster than the baselines, description about an unsupported fact (in orange) including an encoder-decoder model. appearing in the draft text. In contrast, FACTEDI- TOR can deal with the problem well. In addition, Acknowledgments ENCDECEDITOR conducts an unnecessary substi- We would like to thank the reviewers for their in- tution in the draft text (underlined). FACTEDITOR sightful comments. tends to avoid such unnecessary editing. Runtime analysis References We conduct runtime analysis on FACTEDITOR and the baselines in terms of number of processed Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2015. Neural Machine Translation by Jointly words per second, on both WEBEDIT and RO- Learning to Align and Translate. In International TOEDIT. Table8 gives the results when the batch Conference on Learning Representations. size is 128 for all methods. Table-to-Text is the Denny Britz, Melody Guan, and Minh-Thang Luong. fastest, followed by FACTEDITOR. FACTEDITOR 2017. Efficient Attention using a Fixed-Size Mem- is always faster than ENCDECEDITOR, apparently ory Representation. In Proceedings of the 2017 Con- because it has a lower time complexity, as ex- ference on Empirical Methods in Natural Language Processing, pages 392–400, Copenhagen, Denmark. plained in Section 4. The texts in WEBEDIT are rel- Association for Computational Linguistics. atively short, and thus FACTEDITOR and ENCDE- CEDITOR have similar runtime speeds. In contrast, Kyunghyun Cho, Bart van Merrienboer,¨ Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Hol- the texts in ROTOEDIT are relatively long, and thus ger Schwenk, and Yoshua Bengio. 2014. Learn- FACTEDITOR executes approximately two times ing Phrase Representations using RNN Encoder– faster than ENCDECEDITOR. Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Kelvin Guu, Tatsunori B Hashimoto, Yonatan Oren, Methods in Natural Language Processing (EMNLP), and Percy Liang. 2018. Generating Sentences by pages 1724–1734, Doha, Qatar. Association for Editing Prototypes. Transactions of the Association Computational Linguistics. for Computational Linguistics, 6:437–450.

William B. Dolan and Chris Brockett. 2005. Automati- Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, cally constructing a corpus of sentential paraphrases. and Percy S Liang. 2018. A Retrieve-and-Edit In Proceedings of the Third International Workshop Framework for Predicting Structured Outputs. In on Paraphrasing (IWP2005). Advances in Neural Information Processing Systems, pages 10052–10062. Curran Associates, Inc. Yue Dong, Zichao Li, Mehdi Rezagholizadeh, and Jackie Chi Kit Cheung. 2019. EditNTS: An Neural Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Programmer-Interpreter Model for Sentence Simpli- Salakhutdinov, and Eric P. Xing. 2017. Toward Con- fication through Explicit Editing. In Proceedings of trolled Generation of Text. In Proceedings of the the 57th Annual Meeting of the Association for Com- 34th International Conference on Machine Learning, putational Linguistics, pages 3393–3402, Florence, volume 70 of Proceedings of Machine Learning Re- Italy. Association for Computational Linguistics. search, pages 1587–1596, International Convention Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Centre, Sydney, Australia. PMLR. Matthews, and Noah A. Smith. 2015. Transition- Based Dependency Parsing with Stack Long Short- Kentaro Inui, Atsushi Fujita, Tetsuro Takahashi, Ryu Term Memory. In Proceedings of the 53rd Annual Iida, and Tomoya Iwakura. 2003. Text simplifica- Meeting of the Association for Computational Lin- tion for reading assistance: A project note. In Pro- guistics and the 7th International Joint Conference ceedings of the Second International Workshop on on Natural Language Processing (Volume 1: Long Paraphrasing, pages 9–16, Sapporo, . Associa- Papers), pages 334–343, Beijing, China. Associa- tion for Computational Linguistics. tion for Computational Linguistics. Hayate Iso, Yui Uehara, Tatsuya Ishigaki, Hiroshi Claire Gardent, Anastasia Shimorina, Shashi Narayan, Noji, Eiji Aramaki, Ichiro Kobayashi, Yusuke and Laura Perez-Beltrachini. 2017. Creating train- Miyao, Naoaki Okazaki, and Hiroya Takamura. ing corpora for NLG micro-planners. In Proceed- 2019. Learning to Select, Track, and Generate for ings of the 55th Annual Meeting of the Association Data-to-Text. In Proceedings of the Annual Meet- for Computational Linguistics (Volume 1: Long Pa- ing of the Association for Computational Linguistics pers), pages 179–188, Vancouver, Canada. Associa- (ACL), pages 2102–2113, Florence, Italy. tion for Computational Linguistics. Kevin Knight and Ishwar Chander. 1994. Automated Matt Gardner, Joel Grus, Mark Neumann, Oyvind Postediting of Documents. In Proceedings of the Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Pe- AAAI Conference on Artificial Intelligence., vol- ters, Michael Schmitz, and Luke Zettlemoyer. 2018. ume 94, pages 779–784. AllenNLP: A Deep Semantic Natural Language Pro- cessing Platform. In Proceedings of Workshop for Remi´ Lebret, David Grangier, and Michael Auli. 2016. NLP Open Source Software (NLP-OSS), pages 1– Neural Text Generation from Structured Data with 6, Melbourne, Australia. Association for Computa- Application to the Biography Domain. In Proceed- tional Linguistics. ings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1203–1213, Albert Gatt and Emiel Krahmer. 2018. Survey of the Austin, Texas. Association for Computational Lin- State of the Art in Natural Language Generation: guistics. Core tasks, applications and evaluation. Journal of Artificial Intelligence Research (JAIR), 61:65–170. Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li. Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. 2018. Paraphrase Generation with Deep Reinforce- Li. 2016. Incorporating copying mechanism in ment Learning. In Proceedings of the 2018 Con- sequence-to-sequence learning. In Proceedings of ference on Empirical Methods in Natural Language the 54th Annual Meeting of the Association for Com- Processing, pages 3865–3878, Brussels, Belgium. putational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics. pages 1631–1640, Berlin, Germany. Association for Computational Linguistics. Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. Encode, Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Tag, Realize: High-Precision Text Editing. In Bowen Zhou, and Yoshua Bengio. 2016. Pointing Proceedings of the 2019 Conference on Empirical the unknown words. In Proceedings of the 54th Methods in Natural Language Processing and the Annual Meeting of the Association for Computa- 9th International Joint Conference on Natural Lan- tional Linguistics (Volume 1: Long Papers), pages guage Processing (EMNLP-IJCNLP), pages 5053– 140–149, Berlin, Germany. Association for Compu- 5064, Hong Kong, China. Association for Computa- tational Linguistics. tional Linguistics. Amit Moryossef, Yoav Goldberg, and Ido Dagan. 2019. Text by Cross-Alignment. In Advances in Neural In- Step-by-Step: Separating Planning from Realization formation Processing Systems 30, pages 6830–6841. in Neural Data-to-Text Generation. In Proceedings Curran Associates, Inc. of the 2019 Conference of the North American Chap- ter of the Association for Computational Linguistics: Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007. Human Language Technologies, Volume 1 (Long Statistical Phrase-Based Post-Editing. In Human and Short Papers), pages 2267–2277, Minneapolis, Language Technologies 2007: The Conference of Minnesota. Association for Computational Linguis- the North American Chapter of the Association for tics. Computational Linguistics; Proceedings of the Main Conference, pages 508–515, Rochester, New York. Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Association for Computational Linguistics. Hadiwinoto, Raymond Hendy Susanto, and Christo- pher Bryant. 2014. The CoNLL-2014 Shared Task Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. on Grammatical Error Correction. In Proceedings Sequence to Sequence Learning with Neural Net- of the Eighteenth Conference on Computational Nat- works. In Advances in Neural Information Pro- ural Language Learning: Shared Task, pages 1– cessing Systems 27, pages 3104–3112. Curran As- 14, Baltimore, Maryland. Association for Computa- sociates, Inc. tional Linguistics. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Jekaterina Novikova, Ondrejˇ Dusek,ˇ and Verena Rieser. Kaiser, and Illia Polosukhin. 2017. Attention is All 2017. The E2E Dataset: New Challenges For End- you Need. In Advances in Neural Information Pro- to-End Generation. In Proceedings of the 18th An- cessing Systems 30, pages 5998–6008. Curran Asso- nual SIGdial Meeting on Discourse and Dialogue, ciates, Inc. pages 201–206, Saarbrucken,¨ Germany. Association for Computational Linguistics. Thuy-Trang Vu and Gholamreza Haffari. 2018. Auto- matic Post-Editing of Machine Translation: A Neu- Kishore Papineni, Salim Roukos, Todd Ward, and Wei- ral Programmer-Interpreter Approach. In Proceed- Jing Zhu. 2002. Bleu: a method for automatic eval- ings of the 2018 Conference on Empirical Methods uation of machine translation. In Proceedings of in Natural Language Processing, pages 3048–3053, the 40th Annual Meeting of the Association for Com- Brussels, Belgium. Association for Computational putational Linguistics, pages 311–318, Philadelphia, Linguistics. Pennsylvania, USA. Association for Computational Linguistics. Taro Watanabe and Eiichiro Sumita. 2015. Transition- based neural constituent parsing. In Proceedings Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan of the 53rd Annual Meeting of the Association for Dhingra, and Dipanjan Das. 2019. Text Generation Computational Linguistics and the 7th International with Exemplar-based Adaptive Decoding. In Pro- Joint Conference on Natural Language Processing ceedings of the 2019 Conference of the North Amer- (Volume 1: Long Papers), pages 1169–1179, Beijing, ican Chapter of the Association for Computational China. Association for Computational Linguistics. Linguistics: Human Language Technologies, Vol- ume 1 (Long and Short Papers), pages 2555–2565, Sam Wiseman, Stuart Shieber, and Alexander Rush. Minneapolis, Minnesota. Association for Computa- 2017. ”Challenges in Data-to-Document Genera- tional Linguistics. tion”. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. pages 2253–2263, Copenhagen, Denmark. Associa- ”Data-to-text Generation with Entity Modeling”. In tion for Computational Linguistics. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages Sander Wubben, Antal van den Bosch, and Emiel Krah- 2023–2035, Florence, Italy. Association for Compu- mer. 2012. Sentence simplification by monolingual tational Linguistics. machine translation. In Proceedings of the 50th An- nual Meeting of the Association for Computational Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2018. Linguistics (Volume 1: Long Papers), pages 1015– On the convergence of adam and beyond. In Inter- 1024, Jeju Island, Korea. Association for Computa- national Conference on Learning Representations. tional Linguistics. Scott Reed and Nando De Freitas. 2016. Neural Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Programmer-Interpreters. In International Confer- Chen, and Chris Callison-Burch. 2016. Optimizing ence on Learning Representations. Statistical Machine Translation for Text Simplifica- tion. Transactions of the Association for Computa- Ehud Reiter and Robert Dale. 2000. Building Natural tional Linguistics, 4:401–415. Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press. Diyi Yang, Aaron Halfaker, Robert Kraut, and Ed- uard Hovy. 2017. Identifying Semantic Edit Inten- Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi tions from Revisions in Wikipedia. In Proceed- Jaakkola. 2017. Style Transfer from Non-Parallel ings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2000–2010, Copenhagen, Denmark. Association for Computa- tional Linguistics. Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L Gaunt. 2019. Learning to Represent Edits. In International Con- ference on Learning Representations. Sanqiang Zhao, Rui Meng, Daqing He, Andi Saptono, and Bambang Parmanto. 2018. Integrating Trans- former and Paraphrase Rules for Sentence Simpli- fication. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3164–3173, Brussels, Belgium. Association for Computational Linguistics. Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. ”Improving Grammatical Er- ror Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data”. In Proceed- ings of the 2019 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 156–165, Minneapo- lis, Minnesota. Association for Computational Lin- guistics.