Fact-based Text Editing

Hayate Iso, Chao Qiao, Hang Li The status quo of Text Editing

‣ Model, p(y | x), learns how to edit the input, x into the desired output, y.

Style Transfer x = “This is the worst game!” y = “This is the best game!”

x = “Last year, I read the book that is Simplification y = “Jane wrote a book. I read it last year” authored by Jane”

Grammatical Error Correction x = “Fish firming uses the lots of specials” y = “Fish firming uses a lot of specials”

2 Fact-based Text Editing

Hayate Iso†⇤ Chao Qiao‡ Hang Li‡ †Nara Institute of Science and Technology ‡ByteDance AI Lab [email protected], qiaochao, lihang.lh @bytedance.com { } What is Fact-based Text Editing?

• The goal of fact-based text editing Abstract Set of triples is to revise a given document to (Baymax, creator, Douncan Rouleau), { Webetter propose describe a novel text the editing facts task, in referreda (Douncan Rouleau, nationality, American), toknowledge as fact-based text base. editing , in which the goal (Baymax, creator, Steven T. Seagle), is to revise a given document to better de- (Steven T. Seagle, nationality, American), scribe• e.g., the facts several in a knowledge triples base (e.g., sev- (Baymax, series, ), eral triples). The task is important in practice (Big Hero 6, starring, ) because reflecting the truth is a common re- } quirement in text editing. First, we propose a Draft text method for automatically generating a dataset Baymax was created by , a winner of for research on fact-based text editing, where Eagle Award. Baymax is a character in Big Hero 6 . each instance consists of a draft text, a revised Revised text text, and several facts represented in triples. Baymax was created by American creators We apply the method into two public table- Duncan Rouleau and Steven T. Seagle . Baymax is to-text datasets, obtaining two new datasets a character in Big Hero 6 which stars Scott Adsit . consisting of 233k and 37k instances, respec- tively. Next, we propose a new neural network Table 1: Example of fact-based text editing. Facts are 3 architecture for fact-based text editing, called represented in triples. The facts in green appear in FACTEDITOR, which edits a draft text by re- both draft text and triples. The facts in orange are ferring to given facts using a buffer, a stream, present in the draft text, but absent from the triples. and a memory. A straightforward approach to The facts in blue do not appear in the draft text, but address the problem would be to employ an in the triples. The task of fact-based text editing is to encoder-decoder model. Our experimental re- edit the draft text on the basis of the triples, by deleting sults on the two datasets show that FACTE- unsupported facts and inserting missing facts while DITOR outperforms the encoder-decoder ap- retaining supported facts. proach in terms of fidelity and fluency. The results also show that FACTEDITOR conducts inference faster than the encoder-decoder ap- aims to revise the text by adding missing facts and proach. deleting unsupported facts. Table 1 gives an exam- ple of the task. 1 Introduction As far as we know, no previous work did address Automatic editing of text by computer is an impor- the problem. In a text-to-text generation, given a tant application, which can help human writers to text, the system automatically creates another text, write better documents in terms of accuracy, flu- where the new text can be a text in another language ency, etc. The task is easier and more practical than (machine translation), a summary of the original the automatic generation of texts from scratch and text (summarization), or a text in better form (text is attracting attention recently (Yang et al., 2017; editing). In a table-to-text generation, given a table Yin et al., 2019). In this paper, we consider a new containing facts in triples, the system automatically and specific setting of it, referred to as fact-based composes a text, which describes the facts. The text editing, in which a draft text and several facts former is a text-to-text problem, and the latter a (represented in triples) are given, and the system table-to-text problem. In comparison, fact-based

⇤ The work was done when Hayate Iso was a research text editing can be viewed as a ‘text&table-to-text’ intern at ByteDance AI Lab. problem. Overview of this research

• Data Creation:

• We have proposed a data construction method for fact-based text editing and created two datasets.

• Fact-based Text Editing model:

• We have proposed a model for fact-based text editing, which performs the task by generating a sequence of actions, instead of words.

4 Data Creation:Factual Masking

• For all of table-to-text pairs in the training data, we create the template by factual masking.

Τ = {(Baymax, voice, Scott_Adsit)} x = “Scott_Adsit does the voice for Baymax” Set of templates for T’

Masking

Τ’ = {(AGENT-1, voice, PATIENT-1)} x’ x’ = “PATIENT-1 does the voice for AGENT-1”

Storing

5 Data Creation: Retrieve LCS matched template

Set of templates for {(AGENT-1, occupation, PATIENT-3), Τ’ = {(AGENT-1, occupation, PATIENT-3), (AGENT-1, was_a_crew_member_of, BRIDGE-1)} (AGENT-1, was_a_crew_member_of, BRIDGE-1), (BRIDGE-1, operator, PATIENT-2)} y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2.

Retrieve

x’^ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission.

6 Data Creation: Token Alignment

x’^ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission .

y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . To delete

7 Data Creation: Delete Substring

x’^ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission .

y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . To delete

x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 .

Keep Keep Delete

x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission.

8 Data Creation: Fact Unmasking

• Recovering the factual information by original facts, Τ.

x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission.

Τ = {(Alan_Bean, occupation, Test_pilot), Unmask (Alan_Bean, was a crew member of, Apollo_12), (Apollo_12, operator, NASA)}

x =Alan_Bean performed as Test_pilot on Apollo_12 mission.

Fact-based Text Editing instance

Τ = {(Alan_Bean, occupation, Test_pilot), (Alan_Bean, was a crew member of, Apollo_12), (Apollo_12, operator, NASA)}

x =Alan_Bean performed as Test_pilot on Apollo_12 mission.

y =Alan_Bean performed as Test_pilot on Apollo_12 mission that was operated by NASA.

9 Data Creation: Statistics

• We applied our data creation method for two publicly available datasets, WebNLG (Gardent et al., 2017) and RotoWire (Wiseman et al., 2017), to create fact- based text editing datasets, WebEdit and RotoEdit.

setting. Note that the process of data creation is WEBEDIT ROTOEDIT reverse to that of text editing. TRAIN VALID TEST TRAIN VALID TEST Given a pair of 0 and y0, our method retrieves # 181k 23k 29k 27k 5.3k 4.9k T ˆ D another pair denoted as 0 and xˆ0, such that y0 and # d 4.1M 495k 624k 4.7M 904k 839k T W ˆ # r 4.2M 525k 649k 5.6M 1.1M 1.0M x0 have the longest common subsequences. We W # 403k 49k 62k 209k 40k 36k refer to xˆ0 as a reference template. There are two S possibilities; ˆ 0 is a subset or a superset of 0. T T https://github.com/isomap/facteditEB DIT OTO DIT (We ignore the case in which they are identical.) Table 3: Statistics of W E and R E , where # is the number of instances, # and # are the to- 10 D Wd Wr Our method then manages to change y0 to a draft tal numbers of words in the draft texts and the revised template denoted as x0 on the basis of the relation texts, respectively, and # is total the number of sen- ˆ ˆ S between 0 and 0. If 0 ( 0, then the draft tences. T T T T ˆ template x0 created is for insertion, and if 0 ) 0, T T then the draft template x0 created is for deletion. introduce an additional triple (ROOT, IsOf, subj) For insertion, the revised template y0 and the for each subj, where ROOT is a dummy entity. reference template xˆ0 share subsequences, and the set of triples ˆ appear in y but not in xˆ0. Our T\T 0 4FACTEDITOR method keeps the shared subsequences in y0, re- moves the subsequences in y0 about ˆ, and In this section, we describe our proposed model for T\T copies the rest of words in y0, to create the draft fact-based text editing referred to as FACTEDITOR. template x0. Table 2a gives an example. The shared subsequences “AGENT-1 performed as PATIENT- 4.1 Model Architecture 3 on BRIDGE-1 mission” are kept. The set of FACTEDITOR transforms a draft text into a revised triple templates ˆ is (BRIDGE-1, operator, T\T { text based on given triples. The model consists PATIENT-2) . The subsequence “that was oper- } of three components, a buffer for storing the draft ated by PATIENT-2” is removed. Note that the text and its representations, a stream for storing the subsequence “served” is not copied because it is revised text and its representations, and a memory not shared by y0 and xˆ0. for storing the triples and their representations, as For deletion, the revised template y0 and the shown in Figure 1. reference template xˆ0 share subsequences. The FACTEDITOR scans the text in the buffer, copies set of triples ˆ appear in xˆ0 but not in y . the parts of text from the buffer into the stream T \T 0 Our method retains the shared subsequences in if they are described in the triples in the mem- y , copies the subsequences in xˆ0 about ˆ , ory, deletes the parts of the text if they are not 0 T \T and copies the rest of words in y0, to create mentioned in the triples, and inserts new parts of the draft template x0. Table 2b gives an exam- next into the stream which is only presented in the ple. The subsequences “AGENT-1 was created by triples. BRIDGE-1 and PATIENT-2” are retained. The The architecture of FACTEDITOR is inspired by set of triple templates ˆ is (AGENT-1, full- those in sentence parsing Dyer et al. (2015); Watan- T \T { Name, PATIENT-1) . The subsequence “whose abe and Sumita (2015). The actual processing of } full name is PATIENT-1” is copied. Note that the FACTEDITOR is to generate a sequence of words subsequence “the character of” is not copied be- into the stream from the given sequence of words cause it is not shared by y0 and xˆ0. in the buffer and the set of triples in the memory. After getting the draft template x0, our method A neural network is employed to control the entire lexicalizes it to obtain a draft text x, where the editing process. lexicons (entity words) are collected from the cor- responding revised text y. 4.2 Neural Network We obtain two datasets with our method, referred Initialization to as WEBEDIT and ROTOEDIT, respectively. Ta- FACTEDITOR first initializes the representations of ble 3 gives the statistics of the datasets. content in the buffer, stream, and memory. In the WEBEDIT data, sometimes entities only There is a feed-forward network associated with appear in the subj’s of triples. In such cases, we the memory, utilized to create the embeddings of also make them appear in the obj’s. To do so, we triples. Let M denote the number of triples. The How to model the Fact-based Text Editing?

• A natural choice is an encoder-decoder model with attention & copy to generate the revised text from scratch. ✘ Unnecessary word replacement could happen. ✘ Inefficient for the long input & output.

Attention & Copy

Table Encoder Text Encoder Decoder

x y T 11 Approach: Editing through Tagging

• Instead of generating words from scratch, the model just predicts predefined actions. ✓ Model only focuses on the explicit editing ✓ Robust to the length of input & output

Draft text x Bakewell pudding is Dessert that can be served Warm or cold . Revised text y Bakewell pudding is Dessert that originates from Derbyshire Dales . Keep Keep Keep Keep Gen(originates) Gen(from) Gen(Derbyshire Dales) Action sequence a Drop Drop Drop Drop Keep

Table 4: An example of action sequence derived from a draft text and revised text.

12 combination of the context vector of triples ˜tt the sequence of input words x can be written as and the linearly projected embedding of word w T into the stream, as shown in Fig. 1c. The state P (a , x)= P (at zt) of stream is updated with the LSTM as st+1 = |T | ˜ t=1 LSTM([tt; W pyt], st), where yt is the embed- Y ding of the generated word yt and W p denotes where P (a z ) is the conditional probability of t | t parameters. In addition, FACTEDITOR copies the action at given state zt at time t and T denotes the generated word yt into the stream. number of actions. FACTEDITOR continues the actions until the The conditional probability of sequence of gen- buffer becomes empty. erated words y =(y ,y , ,y ) given the se- 1 2 ··· T quence of actions a can be written as Word generation FACTEDITOR generates a word yt at time t, when T the action is Gen, P (y a)= P (yt at) | | t=1 Y Pgen(yt zt)=softmax(W y zt) | · where P (y a ) is the conditional probability of t | t where W y is parameters. generated word yt given action at at time t, which To avoid generation of OOV words, FACTEDI- is calculated as TOR exploits the copy mechanism. It calculates the P (yt zt) if at = Gen probability of copying the object of triple tj P (yt at)= | | (1 otherwise Pcopy(oj zt) exp (vc> tanh(W c [zt; tj])) | / · · Note that not all positions have a generated word. where vc and W c denote parameters, and oj is the In such a case, yt is simply a null word. object of triple tj. It also calculates the probability The learning of the model is carried out via super- of gating vised learning. The objective of learning is to min- imize the negative log-likelihood of P (a , x) |T pgate = sigmoid(wg> zt + bg) and P (y a) · | where wg and bg are parameters. Finally, it cal- T culates the probability of generating a word wt (✓)= log P (a z ) + log P (y a ) L { t | t t | t } through either generation or copying, t=1 X where ✓ denotes the parameters. P (yt zt)=pgatePgen(yt zt) | | A training instance consists of a pair of draft M text and revised text, as well as a set of triples, +(1 pgate) Pcopy(o z ), j | t denoted as x, y, and respectively. For each j=1:oj =yt T X instance, our method derives a sequence of actions where it is assumed that the triples in the memory denoted as a, in a similar way as that in (Dong have the same subject and thus only objects need et al., 2019). It first finds the longest common sub- to be copied. sequence between x and y, and then selects an action of Keep, Drop, or Gen at each position, 4.3 Model Learning according to how y is obtained from x and (cf., T The conditional probability of sequence of actions Tab. 4). Action Gen is preferred over action Drop a =(a ,a , ,a ) given the set of triples and when both are valid. 1 2 ··· T T A running example: Keep Keep

Stream Buffer

that can be served Warm_or_Cold. Bakewell_puddingis Dessert

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) } 13 A running example: Keep

Stream push pop Buffer

Dessert that that can be served Warm_or_Cold. Bakewell_puddingis

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) }

14 A running example: Gen Gen(originates)

Stream Buffer

is Dessert that can be served Warm_or_Cold.

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) } 15 A running example: Gen emb

Stream push Buffer

Dessert that originates can be served Warm_or_Cold. is

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) } 16 A running example: Drop Drop

Stream Buffer

originates from Derbyshire_Dales can be served Warm_or_Cold.

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) } 17 A running example: Drop

Stream pop Buffer

originates from Derbyshire_Dales can be served Warm_or_Cold.

(Bakewell_pudding, region, Derbyshire_Dales) Triples { (Bakewell_pudding, course, Dessert) }

18 Experimental Results

• The proposed model, FactEditor, shows generally better performance.

WebEdit

RotoEdit

Further results are in the paper 19 Examples

(Ardmore Airport, runwayLength, 1411.0), { (Ardmore Airport, 3rd runway SurfaceType, Poaceae), EncDecEditor FactEditor Set of triples (Ardmore Airport, operatingOrganisation, Civil Aviation Authority of New Zealand), (Ardmore Airport, elevationAboveTheSeaLevel, 34.0), (Ardmore Airport, runwayName, 03R/21L) } Ardmore Airport , ICAO Location Identifier UTAA . Ardmore Airport 3rd runway Draft text is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport Fluency is 34.0 above sea level . ☺ ☺ Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport Revised text 3rd runway is made of Poaceae and Ardmore Airport name is 03R/21L . 03R/21L is 1411.0 m long and is above sea level . Ardmore Airport 34.0 Adequecy Ardmore Airport , ICAO Location Identifier UTAA , is operated by ☹ ☺ ENCDECEDITOR Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 m long .

Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport Unnecessary FACTEDITOR 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and paraphrasing Ardmore Airport is 34.0 above sea level . ☹ ☺

Table 7: Example of generated revised texts given by ENCDECEDITOR and FACTEDITOR on WEBEDIT. Entities in green appear in both the set of triples and the draft text. Entities in orange only appear in the draft text. Entities in blue should appear in the revised text but do not appear in the draft text.

20 WEBEDIT ROTOEDIT 6 Conclusion Table-to-Text 4,083 1,834 In this paper, we have defined a new task referred Text-to-Text 2,751 581 to as fact-based text editing and made two contri- ENCDECEDITOR 2,487 505 butions to research on the problem. First, we have FACTEDITOR 3,295 1,412 proposed a data construction method for fact-based Table 8: Runtime analysis (# of words/second). Table- text editing and created two datasets. Second, we to-Text always shows the fastest performance (Bold- have proposed a model for fact-based text editing, faced). FACTEDITOR shows the second fastest runtime named FACTEDITOR, which performs the task by performance (Underlined). generating a sequence of actions. Experimental results show that the proposed model FACTEDI- TOR performs better and faster than the baselines, description about an unsupported fact (in orange) including an encoder-decoder model. appearing in the draft text. In contrast, FACTEDI- TOR can deal with the problem well. In addition, Acknowledgments ENCDECEDITOR conducts an unnecessary substi- We would like to thank the reviewers for their in- tution in the draft text (underlined). FACTEDITOR sightful comments. tends to avoid such unnecessary editing.

Runtime analysis References We conduct runtime analysis on FACTEDITOR and the baselines in terms of number of processed Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2015. Neural Machine Translation by Jointly words per second, on both WEBEDIT and RO- Learning to Align and Translate. In International TOEDIT. Table 8 gives the results when the batch Conference on Learning Representations. size is 128 for all methods. Table-to-Text is the Denny Britz, Melody Guan, and Minh-Thang Luong. fastest, followed by FACTEDITOR. FACTEDITOR 2017. Efficient Attention using a Fixed-Size Mem- is always faster than ENCDECEDITOR, apparently ory Representation. In Proceedings of the 2017 Con- because it has a lower time complexity, as ex- ference on Empirical Methods in Natural Language Processing, pages 392–400, Copenhagen, Denmark. plained in Section 4. The texts in WEBEDIT are rel- Association for Computational Linguistics. atively short, and thus FACTEDITOR and ENCDE- CEDITOR have similar runtime speeds. In contrast, Kyunghyun Cho, Bart van Merrienboer,¨ Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Hol- the texts in ROTOEDIT are relatively long, and thus ger Schwenk, and Yoshua Bengio. 2014. Learn- FACTEDITOR executes approximately two times ing Phrase Representations using RNN Encoder– faster than ENCDECEDITOR. Decoder for Statistical Machine Translation. In (Ardmore Airport, runwayLength, 1411.0), { (Ardmore Airport, 3rd runway SurfaceType, Poaceae), Set of triples (Ardmore Airport, operatingOrganisation, Civil Aviation Authority of New Zealand), (Ardmore Airport, elevationAboveTheSeaLevel, 34.0), (Ardmore Airport, runwayName, 03R/21L) } Ardmore Airport , ICAO Location Identifier UTAA . Ardmore Airport 3rd runway Draft text is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport Revised text 3rd runway is made of Poaceae and Ardmore Airport name is 03R/21L . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Ardmore Airport , ICAO Location Identifier UTAA , is operated by ENCDECEDITOR Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 m long .

Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport FACTEDITOR 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Table 7: Example of generated revised texts given by ENCDECEDITOR and FACTEDITOR on WEBEDIT. Entities inRuntime green appear analysis in both the set of triples and the draft text. Entities in orange only appear in the draft text. Entities in blue should appear in the revised text but do not appear in the draft text. • FactEditor shows the 2nd fastest inference performance. • It processes three times faster than EncDecEditor on RotoEdit dataset.

WEBEDIT ROTOEDIT 6 Conclusion Table-to-Text 4,083 1,834 In this paper, we have defined a new task referred Text-to-Text 2,751 581 to as fact-based text editing and made two contri- ENCDECEDITOR 2,487 505 butions to research on the problem. First, we have FACTEDITOR 3,295 1,412 proposed a data construction method for fact-based

21 Table 8: Runtime analysis (# of words/second). Table- text editing and created two datasets. Second, we to-Text always shows the fastest performance (Bold- have proposed a model for fact-based text editing, faced). FACTEDITOR shows the second fastest runtime named FACTEDITOR, which performs the task by performance (Underlined). generating a sequence of actions. Experimental results show that the proposed model FACTEDI- TOR performs better and faster than the baselines, description about an unsupported fact (in orange) including an encoder-decoder model. appearing in the draft text. In contrast, FACTEDI- TOR can deal with the problem well. In addition, Acknowledgments ENCDECEDITOR conducts an unnecessary substi- We would like to thank the reviewers for their in- tution in the draft text (underlined). FACTEDITOR sightful comments. tends to avoid such unnecessary editing.

Runtime analysis References We conduct runtime analysis on FACTEDITOR and the baselines in terms of number of processed Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2015. Neural Machine Translation by Jointly words per second, on both WEBEDIT and RO- Learning to Align and Translate. In International TOEDIT. Table 8 gives the results when the batch Conference on Learning Representations. size is 128 for all methods. Table-to-Text is the Denny Britz, Melody Guan, and Minh-Thang Luong. fastest, followed by FACTEDITOR. FACTEDITOR 2017. Efficient Attention using a Fixed-Size Mem- is always faster than ENCDECEDITOR, apparently ory Representation. In Proceedings of the 2017 Con- because it has a lower time complexity, as ex- ference on Empirical Methods in Natural Language Processing, pages 392–400, Copenhagen, Denmark. plained in Section 4. The texts in WEBEDIT are rel- Association for Computational Linguistics. atively short, and thus FACTEDITOR and ENCDE- CEDITOR have similar runtime speeds. In contrast, Kyunghyun Cho, Bart van Merrienboer,¨ Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Hol- the texts in ROTOEDIT are relatively long, and thus ger Schwenk, and Yoshua Bengio. 2014. Learn- FACTEDITOR executes approximately two times ing Phrase Representations using RNN Encoder– faster than ENCDECEDITOR. Decoder for Statistical Machine Translation. In Summary

• We introduced the new task, Fact-based Text Editing.

• We have proposed a data construction method for fact-based text editing and created two datasets.

• We have proposed a model for fact-based text editing, which performs the task by generating a sequence of actions.

Code & Data available at https://github.com/isomap/factedit

22