Text Editing by Command

Text Editing by Command Felix Faltings};∗ Michel Galley♠ Gerold Hintz♠ Chris Brockett♠ Chris Quirk♠ Jianfeng Gao♠ Bill Dolan♠ }Department of Computer Science, ETH Zurich¨ ♠Microsoft Research [email protected] fmgalley,gehint,chrisbkt,chrisq,jfgao,[email protected] Abstract Barack Obama was the 44th President of the United States. A prevailing paradigm in neural text generation is one-shot generation, where text is pro- expand duced in a single step. The one-shot setting th is inadequate, however, when the constraints Barack Obama was the 44 President of the United States the user wishes to impose on the generated and the first African-American text are dynamic, especially when authoring to hold the office. longer documents. We address this limitation with an interactive text generation setting in add years in office which the user interacts with the system by is- th suing commands to edit existing text. To this Barack Obama was the 44 President of the United States end, we propose a novel text editing task, and from 2009 to 2017 and the introduce WikiDocEdits, a dataset of single- first African-American to hold sentence edits extracted from Wikipedia revi- the office. sion histories. We show that our Interactive Editor, a transformer-based model trained on Figure 1: An illustration of our interactive text gen- this dataset, outperforms baselines and obtains eration setting. This is an example generated by our positive results in both automatic and human model. The blue panels represent the text being edited, evaluations. We present empirical and qualita- taken from the document shown on the right. The tive analyses of this model’s performance.1 orange panels represent user edit commands. The model grounds edits in query results from a commer- 1 Introduction cial search engine. A long-standing goal of natural language process- ing research has been to generate long-form text Most such work only considers a one-shot gen- (Lebowitz, 1985; Fan et al., 2018; Rashkin et al., eration setting. Given a set of inputs, which may 2020). Recent large generative language models be a prompt, a control code (Keskar et al., 2019), such as GPT-2 (Radford et al., 2019), and GPT- or a table of data (Liu et al., 2018b) for example, 3 (Brown et al., 2020), demonstrate an impres- the system generates text in a single step. Humans, sive ability to generate fluent text, but their outputs though, often produce text through an evolutionary are difficult to control beyond a prompt, and they process involving multiple draft-edit cycles. This manifest a tendency to hallucinate facts (Wiseman is not simply because they make mistakes when et al., 2017). Much recent work has thus focused writing, but because they may require multiple it- on making such models more controllable (Keskar erations to help them shape and even make sense et al., 2019; Hu et al., 2017; Zhang et al., 2020; of what they want to express (Pirolli and Card, Dathathri et al., 2019), and factually grounded 2005). For example, consider a user writing an ar- (Guu et al., 2020; Liu et al., 2018b). ticle about Barack Obama. They might start with * Work done at Microsoft Research. a simple sentence such as “Barack Obama was the 1All our code (including code to recreate our data) 44th President of the United States”. Next, they and pre-trained models will be made available at: http://microsoft.com/research/project/ may wish to expand on that sentence, adding infor- interactive-document-generation mation, or rephrasing it to integrate it better with 5259 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5259–5274 June 6–11, 2021. ©2021 Association for Computational Linguistics the text. Replicating this process in software will 2 Text Editing Task mean allowing users to adjust their requirements in response to model outputs. Even an error-free We now formalize our text editing task. Let 2 system that meets all of a user’s initial require- D be a document, q a user command , and G ments does not obviate the need for iteration, since some appropriate form of grounding. More- 0 those constraints are themselves dynamic. While over, let D be an edited version of D. Then this work focuses on text, we also note that these our task is, given a dataset of edits D = 0 0 arguments extend to other settings where a system f(D0; q0; G0;D0); :::; (DN ; qN ; GN ;DN )g, learn 0 must generate a complex, structured object for a to produce document D , given D, q, and G. user, such as image or code generation. Note that while previous work on text editing usually only considers D as input, we include both The purpose of this paper is to bring into view a form of control q and grounding G. The com- the task of controllable text editing, as a step be- mand is needed because otherwise the type of edit yond one-shot generation towards interactive doc- to be made is undefined, while the grounding pro- ument generation. A full interactive document vides external knowledge needed to make an edit. generation system will likely comprise multiple In our specific instance of this task, we will only components, possibly including one-shot genera- consider sentence-level edits. More formally, we tion to create a first draft. Editing is crucial to in- consider edits D −! D0, where D and D0 differ teractivity because it allows users to change pre- only on a single sentence s 2 D, respectively s0 2 viously generated text to fit their dynamic con- D0. While, in general, edits can vary in complexity straints. This is a stateful operation, where the from document-level to character-level changes, state is the current version of the document, as op- sentences are a natural way to break down text posed to stateless recasting of text from scratch us- into relatively independent units of meaning, so it ing a one-shot model. While services like Gram- makes sense to edit text one sentence at a time. marly or MS Word already offer rewriting sugges- More complex, document-level edits can be seen tions, they mainly focus on syntactic or stylistic as a composition of multiple sentence-level edits. edits such as paraphrases (Gupta et al., 2018). In Additionally, we will consider user commands q this work, we are interested in a broader range of written in natural language, e.g., “add years in of- edits, particularly those that add or remove con- fice”. The command could also take other forms, tent, or change the meaning of text. Figure1 il- such as a categorical variable, but natural lan- lustrates this editing setting with an example from guage allows for the greatest flexibility in spec- our trained model, where a user produces a sen- ifying what the edit should accomplish. More- tence about Barack Obama over multiple edits. over, natural language commands are a good fit for our model, which we will initialize with pre- In sum, we make the following contributions: trained language model weights. For similar rea- We introduce a challenging new text editing task, sons, we will also consider corpora of text snippets wherein a model must learn to edit text in response as our grounding G. Alternatively, the grounding to a user command, while drawing on ground- could also consist of structured data such as tables ing to avoid problems of hallucination (Wise- or graphs. In a real user scenario, this grounding man et al., 2017). To accompany this task, we might be supplied by the user, or retrieved on the release an open-source dataset of sentence-level fly. For our dataset, we pre-retrieve groundings by edits extracted from Wikipedia, including editor querying a commercial search engine. comments, which we leverage as natural language commands, together with pre-retrieved grounding 3 Data documents. We show that a transformer-based editing model trained on our data outperforms To accompany our text editing task we present a “parrot” and GPT-2 baselines, and obtains com- novel dataset of nearly 12 million sentence-level petitive results compared to gold-standard edits in edits, WikiDocEdits. These edits were extracted human evaluations. We then perform an empirical from the revision histories in the February 1, 2020 analysis of our model’s performance, showing the 2This notation reflects that the edit command is analogous importance of the command and grounding, and to a query in a retrieval or QA setting in that it expresses a the varying difficulty of edits in our dataset. form of user intent. 5260 3 dump of English Wikipedia. Statistic Percentiles Mean For a given Wikipedia page, a revision consists 25% 50% 75% of a source and target text, corresponding to the old and new versions of the page. Each revision Sentence length 16 23 31 25:25 is also accompanied by an editor comment, which Diff length 2 3 9 7:27 we will use as a proxy for the user command. For a Comment length 2 3 7 5:20 given revision, we split the source and target texts into sentences and then attempt to match the sen- Table 1: Summary statistics of WikiDocEdits. All tences between source and target. For efficiency, statistics were computed on a 1% subsample of the we only look at a k-sentence neighborhood. Un- data. Lengths reported in number of words. The diff length corresponds to the number of words, inserted or matched sentences are candidates for edits. A deleted, affected by a given edit. source sentence s and target sentence t form an edit pair s −! t if f(s; t) > , where f is sentence- level BLEU4 without smoothing and = 0:1 in 3.2 Data Analysis our case.

Text Editing by Command

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support