
From: AAAI-93 Proceedings. Copyright © 1993, AAAI (www.aaai.org). All rights reserved. Jacques Robin and Kathleen McKeown Department of Computer Science Columbia University New York, NY 10027 { robin,kathy}@cs.columbia.edu Abstract 1. Draft sentence: “San Antonio, TX - David Robinson scored 32 points The complex sentences of newswire reports con- Friday night lifting the San Antonio Spurs to a 127 111 tain floating content units that appear to be op- victory over the Denver Nuggets.” portunistically placed where the form of the sur- Clause coordination with reference adjustment: rounding text allows. We present a corpus anal- “San Antonio, TX - David Robinson scored 32 points ysis that identified precise semantic and syntactic Friday night LIFTING THE SAN ANTONIO SPURS TO A constraints on where and how such information is 127 111 VICTORY OVER DENVER and handing the realized. The result is a set of revision tools that Nuggets their seventh straight loss”. form the rule base for a report generation system, Embedded nominal apposition: allowing incremental generation of complex sen- “San Antonio, TX - David Robinson scored 32 points tences. Friday night lifting the San Antonio Spurs to a 127 111 victory over THE DENVER NUGGETS, losers of seven in a row”. Introduction Generating reports that summarize quantitative data Figure 1: Attaching a floating content unit onto raises several challenges for language generation sys- different draft sentence SUBCONSTITUENTS tems. First, sentences in such reports are very com- plex (e.g., in newswire basketball game summaries the lead sentence ranges from 21 to 46 words in length). Second, while some content units consistently appear many domains and is receiving growing attention (cf. in fixed locations across reports (e.g., game results are [Rubinoff, 19901, [Elhadad and Robin, 19921, [Elhadad, always conveyed in the lead sentence), others float, ap- 19931). pearing anywhere in a report and at different linguis- These observations suggest a generator design where tic ranks within a given sentence. Floating content a draft incorporating fixed content units is produced units appear to be opportunistically placed where the first and then any floating content units that can be form of the surrounding text allows. For example, in accommodated by the surrounding text are added. Ex- Fig. 1, sentences 2 and 3 result from adding the same periments by [Pavard, 19851 provide evidence that only streak information (i.e., data about a series of simi- such a revision-based model of complex sentence gen- lar outcomes) to sentence 1 using different syntactic eration can be cognitively plausible’. categories at distinct structural levels. To determine how floating content units can be in- Although optional in any given sentence, floating corporated in a draft, we analyzed a corpus of basket- content units cannot be ignored. In our domain, they ball reports, pairing sentences that differ semantically account for over 40% of lead sentence content, with by a single floating content unit and identifying the some content types only conveyed as floating struc- minimal syntactic transformation between them. The tures. One such type is historical information (e.g., result is a set of revision too/s, specifying precise se- maximums, minimums, or trends over periods of time). mantic and syntactic constraints on (1) where a par- Its presence in all reports and a majority of lead sen- ticular type of floating content can be added in a draft tences is not surprising, since the relevance of any game and (2) what linguistic constructs can be used for the fact is often largely determined by its historical signif- addition. icance. However, report generators to date [Kukich, The corpus analysis presented here serves as the ba- 19831, [Bourbeau et ad., 19901 are not capable of includ- sis for the development of the report generation sys- ing this type of information due to its floating nature. The issue of optional, floating content is prevalent in ‘cf. [Robin, 19921 for discussion. Natural Language Generation 365 Basic Sentence Example Patrick Ewing scored 41 points Tuesday night to lead the New York Knicks to a 97-79 win over the Hornets Complex Sentence Karl Malone scored 28 points Saturday and John Stockton leading the Utah Jazz to its fourth straight victory, added a season-high 27 points and a league-high 23 assists a 105-95 win over the Los Angeles Clippers Figure 2: Similarity of basic and complex sentence structures tern STREAK (Surface Text Revision Expressing Addi- e Other final game statistics (e.g., “Stockton finished tional Knowledge). The analysis provides not only the with 27 points”). knowledge sources for the system and motivations for o Streaks of similar results (e.g., “Utah recorded its its novel architecture (discussed in [Robin, 1993]), but fourth straight win”). also with means for ultimately evaluating its output. e Record performances (e.g., Ttockton scored a While this paper focuses on the analysis, the on-going season-high 27 points”). system implementation based on functional unification is discussed in [Robin, 19921. After describing our corpus analysis methodology, Complex Sentence Structure We noted that bu- we present the resulting revision tools and how they sic corpus sentences, containing only the four fixed con- can be used to incrementally generate complex sen- tent units, and complex corpus sentences, which in ad- tences. We conclude by previewing our planned use of dition contain up to five floating content units, share the corpus for evaluation and testing. a common top-level structure. This structure consists of two main constituents, one containing the notable statistic (the notable statistic cluster) and the other Corpus analysis methodology containing the game result (the game result cluster), which are related either paratactically or hypotacti- We analyzed the lead sentences of over 800 basket- tally with the notable statistic cluster as head. Hence, ball games summaries from the UPI newswire. We the only structural difference is that in the complex focused on the first sentence after observing that all sentences additional floating content units are clus- reports followed the inverted pyramid structure with tered around the notable statistic and/or the game re- summary lead [Fensch, 19881 where the most crucial sult. For example, the complex sentence at the bottom facts are packed in the lead sentence. The lead sen- of Fig. 2 has the same top-level structure as the basic tence is thus a self-contained mini-report. We first sentence at the top, but four floating content units are noted that all 800 lead sentences contained the game clustered around its notable statistic and a fifth one result (e.g., “Utah beat Miami I&5-95”), its location, with its game result. Furthermore, we found that when date and at least one final game statistic: the most floating elements appear in the lead sentence, their se- notable statistic of a winning team player. We then mantics almost always determines in which of the two semantically restricted our corpus to about 300 lead clusters they appear (e.g., streaks are always in the sentences which contained only these four fixed con- game result cluster). tent units and zero or more floating content units of These corpus observations show that any complex the most common types, namely: sentence can indeed be generated in two steps: (1) pro- 366 Robin duce a basic sentence realizing the fixed content units, onto a verb ( “to defeat”) in &I. &, rather than Rdi (2) incrementally revise it to incorporate floating con- is thus the surface decrement of Ri2. tent units. Furthermore, they indicate that floating We identified 270 surface decrement pairs in the cor- content units can be attached within a cluster, based pus. For each such pair, we then determined the struc- on local constraints, thus simplifying both generation tural transformations necessary to produce the more and our corpus analysis. When we shifted our atten- complex pattern from the simpler base pattern. We tion from whole sentence structure to internal cluster grouped these transformations into classes that we call structures, we split the whole sentence corpus into two revision tools. subsets: one containing notable statistic clusters and the other, game result clusters. evisions for incremental generation Cluster structure To identify syntactic and lexi- We distinguished two kinds of revision tools. Simple cal constraints on the attachment of floating content revisions consist of a single transformation which pre- units within each cluster, we analyzed the syntactic serves the base pattern and adds in a new constituent. form of each cluster in each corpus lead sentence to Complex revisions are in contrast non-monotonic; an derive realization patterns. Realization patterns ab- introductory transformation breaks up the integrity of stract from lexical and syntactic features (e.g., connec- the base pattern in adding in new content. Subsequent tives, mood) to represent the different mappings from restructuring transformations are then necessary to re- semantic structure to syntactic structure. Examples store grammaticality. Simple revisions can be viewed of realization patterns are given in Fig. 3. Each col- as elaborations while complex revisions require true re- umn corresponds to a syntactic constituent and each vision. entry provides information about this constituent: (1) semantic content2, (2) grammatical function, (3) struc- Simple revisions We identified four main types tural status (i.e. head, argument, adjunct etc) and (4- of simple revisions: Adj oin5, Append, Conjoin and 5) syntactic category3. Below each pattern a corpus Absorb. Each is characterized by the type of base example is given. structure to which it applies and the type of revised Realization patterns represent the structure of en- structure it produces. For example, Adjoin applies tire clusters, whether basic or complex. To discover only to hypotactic base patterns. It adds an adjunct how complex clusters can be derived from basic ones A, under the base pattern head Bh as shown in Fig.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-