
Generating Natural Language Summaries from Multiple OnLine Sources PhD Thesis Prop osal Dragomir R Radev radevcscolumbiaedu Technical Rep ort CUCS Department of Computer Science Columbia University New York NY March ii Contents Intro duction Summarization of multiple articles Summarization from multiple sources Symb olic summarization through text understanding and generation Automatic acquisition of lexical resources for generation Interop erability Asynchronous summarization Structure of the prop osal Related Work Text Summarization Building Knowledge Sources for Generation Summarization of NonTextual Data Description Extraction Architectures for Intelligent Information Pro cessing Systems Previous Work on Summary Generation at Colum bia System overview Generating the Base Summary Overview of the Summarization Comp onent Metho dology collecting and using a summary corpus Summary op erators for content planning Change of p ersp ective Contradiction Addition Renement Agreement Sup ersetGeneralization Trend No information Algorithm Input Prepro cessing i Heuristic combination Discourse planning Format conversion Ordering of templates and linguistic generation An example of system op eration Asynchronous notication Including Information from NonTextual Sources Intro duction Case Study Generating Descriptions Creation of a Database of Proles Extraction of entity names from old newswire Extraction of descriptions Categorization of descriptions Organization of descriptions in a database of proles Generation of Descriptions Transformation of descriptions into Functional Descriptions Lexicon creation System Status Planned Work and Prop osed Evaluation System status Planned work and prop osed evaluation Generating up dates from livenews Integrating textual and structured information Robustnesslexicon Prop osed evaluation Schedule for completion Corpus Analysis Lexicon and Language Generation Summarization Op erators Message Ontology Integration of Sources Message Understanding Agents and KQML User Interface Interop erability Possible extensions b eyond the scop e of the thesis work Conclusion and Contributions Contributions Acknowledgments A App endix Schedule for Completion ii List of Figures SUMMONS Architecture Sample MUC Template Parsed MUC Template Sample output from SUMMONS Rules for the Contradiction op erator Messages Template for newswire message one Template for newswire message two emplate for newswire message three T Template for newswire message four SUMMONS output based on the four messages KQML subscription message KQML reply message Finitestate representation of Yasser Arafat in the search pattern Prole for John Ma jor Retrieved description for Silvio Berlusconi Generated FD for Silvio Berlusconi iii List of Tables Twoword and threeword sequences retrieved by the system Examples of retrieved descriptions A Schedule for completion iv Abstract We present a metho dology for summarization of news on currentevents Our approach is included in a system called SUMMONS which presents news summaries to the user in a natural language form along with appropriate background historical information from b oth textual newswire and structured database knowledge sources The system presents novel approaches to several problems summarization of multiple sources summarization of multiple articles symb olic summarization through text understanding and generation asynchronous summarization and generation of textual up dates Wepay sp ecic attention to the generation of summaries that include descriptions of entities such as p eople and places We showhow certain classes of lexical resources can b e automatically extracted from online corp ora and used in the generation of textual summaries We describ e our approach to system mo dules with solving the interop erability problem of the various comp onents by wrapping all facilitators which eect the communication b etween the comp onents using a standardized language We present a plan for completion of the researchaswell as a set of metrics that can b e used in measuring the p erformance of the system Chapter Intro duction One of the ma jor problems with the Internet is the abundance of information and the diculty for the average computer user to read everything existing on a sp ecic topic There exist now more than op erational sources of live newswire on the Internet mostly accessible through the WorldWide Web BernersLee Some of the most p opular sites include Reuters News Reuters CNNs Web site CNN ClariNets eNews online newspap er ClariNet as well as the New York Times online edition NYT For the typical user it is nearly imp ossible to go through megabytes of news every day to select articles he wishes to read Even in the case when the user can actually select all news relevanttothe topic of his interest he will still be faced with the problem of selecting a small subset that he can actually read in a limited time from the immense corpus of news available Hence there is a need for search and selection services as well as for summarization facilities made available to the user There currently exist more than searc h and selection services on the WorldWide Web suchas DECs AltaVista AltaVista Lycos Lycos and DejaNews DejaNews all of which allowkeyword searches for recent news However only recently have there b een practical results in the area of summarization One currently existing Webbased summarization system NetSummdevelop ed by the Language Group at British Telecom Lab oratories Preston and Williams Cuts NetSumm uses a statistical languageindep endent approach to selecting relevantsentences from a news article It has an impressive user interface and is practically domainindep endent but do esnt address two ma jor issues it only summarizes articles that the user has selected and it only summarizes a single article at a time Other statistical systems Kupiec et al Rau et al while using dierent.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages56 Page
-
File Size-