Automatic Text Summarization: a Review
Total Page:16
File Type:pdf, Size:1020Kb
eKNOW 2017 : The Ninth International Conference on Information, Process, and Knowledge Management Automatic Text Summarization: A review Naima Zerari, Samia Aitouche, Mohamed Djamel Mouss, Asma Yaha Automation and Manufacturing Laboratory Department of Industrial Engineering Batna 2 University Batna, Algeria Email: [email protected], [email protected], [email protected], [email protected] Abstract—As we move into the 21st century, with very rapid that the user wants to get [4]. With document summary mobile communication and access to vast stores of information, available, users can easily decide its relevancy to their we seem to be surrounded by more and more information, with interests and acquire desired documents with much less less and less time or ability to digest it. The creation of the mental loads involved. [5]. automatic summarization was really a genius human solution to solve this complicated problem. However, the application of Furthermore, the goal of automatic text summarization is this solution was too complex. In reality, there are many to condense the documents into a shorter version and problems that need to be addressed before the promises of preserve important contents [3]. Text Summarization automatic text summarization can be fully realized. Basically, methods can be classified into two major methods extractive it is necessary to understand how humans summarize the text and abstractive summarization [6]. and then build the system based on that. Yet, individuals are so The rest of the paper is organized as follows: Section 2 is different in their thinking and interpretation that it is hard to about text summarization, precisely the definition of the create "gold-standard" summary against which output summaries will be evaluated. In this paper, we will discuss the summary; Section 3 describes the automatic text basic concepts of this topic by giving the most relevant summarization; Section 4 depicts the models of automatic definitions, characterizations, types and the two different text summarization; Section 5 defines the summaries approaches of automatic text summarization: extraction and characteristics; Section 6 presents a brief review of the two abstraction. Special attention is devoted to the extractive text summarization methods and finally Section 7 concludes approach. It consists of selecting important sentences and this paper and outlines the envisaged research work. paragraphs from the original text and concatenating them into shorter form. Broadly, the importance of sentences is decided based on statistical features of sentences. This approach avoids II. TEXT SUMMARIZATION any efforts on deep text understanding. It is conceptually simple and easy to implement. The human being needs a summary mainly because it reduces reading time and it makes the selection process Keywords- Text summarization; Automatic text easier during the search of document process. summarization; Abstractive approach; Extractive approach; Text summarization can be used by various applications; Natural language processing. for instance researchers need a summary for deciding whether to read the entire document or not and for I. INTRODUCTION summarizing information searched by user on Internet. Summarizing documents involves cognitive effort from the The rapid evolution of WWW has made huge quantity of summarizer: different fragments of a text must be selected, documents on a variety of topics available to the users reformulated and assembled according to their relevance. [1][2]. To exploit these documents effectively, it is required The coherence of the information included in the summary to be able to get a summary of them. However, it is very must also be taken into account [7]. Thus, text difficult for humans to create a hand written summary of the summarization, the reduction of a text to its essential content, entire available document. Automatic Text Summarization is a task that requires linguistic competence, world (ATS) provides a solution to this information overload knowledge, and intelligence [7]. The subfield of problem [2]. Hence, ATS has become an important and summarization has been investigated by the Natural timely tool for user to quickly understand the large volume Language Processing (NLP) community for nearly the last of information [3]. The automatic summarization included half century. Radev et al [8] define a summary as: “a text that in language processing field, is the process of dealing with a is produced from one or more texts that convey important large amount of information by comprising only the essential information in the original text, and that is no longer than ones. It often occurs in everyday communication and it is an half of the original text(s) and usually significantly less than important and professional skill for some people. Automatic that”. This simple definition captures three important aspects text summarization aims at providing a condensed that characterize research on automatic summarization [8]: representation of the content according to the information Copyright (c) IARIA, 2017. ISBN: 978-1-61208-542-5 20 eKNOW 2017 : The Ninth International Conference on Information, Process, and Knowledge Management . Summaries may be produced from a single document most of the abstractive summarization methods produce or multiple documents. highly coherent, cohesive, rich information and less . Summaries should preserve important information. redundant summary. Munot and Govilkar [12] discussed in details the two main categories of text summarization . Summaries should be short. methods and also presented a taxonomy of summarization The summary done by means of a computer, i.e., systems, statistical and linguistic approaches for automatically, is called Automatic Text Summarization. summarization. Sariki et al [2] proposed a system to generate a summary of a single document, specifying the III. AUTOMATIC TEXT SUMMARIZATION keywords and adjusting the length of the final summary to Automatic text summarization is the technique which produce. The proposed system has been improved a lot in compresses a large text to a shorter text which includes the accuracy. The authors precise also that the generated important information. The computer program is given a text summary can be visualized in the form of a Power Point and it returns a summary of the original text. This is done by presentation (PPT), thus making it easy for the user to create reducing redundancy of the text and by extracting the an effective classroom presentation. So, they propose to essence of the text [9]. Generally, a summary should be extend their work to multiple documents in future. Chandra much shorter than the source text. This characteristic is et al [5] proposed K-mixture semantic relationship defined by the compression rate, which measures the ratio of significance (KSRS) approach. It is a statistical approach to length of summary to the length of original text [3]. The first text summarization. The proposed approach combines the effort on automatic text summarization system was made in K- mixture term weighting scheme, based on a the late 1950. This automatic summarizer selects significant mathematical (probabilistic) ground, and the linguistic sentences from the document and concatenates them together technique. This latter explores term relationships by finding [3]. Currently automatic text summarization has benefited the semantic relationship significance of nouns that signifies from the expertise of a range of fields of research: term and sentence semantics. The authors specified that the information retrieval and information extraction, natural proposed approach, KSRS, performs better and language generation, discourse studies, machine learning and consequently its feasibility in text summarization technical studies used by professional summarizers [7]. applications is justifiable. Also, they specified that its use Summaries can be divided in two main categories: extractive allows the choice of a lower summary proportion without and abstractive. worrying about the performance deterioration. An abstractive summarization tries to develop an IV. AUTOMATIC TEXT SUMMARIZATION MODELS understanding of the main concepts in a document and then express those concepts in clear natural language. It uses Depending upon the number of documents accepted as linguistic methods to study the text and then to find the new input by a summarization process, automatic text concepts and expressions to best describe it by generating a summarization can be categorized as single document new shorter text that conveys the salient information from summarization and multi-document summarization as shown the original text document [6]. This method is the more in Fig. 1 below. difficult and it is poorly practical. It is highly complex as it In the model Single Document Text Summarization, a needs extensive natural language processing. summary is produced from single input document. The An extractive summarization consists of selecting important single document summarization process flow can be sentences or paragraphs from the original document and depicted in Fig. 2. However, in Multi Document Text concatenating them into shorter form. The importance of Summarization, a summary is produced from multiple input sentences is decided based on statistical and linguistic documents dealing with the same topic as illustrate in Fig. 3. features of sentences [6]. This method is fairly applicable In 1995, Radev and McKeown [13] were the first