Longsumm 2021: Session Based Automatic Summarization Model for Scientific Document

Total Page:16

File Type:pdf, Size:1020Kb

Longsumm 2021: Session Based Automatic Summarization Model for Scientific Document LongSumm 2021: Session based automatic summarization model for scientific document Senci Ying, Yanzhao Zheng, Wuhe Zou NetEase, China {yingsenci, zhengyanzhao, zouwuhe}@corp.netease.com Abstract of the advantages of different models. Combined with the later models and solutions, we propose an Most summarization task focuses on generat- ensemble method based on session like figure1. ing relatively short summaries. Such a length constraint might not be appropriate when sum- marizing scientific work. The LongSumm task needs participants generate long summary for scientific document. This task usual can be solved by language model. But an impor- tant problem is that model like BERT is limit to memory, and can not deal with a long in- put like a document. Also generate a long output is hard. In this paper, we propose a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary. And our model achieves the best performance in the LongSumm task. Figure 1: SBAS: a session based automatic summariza- 1 Introduction tion model Most of the document summarization tasks fo- cus on generate a short summary that keeps the We split the task into four steps: session gen- core idea of the original document. For long eration, extraction, abstraction, and merging the scientific papers, a short abstract is not long results at the end. First, we split an document enough to cover all the salient information. Re- into several sessions with a certain size, and use searchers often summarize scientific articles by a rouge metric to match the ground truth (sen- writing a blog, which requires specialized knowl- tences from given document’s summary). Then, edge and a deep understanding of the scien- we train two different types of model. One is the tific domain. The LongSumm, a shared task abstraction-based model. Specifically, we use the of SDP 2021(https://sdproc.org/2021/ BIGBIRD(Zaheer et al., 2020), a sparse attention sharedtasks.html), opts to leverage blog mechanism that reduces this quadratic dependency posts created by researchers that summarize sci- to linear, and PEGASUS(Zhang et al., 2020), a pre- entific articles and extractive summaries based on trained model specially designed for summariza- video talks from associated conferences(Lev et al., tion. The other one is based on extraction method. 2019) to address the problem mentioned above. We test the performance of TextRank(Mihalcea Most of the previous methods divide the doc- and Tarau, 2004; Xu et al., 2019), DGCNN(Dilate ument according to section, and use the extrac- Gated Convolutional Neural Network)(Su, 2018) tion or abstraction model to predict the summary and BERTSUMM(Liu, 2019). In the end, for each for each part respectively, and combine the results type of model, we generate the summary from the as the final summary of the document. Section one which has the best performance, and use an based method may drop some important informa- ensemble method to merge the summaries together. tion among the sections. Generally, only uses one The result show that our method is effective and type of model for prediction can not make good use beats the state-of-art models in this task. 97 Proceedings of the Second Workshop on Scholarly Document Processing, pages 97–102 June 10, 2021. ©2021 Association for Computational Linguistics 2 Related Work generate the final summary. The common automatic summarization is mainly 3.1 Session Generation divided into the extraction-based summarization Limited by the computational power, many meth- and the abstraction-based summarization. The ods chose to truncate long articles directly, which extraction-based model extracts several sentences makes the model unable to perceive the content of and words from the original article by the seman- the following articles, and the generated summary tic analysis and sentence importance analysis to can only reflect part of the input text. Others divide form the abstract of the article. Typical models the article into sections, but this also raise some include TextRank(Mihalcea and Tarau, 2004; Xu problems. The length and content of section are et al., 2019) algorithm which based on sentence different between different articles. The division importance and the extraction method based on based on section may not reflect the relationship be- pre-training model(Liu, 2019). The abstracts ob- tween text and abstract well. This paper proposes tained by the extraction model can better reflect a segmentation method based on session, which the focus of the article, but because the extracted divides the article into different sessions according sentences are scattered in different parts of the ar- to the selected size, predicts the summary for each ticle, the coherence of the abstracts is a problem session, and selects the most appropriate window to be challenged. The abstraction-based models size in this task by adjusting the size of the session. are based on the structure of seq2seq, and the pre- The specific data processing steps are as follows: training model is used to achieve better generation (1) First, select the appropriate session size(2048 effect like BART(Lewis et al., 2019), T5(Raffel words) and a buffer(128 words), which is used to et al., 2019). Recently, PEGASUS(Zhang et al., keep the last text of the previous session as the 2020), a pre-training model released by Google, context of the current session. (2) For generating specially designed the pre-training mode for the models. The real summary is divided into sen- summarization task, and achieved the state-of-art tences, and the corresponding summary sentence performance on all 12 downstream datasets. is assigned to each session according to the rouge This task focuses on the the solution of the long metric. In order to make the model predict long summary. The input and ouput text of the tradi- summaries as much as possible, a greedy match- tional model is limited due to the memory and ing rule is used to allocate the summary sentences time-consuming. However, this task requires the to each session. we first drop the sentences with model to summarize scientific papers and generate the threshold 0.7, which denotes the rouge score very long summaries. To solve this problem, most between the session and summary sentences. Then of the solutions in the the previous are based on sec- we pick the sentences according to the scores until tions(Li et al., 2020; Roy et al., 2020). They divide meets the length we set, default 256 words. scientific papers into sections, generate abstracts Although this may cause different sessions to for each seciton, and finally combine them to get predict the same summary, we think that duplicate the final results. Resently, Google’s new model sentences can be detected through the later data BIGBIRD(Zaheer et al., 2020) , using sparse atten- processing, and it is more important for the train- tion mechanism to enable the model fit long text, is ing model to generate long sentences . (3) For suitable for this task scenario. the extraction model, we only need to match dif- 3 Method ferent sessions with their corresponding summary sentences. The pre-training model plays a significant role in the field of automatic summarization, but due to 3.2 Abstraction-based Model its huge amount of parameters, most of the models The training data contains around 700 abstractive can only be used for short text tasks. For long summaries that come from different domains of articles, there are two common ways to do. One CS including ML, NLP, AI, vision, storage, etc. is to directly truncate the long articles, the other And the abstractive summaries are blog posts cre- is to predict the articles according to the section. ated by NLP and ML researchers. The traditional This paper proposes a text segmentation method generation model is mainly based on the classical based on session, and use an ensemble method with transformers structure. In order to solve the prob- the extraction model and the abstraction model to lem of long text input , we use the sparse attention 98 structure BIGBIRD(Zaheer et al., 2020), which according to the golden extractive summarize. is proposed by Google recently, and makes fine- tuning on its two open source pre-training models: 2. Transform each sentence by Roberta-Large (1) Roberta(Liu et al., 2019): a bert model with pre-trained model(Liu et al., 2019), and get the dynamic masking and drops the next predict the output of last hidden layers as the feature loss representation, then convert the feature matrix (2) PEGASUS(Zhang et al., 2020): a trans- to a fixed-size vector by average-pooling. former model while using gap sentences generation 3. TRAINING: Feed the obtained sentence vec- to pre-training. tors into the DGCNN-Extraction model (Fig- The models used in this paper are both pre- ure2) and binary classify each sentence. trained on arXiv datasets, so they have strong abil- ity to generate abstracts. 4. INFERENCE: Take the sigmoid-output of the model as the importance score for each sen- 3.3 Extraction-based Model tence, according to which we extract the cor- The extractive data have 1705 extractive summaries responding sentences from the paper as the which are based on video talks from associated con- extractive summary and the total length of the ferences(Lev et al., 2019). We have tried tree differ- summary is limited. ent extraction models to select important sentence from the documents. (3) BERTSUMM(Liu, 2019): BERTSUMM is (1) TextRank(Mihalcea and Tarau, 2004): We a Bert-based model designed for the extractive simply use the TextRank algorithm to pick out summarization task. Different from DGCNN- some most important sentences from the docu- Extraction model, because of the limit of the input ments and limited the number of sentences ex- length of Bert, we have to divide each paper into tracted.
Recommended publications
  • Automatic Summarization of Student Course Feedback
    Automatic Summarization of Student Course Feedback Wencan Luo† Fei Liu‡ Zitao Liu† Diane Litman† †University of Pittsburgh, Pittsburgh, PA 15260 ‡University of Central Florida, Orlando, FL 32716 wencan, ztliu, litman @cs.pitt.edu [email protected] { } Abstract Prompt Describe what you found most interesting in today’s class Student course feedback is generated daily in Student Responses both classrooms and online course discussion S1: The main topics of this course seem interesting and forums. Traditionally, instructors manually correspond with my major (Chemical engineering) analyze these responses in a costly manner. In S2: I found the group activity most interesting this work, we propose a new approach to sum- S3: Process that make materials marizing student course feedback based on S4: I found the properties of bike elements to be most the integer linear programming (ILP) frame- interesting work. Our approach allows different student S5: How materials are manufactured S6: Finding out what we will learn in this class was responses to share co-occurrence statistics and interesting to me alleviates sparsity issues. Experimental results S7: The activity with the bicycle parts on a student feedback corpus show that our S8: “part of a bike” activity approach outperforms a range of baselines in ... (rest omitted, 53 responses in total.) terms of both ROUGE scores and human eval- uation. Reference Summary - group activity of analyzing bicycle’s parts - materials processing - the main topic of this course 1 Introduction Table 1: Example student responses and a reference summary Instructors love to solicit feedback from students. created by the teaching assistant. ‘S1’–‘S8’ are student IDs.
    [Show full text]
  • Using N-Grams to Understand the Nature of Summaries
    Using N-Grams to Understand the Nature of Summaries Michele Banko and Lucy Vanderwende One Microsoft Way Redmond, WA 98052 {mbanko, lucyv}@microsoft.com views of the event being described over different Abstract documents, or present a high-level view of an event that is not explicitly reflected in any single document. A Although single-document summarization is a useful multi-document summary may also indicate the well-studied task, the nature of multi- presence of new or distinct information contained within document summarization is only beginning to a set of documents describing the same topic (McKeown be studied in detail. While close attention has et. al., 1999, Mani and Bloedorn, 1999). To meet these been paid to what technologies are necessary expectations, a multi-document summary is required to when moving from single to multi-document generalize, condense and merge information coming summarization, the properties of human- from multiple sources. written multi-document summaries have not Although single-document summarization is a well- been quantified. In this paper, we empirically studied task (see Mani and Maybury, 1999 for an characterize human-written summaries overview), multi-document summarization is only provided in a widely used summarization recently being studied closely (Marcu & Gerber 2001). corpus by attempting to answer the questions: While close attention has been paid to multi-document Can multi-document summaries that are summarization technologies (Barzilay et al. 2002, written by humans be characterized as Goldstein et al 2000), the inherent properties of human- extractive or generative? Are multi-document written multi-document summaries have not yet been summaries less extractive than single- quantified.
    [Show full text]
  • Automatic Summarization of Medical Conversations, a Review Jessica Lopez
    Automatic summarization of medical conversations, a review Jessica Lopez To cite this version: Jessica Lopez. Automatic summarization of medical conversations, a review. TALN-RECITAL 2019- PFIA 2019, Jul 2019, Toulouse, France. pp.487-498. hal-02611210 HAL Id: hal-02611210 https://hal.archives-ouvertes.fr/hal-02611210 Submitted on 30 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Jessica López Espejel Automatic summarization of medical conversations, a review Jessica López Espejel 1, 2 (1) CEA, LIST, DIASI, F-91191 Gif-sur-Yvette, France. (2) Paris 13 University, LIPN, 93430 Villateneuse, France. [email protected] RÉSUMÉ L’analyse de la conversation joue un rôle important dans le développement d’appareils de simulation pour la formation des professionnels de la santé (médecins, infirmières). Notre objectif est de développer une méthode de synthèse automatique originale pour les conversations médicales entre un patient et un professionnel de la santé, basée sur les avancées récentes en matière de synthèse à l’aide de réseaux de neurones convolutionnels et récurrents. La méthode proposée doit être adaptée aux problèmes spécifiques liés à la synthèse des dialogues. Cet article présente une revue des différentes méthodes pour les résumés par extraction et par abstraction et pour l’analyse du dialogue.
    [Show full text]
  • Exploring Sentence Vector Spaces Through Automatic Summarization
    Under review as a conference paper at ICLR 2018 EXPLORING SENTENCE VECTOR SPACES THROUGH AUTOMATIC SUMMARIZATION Anonymous authors Paper under double-blind review ABSTRACT Vector semantics, especially sentence vectors, have recently been used success- fully in many areas of natural language processing. However, relatively little work has explored the internal structure and properties of spaces of sentence vectors. In this paper, we will explore the properties of sentence vectors by studying a par- ticular real-world application: Automatic Summarization. In particular, we show that cosine similarity between sentence vectors and document vectors is strongly correlated with sentence importance and that vector semantics can identify and correct gaps between the sentences chosen so far and the document. In addition, we identify specific dimensions which are linked to effective summaries. To our knowledge, this is the first time specific dimensions of sentence embeddings have been connected to sentence properties. We also compare the features of differ- ent methods of sentence embeddings. Many of these insights have applications in uses of sentence embeddings far beyond summarization. 1 INTRODUCTION Vector semantics have been growing in popularity for many other natural language processing appli- cations. Vector semantics attempt to represent words as vectors in a high-dimensional space, where vectors which are close to each other have similar meanings. Various models of vector semantics have been proposed, such as LSA (Landauer & Dumais, 1997), word2vec (Mikolov et al., 2013), and GLOVE(Pennington et al., 2014), and these models have proved to be successful in other natural language processing applications. While these models work well for individual words, producing equivalent vectors for sentences or documents has proven to be more difficult.
    [Show full text]
  • Keyphrase Based Evaluation of Automatic Text Summarization
    International Journal of Computer Applications (0975 – 8887) Volume 117 – No. 7, May 2015 Keyphrase based Evaluation of Automatic Text Summarization Fatma Elghannam Tarek El-Shishtawy Electronics Research Institute Faculty of Computers and Information Cairo, Egypt Benha University, Benha, Egypt ABSTRACT KpEval idea is to count the matches between the peer The development of methods to deal with the informative summary and reference summaries for the essential parts of contents of the text units in the matching process is a major the summary text. KpEval have three main modules, i) challenge in automatic summary evaluation systems that use lemma extractor module that breaks the text into words and fixed n-gram matching. The limitation causes inaccurate extracts their lemma forms and the associated lexical and matching between units in a peer and reference summaries. syntactic features, ii) keyphrase extractor that extracts The present study introduces a new Keyphrase based important keyphrases in their lemma forms, and iii) the Summary Evaluator (KpEval) for evaluating automatic evaluator that scoring the summary based on counting the summaries. The KpEval relies on the keyphrases since they matched keyphrases occur between the peer summary and one convey the most important concepts of a text. In the or more reference summaries. The remaining of this paper is evaluation process, the keyphrases are used in their lemma organized as follows: Section 2 reviews the previous works; form as the matching text unit. The system was applied to Section 3 the proposed keyphrase based summary evaluator; evaluate different summaries of Arabic multi-document data Section 4 discusses the performance evaluation; and section 5 set presented at TAC2011.
    [Show full text]
  • Latent Semantic Analysis and the Construction of Coherent Extracts
    Latent Semantic Analysis and the Construction of Coherent Extracts Tristan Miller German Research Center for Artificial Intelligence0 Erwin-Schrodinger-Straße¨ 57, D-67663 Kaiserslautern [email protected] Keywords: automatic summarization, latent semantic analy- many of these techniques are tied to a particular sis, LSA, coherence, extracts language or require resources such as a list of dis- Abstract course keywords and a manually marked-up cor- pus; others are constrained in the type of summary We describe a language-neutral au- they can generate (e.g., general-purpose vs. query- tomatic summarization system which focussed). aims to produce coherent extracts. It In this paper, we present a new, recursive builds an initial extract composed solely method for automatic text summarization which of topic sentences, and then recursively aims to preserve both the topic coverage and fills in the topical lacunae by provid- the coherence of the source document, yet has ing linking material between semanti- minimal reliance on language-specific NLP tools. cally dissimilar sentences. While exper- Only word- and sentence-boundary detection rou- iments with human judges did not prove tines are required. The system produces general- a statistically significant increase in tex- purpose extracts of single documents, though it tual coherence with the use of a latent should not be difficult to adapt the technique semantic analysis module, we found a to query-focussed summarization, and may also strong positive correlation between co- be of use in improving the coherence of multi- herence and overall summary quality. document summaries. 2 Latent semantic analysis 1 Introduction Our system fits within the general category of IR- A major problem with automatically-produced based systems, but rather than comparing text with summaries in general, and extracts in particular, the standard vector-space model, we employ la- is that the output text often lacks fluency and orga- tent semantic analysis (LSA) [Deerwester et al., nization.
    [Show full text]
  • Leveraging Word Embeddings for Spoken Document Summarization
    Leveraging Word Embeddings for Spoken Document Summarization Kuan-Yu Chen*†, Shih-Hung Liu*, Hsin-Min Wang*, Berlin Chen#, Hsin-Hsi Chen† *Institute of Information Science, Academia Sinica, Taiwan #National Taiwan Normal University, Taiwan †National Taiwan University, Taiwan * # † {kychen, journey, whm}@iis.sinica.edu.tw, [email protected], [email protected] Abstract without human annotations involved. Popular methods include Owing to the rapidly growing multimedia content available on the vector space model (VSM) [9], the latent semantic analysis the Internet, extractive spoken document summarization, with (LSA) method [9], the Markov random walk (MRW) method the purpose of automatically selecting a set of representative [10], the maximum marginal relevance (MMR) method [11], sentences from a spoken document to concisely express the the sentence significant score method [12], the unigram most important theme of the document, has been an active area language model-based (ULM) method [4], the LexRank of research and experimentation. On the other hand, word method [13], the submodularity-based method [14], and the embedding has emerged as a newly favorite research subject integer linear programming (ILP) method [15]. Statistical because of its excellent performance in many natural language features may include the term (word) frequency, linguistic processing (NLP)-related tasks. However, as far as we are score, recognition confidence measure, and prosodic aware, there are relatively few studies investigating its use in information. In contrast, supervised sentence classification extractive text or speech summarization. A common thread of methods, such as the Gaussian mixture model (GMM) [9], the leveraging word embeddings in the summarization process is Bayesian classifier (BC) [16], the support vector machine to represent the document (or sentence) by averaging the word (SVM) [17], and the conditional random fields (CRFs) [18], embeddings of the words occurring in the document (or usually formulate sentence selection as a binary classification sentence).
    [Show full text]
  • Automatic Summarization and Readability
    COGNITIVE SCIENCE MASTER THESIS Automatic summarization and Readability LIU-IDA/KOGVET-A–11/004–SE Author: Supervisor: Christian SMITH Arne JONSSON¨ [email protected] [email protected] List of Figures 2.1 A simplified graph where sentences are linked and weighted ac- cording to the cosine values between them. 10 3.1 Evaluation of summaries on different dimensionalities. The X-axis denotes different dimensionalities, the Y-axis plots the mean value from evaluations on several seeds on each dimensionality. 18 3.2 The iterations of PageRank. The figure depicts the ranks of the sentences plotted on the Y-axis and the iterations on the X-axis. Each series represents a sentence. 19 3.3 The iterations of PageRank in different dimensionalities of the RI- space. The figure depicts 4 different graphs, each representing the trial of a specific setting of the dimensionality. From the left the dimensionalities of 10, 100, 300 and 1000 were used. The ranks of the sentences is plotted on the Y-axis and the iterations on the X-axis. Each series represents a sentence. 20 3.4 Figures of sentence ranks on different damping factors in PageRank 21 3.5 Effect of randomness, same text and settings on ten different ran- dom seeds. The final ranks of the sentences is plotted on the Y, with the different seeds on X. The graph depicts 9 trials at follow- ing dimensionalities, from left: 10, 20, 50, 100, 300, 500, 1000, 2000, 10000. 22 3.6 Values sometimes don’t converge on smaller texts. The left graph depicts the text in d=100 and the right graph d=20.
    [Show full text]
  • Question-Based Text Summarization
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Drexel Libraries E-Repository and Archives Question-based Text Summarization A Thesis Submitted to the Faculty of Drexel University by Mengwen Liu in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2017 c Copyright 2017 Mengwen Liu. All Rights Reserved. ii Dedications This thesis is dedicated to my family iii Acknowledgments I would like to thank to my advisor, Prof. Xiaohua Tony Hu for his generous guidance and support in the past five and a half years. Prof. Hu provided a flexible environment for me to explore different research fields. He encouraged me to work with intellectual peers and attend professional activities, and gave immediate suggestions on my studies and career development. I am also grateful to Prof. Yuan An, whom I worked closely in the first two years of my doctoral study. I thank to him for offering guidance about research, paper review, collaboration with professionals from other fields, and scientific writings. I would like to thank my dissertation committee of Prof. Weimao Ke, Prof. Erjia Yan, Prof. Bei Yu, and Prof. Yi Fang. Prof. Ke provided many suggestions towards my research projects that relate to information retrieval and my thesis topic. Prof. Yan offered me lots of valuable suggestions for my prospectus and thesis and encouraged me to conduct long-term research. Prof. Yu inspired my interest in natural language processing and text mining, and taught me how to do research from scratch with great patience.
    [Show full text]
  • Automatic Text Summarization: Past, Present and Future Horacio Saggion, Thierry Poibeau
    Automatic Text Summarization: Past, Present and Future Horacio Saggion, Thierry Poibeau To cite this version: Horacio Saggion, Thierry Poibeau. Automatic Text Summarization: Past, Present and Future. T. Poibeau; H. Saggion. J. Piskorski, R. Yangarber. Multi-source, Multilingual Information Extraction and Summarization, Springer, pp.3-13, 2012, Theory and Applications of Natural Language Process- ing, 978-3-642-28569-1. hal-00782442 HAL Id: hal-00782442 https://hal.archives-ouvertes.fr/hal-00782442 Submitted on 27 Jul 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Chapter 1 Automatic Text Summarization: Past, Present and Future Horacio Saggion and Thierry Poibeau Abstract Automatic text summarization, the computer-based production of con- densed versions of documents, is an important technology for the information soci- ety. Without summaries it would be practically impossible for human beings to get access to the ever growing mass of information available online. Although research in text summarization is over fifty years old, some efforts are still needed given the insufficient quality of automatic summaries and the number of interesting summa- rization topics being proposed in different contexts by end users (“domain-specific summaries”, “opinion-oriented summaries”, “update summaries”, etc.).
    [Show full text]
  • Automatic Text Summarization: a Review
    eKNOW 2017 : The Ninth International Conference on Information, Process, and Knowledge Management Automatic Text Summarization: A review Naima Zerari, Samia Aitouche, Mohamed Djamel Mouss, Asma Yaha Automation and Manufacturing Laboratory Department of Industrial Engineering Batna 2 University Batna, Algeria Email: [email protected], [email protected], [email protected], [email protected] Abstract—As we move into the 21st century, with very rapid that the user wants to get [4]. With document summary mobile communication and access to vast stores of information, available, users can easily decide its relevancy to their we seem to be surrounded by more and more information, with interests and acquire desired documents with much less less and less time or ability to digest it. The creation of the mental loads involved. [5]. automatic summarization was really a genius human solution to solve this complicated problem. However, the application of Furthermore, the goal of automatic text summarization is this solution was too complex. In reality, there are many to condense the documents into a shorter version and problems that need to be addressed before the promises of preserve important contents [3]. Text Summarization automatic text summarization can be fully realized. Basically, methods can be classified into two major methods extractive it is necessary to understand how humans summarize the text and abstractive summarization [6]. and then build the system based on that. Yet, individuals are so The rest of the paper is organized as follows: Section 2 is different in their thinking and interpretation that it is hard to about text summarization, precisely the definition of the create "gold-standard" summary against which output summaries will be evaluated.
    [Show full text]
  • An Automatic Text Summarization: a Systematic Review
    EasyChair Preprint № 869 An Automatic Text Summarization: A Systematic Review Vishwa Patel and Nasseh Tabrizi EasyChair preprints are intended for rapid dissemination of research results and are integrated with the rest of EasyChair. March 31, 2019 An Automatic Text Summarization: A Systematic Review Vishwa Patel1;2 and Nasseh Tabrizi1;3 1 East Carolina University, Greenville NC 27858, USA 2 [email protected] 3 [email protected] Abstract. The 21st century has become a century of information over- load, where in fact information related to even one topic (due to its huge volume) takes a lot of time to manually summarize it into few lines. Thus, in order to address this problem, Automatic Text Summarization meth- ods have been developed. Generally, there are two approaches that are currently being used in practice: Extractive and Abstractive methods. In this paper, we have reviewed papers from IEEE and ACM libraries those related to Automatic Text Summarization for the English language. Keywords: Automatic Text Summarization · Extractive · Abstractive · Review paper 1 Introduction Nowadays, the Internet has become one of the important sources of information. We surf the Internet, to find some information related to a certain topic. But search engines often return an excessive amount of information for the end user to read increasing the need for Automatic Text Summarization(ATS), increased in this modern era. The ATS will not only save time but also provide important insight into a piece of information. Many years ago scientists started working on ATS but the peak of interest in it started from the year 2000. In the early 21st century, new technologies emerged in the field of Natural Language Processing (NLP) to enhance the capabilities of ATS.
    [Show full text]