Using Semantic Folding with Textrank for Automatic Summarization
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 9-29-2020 Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Electrical and Computer Engineering Commons Let us know how access to this document benefits ou.y Recommended Citation Mathuria, Aakanksha, "Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation" (2020). Dissertations and Theses. Paper 5581. https://doi.org/10.15760/etd.7453 This Thesis is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation by Aakanksha Mathuria A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Thesis Committee: Dan Hammerstrom, Chair Christof Teuscher Nirupama Bulusu Portland State University 2020 Abstract With recent developments in deep networks, there have been significant advances in visual object detection and recognition. However, some of these networks are still easily fooled/hacked and have shown ”bag of features” kinds of failures. Some of this is due to the fact that even deep networks make only marginal use of the complex structure that exists in real-world images. Primate visual systems appear to capture the structure in images, but how? In the research presented here, we are studying approaches for robust pattern matching using static, 2D Blocks World images based on graphical representations of the various components of an image. -
Automatic Summarization of Student Course Feedback
Automatic Summarization of Student Course Feedback Wencan Luo† Fei Liu‡ Zitao Liu† Diane Litman† †University of Pittsburgh, Pittsburgh, PA 15260 ‡University of Central Florida, Orlando, FL 32716 wencan, ztliu, litman @cs.pitt.edu [email protected] { } Abstract Prompt Describe what you found most interesting in today’s class Student course feedback is generated daily in Student Responses both classrooms and online course discussion S1: The main topics of this course seem interesting and forums. Traditionally, instructors manually correspond with my major (Chemical engineering) analyze these responses in a costly manner. In S2: I found the group activity most interesting this work, we propose a new approach to sum- S3: Process that make materials marizing student course feedback based on S4: I found the properties of bike elements to be most the integer linear programming (ILP) frame- interesting work. Our approach allows different student S5: How materials are manufactured S6: Finding out what we will learn in this class was responses to share co-occurrence statistics and interesting to me alleviates sparsity issues. Experimental results S7: The activity with the bicycle parts on a student feedback corpus show that our S8: “part of a bike” activity approach outperforms a range of baselines in ... (rest omitted, 53 responses in total.) terms of both ROUGE scores and human eval- uation. Reference Summary - group activity of analyzing bicycle’s parts - materials processing - the main topic of this course 1 Introduction Table 1: Example student responses and a reference summary Instructors love to solicit feedback from students. created by the teaching assistant. ‘S1’–‘S8’ are student IDs. -
Using N-Grams to Understand the Nature of Summaries
Using N-Grams to Understand the Nature of Summaries Michele Banko and Lucy Vanderwende One Microsoft Way Redmond, WA 98052 {mbanko, lucyv}@microsoft.com views of the event being described over different Abstract documents, or present a high-level view of an event that is not explicitly reflected in any single document. A Although single-document summarization is a useful multi-document summary may also indicate the well-studied task, the nature of multi- presence of new or distinct information contained within document summarization is only beginning to a set of documents describing the same topic (McKeown be studied in detail. While close attention has et. al., 1999, Mani and Bloedorn, 1999). To meet these been paid to what technologies are necessary expectations, a multi-document summary is required to when moving from single to multi-document generalize, condense and merge information coming summarization, the properties of human- from multiple sources. written multi-document summaries have not Although single-document summarization is a well- been quantified. In this paper, we empirically studied task (see Mani and Maybury, 1999 for an characterize human-written summaries overview), multi-document summarization is only provided in a widely used summarization recently being studied closely (Marcu & Gerber 2001). corpus by attempting to answer the questions: While close attention has been paid to multi-document Can multi-document summaries that are summarization technologies (Barzilay et al. 2002, written by humans be characterized as Goldstein et al 2000), the inherent properties of human- extractive or generative? Are multi-document written multi-document summaries have not yet been summaries less extractive than single- quantified. -
Automatic Summarization of Medical Conversations, a Review Jessica Lopez
Automatic summarization of medical conversations, a review Jessica Lopez To cite this version: Jessica Lopez. Automatic summarization of medical conversations, a review. TALN-RECITAL 2019- PFIA 2019, Jul 2019, Toulouse, France. pp.487-498. hal-02611210 HAL Id: hal-02611210 https://hal.archives-ouvertes.fr/hal-02611210 Submitted on 30 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Jessica López Espejel Automatic summarization of medical conversations, a review Jessica López Espejel 1, 2 (1) CEA, LIST, DIASI, F-91191 Gif-sur-Yvette, France. (2) Paris 13 University, LIPN, 93430 Villateneuse, France. [email protected] RÉSUMÉ L’analyse de la conversation joue un rôle important dans le développement d’appareils de simulation pour la formation des professionnels de la santé (médecins, infirmières). Notre objectif est de développer une méthode de synthèse automatique originale pour les conversations médicales entre un patient et un professionnel de la santé, basée sur les avancées récentes en matière de synthèse à l’aide de réseaux de neurones convolutionnels et récurrents. La méthode proposée doit être adaptée aux problèmes spécifiques liés à la synthèse des dialogues. Cet article présente une revue des différentes méthodes pour les résumés par extraction et par abstraction et pour l’analyse du dialogue. -
Exploring Sentence Vector Spaces Through Automatic Summarization
Under review as a conference paper at ICLR 2018 EXPLORING SENTENCE VECTOR SPACES THROUGH AUTOMATIC SUMMARIZATION Anonymous authors Paper under double-blind review ABSTRACT Vector semantics, especially sentence vectors, have recently been used success- fully in many areas of natural language processing. However, relatively little work has explored the internal structure and properties of spaces of sentence vectors. In this paper, we will explore the properties of sentence vectors by studying a par- ticular real-world application: Automatic Summarization. In particular, we show that cosine similarity between sentence vectors and document vectors is strongly correlated with sentence importance and that vector semantics can identify and correct gaps between the sentences chosen so far and the document. In addition, we identify specific dimensions which are linked to effective summaries. To our knowledge, this is the first time specific dimensions of sentence embeddings have been connected to sentence properties. We also compare the features of differ- ent methods of sentence embeddings. Many of these insights have applications in uses of sentence embeddings far beyond summarization. 1 INTRODUCTION Vector semantics have been growing in popularity for many other natural language processing appli- cations. Vector semantics attempt to represent words as vectors in a high-dimensional space, where vectors which are close to each other have similar meanings. Various models of vector semantics have been proposed, such as LSA (Landauer & Dumais, 1997), word2vec (Mikolov et al., 2013), and GLOVE(Pennington et al., 2014), and these models have proved to be successful in other natural language processing applications. While these models work well for individual words, producing equivalent vectors for sentences or documents has proven to be more difficult. -
Keyphrase Based Evaluation of Automatic Text Summarization
International Journal of Computer Applications (0975 – 8887) Volume 117 – No. 7, May 2015 Keyphrase based Evaluation of Automatic Text Summarization Fatma Elghannam Tarek El-Shishtawy Electronics Research Institute Faculty of Computers and Information Cairo, Egypt Benha University, Benha, Egypt ABSTRACT KpEval idea is to count the matches between the peer The development of methods to deal with the informative summary and reference summaries for the essential parts of contents of the text units in the matching process is a major the summary text. KpEval have three main modules, i) challenge in automatic summary evaluation systems that use lemma extractor module that breaks the text into words and fixed n-gram matching. The limitation causes inaccurate extracts their lemma forms and the associated lexical and matching between units in a peer and reference summaries. syntactic features, ii) keyphrase extractor that extracts The present study introduces a new Keyphrase based important keyphrases in their lemma forms, and iii) the Summary Evaluator (KpEval) for evaluating automatic evaluator that scoring the summary based on counting the summaries. The KpEval relies on the keyphrases since they matched keyphrases occur between the peer summary and one convey the most important concepts of a text. In the or more reference summaries. The remaining of this paper is evaluation process, the keyphrases are used in their lemma organized as follows: Section 2 reviews the previous works; form as the matching text unit. The system was applied to Section 3 the proposed keyphrase based summary evaluator; evaluate different summaries of Arabic multi-document data Section 4 discusses the performance evaluation; and section 5 set presented at TAC2011. -
Latent Semantic Analysis and the Construction of Coherent Extracts
Latent Semantic Analysis and the Construction of Coherent Extracts Tristan Miller German Research Center for Artificial Intelligence0 Erwin-Schrodinger-Straße¨ 57, D-67663 Kaiserslautern [email protected] Keywords: automatic summarization, latent semantic analy- many of these techniques are tied to a particular sis, LSA, coherence, extracts language or require resources such as a list of dis- Abstract course keywords and a manually marked-up cor- pus; others are constrained in the type of summary We describe a language-neutral au- they can generate (e.g., general-purpose vs. query- tomatic summarization system which focussed). aims to produce coherent extracts. It In this paper, we present a new, recursive builds an initial extract composed solely method for automatic text summarization which of topic sentences, and then recursively aims to preserve both the topic coverage and fills in the topical lacunae by provid- the coherence of the source document, yet has ing linking material between semanti- minimal reliance on language-specific NLP tools. cally dissimilar sentences. While exper- Only word- and sentence-boundary detection rou- iments with human judges did not prove tines are required. The system produces general- a statistically significant increase in tex- purpose extracts of single documents, though it tual coherence with the use of a latent should not be difficult to adapt the technique semantic analysis module, we found a to query-focussed summarization, and may also strong positive correlation between co- be of use in improving the coherence of multi- herence and overall summary quality. document summaries. 2 Latent semantic analysis 1 Introduction Our system fits within the general category of IR- A major problem with automatically-produced based systems, but rather than comparing text with summaries in general, and extracts in particular, the standard vector-space model, we employ la- is that the output text often lacks fluency and orga- tent semantic analysis (LSA) [Deerwester et al., nization. -
Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model
Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model Grigori Sidorov1, Alexander Gelbukh1, Helena Gomez-Adorno1, and David Pinto2 1 Centro de Investigacion en Computacion, Instituto Politecnico Nacional, Mexico D.F., Mexico 2 Facultad de Ciencias de la Computacion, Benemerita Universidad Autonoma de Puebla, Puebla, Mexico {sidorov,gelbukh}@cic.ipn.mx, [email protected], [email protected] Abstract. We show how to consider similarity between 1 Introduction features for calculation of similarity of objects in the Vec tor Space Model (VSM) for machine learning algorithms Computation of similarity of specific objects is a basic and other classes of methods that involve similarity be task of many methods applied in various problems in tween objects. Unlike LSA, we assume that similarity natural language processing and many other fields. In between features is known (say, from a synonym dictio natural language processing, text similarity plays crucial nary) and does not need to be learned from the data. role in many tasks from plagiarism detection [18] and We call the proposed similarity measure soft similarity. question answering [3] to sentiment analysis [14-16]. Similarity between features is common, for example, in The most common manner to represent objects is natural language processing: words, n-grams, or syn the Vector Space Model (VSM) [17]. In this model, the tactic n-grams can be somewhat different (which makes objects are represented as vectors of values of features. them different features) but still have much in common: The features characterize each object and have numeric for example, words “play” and “game” are different but values. -
Leveraging Word Embeddings for Spoken Document Summarization
Leveraging Word Embeddings for Spoken Document Summarization Kuan-Yu Chen*†, Shih-Hung Liu*, Hsin-Min Wang*, Berlin Chen#, Hsin-Hsi Chen† *Institute of Information Science, Academia Sinica, Taiwan #National Taiwan Normal University, Taiwan †National Taiwan University, Taiwan * # † {kychen, journey, whm}@iis.sinica.edu.tw, [email protected], [email protected] Abstract without human annotations involved. Popular methods include Owing to the rapidly growing multimedia content available on the vector space model (VSM) [9], the latent semantic analysis the Internet, extractive spoken document summarization, with (LSA) method [9], the Markov random walk (MRW) method the purpose of automatically selecting a set of representative [10], the maximum marginal relevance (MMR) method [11], sentences from a spoken document to concisely express the the sentence significant score method [12], the unigram most important theme of the document, has been an active area language model-based (ULM) method [4], the LexRank of research and experimentation. On the other hand, word method [13], the submodularity-based method [14], and the embedding has emerged as a newly favorite research subject integer linear programming (ILP) method [15]. Statistical because of its excellent performance in many natural language features may include the term (word) frequency, linguistic processing (NLP)-related tasks. However, as far as we are score, recognition confidence measure, and prosodic aware, there are relatively few studies investigating its use in information. In contrast, supervised sentence classification extractive text or speech summarization. A common thread of methods, such as the Gaussian mixture model (GMM) [9], the leveraging word embeddings in the summarization process is Bayesian classifier (BC) [16], the support vector machine to represent the document (or sentence) by averaging the word (SVM) [17], and the conditional random fields (CRFs) [18], embeddings of the words occurring in the document (or usually formulate sentence selection as a binary classification sentence). -
Evaluating Vector-Space Models of Word Representation, Or, the Unreasonable Effectiveness of Counting Words Near Other Words
Evaluating Vector-Space Models of Word Representation, or, The Unreasonable Effectiveness of Counting Words Near Other Words Aida Nematzadeh, Stephan C. Meylan, and Thomas L. Griffiths University of California, Berkeley fnematzadeh, smeylan, tom griffi[email protected] Abstract angle between word vectors (e.g., Mikolov et al., 2013b; Pen- nington et al., 2014). Vector-space models of semantics represent words as continuously-valued vectors and measure similarity based on In this paper, we examine whether these constraints im- the distance or angle between those vectors. Such representa- ply that Word2Vec and GloVe representations suffer from the tions have become increasingly popular due to the recent de- same difficulty as previous vector-space models in capturing velopment of methods that allow them to be efficiently esti- mated from very large amounts of data. However, the idea human similarity judgments. To this end, we evaluate these of relating similarity to distance in a spatial representation representations on a set of tasks adopted from Griffiths et al. has been criticized by cognitive scientists, as human similar- (2007) in which the authors showed that the representations ity judgments have many properties that are inconsistent with the geometric constraints that a distance metric must obey. We learned by another well-known vector-space model, Latent show that two popular vector-space models, Word2Vec and Semantic Analysis (Landauer and Dumais, 1997), were in- GloVe, are unable to capture certain critical aspects of human consistent with patterns of semantic similarity demonstrated word association data as a consequence of these constraints. However, a probabilistic topic model estimated from a rela- in human word association data. -
Anomalous Behavior Detection Framework Using HTM-Based Semantic Folding Technique
Hindawi Computational and Mathematical Methods in Medicine Volume 2021, Article ID 5585238, 14 pages https://doi.org/10.1155/2021/5585238 Research Article Anomalous Behavior Detection Framework Using HTM-Based Semantic Folding Technique Hamid Masood Khan ,1 Fazal Masud Khan,1 Aurangzeb Khan,2 Muhammad Zubair Asghar ,1 and Daniyal M. Alghazzawi 3 1Institute of Computing and Information Technology, Gomal University, D.I.Khan, Pakistan 2Department of Computer Science, University of Science and Technology, Bannu, Pakistan 3Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia Correspondence should be addressed to Muhammad Zubair Asghar; [email protected] Received 23 January 2021; Revised 17 February 2021; Accepted 26 February 2021; Published 16 March 2021 Academic Editor: Waqas Haider Bangyal Copyright © 2021 Hamid Masood Khan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Upon the working principles of the human neocortex, the Hierarchical Temporal Memory model has been developed which is a proposed theoretical framework for sequence learning. Both categorical and numerical types of data are handled by HTM. Semantic Folding Theory (SFT) is based on HTM to represent a data stream for processing in the form of sparse distributed representation (SDR). For natural language perception and production, SFT delivers a solid structural background for semantic evidence description to the fundamentals of the semantic foundation during the phase of language learning. Anomalies are the patterns from data streams that do not follow the expected behavior. -
Approximate Pattern Matching Using Hierarchical Graph Construction and Sparse Distributed Representation
Portland State University PDXScholar Electrical and Computer Engineering Faculty Publications and Presentations Electrical and Computer Engineering 7-23-2019 Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Portland State University, [email protected] Dan Hammerstrom Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/ece_fac Part of the Engineering Commons Let us know how access to this document benefits ou.y Citation Details "Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation," A. Mathuria, D.W. Hammerstrom, International Conference on Neuromorphic Systems, Knoxville, TN, July 23-5, 2019 This Conference Proceeding is brought to you for free and open access. It has been accepted for inclusion in Electrical and Computer Engineering Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Approximate Pattern Matching using Hierarchical Graph Construction and Sparse Distributed Representation Aakanksha Mathuria Electrical and Computer Engineering Portland State University USA [email protected] Dan W. Hammerstrom Electrical and Computer Engineering Portland State University USA [email protected] ABSTRACT training on large numbers of images. None of these techniques actually captures the spatial relationships of the low level or high- With recent developments in deep networks, there have been level features, which biological networks appear to do. significant advances in visual object detection and Efficient graph representations capture the higher order recognition. However, some of these networks are still easily information content of the objects and provide algorithmic benefits fooled/hacked and have shown “bag of features” failures.