Aspects of Salience in Natural Language Generation / by T

Total Page:16

File Type:pdf, Size:1020Kb

Aspects of Salience in Natural Language Generation / by T ASPECTS OF SALIENCE IN NATURAL LANGUAGE GENERATION T. Pattabhiraman B.Tech., Indian Institute of Technology, Madras, 1982 M. S., University of Kentucky, Lexington, 1984 A THESISSUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOROF PHILOSOPHY in the School of Computing Science @ T. Pattabhiraman 1992 SIMON FRASER UNIVERSITY August 1992 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. APPROVAL Name: T. Pattabhiraman Degree: Doctor of Philosophy Title of Thesis: Aspects of Salience in Natural Language Gen- eration Examining Committee: Dr. Bill Havens, Chairman Dr. Nicholas J. ~erkone,Senior Supervisor vy fi.,j$. Hadley, Supervisor wR&mond E. Jennings, Supervisor Dr. Frederick P. Popowich, Examiner Dr. David D. McDonald, Professor, Computer Science, Brandeis University, External Examiner Date Approved: PARTIAL COPYRIGHT LICENSE I hereby grant to Simon Fraser University the right to lend my thesis, project or extended essay (the title of which is shown below) to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. I further agree that permission for multiple copying of this work for scholarly purposes may be granted by me or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without my written permission. Title of Thesis/Project/Extended Essay Aspects of Salience in Natural Language Generat ion. Author: (signature) Thiyagaraj asarma Pat tabhiraman (name) August 11, 1992 (date) Abstract This dissertation examines the role of salience in natural language generation (NLG). The salience of an entity, in intuitive terms, refers to its prominence, and is interpreted as a measure of how well an entity stands out from other entities and biases the preference of the generator in selecting words and com- plex constructs. Through an analysis of previous work in diverse disciplines, we show the variety of salience effects in NLG. Next, we classify several impor- tant determinants of salience, corresponding to different factors contributing to salience. We then delineate two theoretically-significant categories: canonical salience and instantial salience. The former is characterized as a built-in preference in the general conceptual- and linguistic knowledge of the speaker. The latter refers to the salience of specific objects in the context of NLG, and may accrue through such determinants as vividness and recency of mention. Psycholinguis- tic results of Osgood and Bock are highlighted to suggest the multiplicative interaction between canonical salience and instantial salience. This interac- tion is captured in a decision-theoretic formalization which models canonical salience as probabilities, instantial salience as utilities, and the selection crite- rion as maximization of expected utility. Next we consider the phenomena of basic level- and entry level preference in naming objects. We argue that basic level preference in taxonomic concept knowledge is a form of canonical salience. By establishing the stark similarity of the conclusions of the psychological experiments of Jolicoeur et al. to those of Osgood and Bock, the dynamics of preferring names for objects is concep- tualized as an interaction between canonical salience and instantial salience. We demonstrate the suitability of decision theory to model this interaction. Our model of salience interactions is further reinforced in a study of simile generation. We model property salience as information-theoretic redundancy and equivalently, as expected utility. To avert miscommunication and anomaly, we propose two different cost measures (antipodal to utility) using probabilis- tic knowledge and intrinsicness, and develop net expected utility as a decision criterion for selecting objects of comparison in similes. We conclude in this thesis that the multi-aspect notion of salience pro- vides subtle, quantitative control in language generation decisions, and cap- tures interesting and significant effects such as graded judgments and relative frequencies of linguistic constructs. To Amma and Appa Acknowledgments I acknowledge with gratitude the support, help and encouragement I received from many individuals in conducting my graduate studies at SFU and com- pleting this dissertation. First and foremost, my thanks are due to my Senior Supervisor Dr. Nick Cercone for providing individualized and patient supervision, constant encour- agement and opportunities for all-round development, and most of all, for creating an intellectually liberal and open-minded environment for conducting research. I thank Dr. Ray Jennings for the many fruitful discussions, and for inspiring creativity ('Never do a safe thesis'), and Dr. Bob Hadley for encouraging me to seek psychological clues. I also thank Dr. Veronica Dahl for introducing me to computational linguistics and to natural language gen- eration (NLG), and Dr. Romas Aleliunas for the many fascinating discussions we had at the incipient stages of this research. Drs. Fred Popowich, Dan Fass and Jim Delgrande helped with literature search. Drs. Binay Bhattacharya and Art Liestman provided moral support and good counsel. Kersti Jaager helped in numerous ways and showed a personal interest in my progress. Tim, Stephen, Honman, Frank, Peter, Steve and Ed helped with the use of the computing facilities at the Centre for Systems Science and the School of Computing Science. SFU library was very responsive to my suggestions for book acquisition. Generous financial support for sustaining my studies and travelling to con- ferences was provided by Simon Fraser University (several graduate research fellowships), the Natural Sciences and Engineering ~esearchCouncil of Canada and Centre for Systems Science (thanks to Dr. Nick Cercone), the British Columbia Advanced Systems Foundation (graduate scholarship) and IJCAII (conference travel award). Valuable initial experience for NLG research was gained through participation in the project supported by IBM Canada and conducted by Dr. Veronica Dahl. My special thanks are due to the members of the international NLG re- search community (the list is too long to enumerate here!) who have helped me immensely by obliging literature requests, discussing research issues over e-mail, and providing feedback and encouragement at conferences. Sanjay and Sumathi, Mahesh and Pathmini, Ram, Ramesh, Rao, Ekam and their families, Param and Kalpagam and innumerable other friends and colleagues have made my stay at SFU pleasant, and have helped me in countless ways. I thank them all. Special thanks to Bob and Audrey who provided a home away from home. I am greatly indebted to my parents, grandparents, brother, sisters, teach- ers, friends and relatives for their role in shaping and supporting my interests and instilling in me a zest for life. vii Contents ... Abstract 111 Acknowledgments vi 1 Introduction 1 1.1 Overview of the Thesis ....................... 3 2 Salience Effects in NLG 7 2.1 Studying Salience in NLG ..................... 8 2.2 Salience and NLG Decisions .................... 9 2.2.1 Content Selection ...................... 9 2.2.2 Content Ordering ...................... 11 2.2.3 Lexical Selection ...................... 11 2.2.4 Linearization ........................ 14 2.2.5 Generating Referring Expressions ............. 17 2.3 Synthesis ............................... 21 3 Determinants of Salience 23 3.1 Bearers of Salience ......................... 24 3.1.1 Propagation ......................... 25 3.2 Salience Determinants ....................... 26 3.2.1 Vividness and Imageability ................. 27 3.2.2 Uniqueness ..........................28 3.2.3 Pervasiveness ........................ 29 ... Vlll 3.2.4 Property Salience in Cognitive Psychology ........ 30 3.2.5 Perspectives from Cognitive Linguistics .......... 34 3.2.6 Association, Conventionality and Entrenchment ..... 36 3.2.7 Concreteness ........................ 37 3.2.8 Speaker Motivation ..................... 37 3.2.9 Humanness and Animacy ................. 40 3.2.10 Processual Determinants .................. 41 4 Canonical Salience 44 4.1 Built-in Preference ......................... 46 4.2 Canonical Salience and Instantial Salience ............ 46 4.2.1 Instantial Salience ..................... 47 4.2.2 Remarks ........................... 48 4.3 Naturalness and the Perceptual Connection ........... 49 4.3.1 Evidence for Naturalness .................. 49 4.3.2 Discussion .......................... 52 4.4 Sridhar's Perceptual Hypotheses .................. 52 4.5 Herskovits on Spatial Prepositions ................. 54 4.5.1 Figure/Ground Asymmetry ................ 54 4.5.2 Remarks ........................... 56 4.6 Langacker's Analysis ........................ 56 4.7 Clues from Formal Linguistics ................... 57 4.8 Discussion .............................. 59 4.8.1 Forms of Statement of Canonical Salience ........ 59 4.8.2 Computationally Relevant Characteristics of Canonical Salience ........................... 60 4.8.3 Difficult Issues ....................... 60 5 Salience Interactions in Phrase Choice 63 5.1 Interactions between Canonical Salience and Instantial Salience 63 5.1.1 Two Major Types of Interactions ............. 64 5.2 Evidence on the Canonical-Instantial
Recommended publications
  • Autonomous Agents and the Concept of Concepts
    Autonomous Agents and the Concept of Concepts Autonomous Agents and the Concept of Concepts Paul Davidsson Doctoral dissertation submitted in partial ful®llment of the requirements for the degree of PhD in Computer Science. Department of Computer Science, Lund University First edition c 1996 by Paul Davidsson Department of Computer Science Lund University Box 118 S±221 00 Lund Sweden [email protected] http://www.dna.lth.se ISBN 91±628±2035±4 CODEN LUTEDX/(TECS±1006)/1±221/(1996) PrintedinSwedenbyKFSiLundAB Preface Although this thesis is written in the form of a monograph, some of the results presented have already been published in various articles and reports. These are listed below and will in the text be referenced using the associated Roman numerals as follows: I. E. Astor, P. Davidsson, B. Ekdahl, and R. Gustavsson. ªAnticipatory Planningº, In Advance Proceedings of the First European Workshop on Planning, 1991. II. P. Davidsson, ªConcept Acquisition by Autonomous Agents: Cognitive Model- ing versus the Engineering Approachº, Lund University Cognitive Studies 12, ISSN 1101±8453, Lund University, Sweden, 1992. III. P. Davidsson, ªA Framework for Organization and Representation of Concept Knowledge in Autonomous Agentsº, In Scandinavian Conference of Arti®cial Intelligence ± 93, pages 183±192, IOS Press, 1993. IV. P. Davidsson, ªToward a General Solution to the Symbol Grounding Problem: Combining Machine Learning and Computer Visionº, In AAAI Fall Symposium Series, Machine Learning in Computer Vision, pages 157±161, 1993. V. P. Davidsson, E. Astor, and B. Ekdahl, ªA Framework for Autonomous Agents Based on the Concept of Anticipatory Systemsº, In Cybernetics and Systems '94, pages 1427±1434, World Scienti®c, 1994.
    [Show full text]
  • DATA MINING the WEB Uncovering Patterns in Web Content, Structure, and Usage
    SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB Uncovering Patterns in Web Content, Structure, and Usage ZDRAVKO MARKOV AND DANIEL T. LAROSE Central Connecticut State University New Britain, CT WILEY-INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION iii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 vi SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB i SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 ii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB Uncovering Patterns in Web Content, Structure, and Usage ZDRAVKO MARKOV AND DANIEL T. LAROSE Central Connecticut State University New Britain, CT WILEY-INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION iii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 Copyright C 2007 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com.
    [Show full text]
  • Conceptual Clustering, Categorization, and Polymorphy
    Machine Learning 3: 343-372, 1989 @ 1989 Kluwer Academic Publishers - Manufactured in The Netherlands Conceptual Clustering, Categorization, and Polymorphy STEPHEN JOSl~ HANSON t ([email protected]) Bell Communications Research, 435 South Street, Morristown, NJ 07960, U.S.A. MALCOLM BAUER ([email protected]) Cognitive Science Laboratory, Princeton University, 221 Nassau Street, Prineeton~ NJ 08542, U.S.A. (Received: October 1, 1986) (Revised: October 31, 1988) Keywords: Conceptual clustering, categorization, correlation, polymorphy, knowledge structures, coherence Abstract. In this paper we describe WITT, a computational model of categoriza- tion and conceptual clustering that has been motivated and guided by research on human categorization. Properties of categories to which humans are sensitive include best or prototypieal members, relative contrasts between categories, and polymor- phy (neither necessary nor sufficient feature rules). The system uses pairwise feature correlations to determine the "similarity" between objects and clusters of objects, allowing the system a flexible representation scheme that can model common-feature categories and polymorphous categories. This intercorrelation measure is cast in terms of an information-theoretic evaluation function that directs WITT'S search through the space of clusterings. This information-theoretic similarity metric also can be used to explain basic~level and typicality effects that occur in humans. WITT has been tested on both artificial domains and on data from the 1985 World Almanac, and we have examined the effect of various system parameters on the quality of the model's behavior. 1. Introduction The majority of machine learning research has followed AI in using logic or predicate calculus for the representation of knowledge. Such logical formalisms have many advantages: they are precise, general, consistent, powerful, and extensible, and they seem distantly related to natural language.
    [Show full text]
  • Machine Generalization and Human Categorization: an Information
    I Machine Generalization and Human Categorization: I An Information-Theoretic View I James E. Corter Mark A. Gluck I Columbia University Stanford University I 1. INTRODUCTION I In designing an intelligent system that must be able to explain its reasoning to a human user, or to provide generalizations that the human user finds reasonable, it may he useful to take into consideration psychological data on what types of concepts and categories people naturally use. The psychological literature on concept learning and I categorization provides strong evidence that certain categories are more easily learned, recalled, and recognized than others. We show here how a measure of the informational value of a category predicts the results of several important categorization experiments I better than standard alternative explanations. This suggests that information-based approaches to machine generalization may prove particularly useful and natural for I human users of the systems. I 2. WHAT CONSTITUTES AN OPTIMAL CATEGORY? 2.1 Psychological Evidence Concerning the Optimality of Categories I Many studies have shown that some categories or groupings of instances are easier than others to learn and recall as coherent concepts or generalizations. For example, within a hierarchically nested set of categories (such as a taxonomy of animals), there is I Borne level of abstraction--called the "basic level"--that is most natural for people to use (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). For example, in the hierarchy animal-bird-robin, bird is the basic level category. The preferrential status of basic level I categories can be measured in a variety of ways. Basic level names are generally learned earlier by children (Rosch et al., 1976; Daehler, Lonardo, and Bukatko, 1979), and arise earlier in the development of languages (Berlin, Breedlove, & Raven, 1973).
    [Show full text]
  • Mutual Information Based Clustering of Market Basket Data for Profiling
    Mutual Information based Clustering of Market Basket Data for Profiling Users Bartholomaus¨ Ende Rudiger¨ Brause E-Finance Lab Dept. of Computer Science and Mathematics Johann Wolfgang Goethe-University Johann Wolfgang Goethe-University 60054 Frankfurt, Germany 60054 Frankfurt, Germany Abstract • structuring discussion groups as well as identifying their main interests. Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, in- To obtain data for these purposes portal operators might ei- dividual, automatically obtained and maintained user pro- ther record page view information like time duration, or ex- files are the key for user satisfaction. This contribution plicit user feedback, e.g. ratings, downloads and printouts shows for the example of a cooking information site how of specific texts as well as any other kinds of services. user profiles might be obtained using category information Beside the multiple applications from above, the modeling provided by cooking recipes. It is shown that metrical dis- of state transitions and sequences is not covered by this con- tance functions and standard clustering procedures lead to tribution. Attempts that compute user profiles by model- erroneous results. Instead, we propose a new mutual infor- ing user action sequences via Markov chains can be found mation based clustering approach and outline its implica- in [20] and [2]. Other approaches like those of [18] and tions for the example of user profiling. [21] incorporate frequent item sets for predicting page vis- its. [14] even advances this idea by introducing click stream trees which represent frequent sequences of page views. 1 Introduction One common enhancement of all these approaches is the reduction of dimensionality by clustering portal contents.
    [Show full text]
  • Identifying Experts and Authoritative Documents in Social Bookmarking Systems
    IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS by Jonathan P. Grady B.A. Economics & Business Administration, Ursinus College, 1997 M.S. Electronic Commerce, Carnegie Mellon Univ., 2001 Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2013 UNIVERSITY OF PITTSBURGH SCHOOL OF INFORMATION SCIENCES This dissertation was presented by Jonathan P. Grady It was defended on March 27, 2013 and approved by Peter Brusilovsky, Ph.D., Professor, School of Information Sciences Daqing He, Ph.D., Associate Professor, School of Information Sciences Stephen C. Hirtle, Ph.D., Professor, School of Information Sciences Brian S. Butler, Ph.D., Associate Professor, College of Information Studies, University of Maryland Dissertation Advisor: Michael B. Spring, Ph.D., Associate Professor, School of Information Sciences ii Copyright © by Jonathan P. Grady 2013 iii IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS Jonathan P. Grady University of Pittsburgh, 2013 Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags.
    [Show full text]
  • Ahn & Kim 1 the Causal Status Effect in Categorization
    Ahn & Kim 1 The causal status effect in categorization: An overview Woo-kyoung Ahn Vanderbilt University Nancy S. Kim Yale University To appear in D. L. Medin (Ed.) Psychology of Learning and Motivation Ahn & Kim 2 When we categorize objects in the world or think about concepts, some aspects of entities matter more than others. In our concept of tires, for instance, roundness is more important than a black color. Even children believe that origins are more primary than certain specific aspects of appearances in categorizing animals (Keil, 1989). This chapter discusses why some features of concepts are more central than others. We will begin with a review of possible determinants of feature centrality. Then, the main body of this chapter will focus on one important determinant, the effect of people’s causal background knowledge on feature centrality. Different Approaches to Understanding Feature Centrality In general, three approaches have been taken to understand feature centrality in concepts: the content-based approach, the statistical approach, and the theory-based approach. We will review each of these approaches, and discuss how they complement each other. Content-based Approach In what we have termed the content-based approach, specific dimensions are documented as being central for certain categories or domains, but not necessarily for others. For instance, it was found that for children, the shape of a rigid object is the most important dimension used to name the object. Landau, Smith, and Jones (1988) presented two- and three-year-old children with a small, blue, wooden inverted U-shaped object, and told the children that the object was a "dax." When asked to select other objects that were also a dax, children preferred objects with the same shape over those with the same size or material.
    [Show full text]
  • Core Concepts in Data Analysis: Summarization, Correlation, Visualization
    September 30, 2010 Core Concepts in Data Analysis: Summarization, Correlation, Visualization Boris Mirkin Department of Computer Science and Information Systems, Birkbeck, University of London, Malet Street, London WC1E 7HX UK Department of Data Analysis and Machine Intelligence, Higher School of Economics, 11 Pokrovski Boulevard, Moscow RF Abstract This book presents an in-depth description of main data analysis methods: 1D summarization, 2D analysis, popular classifiers such as naïve Bayes and linear discriminant analysis, regression and neuron nets, Principal component analysis and its derivates, K-Means clustering and its extensions, hierarchical clustering, and network clustering including additive and spectral methods. These are sys- tematized based on the idea that data analysis is to help enhance concepts and rela- tions between them in the knowledge of the domain. Modern approaches of evolu- tionary optimization and computational validation are utilized. Various relations between criteria and methods are formulated as those underlain by data-driven least-squares frameworks invoked for most of them. The description is organized in three interrelated streams: presentation, formulation and computation, so that the presentation part can be read and studied by students with little mathematical background. A number of self-study tools – worked examples, case studies, pro- jects and questions – are provided to help the reader in mastering the material. ii Acknowledgments Too many people contributed to the material of the book to list all their names. First of all, my gratitude goes to Springer’s editors who were instrumental in bringing forth the very idea of writing such a book and in channeling my efforts by providing good critical reviews.
    [Show full text]
  • Effects of Statistical Density on Learning and Representation of Categories
    Journal of Experimental Psychology: General Copyright 2008 by the American Psychological Association 2008, Vol. 137, No. 1, 52–72 0096-3445/08/$12.00 DOI: 10.1037/0096-3445.137.1.52 What’s Behind Different Kinds of Kinds: Effects of Statistical Density on Learning and Representation of Categories Heidi Kloos VladimirM. Sloutsky University of Cincinnati The Ohio State University This research examined how differences in category structure affect category learning and category representation across points of development. The authors specifically focused on category density—or the proportion of category-relevant variance to the total variance. Results of Experiments 1–3 showeda clear dissociation between dense and sparse categories: Whereas dense categories were readily learned without supervision, learning of sparse categories required supervision. There were also developmental differences in how statistical density affected category representation. Although children represented both dense and sparse categories on the basis of the overall similarity (Experiment 4A), adults represented dense categories on the basis of similarity and represented sparse categories on the basis of the inclusion rule (Experiment 4B). The results support the notion that statistical structure interacts with the learning regime in their effects on category learning. In addition, these results elucidate important developmental differences in how categories are represented, which presents interesting challenges for theories of categorization. Keywords: categorization, category learning, selective attention, cognitive development The ability to form categories (i.e., treating discriminable enti- new cats) paired with members ofa nonstudied category (e.g., ties as members of an equivalence class) isa critically important dogs). If infants learn the category during familiarization, they component of human cognition.
    [Show full text]
  • Lightweight Structure in Text
    Lightweight Structure in Text Robert C. Miller May 2002 CMU-CS-02-134 CMU-HCII-02-103 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA Thesis Committee: Brad A. Myers, Co-chair David Garlan, Co-chair James H. Morris Brian Kernighan, Princeton University Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright c 2002 Robert C. Miller This research was sponsored by the National Science Foundation under grant no. IRI-9319969 and grant no. IIS- 0117658, the Army Research Office (ARO) under National Defense Science and Engineering Grant no. DAAH04-95- 1-0552, the Defense Advanced Research Project Agency (DARPA) - US Army under contract no. DAAD1799C0061, DARPA - Navy under contract no. N66001-94-C-6037, N66001-96-C-8506, and the USENIX Association. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any sponsoring party or the U.S. Government. Keywords: text processing, structured text, pattern matching, regular expressions, grammars, web automation, repetitive text editing, machine learning, programming-by-demonstration, simul- taneous editing, outlier finding, LAPIS Abstract Pattern matching is heavily used for searching, filtering, and transforming text, but existing pattern languages offer few opportunities for reuse. Lightweight structure is a new approach that solves the reuse problem. Lightweight structure has three parts: a model of text structure as contiguous segments of text, or regions; an extensible library of structure abstractions (e.g., HTML elements, Java expressions, or English sentences) that can be implemented by any kind of pattern or parser; and a region algebra for composing and reusing structure abstractions.
    [Show full text]
  • On Interestingness Measures of Formal Concepts
    On interestingness measures of formal concepts Sergei O. Kuznetsov, Tatiana Makhalova National Research University Higher School of Economics Kochnovsky pr. 3, Moscow 125319, Russia [email protected], [email protected] Abstract. Formal concepts and closed itemsets proved to be of big im- portance for knowledge discovery, both as a tool for concise represen- tation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it diffi- cult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to vari- ous aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation. Formal Concept Analysis intrestingess measures closed itemsets 1 Introduction and Motivation Formal concepts play an important role in knowledge discovery, since they can be used for concise representation of association rules, clustering and constructing domain taxonomies, see surveys [40, 41, 34]. Most of the difficulties in the appli- cation of closed itemsets (or, a more common name, attribute sets) to practical datasets are caused by the exponential number of formal concepts. It compli- cates both the process of the model construction and the analysis of the results as well. Recently, several approaches to tackle these issues were introduced. For a dataset all closed sets of attributes ordered by inclusion relation make a lattice, below we use the terminology of concept lattices and Formal Concept Analy- sis [21]. In terms of FCA, a concept is a pair consisting of an extent (a closed set of objects, or transactions in data mining terms) and an intent (the respective closed set of attributes, or itemset).
    [Show full text]
  • Spontaneous and Relative Categorisation
    _________________________________________________________________________Swansea University E-Theses Spontaneous and relative categorisation. Edwards, Darren J How to cite: _________________________________________________________________________ Edwards, Darren J (2010) Spontaneous and relative categorisation.. thesis, Swansea University. http://cronfa.swan.ac.uk/Record/cronfa43199 Use policy: _________________________________________________________________________ This item is brought to you by Swansea University. Any person downloading material is agreeing to abide by the terms of the repository licence: copies of full text items may be used or reproduced in any format or medium, without prior permission for personal research or study, educational or non-commercial purposes only. The copyright for any work remains with the original author unless otherwise specified. The full-text must not be sold in any format or medium without the formal permission of the copyright holder. Permission for multiple reproductions should be obtained from the original author. Authors are personally responsible for adhering to copyright and publisher restrictions when uploading content to the repository. Please link to the metadata record in the Swansea University repository, Cronfa (link given in the citation reference above.) http://www.swansea.ac.uk/library/researchsupport/ris-support/ Spontaneous and Relative Categorisation Darren J. Edwards Submitted to the University of Wales in fulfilment of the requirements for the Degree of Doctor of Philosophy Swansea University September 2010 Supervision provided by Dr. Emmanuel Pothos ProQuest Number: 10821591 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted.
    [Show full text]