Spontaneous and Relative Categorisation

Total Page:16

File Type:pdf, Size:1020Kb

Spontaneous and Relative Categorisation _________________________________________________________________________Swansea University E-Theses Spontaneous and relative categorisation. Edwards, Darren J How to cite: _________________________________________________________________________ Edwards, Darren J (2010) Spontaneous and relative categorisation.. thesis, Swansea University. http://cronfa.swan.ac.uk/Record/cronfa43199 Use policy: _________________________________________________________________________ This item is brought to you by Swansea University. Any person downloading material is agreeing to abide by the terms of the repository licence: copies of full text items may be used or reproduced in any format or medium, without prior permission for personal research or study, educational or non-commercial purposes only. The copyright for any work remains with the original author unless otherwise specified. The full-text must not be sold in any format or medium without the formal permission of the copyright holder. Permission for multiple reproductions should be obtained from the original author. Authors are personally responsible for adhering to copyright and publisher restrictions when uploading content to the repository. Please link to the metadata record in the Swansea University repository, Cronfa (link given in the citation reference above.) http://www.swansea.ac.uk/library/researchsupport/ris-support/ Spontaneous and Relative Categorisation Darren J. Edwards Submitted to the University of Wales in fulfilment of the requirements for the Degree of Doctor of Philosophy Swansea University September 2010 Supervision provided by Dr. Emmanuel Pothos ProQuest Number: 10821591 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10821591 Published by ProQuest LLC(2018). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 Abstract This thesis concerns two different aspects of categorization, the first is an investigation with a novel paradigm, which relates to relative vs. absolute (supervised) categorization, and the second, an investigation of the simplicity model (Pothos & Chater, 2002). The first investigation was motivated from the relative judgment model (Stewart et al., 2005). According to this model, classification judgments in absolute identification tasks are influenced by the relative context in which they are presented. We examine the generality of this conclusion in categorization. In the present study, we tested 320 participants in 5 experiments, in which participants had to classify new items into predefined artificial categories. In three experiments, we observed a (predominantly) relative mode of classification, and in 2 experiments we observed an absolute mode of classification. These results suggest three factors which promote a relative mode of classification; when there are fewer items per group, more training groups, and the presence of a time delay. Overall, we propose that less information about the distributional properties of a category and/or weaker memory traces for the category exemplars (induced, e.g., by smaller item numbers per category, or a time delay respectively) can encourage relative judgment. For the simplicity model, we conducted three experiments, a free sort task, a learning task and a memory task. In the free sort task, we asked 169 participants to spontaneously categorize nine sets of items. A category structure was assumed to be more intuitive if a large number of participants consistently produced the same classification. Our results provide a rich empirical framework for examining the simplicity model of unsupervised categorization (Pothos & Chater, 2002). fi 2 Declaration and Statements Declaration This work has not been previously accepted in substance for any degree and is not concurrently submitted in candidature for any degree. Signed ................ Darren Edwards ............................................... D ate................ 29/9/2010.............................................. Statement 1 This thesis is the result of my own investigations, except where otherwise stated. Where correction services have been used, the extent and nature of the correction is clearly marked in a footnote(s). Other sources are acknowledged by footnotes giving explicit references. A bibliography is appended. Signed Darren Edwards ............ ................................... D ate................29/9/2010............................................... Statement 2 I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter library loan, and for the title and summary to be made available to outside organizations. Signed ..................... Darren Edwards ............................................ D ate....................29/9/2010............................................. 3 Table of Contents Abstract.............................................................................................................................. 2 Acknowledgements............................................................................................................7 Research publications and conference proceedings.................................................... 8 Tables and Figures..........................................................................................................10 Chapter 1 Introduction.................................................................................................. 13 Chapter 2 The Simplicity Model in Unsupervised Categorization.........................15 2.1 Introduction.................................................................................................. 15 2.2 Exemplar Approach in Supervised Categorization vs. Unsupervised Categorization........................................................................... 17 2.3 The simplicity principle.............................................................................. 19 2.4 Measuring Simplicity.................................................................................. 23 2.5 The Simplicity Model of Unsupervised Categorization........................ 28 2.6 Other Unsupervised Models vs. the Simplicity Model...........................37 2.7 Summary....................................................................................................... 47 Chapter 3 Supervised Categorization and Absolute Judgment..............................48 3.1 An Introduction............................................................................................48 3.2 Exemplar Models..........................................................................................48 3.3 Prototype vs. Exemplar Theories of Categorisation..............................49 3.4 The Generalized Context Model of Supervised Categorization..........50 3.5 Other supervised categorization.............................................................. 52 4 3.6 Exemplar theory; the GCM and how this relates to absolute judgment ...............................................................................................................................55 3.7 Summary....................................................................................................... 57 Chapter 4 Relative Judgment in Categorization.......................................................58 4.1 An Introduction........................................................................................... 58 4.2 Absolute identification tasks...................................................................... 58 4.3 Models that account for the effects observed in absolute identification tasks..............................................................................................61 4.4 Stewarts Relative Judgement Model (RJM, 2005).................................65 4.5 Relative Judgment in Analogical Mapping............................................. 66 4.6 Relative Judgment in our experimentation..............................................69 Chapter 5 Experimental results; relative vs. absolute representation in categorization...................................................................................................................71 5.1 Introduction..................................................................................................71 5.2 Relative vs. absolute representation in supervised categorization 71 5.3 Defining relative and absolute representation........................................ 74 5.4 Experiment .................................................................................................1 77 5.5 Experiment 2.................................................................................................81 5.6 Experiment .................................................................................................3 84 5.7 Experiment .................................................................................................4 87 5.8 Experiment .................................................................................................5 88 5.9 General Discussion.....................................................................................
Recommended publications
  • Autonomous Agents and the Concept of Concepts
    Autonomous Agents and the Concept of Concepts Autonomous Agents and the Concept of Concepts Paul Davidsson Doctoral dissertation submitted in partial ful®llment of the requirements for the degree of PhD in Computer Science. Department of Computer Science, Lund University First edition c 1996 by Paul Davidsson Department of Computer Science Lund University Box 118 S±221 00 Lund Sweden [email protected] http://www.dna.lth.se ISBN 91±628±2035±4 CODEN LUTEDX/(TECS±1006)/1±221/(1996) PrintedinSwedenbyKFSiLundAB Preface Although this thesis is written in the form of a monograph, some of the results presented have already been published in various articles and reports. These are listed below and will in the text be referenced using the associated Roman numerals as follows: I. E. Astor, P. Davidsson, B. Ekdahl, and R. Gustavsson. ªAnticipatory Planningº, In Advance Proceedings of the First European Workshop on Planning, 1991. II. P. Davidsson, ªConcept Acquisition by Autonomous Agents: Cognitive Model- ing versus the Engineering Approachº, Lund University Cognitive Studies 12, ISSN 1101±8453, Lund University, Sweden, 1992. III. P. Davidsson, ªA Framework for Organization and Representation of Concept Knowledge in Autonomous Agentsº, In Scandinavian Conference of Arti®cial Intelligence ± 93, pages 183±192, IOS Press, 1993. IV. P. Davidsson, ªToward a General Solution to the Symbol Grounding Problem: Combining Machine Learning and Computer Visionº, In AAAI Fall Symposium Series, Machine Learning in Computer Vision, pages 157±161, 1993. V. P. Davidsson, E. Astor, and B. Ekdahl, ªA Framework for Autonomous Agents Based on the Concept of Anticipatory Systemsº, In Cybernetics and Systems '94, pages 1427±1434, World Scienti®c, 1994.
    [Show full text]
  • DATA MINING the WEB Uncovering Patterns in Web Content, Structure, and Usage
    SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB Uncovering Patterns in Web Content, Structure, and Usage ZDRAVKO MARKOV AND DANIEL T. LAROSE Central Connecticut State University New Britain, CT WILEY-INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION iii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 vi SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB i SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 ii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 DATA MINING THE WEB Uncovering Patterns in Web Content, Structure, and Usage ZDRAVKO MARKOV AND DANIEL T. LAROSE Central Connecticut State University New Britain, CT WILEY-INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION iii SPH SPH JWDD053-FM JWDD053-Markov March 8, 2007 22:51 Char Count= 0 Copyright C 2007 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com.
    [Show full text]
  • Aspects of Salience in Natural Language Generation / by T
    ASPECTS OF SALIENCE IN NATURAL LANGUAGE GENERATION T. Pattabhiraman B.Tech., Indian Institute of Technology, Madras, 1982 M. S., University of Kentucky, Lexington, 1984 A THESISSUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOROF PHILOSOPHY in the School of Computing Science @ T. Pattabhiraman 1992 SIMON FRASER UNIVERSITY August 1992 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. APPROVAL Name: T. Pattabhiraman Degree: Doctor of Philosophy Title of Thesis: Aspects of Salience in Natural Language Gen- eration Examining Committee: Dr. Bill Havens, Chairman Dr. Nicholas J. ~erkone,Senior Supervisor vy fi.,j$. Hadley, Supervisor wR&mond E. Jennings, Supervisor Dr. Frederick P. Popowich, Examiner Dr. David D. McDonald, Professor, Computer Science, Brandeis University, External Examiner Date Approved: PARTIAL COPYRIGHT LICENSE I hereby grant to Simon Fraser University the right to lend my thesis, project or extended essay (the title of which is shown below) to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. I further agree that permission for multiple copying of this work for scholarly purposes may be granted by me or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without my written permission. Title of Thesis/Project/Extended Essay Aspects of Salience in Natural Language Generat ion.
    [Show full text]
  • Conceptual Clustering, Categorization, and Polymorphy
    Machine Learning 3: 343-372, 1989 @ 1989 Kluwer Academic Publishers - Manufactured in The Netherlands Conceptual Clustering, Categorization, and Polymorphy STEPHEN JOSl~ HANSON t ([email protected]) Bell Communications Research, 435 South Street, Morristown, NJ 07960, U.S.A. MALCOLM BAUER ([email protected]) Cognitive Science Laboratory, Princeton University, 221 Nassau Street, Prineeton~ NJ 08542, U.S.A. (Received: October 1, 1986) (Revised: October 31, 1988) Keywords: Conceptual clustering, categorization, correlation, polymorphy, knowledge structures, coherence Abstract. In this paper we describe WITT, a computational model of categoriza- tion and conceptual clustering that has been motivated and guided by research on human categorization. Properties of categories to which humans are sensitive include best or prototypieal members, relative contrasts between categories, and polymor- phy (neither necessary nor sufficient feature rules). The system uses pairwise feature correlations to determine the "similarity" between objects and clusters of objects, allowing the system a flexible representation scheme that can model common-feature categories and polymorphous categories. This intercorrelation measure is cast in terms of an information-theoretic evaluation function that directs WITT'S search through the space of clusterings. This information-theoretic similarity metric also can be used to explain basic~level and typicality effects that occur in humans. WITT has been tested on both artificial domains and on data from the 1985 World Almanac, and we have examined the effect of various system parameters on the quality of the model's behavior. 1. Introduction The majority of machine learning research has followed AI in using logic or predicate calculus for the representation of knowledge. Such logical formalisms have many advantages: they are precise, general, consistent, powerful, and extensible, and they seem distantly related to natural language.
    [Show full text]
  • Machine Generalization and Human Categorization: an Information
    I Machine Generalization and Human Categorization: I An Information-Theoretic View I James E. Corter Mark A. Gluck I Columbia University Stanford University I 1. INTRODUCTION I In designing an intelligent system that must be able to explain its reasoning to a human user, or to provide generalizations that the human user finds reasonable, it may he useful to take into consideration psychological data on what types of concepts and categories people naturally use. The psychological literature on concept learning and I categorization provides strong evidence that certain categories are more easily learned, recalled, and recognized than others. We show here how a measure of the informational value of a category predicts the results of several important categorization experiments I better than standard alternative explanations. This suggests that information-based approaches to machine generalization may prove particularly useful and natural for I human users of the systems. I 2. WHAT CONSTITUTES AN OPTIMAL CATEGORY? 2.1 Psychological Evidence Concerning the Optimality of Categories I Many studies have shown that some categories or groupings of instances are easier than others to learn and recall as coherent concepts or generalizations. For example, within a hierarchically nested set of categories (such as a taxonomy of animals), there is I Borne level of abstraction--called the "basic level"--that is most natural for people to use (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). For example, in the hierarchy animal-bird-robin, bird is the basic level category. The preferrential status of basic level I categories can be measured in a variety of ways. Basic level names are generally learned earlier by children (Rosch et al., 1976; Daehler, Lonardo, and Bukatko, 1979), and arise earlier in the development of languages (Berlin, Breedlove, & Raven, 1973).
    [Show full text]
  • Mutual Information Based Clustering of Market Basket Data for Profiling
    Mutual Information based Clustering of Market Basket Data for Profiling Users Bartholomaus¨ Ende Rudiger¨ Brause E-Finance Lab Dept. of Computer Science and Mathematics Johann Wolfgang Goethe-University Johann Wolfgang Goethe-University 60054 Frankfurt, Germany 60054 Frankfurt, Germany Abstract • structuring discussion groups as well as identifying their main interests. Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, in- To obtain data for these purposes portal operators might ei- dividual, automatically obtained and maintained user pro- ther record page view information like time duration, or ex- files are the key for user satisfaction. This contribution plicit user feedback, e.g. ratings, downloads and printouts shows for the example of a cooking information site how of specific texts as well as any other kinds of services. user profiles might be obtained using category information Beside the multiple applications from above, the modeling provided by cooking recipes. It is shown that metrical dis- of state transitions and sequences is not covered by this con- tance functions and standard clustering procedures lead to tribution. Attempts that compute user profiles by model- erroneous results. Instead, we propose a new mutual infor- ing user action sequences via Markov chains can be found mation based clustering approach and outline its implica- in [20] and [2]. Other approaches like those of [18] and tions for the example of user profiling. [21] incorporate frequent item sets for predicting page vis- its. [14] even advances this idea by introducing click stream trees which represent frequent sequences of page views. 1 Introduction One common enhancement of all these approaches is the reduction of dimensionality by clustering portal contents.
    [Show full text]
  • Identifying Experts and Authoritative Documents in Social Bookmarking Systems
    IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS by Jonathan P. Grady B.A. Economics & Business Administration, Ursinus College, 1997 M.S. Electronic Commerce, Carnegie Mellon Univ., 2001 Submitted to the Graduate Faculty of School of Information Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2013 UNIVERSITY OF PITTSBURGH SCHOOL OF INFORMATION SCIENCES This dissertation was presented by Jonathan P. Grady It was defended on March 27, 2013 and approved by Peter Brusilovsky, Ph.D., Professor, School of Information Sciences Daqing He, Ph.D., Associate Professor, School of Information Sciences Stephen C. Hirtle, Ph.D., Professor, School of Information Sciences Brian S. Butler, Ph.D., Associate Professor, College of Information Studies, University of Maryland Dissertation Advisor: Michael B. Spring, Ph.D., Associate Professor, School of Information Sciences ii Copyright © by Jonathan P. Grady 2013 iii IDENTIFYING EXPERTS AND AUTHORITATIVE DOCUMENTS IN SOCIAL BOOKMARKING SYSTEMS Jonathan P. Grady University of Pittsburgh, 2013 Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags.
    [Show full text]
  • Ahn & Kim 1 the Causal Status Effect in Categorization
    Ahn & Kim 1 The causal status effect in categorization: An overview Woo-kyoung Ahn Vanderbilt University Nancy S. Kim Yale University To appear in D. L. Medin (Ed.) Psychology of Learning and Motivation Ahn & Kim 2 When we categorize objects in the world or think about concepts, some aspects of entities matter more than others. In our concept of tires, for instance, roundness is more important than a black color. Even children believe that origins are more primary than certain specific aspects of appearances in categorizing animals (Keil, 1989). This chapter discusses why some features of concepts are more central than others. We will begin with a review of possible determinants of feature centrality. Then, the main body of this chapter will focus on one important determinant, the effect of people’s causal background knowledge on feature centrality. Different Approaches to Understanding Feature Centrality In general, three approaches have been taken to understand feature centrality in concepts: the content-based approach, the statistical approach, and the theory-based approach. We will review each of these approaches, and discuss how they complement each other. Content-based Approach In what we have termed the content-based approach, specific dimensions are documented as being central for certain categories or domains, but not necessarily for others. For instance, it was found that for children, the shape of a rigid object is the most important dimension used to name the object. Landau, Smith, and Jones (1988) presented two- and three-year-old children with a small, blue, wooden inverted U-shaped object, and told the children that the object was a "dax." When asked to select other objects that were also a dax, children preferred objects with the same shape over those with the same size or material.
    [Show full text]
  • Core Concepts in Data Analysis: Summarization, Correlation, Visualization
    September 30, 2010 Core Concepts in Data Analysis: Summarization, Correlation, Visualization Boris Mirkin Department of Computer Science and Information Systems, Birkbeck, University of London, Malet Street, London WC1E 7HX UK Department of Data Analysis and Machine Intelligence, Higher School of Economics, 11 Pokrovski Boulevard, Moscow RF Abstract This book presents an in-depth description of main data analysis methods: 1D summarization, 2D analysis, popular classifiers such as naïve Bayes and linear discriminant analysis, regression and neuron nets, Principal component analysis and its derivates, K-Means clustering and its extensions, hierarchical clustering, and network clustering including additive and spectral methods. These are sys- tematized based on the idea that data analysis is to help enhance concepts and rela- tions between them in the knowledge of the domain. Modern approaches of evolu- tionary optimization and computational validation are utilized. Various relations between criteria and methods are formulated as those underlain by data-driven least-squares frameworks invoked for most of them. The description is organized in three interrelated streams: presentation, formulation and computation, so that the presentation part can be read and studied by students with little mathematical background. A number of self-study tools – worked examples, case studies, pro- jects and questions – are provided to help the reader in mastering the material. ii Acknowledgments Too many people contributed to the material of the book to list all their names. First of all, my gratitude goes to Springer’s editors who were instrumental in bringing forth the very idea of writing such a book and in channeling my efforts by providing good critical reviews.
    [Show full text]
  • Effects of Statistical Density on Learning and Representation of Categories
    Journal of Experimental Psychology: General Copyright 2008 by the American Psychological Association 2008, Vol. 137, No. 1, 52–72 0096-3445/08/$12.00 DOI: 10.1037/0096-3445.137.1.52 What’s Behind Different Kinds of Kinds: Effects of Statistical Density on Learning and Representation of Categories Heidi Kloos VladimirM. Sloutsky University of Cincinnati The Ohio State University This research examined how differences in category structure affect category learning and category representation across points of development. The authors specifically focused on category density—or the proportion of category-relevant variance to the total variance. Results of Experiments 1–3 showeda clear dissociation between dense and sparse categories: Whereas dense categories were readily learned without supervision, learning of sparse categories required supervision. There were also developmental differences in how statistical density affected category representation. Although children represented both dense and sparse categories on the basis of the overall similarity (Experiment 4A), adults represented dense categories on the basis of similarity and represented sparse categories on the basis of the inclusion rule (Experiment 4B). The results support the notion that statistical structure interacts with the learning regime in their effects on category learning. In addition, these results elucidate important developmental differences in how categories are represented, which presents interesting challenges for theories of categorization. Keywords: categorization, category learning, selective attention, cognitive development The ability to form categories (i.e., treating discriminable enti- new cats) paired with members ofa nonstudied category (e.g., ties as members of an equivalence class) isa critically important dogs). If infants learn the category during familiarization, they component of human cognition.
    [Show full text]
  • Lightweight Structure in Text
    Lightweight Structure in Text Robert C. Miller May 2002 CMU-CS-02-134 CMU-HCII-02-103 School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA Thesis Committee: Brad A. Myers, Co-chair David Garlan, Co-chair James H. Morris Brian Kernighan, Princeton University Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright c 2002 Robert C. Miller This research was sponsored by the National Science Foundation under grant no. IRI-9319969 and grant no. IIS- 0117658, the Army Research Office (ARO) under National Defense Science and Engineering Grant no. DAAH04-95- 1-0552, the Defense Advanced Research Project Agency (DARPA) - US Army under contract no. DAAD1799C0061, DARPA - Navy under contract no. N66001-94-C-6037, N66001-96-C-8506, and the USENIX Association. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any sponsoring party or the U.S. Government. Keywords: text processing, structured text, pattern matching, regular expressions, grammars, web automation, repetitive text editing, machine learning, programming-by-demonstration, simul- taneous editing, outlier finding, LAPIS Abstract Pattern matching is heavily used for searching, filtering, and transforming text, but existing pattern languages offer few opportunities for reuse. Lightweight structure is a new approach that solves the reuse problem. Lightweight structure has three parts: a model of text structure as contiguous segments of text, or regions; an extensible library of structure abstractions (e.g., HTML elements, Java expressions, or English sentences) that can be implemented by any kind of pattern or parser; and a region algebra for composing and reusing structure abstractions.
    [Show full text]
  • On Interestingness Measures of Formal Concepts
    On interestingness measures of formal concepts Sergei O. Kuznetsov, Tatiana Makhalova National Research University Higher School of Economics Kochnovsky pr. 3, Moscow 125319, Russia [email protected], [email protected] Abstract. Formal concepts and closed itemsets proved to be of big im- portance for knowledge discovery, both as a tool for concise represen- tation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it diffi- cult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to vari- ous aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation. Formal Concept Analysis intrestingess measures closed itemsets 1 Introduction and Motivation Formal concepts play an important role in knowledge discovery, since they can be used for concise representation of association rules, clustering and constructing domain taxonomies, see surveys [40, 41, 34]. Most of the difficulties in the appli- cation of closed itemsets (or, a more common name, attribute sets) to practical datasets are caused by the exponential number of formal concepts. It compli- cates both the process of the model construction and the analysis of the results as well. Recently, several approaches to tackle these issues were introduced. For a dataset all closed sets of attributes ordered by inclusion relation make a lattice, below we use the terminology of concept lattices and Formal Concept Analy- sis [21]. In terms of FCA, a concept is a pair consisting of an extent (a closed set of objects, or transactions in data mining terms) and an intent (the respective closed set of attributes, or itemset).
    [Show full text]