Data Mining: a Heuristic Approach
Total Page:16
File Type:pdf, Size:1020Kb
Data Mining: A Heuristic Approach Hussein A. Abbass Ruhul A. Sarker Charles S. Newton University of New South Wales, Australia Idea Group Information Science Publishing Publishing Hershey • London • Melbourne • Singapore • Beijing Acquisitions Editor: Mehdi Khosrowpour Managing Editor: Jan Travers Development Editor: Michele Rossi Copy Editor: Maria Boyer Typesetter: Tamara Gillis Cover Design: Debra Andree Printed at: Integrated Book Technology Published in the United States of America by Idea Group Publishing 1331 E. Chocolate Avenue Hershey PA 17033-1117 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.idea-group.com and in the United Kingdom by Idea Group Publishing 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 3313 Web site: http://www.eurospan.co.uk Copyright © 2002 by Idea Group Publishing. All rights reserved. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Library of Congress Cataloging-in-Publication Data Data mining : a heuristic approach / [edited by] Hussein Aly Abbass, Ruhul Amin Sarker, Charles S. Newton. p. cm. Includes index. ISBN 1-930708-25-4 1. Data mining. 2. Database searching. 3. Heuristic programming. I. Abbass, Hussein. II. Sarker, Ruhul. III. Newton, Charles, 1942- QA76.9.D343 D36 2001 006.31--dc21 2001039775 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. NEW from Idea Group Publishing • Data Mining: A Heuristic Approach Hussein Aly Abbass, Ruhul Amin Sarker and Charles S. Newton/ 1-930708-25-4 • Managing Information Technology in Small Business: Challenges and Solutions Stephen Burgess/ 1-930708-35-1 • Managing Web Usage in the Workplace: A Social, Ethical and Legal Perspective Murugan Anandarajan and Claire A. Simmers/ 1-930708-18-1 • Challenges of Information Technology Education in the 21st Century Eli Cohen/ 1-930708-34-3 • Social Responsibility in the Information Age: Issues and Controversies Gurpreet Dhillon/ 1-930708-11-4 • Database Integrity: Challenges and Solutions Jorge H. Doorn and Laura Rivero/ 1-930708-38-6 • Managing Virtual Web Organizations in the 21st Century: Issues and Challenges Ulrich Franke/ 1-930708-24-6 • Managing Business with Electronic Commerce: Issues and Trends Aryya Gangopadhyay/ 1-930708-12-2 • Electronic Government: Design, Applications and Management Åke Grönlund/ 1-930708-19-X • Knowledge Media in Health Care: Opportunities and Challenges Rolf Grutter/ 1-930708-13-0 • Internet Management Issues: A Global Perspective John D. Haynes/ 1-930708-21-1 • Enterprise Resource Planning: Global Opportunities and Challenges Liaquat Hossain, Jon David Patrick and M. A. Rashid/ 1-930708-36-X • The Design and Management of Effective Distance Learning Programs Richard Discenza, Caroline Howard, and Karen Schenk/ 1-930708-20-3 • Multirate Systems: Design and Applications Gordana Jovanovic-Dolecek/ 1-930708-30-0 • Managing IT/Community Partnerships in the 21st Century Jonathan Lazar/ 1-930708-33-5 • Multimedia Networking: Technology, Management and Applications Syed Mahbubur Rahman/ 1-930708-14-9 • Cases on Worldwide E-Commerce: Theory in Action Mahesh Raisinghani/ 1-930708-27-0 • Designing Instruction for Technology-Enhanced Learning Patricia L. Rogers/ 1-930708-28-9 • Heuristic and Optimization for Knowledge Discovery Ruhul Amin Sarker, Hussein Aly Abbass and Charles Newton/ 1-930708-26-2 • Distributed Multimedia Databases: Techniques and Applications Timothy K. Shih/ 1-930708-29-7 • Neural Networks in Business: Techniques and Applications Kate Smith and Jatinder Gupta/ 1-930708-31-9 • Information Technology and Collective Obligations: Topics and Debate Robert Skovira/ 1-930708-37-8 • Managing the Human Side of Information Technology: Challenges and Solutions Edward Szewczak and Coral Snodgrass/ 1-930708-32-7 • Cases on Global IT Applications and Management: Successes and Pitfalls Felix B. Tan/ 1-930708-16-5 • Enterprise Networking: Multilayer Switching and Applications Vasilis Theoharakis and Dimitrios Serpanos/ 1-930708-17-3 • Measuring the Value of Information Technology Han T. M. van der Zee/ 1-930708-08-4 • Business to Business Electronic Commerce: Challenges and Solutions Merrill Warkentin/ 1-930708-09-2 Excellent additions to your library! Receive the Idea Group Publishing catalog with descriptions of these books by calling, toll free 1/800-345-4332 or visit the IGP Online Bookstore at: http://www.idea-group.com! Data Mining: A Heuristic Approach Table of Contents Preface ............................................................................................................................vi Part One: General Heuristics Chapter 1: From Evolution to Immune to Swarm to …? A Simple Introduction to Modern Heuristics ....................................................... 1 Hussein A. Abbass, University of New South Wales, Australia Chapter 2: Approximating Proximity for Fast and Robust Distance-Based Clustering ................................................................................ 22 Vladimir Estivill-Castro, University of Newcastle, Australia Michael Houle, University of Sydney, Australia Part Two: Evolutionary Algorithms Chapter 3: On the Use of Evolutionary Algorithms in Data Mining .......................... 48 Erick Cantú-Paz, Lawrence Livermore National Laboratory, USA Chandrika Kamath, Lawrence Livermore National Laboratory, USA Chapter 4: The discovery of interesting nuggets using heuristic techniques .......... 72 Beatriz de la Iglesia, University of East Anglia, UK Victor J. Rayward-Smith, University of East Anglia, UK Chapter 5: Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains ..................................................... 97 Iñaki Inza, University of the Basque Country, Spain Pedro Larrañaga, University of the Basque Country, Spain Basilio Sierra, University of the Basque Country, Spain Chapter 6: Towards the Cross-Fertilization of Multiple Heuristics: Evolving Teams of Local Bayesian Learners ................................................... 117 Jorge Muruzábal, Universidad Rey Juan Carlos, Spain Chapter 7: Evolution of Spatial Data Templates for Object Classification .............. 143 Neil Dunstan, University of New England, Australia Michael de Raadt, University of Southern Queensland, Australia Part Three: Genetic Programming Chapter 8: Genetic Programming as a Data-Mining Tool ....................................... 157 Peter W.H. Smith, City University, UK Chapter 9: A Building Block Approach to Genetic Programming for Rule Discovery............................................................................................. 174 A.P. Engelbrecht, University of Pretoria, South Africa Sonja Rouwhorst, Vrije Universiteit Amsterdam, The Netherlands L. Schoeman, University of Pretoria, South Africa Part Four: Ant Colony Optimization and Immune Systems Chapter 10: An Ant Colony Algorithm for Classification Rule Discovery ............. 191 Rafael S. Parpinelli, Centro Federal de Educacao Tecnologica do Parana, Brazil Heitor S. Lopes, Centro Federal de Educacao Tecnologica do Parana, Brazil Alex A. Freitas, Pontificia Universidade Catolica do Parana, Brazil Chapter 11: Artificial Immune Systems: Using the Immune System as Inspiration for Data Mining ......................................................................... 209 Jon Timmis, University of Kent at Canterbury, UK Thomas Knight, University of Kent at Canterbury, UK Chapter 12: aiNet: An Artificial Immune Network for Data Analysis .................... 231 Leandro Nunes de Castro, State University of Campinas, Brazil Fernando J. Von Zuben, State University of Campinas, Brazil Part Five: Parallel Data Mining Chapter 13: Parallel Data Mining ............................................................................. 261 David Taniar, Monash University, Australia J. Wenny Rahayu, La Trobe University, Australia About the Authors ...................................................................................................... 290 Index ........................................................................................................................... 297 vi Preface The last decade has witnessed a revolution in interdisciplinary research where the boundaries of different areas have overlapped or even disappeared. New fields of research emerge each day where two or more fields have integrated to form a new identity. Examples of these emerging areas include bioinformatics (synthesizing biology with computer and information systems), data mining (combining statistics, optimization, machine learning, artificial intelligence, and databases), and modern heuristics (integrating ideas from tens of fields such as biology, forest, immunology, statistical mechanics, and physics to inspire search techniques). These integrations have proved useful in substantiating problem- solving approaches with reliable and robust techniques to handle the increasing demand from practitioners to solve real-life problems. With the revolution in genetics, databases, automa- tion, and robotics, problems are no longer those that can be solved analytically in a feasible time. Complexity arises because of new discoveries about the genome, path planning, changing environments, chaotic systems, and many others, and has contributed to the increased demand to find search techniques that