MANAGING and MINING MULTIMEDIA DATABASES
Total Page:16
File Type:pdf, Size:1020Kb
MANAGING and MINING MULTIMEDIA DATABASES 0037FM/frame Page 2 Friday, May 11, 2001 10:31 AM MANAGING and MINING MULTIMEDIA DATABASES Bhavani Thuraisingham CRC Press Boca Raton London New York Washington, D.C. disclaimer Page 1 Friday, May 18, 2001 3:50 PM Library of Congress Cataloging-in-Publication Data Thuraisingham, Bhavani M. Managing and mining multimedia databases / Bhavani Thuraisingham. p. cm. Includes bibliographical references and index. ISBN 0-8493-0037-1 1. Database management. 2. Data mining. 3. Multimedia systems. I. Title. QA76.9.D3 T458 2001 006.7—dc21 2001025368 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2001 by CRC Press LLC No claim to original U.S. Government works International Standard Book Number 0-8493-0037-1 Library of Congress Card Number 2001025368 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper 0037FM/frame Page 5 Friday, May 11, 2001 10:31 AM Preface Recent developments in information systems technologies have resulted in comput- erizing many applications in various business areas. Data has become a critical resource in many organizations; therefore efficient access to data, sharing or extract- ing information from the data, and making use of this information have become urgent needs. As a result, there have been many efforts to integrate the various data sources scattered across several sites and to extract information from these databases in the form of patterns and trends. These data sources may be databases managed by database management systems, or they could be warehoused in a repository from multiple sources. The advent of the World Wide Web (WWW) in the mid 1990s has resulted in even greater demand for managing data, information, and knowledge effectively. There is now so much data on the Web that managing it with conventional tools is becoming almost impossible. New tools and techniques are needed to effectively manage this data. Therefore, various tools are being developed to provide the interoperability and warehousing between the multiple data sources and systems, as well as to extract information from the databases and warehouses on the Web. Data in Web databases are both structured and unstructured. Structured databases include relational and object databases. Unstructured databases include text, image, audio, and video databases. In general, multimedia databases are unstructured. Some text databases are semistructured, meaning that they have partial structure. Devel- opments in multimedia database management systems have exploded during the past decade. While numerous papers and some texts have appeared in multimedia data- bases, more recently these databases are being mined to extract useful information. Furthermore, multimedia databases are being accessed on the Web. There is currently little information about providing a complete set of services for multimedia data- bases. These services include managing, mining, and integrating multimedia data- bases on the Web for electronic enterprises. The focus of this book is on managing and mining multimedia databases for the electronic enterprise. We focus on database management system techniques for text, image, audio, and video databases. We then address issues and challenges regarding mining the multimedia databases to extract information that was previously unknown. Finally, we discuss the directions and challenges of integrating multimedia databases for the Web. In particular, e-business and its relationship to managing and mining multimedia databases will be discussed. Few texts provide a comprehensive set of services for multimedia data management, although numerous research papers have been published on this topic. The purpose of this book is to discuss complex ideas in multimedia data management and mining in a way that can be understood by someone who wants background information in this area. Technical managers as well as those interested in technology will benefit from this book. We employ a 0037FM/frame Page 6 Friday, May 11, 2001 10:31 AM data-centric approach to describe multimedia technologies. The concepts are explained using e-commerce and the Web as an application area. This book is divided into three parts. Part I describes multimedia database management. Without the underlying concepts such as querying and storage man- agement, one cannot develop multimedia information management for the Web. We start with an overview of multimedia database system architectures and data models. This is followed by a discussion of some critical functions for multimedia database management. These functions include query processing, metadata management, storage management, and distribution. Part II describes multimedia data mining. We discuss text, image, video, and audio mining. These discussions also provide overviews of text/information retrieval, image processing, video information retrieval, and audio/speech processing. Part III describes multimedia on the Web. We start with a discussion of how multimedia databases may be integrated on the Web and then address multimedia data management and mining for e-business. We discuss some of the emerging technologies to support multimedia data management, e.g., collaboration, knowledge management, and training. Next, we discuss security and privacy issues for multi- media databases with the Web in mind. Finally, emerging standards as well as prototypes and products for multimedia data management and mining are explored. Since a lot of background information is needed to understand the concepts in this book, six appendices are included. Appendix A provides an overview and framework for data management, showing where multimedia data management fits into this framework. We then provide a discussion of database systems technologies followed by a discussion of data mining technologies. These are discussed next. These include object-programming languages, object databases, object-based design and analysis, distributed objects, and components and framework, which all have applications in multimedia data management. Next, we discuss security issues, and finally, we provide an overview of Web technologies and e-commerce. Since mul- timedia on the Web will be a critical part of our lives and the Web is central to this book, we have also provided an introduction to the Web. Although our first three books, Data Management Systems: Evolution and Inter- operation; Data Mining: Technologies, Techniques, Tools, and Trends; and Web Data Management and Electronic Commerce, would serve as excellent sources of refer- ence, this book is fairly self-contained. We have provided a reasonably comprehen- sive overview of the various background material necessary to understand multime- dia databases in the six appendices. However, some of the details of this background information, especially on data management and mining, can be found in our pre- vious texts. We have tried to obtain current information on products and standards. However, as emphasized repeatedly in our books, vendors and researchers are continually updating their systems, and therefore information valid today may not be accurate tomorrow. We urge the reader to contact the vendors and get up-to-date information. Note that many of the products are trademarks of various corporations. If we know or have heard of such trademarks, we use capital italic letters for the product when it is first introduced. Again, due to the rapidly changing nature of the computer 0037FM/frame Page 7 Friday, May 11, 2001 10:31 AM industry, we encourage the reader to contact the vendors to obtain up-to-date infor- mation on trademarks and ownership of the various products. We have tried our best to obtain references from books, journals, magazines, and conference and workshop proceedings, and have given only a few Web page URLs as references. Although we tried to limit URLs as references, we found that it was almost impossible to write a current text without referencing them. Although URLs often contain excellent reference material, some may no longer be available even by the time this book is published. Therefore, we also encourage the reader to check the Web periodically for current information on multimedia data management developments, prototypes, and products. There are several conference series devoted to this topic. We repeatedly use the terms data, data management, database systems,