Intelligent Big Multimedia Databases

9665_9789814696647_tp.indd 1 30/4/15 11:36 am b1816 MR SIA: FLY PAST b1816_FM This page intentionally left blank b1816_FM.indd vi 10/10/2014 1:12:39 PM 9665_9789814696647_tp.indd 2 30/4/15 11:36 am Published by :RUOG6FLHQWL¿F3XEOLVKLQJ&R3WH/WG 7RK7XFN/LQN6LQJDSRUH 86$RI¿FH:DUUHQ6WUHHW6XLWH+DFNHQVDFN1- 8.RI¿FH6KHOWRQ6WUHHW&RYHQW*DUGHQ/RQGRQ:&++( British Library Cataloguing-in-Publication Data $FDWDORJXHUHFRUGIRUWKLVERRNLVDYDLODEOHIURPWKH%ULWLVK/LEUDU\ INTELLIGENT BIG MULTIMEDIA DATABASES &RS\ULJKWE\:RUOG6FLHQWL¿F3XEOLVKLQJ&R3WH/WG $OOULJKWVUHVHUYHG7KLVERRNRUSDUWVWKHUHRIPD\QRWEHUHSURGXFHGLQDQ\IRUPRUE\DQ\PHDQV HOHFWURQLFRUPHFKDQLFDOLQFOXGLQJSKRWRFRS\LQJUHFRUGLQJRUDQ\LQIRUPDWLRQVWRUDJHDQGUHWULHYDO system now known or to be invented, without written permission from the publisher. )RUSKRWRFRS\LQJRIPDWHULDOLQWKLVYROXPHSOHDVHSD\DFRS\LQJIHHWKURXJKWKH&RS\ULJKW&OHDUDQFH &HQWHU,QF5RVHZRRG'ULYH'DQYHUV0$86$,QWKLVFDVHSHUPLVVLRQWRSKRWRFRS\ LVQRWUHTXLUHGIURPWKHSXEOLVKHU ,6%1 3ULQWHGLQ6LQJDSRUH Steven - Intelligent Big Multimedia Databases.indd 1 21/4/2015 11:21:03 AM April 7, 2015 16:32 ws-book9x6 9665-main page v for Manuela v b1816 MR SIA: FLY PAST b1816_FM This page intentionally left blank b1816_FM.indd vi 10/10/2014 1:12:39 PM April 7, 2015 16:32 ws-book9x6 9665-main page vii Preface Multimedia databases address a growing number of commercially important applications such as media on demand, surveillance systems and med- ical systems. The book will present essential and relevant techniques and algorithms for the development and implementation of large multimedia database systems. The traditional relational database model is based on a relational algebra that is an offshoot of first-order logic and of the algebra of sets. The simple relational model is not powerful enough to address multimedia data. Because of this, multimedia databases are categorized into many major areas. Each of these areas are now so extensive that a major understanding of the mathematical core concepts requires the study of different fields such as information retrieval, digital image processing, fractals, machine learning, neuronal networks and high-dimensional indexing. This book unifies the essential concepts and recent algorithms into a single volume. Overview of the book The book is divided into ten chapters. We start with some examples and a description of multimedia databases. In addressing multimedia information, we are addressing digital data representations and how these data can be stored and manipulated. Multimedia data provide additional function- ality than would be available in traditional forms of data. It allows new data access methods such as query by images in which the most similar image to the presented image is determined. In the third chapter, we address the basic transform functions that are required when addressing multimedia databases, such as Fourier and cosine transforms as well as the wavelet transform, which is the most popular. vii April 7, 2015 16:32 ws-book9x6 9665-main page viii viii Intelligent Big Multimedia Databases Starting from continuous wavelet transforms, we investigate the discrete fast wavelet transform for images, which is the basis for many compression algorithms. It is also related to the image pyramid, which will play an important role when addressing indexing techniques. We conclude the chapter with a description of the Karhunen-Loève transform, which is the basis of principal component analysis (PCA) and the k-means algorithm. The size of a multimedia object may be huge. For the efficient storage and retrieval of large amounts of data, a clever method of encoding the information using fewer bits than the original representation is essential. This is the topic of the fourth chapter, which addresses compression algorithms. In addition to lossless compression, where no loss of information is present, lossy compression based on human perceptual features is essential for humans, and in this form of compression, we only represent the part of information that we experience. Lossy compression is related to feature extraction, which will be described in the fifth chapter. We introduce the basic image features and outgoing from the image pyramid, and for the scale space, we describe the scale-invariant feature transform (SIFT). Next, we turn to speech and explain the speech formant frequencies. A feature vector represents the extracted features that describe multimedia objects. We introduce the dis- tinction between the nearest neighbor similarity and the epsilon similarity for vectors in a database. When the features are represented by sequences of varying length, time wrapping is used to determine the similarity between them. For the fast access of large data, divide and conquer methods are used, which are based on hierarchical structures, and this is discussed in the sixth chapter. For numbers, a tree can be used to prune branches in the processing queries. The access is fast: it is logarithmic in relation to the size of the database representing the numbers. Usually, the multimedia objects are described by vectors rather than by numbers. For low-dimensional vectors, metric index trees such as kd-trees and R-trees can be used. Alternatively, an index structure based on space-filling curves can be constructed. The metric index trees operate efficiently when the number of dimensions is small. The growth of the number of dimensions has negative impli- cations for the performance of multidimensional index trees; these negative effects are called the “curse of dimensionality.” The “curse of dimensionality”, which states that for an exact nearest neighbor, any algorithm for high dimension d and n objects must either use an nd-dimension space or have aquerytimeofn × d [Böhm et al. (2001)], [Pestov (2012)]. In approximate April 7, 2015 16:32 ws-book9x6 9665-main page ix Preface ix indexing, the data points that may be lost at some distances are distorted. Approximate indexing seems to be, in some sense, free from the curse of dimensionality. We describe the popular locality-sensitive hashing (LSH) algorithm in the seventh chapter. An alternative method, which is based on exact indexing, is the generic multimedia indexing (GEMINI) and is introduced in the eighth chapter. The idea is to determine a feature extraction function that maps the high- dimensional objects into a low-dimensional space. In this low-dimensional space, a so-called “quick-and-dirty” test can discard the non-qualifying objects. Based on the ideas of the image pyramid and the scale space, this approach can be extended to the subspace tree. The search in such a structure starts at the subspace with the lowest dimension. In this subspace, the set of all possible similar objects is determined. The algorithm can be easily parallelized for large data. Chunks divide the database; each chunk may be processed individually by ten to thousands of servers. In the following chapter, we address information retrieval for text databases. Documents are represented as sparse vectors. In sparse vectors, most components are zero. To address this, alternative indexing techniques based on random projections are described. The tenth chapter addresses an alternative approach in feature extraction based on statistical supervised machine learning. Based on percep- trons, we introduce the back-propagation algorithm and the radial-basis function networks, where both may be constructed by the support-vector- learning algorithm. We conclude the book with a chapter about applications in which we highlight some architecture issues and present multimedia database applications in medicine. The book is written for general readers and information professionals as well as students and professors that are interested in the topics of large multimedia databases and want to acquire the required essential knowledge. In addition, readers interested in general pattern recognition engineering can profit from the book. It is based on a lecture that was given for several years at the Universidade de Lisboa. My research in recent years has benefited from many discussions with Angeloˆ Cardoso, Catarina Moreira and João Sacramento. I like to ac- knowledge financial support from Funda¸cão para a Ciência e Tecnologia (Portugal) through the programme PTDC/EIA-CCO/119722/2010. April 7, 2015 16:32 ws-book9x6 9665-main page x x Intelligent Big Multimedia Databases Finally, I would like to thank my son André and my loving wife Manuela, without their encouragement the book would be never finished. Andreas Wichert April 7, 2015 16:32 ws-book9x6 9665-main page xi Contents Preface vii 1. Introduction 1 1.1 Intelligent Multimedia Database . 1 1.2 MotivationandGoals.................... 5 1.3 GuidetotheReader..................... 6 1.4 Content............................ 7 2. Multimedia Databases 13 2.1 RelationalDatabases..................... 13 2.1.1 Structured Query Language SQL . 15 2.1.2 Symbolical artificial intelligence and relational databases....................... 16 2.2 MediaData.......................... 19 2.2.1 Text.......................... 19 2.2.2 Graphics and digital images . 21 2.2.3 Digitalaudioandvideo............... 23 2.2.4 SQLandmultimedia................ 27 2.2.5 Multimediaextender................ 27 2.3 Content-BasedMultimediaRetrieval............ 28 2.3.1 Semantic gap and metadata . 31 3. Transform Functions 35 3.1 FourierTransform...................... 35 3.1.1 Continuous Fourier transform . 35 3.1.2 Discrete Fourier transform . 37 xi April 7, 2015 16:32 ws-book9x6 9665-main

Intelligent Big Multimedia Databases

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support