Approximation-Based Similarity Search for 3-D Surface Segments1

A revised version of this report appeared in GeoInformatica, Int. Journal on Advances of Computer Science for Geographic Information Systems 2(2): 113-147, 1998. query Approximation-Based Similarity Search for 3-D Surface Segments1 DB = … HANS-PETER KRIEGEL AND THOMAS SEIDL { } University of Munich, Institute for Computer Science, Oettingenstr. 67, D-80538 München, Germany Contact: [email protected], phone +49-89-2178-2191, fax +49-89-2178-2192 Figure 1. Similarity search in a database of 3-D surface segments. Abstract protein surface and subsequently be stored in a database [SK 95]. Thus, the problem of The issue of finding similar 3-D surface segments arises in many recent applications of spatial database finding docking partners is reduced to finding similar (complementary) surface segments systems, such as molecular biology, medical imaging, CAD, and geographic information systems. Sur- from a large database of segments matching a given query segment. face segments being similar in shape to a given query segment are to be retrieved from the database. The Medical Imaging. Modern medical imaging technology such as CTI or MRI produces de- two main questions are how to define shape similarity and how to efficiently execute similarity search scriptions of 3-D objects like organs or tumors by a set of 2-D images. These images repre- queries. We propose a new similarity model based on shape approximation by multi-parametric surface sent slices through the object, from which the 3-D shape can be reconstructed. A method for functions that are adaptable to specific application domains. We then define shape similarity of two 3-D retrieving similar surface segments can help to discover correlations between shape defor- surface segments in terms of their mutual approximation errors. Applying the multi-step query process- mations of organs and certain deceases. ing paradigm, we propose algorithms to efficiently support complex similarity search queries in large spatial databases. A new query type, called the ellipsoid query, is utilized in the filter step. Ellipsoid Geographic Information Systems. Regions with similar topography or hills that have a queries, being specified by quadratic forms, represent a general concept for similarity search. Our major similar shape as a given example are of great interest for geographers and the mining indus- contribution is the introduction of efficient algorithms to perform ellipsoid queries on multi-dimension- try, for example. A modern GIS will benefit users by supporting shape similarity search for al index structures. Experimental results on a large 3-D protein database containing 94,000 surface 3-D surface segments. segments demonstrate the successful application and the high performance of our method. Further application fields include CAD and mechanical engineering. In order to meet the Keywords: Approximation-based similarity search, multi-step similarity query processing, ellipsoid specific requirements of these application domains, our method supports invariance against queries on multidimensional index structures, 3-D spatial database systems translation and rotation, because the position and orientation of the objects in 3-D space does not affect shape similarity. Since the number of objects in a spatial database typically is very large, efficient query processing is important and is supported by our algorithm. Our method 1. Introduction requires the surface segments to be given as sets of points which can be obtained from common surface representations. Figure 1 illustrates the problem of similarity search for 3-D surface Currently, more and more applications managing spatial objects become involved with the segments: Given a query segment query, retrieve all segments from the database DB of 3-D problem of efficient similarity search in large databases. The application areas of retrieving surface segments that are similar to query. similar surface segments range from molecular biology and medical imaging to geographic information systems (GIS) and CAD databases containing mechanical parts. The following ex- The paper is organized as follows: In the remainder of this introduction, we sketch some amples illustrate typical requirements and potential benefits of shape-oriented similarity related work as well as the basic idea of our approach. In Section 2, we provide the background search: for shape approximation of 3-D segments by multi-parametric surface functions. Our novel Molecular Biology. A challenging problem in molecular biology is the prediction of pro- shape similarity model for 3-D surface segments is defined in Section 3, along with an experi- tein-protein interactions (the molecular docking problem): Which proteins from the data- mental evaluation of similarity results. Starting with Section 4, we turn to efficient query pro- base form a stable complex with a given query protein? It is well known that docking part- cessing and provide a framework for multi-step similarity query processing. We derive a lower- ners are recognized by complementary surface regions [MWS 96]. In many cases, the active bounding filter distance function that guarantees no false dismissals. This function corresponds sites of the proteins, i.e. the docking regions, are known and can be extracted from the to an ellipsoid query which represents a new and general query type for spatial database systems. In Section 5 we introduce a new algorithm for efficiently performing ellipsoid queries on 1. This research was funded by the German Ministry for Education, Science, Research and Technology multidimensional index structures. Section 6 shows the performance results for our similarity (BMBF) under grant no. 01 IB 307 B. The authors are responsible for the content of this paper. search system, and Section 7 concludes the paper. - 1 - - 2 - 1.1. Related Work a) b) c) In recent years, considerable work has been done on similarity search in database systems. s1 s2 Most of the previous approaches deal with one- or two-dimensional data, such as time series, digital images or polygonal data. However, they do not manage three-dimensional objects. s3 Agrawal et al. present a method for similarity search in a sequence database of one-dimensional data [AFS 93]. The sequences are mapped onto points of a low-dimensional feature space using a Discrete Fourier Transform. A Point Access Method (PAM) is then used for effi- Figure 2. a) 3-D spatial objects such as molecules or mechanical parts; b) surface representation cient retrieval of similar sequences. This technique was later generalized for subsequence by points; c) surface segments as subsets of the surface points. matching in [FRM 94], and searching in the presence of noise, scaling, and translation in [ALSS 95]. Nevertheless, it remains restricted to one-dimensional sequence data. tions or vector representations, this assumption does not restrict generality. Figure 2 shows an Jagadish proposes a technique for the retrieval of similar shapes in two dimensions [Jag 91]. example from molecular biology. A point set is computed for the surface of every molecule. He derives an appropriate object description from a rectilinear cover of an object, i.e. a cover The surface is then decomposed into (not necessarily disjoint) segments. The resulting set of consisting of rectilinear rectangles. The rectangles belonging to a single object are sorted by segments should include all docking sites at which the interaction with partner molecules takes size, and the largest ones serve as retrieval key for the shape of the object. Normalization is used place. to achieve invariance with respect to scaling and translation. Though this method can be gener- Several techniques are available for the segmentation of molecular surfaces. They are adapt- alized to three dimensions by using covers of hyperrectangles, it has not been evaluated for real ed from image and signal processing, or from clustering techniques in spatial databases. Two world 3D data and, furthermore, does not achieve rotational invariance. different classes may be distinguished: Mehrotra and Gary suggest the use of boundary features for the retrieval of shapes [MG 93] Guided segmentation. If typical shapes or locations of docking sites on molecular surfaces [GM 93]. A 2D-shape is represented by an ordered set of surface points, and fixed-sized subsets are known, the segmentation technique may be provided with appropriate heuristics to de- of this representation are extracted as shape features. All of these features are mapped to points termine potential docking segments. Such a guided algorithm, while returning a small num- in a multidimensional space which are stored using a PAM. This method provides translational, ber of segments, has a low probability of dismissing or splitting actual docking sites. The rotational and scaling invariance. It can handle partially occluded objects, but is limited to two reliability of the algorithm, however, critically depends on the quality of the underlying dimensions. heuristics. It is non-trivial to provide well-suited heuristics without deep insight into the For retrieving similar 2D polygon shapes from a CAD database system, previous work is characteristics of molecular docking processes [Mei+ 95] [MPSS 97]. presented in [BKK 97] and [BK 97]. This technique applies the Fourier Transformation in order Naive segmentation. If no information on how to extract typical docking sites from molec- to obtain a shape encoding for retrieving similar sections of polygon contours. The polygon ular surfaces is available, a brute force segmentation has to be applied. A lot of segments are sections are stored as extended multidimensional feature objects and a Spatial Access Method produced for each object. The more segments we produce, the higher the probability that no (SAM) is used for efficient retrieval [BKK 96] [Ber+ 97]. This approach is also limited to two- actual docking sites are missed. The drawback as compared to the guided segmentation dimensional objects. approach is, that the system has to manage considerably more 3-D segments. Korn et al. propose a method for searching similar tumor shapes in a medical image database [Kor+ 96].

Approximation-Based Similarity Search for 3-D Surface Segments1

Chapter 5 Dimensional Analysis and Similarity

High School Geometry Model Curriculum Math.Pdf

The Development of Thales Theorem Throughout History Slim Mrabet

Geometry Course Outline

Similar Quadrilaterals Cui, Kadaveru, Lee, Maheshwari Page 1

∆ Congruence & Similarity

Document Similarity in Information Retrieval

SIMILARITY Euclidean Geometry Can Be

Chapter 4 Euclidean Geometry

Comparison Dimensions and Similarity: Addressing Individual Heterogeneity

Similarity Measures Similarity and Dissimilarity Are Important Because

SIMILARITY Euclidean Geometry Can Be Described As a Study of The