Efficient Similarity Search in High-Dimensional Data Spaces

New Jersey Institute of Technology Digital Commons @ NJIT Dissertations Electronic Theses and Dissertations Spring 5-31-2004 Efficient similarity search in high-dimensional data spaces Yue Li New Jersey Institute of Technology Follow this and additional works at: https://digitalcommons.njit.edu/dissertations Part of the Computer Sciences Commons Recommended Citation Li, Yue, "Efficient similarity search in high-dimensional data spaces" (2004). Dissertations. 633. https://digitalcommons.njit.edu/dissertations/633 This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact [email protected]. Copyright Warning & Restrictions The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material. Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specified conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a, user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use” that user may be liable for copyright infringement, This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law. Please Note: The author retains the copyright while the New Jersey Institute of Technology reserves the right to distribute this thesis or dissertation Printing note: If you do not wish to print this page, then select “Pages from: first page # to: last page #” on the print dialog screen The Van Houten library has removed some of the personal information and all signatures from the approval page and biographical sketches of theses and dissertations in order to protect the identity of NJIT graduates and faculty. ASAC EICIE SIMIAIY SEAC I IGIMESIOA AA SACES b Y Simiaiy seac i ig-imesioa aa saces is a oua aaigm o may moe aaase aicaios suc as coe ase image eiea ime seies aaysis i iacia a makeig aaases a aa miig Oecs ae ee- see as ig-imesioa ois o ecos ase o ei imoa eaues Oec simiaiy is e measue y e isace ewee eaue ecos a simiaiy seac is imemee ia age queies o k-eaes eigo (k- queies Imemeig k- queies ia a sequeia sca o age aes o eaue ecos is comuaioay eesie uiig mui-imesioa iees o e eaue ecos o k- seac aso es o e usaisacoy we e imesioaiy is ig is is ue o e oo ie eomace cause y e imesioaiy cuse imesioaiy eucio usig e Sigua aue ecomosiio meo is e aoac aoe i is suy o ea wi ig-imesioa aa oig a o may ea-wo aases aa isiuio es o e eeogeeous imesioaiy eucio o e eie aase may cause a sigiica oss o iomaio Moe eicie eeseaio is soug y cuseig e aa io omogeeous suses o ois a ayig imesioaiy eucio o eac cuse eseciey ie uiiig oca ae a goa imesioaiy eucio e esis eas wi e imoeme o e eiciecy o quey ocessig associae wi oca imesioaiy eucio meos suc as e Cuseig a Sigua aue ecomosiio (CS a e oca imesioaiy eucio ( meos aiaios i e imemeaio o CS ae cosiee a e wo meos ae comae om e iewoi o e comessio aio CU ime a eiea eiciecy A eac k- agoim is esee o oca imesioaiy eucio meos y eeig a eisig mui-se k- seac agoim wic is esige o goa imesioaiy eucio Eeimea esus sow a e ew meo equies ess CU ime a e aoimae meo oose oigia o CS a a comaae ee o accuacy Oima susace imesioaiy eucio as e ie o miimiig oa quey cos e oem is comicae i a eac cuse ca eai a iee ume o imesios A yi meo is esee comiig e es eaues o e CS a meos o i oima susace imesioaiies o cuses geeae y oca imesioaiy eucio meos e eeimes sow a e oose meo woks we o o ea-wo aases a syeic aases EICIE SIMIAIY SEAC I IG-IMESIOA AA SACES y Yue i A isseaio Sumie o e acuy o ew esey Isiue o ecoogy i aia uime o e equiemes o e egee o oco o iosoy i Comue Sciece eame o Comue Sciece May Copyright © 2004 by Yue Li ALL RIGHTS RESERVED AOA AGE EICIE SIMIAIY SEAC I IGIMESIOA AA SACE Y Aeae omasia isseaio Aiso ae oesso o Comue Sciece I D ose eug Commiee Meme ae isiguise oesso o Comue Sciece I Wociec ye Commiee Meme ae oesso o Comue Sciece I ice Oia Commiee Meme ae Assisa oesso o Comue Sciece I ia Yag Commiee Meme ae Assisa oesso o Iusia a Mauacuig Egieeig I IOGAICA SKEC Athr: Yue i r: oco o iosoy t: May Undrrdt nd Grdt Edtn: • oco o iosoy i Comue Sciece ew esey Isiue o ecoogy ewak • Mase o Sciece i Comue Sciece Saog Uiesiy ia Cia 1999 • aceo o Sciece i Comue Sciece Saog Uiesiy ia Cia 1993 Mjr: Comue Sciece rnttn nd bltn: Y i A omasia a ag "iig Oima Susace imesioaiy o k- Seac i Cusee aases" 15 I Co o aaase a Ee Sysems Aicaios (EA sumie ag A omasia a Y i "A esise a yamic Oee aiio Ie o k eaes eigo Seac" 15 I Co o aaase a Ee Sysems Aicaios (EA sumie Y i A omasia a ag "A Eac k- Seac Agoim o CS" IEEE asacios o Kowege a aa Egieeig (KE oua i eiew 3 v o e memoy o my moe v ACKOWEGME I wou ike o eess my eees aeciaio o Aeae omasia wo o oy see as my eseac aiso oiig auae a couess esouces isig a iuiio u aso cosay gae me suo ecouageme a eassuace Secia aks ae gie o ose eug Wociec ye ice Oia a ia Yag o aciey aiciaig i my commiee I wou ike o ak youg-Kee Yi a ioio Casei o ei e o my eseac is wok cou o ae ee comee wiou ei ackgou e Secia aks ae gie o iua ag wo woke wi me o wo yeas a assise me i imemeig cuseig a mui-imesioa ieig agoims I wou ike o ak my oe eow sues i e Iegae Sysems aoaoy — Cuqi a Cag iu a Gag u — o ei iesi suo a assisace oe yeas iay I wou ike o ak my usa my ae my auge a my i-aws o ei oe a suo v AE O COES Chptr Page 1 IOUCIO 1 11 Moiaio 1 1 Caeges Coiuios a Ouie 4 11 imesioaiy eucio 4 1 Eac k- Seac 5 13 Oima Susace imesioaiy 6 1 Ouie 7 2 IG-IMESIOA IEIG 9 1 Simiaiy Queies 0 2.2 isace Meics 11 3 Mui-imesioa Ie Sucues 13 31 K--ees 1 3 e -ee a Is aiaios 15 2.4 imesioaiy Cuse 17 5 imesioaiy eucio 19 51 owe ouig oey 1 5 omaie Mea Squae Eo 22 53 Sigua aue ecomosiio (S 22 5 iscee ouie asom ( 24 55 iscee Waee asom (W 5 5 Segmeaio Mea Meo (SMM 26 2.6 Meic ase Ie Sucues 26 7 Summay 7 3 EOMACE O CS 9 31 Ioucio 9 v AE O COES (Cntnd Chptr Page 3 eae Wok 33 31 Cuseig 33 3 3 33 MM 37 33 e CS Meo 39 331 CS Imemeaio Ses 39 33 Aoimae k- Seac Agoim 3 3 eomace Suy 46 31 Eeimea Seu 46 3 eomace Meics 7 33 Eeimes 48 35 Cocusios 60 4 K-EAES EIGO SEAC 62 1 Ioucio 62 4.2 eae Wok 3 3 Eac k- Seac Agoim o CS 5 4.4 Eeimes 68 1 CU Cos esus MSE o e Eac Meo 68 4.4.2 Eac Meo esus Aoimae Meo 7 3 e Eec o Maiaiig a Ea imesio 71 5 Cocusio a iscussio 73 5 OIMA SUSACE IMESIOAIY 75 51 Ioucio 75 5 eae Wok 79 53 e yi Meo o Cusee aases 1 531 Iomaio oss a Susace imesioaiy 1 v AE O COES (Coiue Cae age 53 e yi Meo 5 533 Oima Susace imesioaiy 5 Eeimes 7 51 MSE a e Aeage Susace imesioaiy 9 5 Quey Coss a Susace imesioaiy 91 55 Cocusio 93 COCUSIO 95 AEI A K- AGOIM O 97 AEI CU COS O EGUA K-MEAS A EIICA K-MEAS 99 AEI C AAYIC K- QUEY COS MOE 11 C1 aca imesios 11 C aousoss Quey Cos Moes 1 C3 oms Quey Cos moes 15 C e eise Quey Moe o age Queies 1 C5 Eeimes 17 EEECES 19 i IS O AES ae age 11 Symo ae 8 31 aases 7 3 MM aamees a eaie imesios Comae o CS 60 51 Agoim 1 84 5 Agoim 5 53 aases o e Eeimes 88 5 Cusee aases (C Geeae y wi iee aamees 9 A1 k- Agoim o 9 C1 esus o Equaio s Eeimes (M = 1 E = 1 E = 1 1 x IS O IGUES r Page 1 A - eame o (a ee yes o isaces ( age quey o aius E ue e ee meics 1 Eame o a k---ee o - sace 15 3 Eame o a - -ee 1 A cice a is miimum ouig squae i - sace 1 5 isaces ewee aa ois a cees o the datasets: (a) COLH64 (b) TXT55 (c) GABOR64 (d) SYHT64.

Efficient Similarity Search in High-Dimensional Data Spaces

Pivot Selection Method for Optimizing Both Pruning and Balancing in Metric Space Indexes

Peer-To-Peer Similarity Search Based on M-Tree Indexing

Two Algorithms for Nearest-Neighbor Search in High Dimensions

Similarity Searching in Peer-To-Peer Environment

Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Prefix Hash Tree an Indexing Data Structure Over Distributed Hash

Nearest Neighbor Search in Google Correlate

Efficient Nearest-Neighbor Computation for GPU-Based Motion

R-Trees with Voronoi Diagrams for Efficient Processing of Spatial

Dimensional Testing for Reverse K-Nearest Neighbor Search

Distributed Computation of the Knn Graph for Large High-Dimensional Point Sets

Efficient Nearest Neighbor Searching for Motion Planning