Multi-Dimensional Index Structures

Multi-Dimensional Index Structures

Multi-dimensional Index Structures Krzysztof Dembczy´nski Intelligent Decision Support Systems Laboratory (IDSS) Pozna´nUniversity of Technology, Poland Software Development Technologies Master studies, second semester Academic year 2017/18 (winter course) 1 / 40 Review of the previous lectures • Mining of massive datasets • Classification and regression • Evolution of database systems • MapReduce 2 / 40 Outline 1 Motivation 2 Hash Structures for Multidimensional data 3 Tree Structures for Multidimensional Data 4 Summary 3 / 40 Outline 1 Motivation 2 Hash Structures for Multidimensional data 3 Tree Structures for Multidimensional Data 4 Summary 4 / 40 Multi-dimensional structures • Conventional index structures are one dimensional and are not suitable for multi-dimensional search queries. age salary 5 / 40 I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... Multi-dimensional structures • Typical applications: 6 / 40 I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. 6 / 40 I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. 6 / 40 I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. 6 / 40 I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. 6 / 40 I ... Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. 6 / 40 Multi-dimensional structures • Typical applications: I Geographic Information Systems (GIS): where-am-I queries. I Computer vision: find the most similar picture. I Learning: decision trees, rules, nearest neighbors. I Recommender systems: find the most similar users/items. I Similarity of documents: plagiarism, mirror pages, articles from the same source. I ... 6 / 40 I Partial match queries: for specified values for one or more dimensions find all points matching those values in those dimensions: where salary = 5000 and age = 30 I Range queries: for specified ranges for one or more dimensions find all the points within those ranges: where salary between 3500 and 5000 and age between 25 and 35 I Nearest-neighbor queries: find the closest one or more points to a given point. I Where-am-I queries: for a given point, where this point is located (in which shape). Multi-dimensional structures • Typical types of multi-dimensional queries: 7 / 40 I Range queries: for specified ranges for one or more dimensions find all the points within those ranges: where salary between 3500 and 5000 and age between 25 and 35 I Nearest-neighbor queries: find the closest one or more points to a given point. I Where-am-I queries: for a given point, where this point is located (in which shape). Multi-dimensional structures • Typical types of multi-dimensional queries: I Partial match queries: for specified values for one or more dimensions find all points matching those values in those dimensions: where salary = 5000 and age = 30 7 / 40 I Nearest-neighbor queries: find the closest one or more points to a given point. I Where-am-I queries: for a given point, where this point is located (in which shape). Multi-dimensional structures • Typical types of multi-dimensional queries: I Partial match queries: for specified values for one or more dimensions find all points matching those values in those dimensions: where salary = 5000 and age = 30 I Range queries: for specified ranges for one or more dimensions find all the points within those ranges: where salary between 3500 and 5000 and age between 25 and 35 7 / 40 I Where-am-I queries: for a given point, where this point is located (in which shape). Multi-dimensional structures • Typical types of multi-dimensional queries: I Partial match queries: for specified values for one or more dimensions find all points matching those values in those dimensions: where salary = 5000 and age = 30 I Range queries: for specified ranges for one or more dimensions find all the points within those ranges: where salary between 3500 and 5000 and age between 25 and 35 I Nearest-neighbor queries: find the closest one or more points to a given point. 7 / 40 Multi-dimensional structures • Typical types of multi-dimensional queries: I Partial match queries: for specified values for one or more dimensions find all points matching those values in those dimensions: where salary = 5000 and age = 30 I Range queries: for specified ranges for one or more dimensions find all the points within those ranges: where salary between 3500 and 5000 and age between 25 and 35 I Nearest-neighbor queries: find the closest one or more points to a given point. I Where-am-I queries: for a given point, where this point is located (in which shape). 7 / 40 Multi-dimensional queries with conventional indexes • Consider a range query: where salary between 3500 and 5000 and age between 25 and 35 age age age salary salary salary • To answer the query: I Scan along either index at once, I Intersect the elements returned by indexes • This approach produces many false hits on each index! 8 / 40 I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: 9 / 40 I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database 9 / 40 O(n). I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: 9 / 40 I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). 9 / 40 O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: 9 / 40 • With large databases linear complexity can be too costly. • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: O(n log k)! 9 / 40 • Can we do better? Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. 9 / 40 Nearest neighbor queries • Brute force search: I Given a query point q scan through each of n data points in database I Computational complexity for 1-NN query: O(n). I Computational complexity for k-NN query: O(n log k)! • With large databases linear complexity can be too costly. • Can we do better? 9 / 40 I There is no point within the selected range. I The closest point within the range might not be the closest point overall. • There are two situations we need to take into account: Nearest neighbor queries • To solve the nearest neighbor search one can ask the range query and select the point closest to the target within that range. age salary 10 / 40 I There is no point within the selected range. I The closest point within the range might not be the closest point overall. • There are two situations we need to take into account: Nearest neighbor queries • To solve the nearest neighbor search one can ask the range query and select the point closest to the target within that range.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    136 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us