IEEE International Conference on Data Engineering Computing Distance Histograms Efficiently in Scientific Databases Yi-Cheng Tu 1, Shaoping Chen 2, and Sagar Pandit 3 1Department of Computer Science and Engineering, University of South Florida 4202 E. Fowler Ave., ENB118, Tampa, FL 33620, U.S.A.
[email protected] 2Department of Mathematics, Wuhan University of Technology 122 Luosi Road, Wuhan, Hubei, 430070, P. R. China
[email protected] 3Department of Physics, University of South Florida 4202 E. Fowler Ave., PHY114, Tampa, FL 33620, U.S.A.
[email protected] Abstract— Particle simulation has become an important re- However, many of such basic analytical queries require super- search tool in many scientific and engineering fields. Data gen- linear processing time if handled in a straightforward way, erated by such simulations impose great challenges to database as they are in current scientific databases. In this paper, we storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is report our efforts to design efficient algorithms for a type of the building block of many high-level analytics, and requires query that is extremely important in the analysis of particle quadratic time to compute using a straightforward algorithm. simulation data. In this paper, we propose a novel algorithm to compute SDH Particle simulations are computer simulations in which the based on a data structure called density map, which can be easily basic components of large systems (e.g., atoms, molecules, implemented by augmenting a Quad-tree index. We also show the results of rigorous mathematical analysis of the time complexity stars, galaxies ...) are treated as classical entities (i.e., particles) 3 of the proposed algorithm: our algorithm runs on Θ(N 2 ) for that interact for certain duration under postulated empirical 5 two-dimensional data and Θ(N 3 ) for three-dimensional data, forces.